Google DeepMind Launches Gemini Omni: Video Generation and Editing Using Natural Language Is Here

May 20, 2026 Daniel Cesak

Google DeepMind introduced a completely new model called Gemini Omni at the Google I/O 2026 conference. It's not just another text chatbot — it's a multimodal generative system that can create and edit video through natural dialogue, as naturally as if you were instructing an experienced video editor. The first version, Gemini Omni Flash, launches on May 20, 2026, and promises that shooting and editing video will never be the same again.

Listen to this article:

What Is Gemini Omni and How It Differs from Veo

Gemini Omni is an independent branch of the Gemini model family, built on combining reasoning and generation capabilities. While previous models like Veo 3.1 were "just" for creating video from a text prompt, Omni goes a step further — you can break a video down into multiple steps, entering a new prompt at each stage, and the model remembers the full context of previous edits.

Google states that Gemini Omni understands physics — gravity, kinetic energy, fluid dynamics — and can apply these principles to generated scenes. At the same time, it draws on extensive knowledge of history, science, and culture, so the resulting video is meant to be not just visually impressive but also logically coherent. In a direct comparison with competitors (OpenAI Sora, Runway Gen-4), Omni offers precisely this combined advantage: generation + editing + reasoning in a single model.

Video Editing Through Conversation: Like Talking to an Editor

The core of Omni lies in natural video editing. The model maintains consistency of characters, environments, and physical properties across individual prompts. Official demos show how the model transfers a character from one environment to another, changes the camera angle, or makes a specific object disappear — all without losing scene coherence.

A practical example from the official presentation: you upload a video of a violinist playing in a room to Gemini Omni. You type "move the violinist to the environment in this reference photo". The model does it. Then you add "change the camera angle to an over-the-shoulder view of the violinist" and the model adjusts the scene without losing context.

The model can replace any object in a video with a simple description ("swap the spaceship for a car"), change visual style ("convert the scene to a retro-futuristic style"), or add sound effects synchronized with the action on screen.

Reference Inputs: Combine Anything with Anything

One of the most powerful features of Gemini Omni is working with reference inputs. You can upload an image, video, audio, or text and the model combines them into a single consistent output. For example:

Upload a photo of a character and a video with a specific motion — the model transfers the motion to your character
Upload a sketch and the model converts it into a realistic video, using the drawing exclusively as a motion guide
Combine a reference environment, character, and music track to create a complete scene

This capability primarily targets creative professionals who want to quickly prototype visual ideas, but also everyday users who want to bring their photos or drawings to life.

Educational Potential: From Proteins to the Alphabet

Google explicitly demonstrates that Omni can create educational content. In one demo, the model generates a protein folding animation in the style of a stop-motion clay animation — without human hands, scientifically accurate. In another, it creates an alphabet with 26 items (capybara for C, disco ball for D, lava lamp for L), where each letter gets its own shot with a corresponding caption.

This opens doors for educational video creators, teachers, and science communicators who can generate illustrative animations of complex concepts in minutes.

Digital Avatars: Video with Your Face and Voice

A more controversial yet fascinating feature is the creation of digital avatars. Users can create their digital copy that looks and speaks like them. This feature is being tested in a pilot mode on YouTube Shorts, and Google says it is approaching it cautiously — the voice and dialogue editing features are undergoing further testing.

SynthID and Transparency: Every Video Will Be Labeled

All videos created with Gemini Omni will contain a SynthID digital watermark and metadata according to the C2PA Content Credentials standard. The origin of the video will be verifiable directly in the Gemini app, in the Chrome browser, or in Google Search. Google is building on its strategy of AI content transparency — the SynthID technology was recently adopted by OpenAI for its models as well.

Availability and Pricing

Gemini Omni Flash is available starting May 20, 2026:

For Google AI Plus, Pro, and Ultra subscribers — globally, including the Czech Republic, through the Gemini app and Google Flow studio
YouTube Shorts and YouTube Create — free for all users, rollout begins this week
API for developers and businesses — planned for June 2026

For Czech users, it's crucial that both the Gemini app and Google Flow support Czech as an input language for text prompts, so you can instruct the model in your native language. The Czech localization of the Gemini interface is fully available. Subscription prices start at CZK 549 per month (Google One AI Premium).

Google further announces that outputs in the form of images and audio will be added in the future, making Omni a truly universal generative model.

What This Means for Creators and Businesses in the Czech Republic

For Czech creators, marketing agencies, and educational institutions, Gemini Omni potentially represents a breakthrough tool — for the first time, a model is available that not only generates video but allows it to be smoothly edited in conversational mode, without the need for expensive software or advanced technical knowledge.

European regulation (EU AI Act) does impose stricter transparency requirements on generative AI, but Google is meeting these requirements head-on with SynthID and C2PA technology. For Czech companies deploying AI into creative workflows, this means the tool is designed from the ground up with an emphasis on transparency and safety — reducing regulatory risks during adoption.

Main Competitors: Sora, Veo, Runway

Gemini Omni enters a field where others already operate:

OpenAI Sora — text-to-video generation, limited editing, priced within ChatGPT Plus ($20/month)
Runway Gen-4 — professional video generation and editing tool, starting at $15/month
Google Veo 3.1 — Omni's predecessor, primarily focused on generation, without advanced editing capabilities

The key difference of Omni is the combination of reasoning abilities with generation. While Sora or Runway generate video based on a text description and limited subsequent editing, Omni remembers the entire conversation history, maintains scene consistency, and applies physical and scientific knowledge.

Is Gemini Omni available for free?

Partially yes. Within YouTube Shorts and YouTube Create, Gemini Omni Flash is available for free to all users. For full-featured use in the Gemini app and Google Flow, you need a Google One AI Premium subscription (in the Czech Republic starting at CZK 549/month) or a higher Pro or Ultra plan.

Does Gemini Omni support the Czech language?

Yes. Both the Gemini app and Google Flow support Czech for entering text prompts. The Czech Gemini interface is fully localized. The model itself understands prompts in Czech — so you don't need to know English for video generation and editing.

How can I tell if a video was created by AI?

All videos from Gemini Omni contain an invisible SynthID digital watermark and C2PA metadata. You can verify the video's origin directly in the Gemini app, in the Chrome browser, or in Google Search. Google is also expanding tools for verifying AI content across the web.

When will Gemini Omni be available via API for businesses?

Google states that API access for developers and enterprises will be launched within a few weeks of the announcement, meaning approximately during June 2026.