Google DeepMind Launches Gemini Omni: Video Generation and Editing Using Natural Language Is Here

May 20, 2026 Daniel Cesak

Google DeepMind introduced a completely new model called Gemini Omni at the Google I/O 2026 conference. It's not just another text chatbot — it's a multimodal generative system that can create and edit video through natural dialogue, just as naturally as you would instruct an experienced video editor. The first version, Gemini Omni Flash, launches on May 20, 2026, and promises that shooting and editing video will never be the same again.

Listen to this article:

What is Gemini Omni and how it differs from Veo

Gemini Omni is a separate branch of the Gemini model family, built on the combination of reasoning and generation capabilities. While previous models like Veo 3.1 served "merely" for creating video from a text prompt, Omni goes a step further — you can break video into multiple steps, enter a new prompt at each stage, and the model remembers the entire context of previous edits.

Google states that Gemini Omni understands physics — gravity, kinetic energy, fluid dynamics — and can apply these principles to generated scenes. At the same time, it draws on extensive knowledge of history, science and culture, so the resulting video should be not only visually impressive but also logically meaningful. In direct comparison with competitors (OpenAI Sora, Runway Gen-4), Omni offers precisely this combined advantage: generation + editing + reasoning in a single model.

Video editing through conversation: as if you were talking to an editor

The core of Omni lies in natural video editing. The model maintains consistency of characters, environments and physical properties across individual prompts. Official demos show how the model transfers a character from one environment to another, changes the camera angle, or makes a specific object disappear — all without losing scene coherence.

A practical example from the official presentation: you upload a video of a violinist playing in a room into Gemini Omni. You type "move the violinist into the setting from this reference photo". The model does it. Then you add "change the camera angle to a view from behind the violinist's shoulder" and the model edits the scene without losing context.

The model can replace any object in the video with a simple description ("swap the spaceship for a car"), change the visual style ("convert the scene to a retro-futuristic style"), or add sound effects synchronized with on-screen action.

Reference inputs: combine anything with anything

One of the most powerful features of Gemini Omni is working with reference inputs. You can upload an image, video, audio, and text into the model, and it combines them into one consistent output. For example:

Upload a photo of a character and a video with a specific movement — the model transfers the movement onto your character
Upload a sketch and the model converts it into a realistic video, with the drawing serving purely as a motion guide
Combine a reference environment, character, and music track to create a complete scene

This capability primarily targets creative professionals who want to quickly prototype visual ideas, but also regular users who want to bring their photos or drawings to life.

Educational potential: from proteins to the alphabet

Google explicitly demonstrates that Omni can create educational content. In one demo, the model generates an animation of protein folding in the style of stop-motion clay animation — without human hands, scientifically accurate. In another, it creates an alphabet with 26 items (capybara for C, discus for D, lava lamp for L), where each letter gets its own shot with a matching caption.

This opens doors for educational video creators, teachers, and science communicators, who can generate illustrative animations of complex concepts within minutes.

Digital avatars: video with your face and voice

A more controversial yet fascinating feature is the creation of digital avatars. Users can create their digital copy that looks and talks like them. This feature is being piloted on YouTube Shorts, and Google says it approaches it cautiously — the voice and dialogue editing feature is undergoing further testing.

SynthID and transparency: every video will be labeled

All videos created using Gemini Omni will contain a SynthID digital watermark and metadata according to the C2PA Content Credentials standard. The video's origin will be verifiable directly in the Gemini app, in the Chrome browser, or in Google Search. Google is thus building on its AI content transparency strategy — the SynthID technology was recently adopted by OpenAI for its models as well.

Availability and pricing

Gemini Omni Flash is available starting May 20, 2026:

For Google AI Plus, Pro, and Ultra subscribers — globally, including the Czech Republic, through the Gemini app and Google Flow studio
YouTube Shorts and YouTube Create — free for all users, rollout begins this week
API for developers and businesses — planned for June 2026

For Czech users, it's crucial that both the Gemini app and Google Flow support Czech as an input language for text prompts, so you can instruct the model in your native language. Czech localization of the Gemini interface is fully available. Subscription prices start at 549 CZK per month (Google One AI Premium).

Google further announces that image and audio outputs will be added in the future, making Omni a truly universal generative model.

What this means for creators and businesses in Czechia

For Czech creators, marketing agencies, and educational institutions, Gemini Omni represents a potentially game-changing tool — for the first time, a model is available that not only generates video but allows it to be seamlessly edited in conversational mode, without the need for expensive software or advanced technical knowledge.

Although European regulation (the EU AI Act) imposes stricter transparency requirements on generative AI, Google is proactively meeting these requirements with SynthID and C2PA technology. For Czech companies deploying AI into creative workflows, this means the tool is designed from the ground up with an emphasis on transparency and safety — reducing regulatory risks during adoption.

Main competitors: Sora, Veo, Runway

Gemini Omni enters a field where the following already operate:

OpenAI Sora — text-to-video generation, limited editing, priced within ChatGPT Plus ($20/month)
Runway Gen-4 — professional video generation and editing tool, starting at $15/month
Google Veo 3.1 — Omni's predecessor, primarily focused on generation, without advanced editing capabilities

The key differentiator of Omni is the combination of reasoning capabilities with generation. While Sora or Runway generate video based on a text description with limited subsequent editing, Omni remembers the entire conversation history, maintains scene consistency, and applies physical and scientific knowledge.

Is Gemini Omni available for free?

Partly yes. Within YouTube Shorts and YouTube Create, Gemini Omni Flash is available for free to all users. For full use in the Gemini app and Google Flow, you need a Google One AI Premium subscription (in Czechia from 549 CZK/month) or a higher Pro or Ultra plan.

Does Gemini Omni support Czech language?

Yes. Both the Gemini app and Google Flow support Czech for entering text prompts. The Czech Gemini interface is fully localized. The model itself understands Czech input — so you don't need to know English for video generation and editing.

How can I tell a video was created by AI?

All Gemini Omni videos contain an invisible SynthID digital watermark and C2PA metadata. You can verify the video's origin directly in the Gemini app, in the Chrome browser, or in Google Search. Google is also expanding AI content verification tools across the web.

When will Gemini Omni be available via API for businesses?

Google states that API access for developers and enterprises will launch within a few weeks of the announcement, roughly during June 2026.