Google OMNI: What We Know About the Leaked Video, Image & Audio Model

May 11, 2026 Daniel Cesak

Google OMNI unified AI video model - article illustration for ai-jarvis.eu

  Days before Google I/O 2026, the first screenshots from the Gemini app leaked, revealing something AI enthusiasts have been waiting for — a model codenamed Google OMNI. This is not just another improvement of existing tools, but a completely new type of model that combines video, image, and audio generation into a single unified system. What do we know about it, how is it different from Veo, and why could it change the short-form video game?

What is Google OMNI?

Google OMNI (also referred to as Gemini Omni Video Generator) is a unified multimodal model that can generate video, static images, and synchronized audio in a single pass. While creators previously had to combine Veo (video), a separate image generator, and an external audio tool, OMNI brings everything together in one prompt.

The model was first spotted in the Gemini app UI by TestingCatalog in early May 2026. Screenshots show a revamped video editor directly within Gemini, capable of generating clips of 5, 8, or 10 seconds in 16:9, 9:16, and 1:1 aspect ratios.

What Capabilities Does OMNI Offer?

According to information from the leaked interface and subsequent analysis by sites like geminiomni.org, OMNI will support:

Text-to-Video — creating video from a text prompt
Image-to-Video — animating a static image
Reference Video/Audio — uploading a sample video or audio as reference
Automatic Audio Generation — the audio track is created together with the video, including beats, ambient sounds, and dialogue
Template Support — pre-built scenarios for product videos, explainers, and social media

One generation cycle takes approximately 30 to 90 seconds depending on length and resolution. This is significantly faster than current tools that require separate generation steps followed by manual editing.

OMNI vs. Veo — What's the Difference?

Veo is Google's existing video generation model. Its latest version, Veo 3, can create minute-long clips at up to 4K resolution, but focuses purely on video. The audio track must be added separately.

OMNI, by contrast, represents a truly unified approach — one model, one prompt, one result containing video, images, and audio. Simply put: Veo is a specialized tool, OMNI is all-in-one.

Another key difference is that OMNI can work with reference video or audio — creators can upload a style sample and the model adapts to it. This was not available in Veo to such an extent.

Timing — UI Leak Ahead of I/O 2026

The leak comes just days before Google I/O 2026, taking place on May 19–20, 2026 in Mountain View. It appears to be no coincidence — Google is apparently testing the model in its late stages and preparing to officially unveil it at the conference.

Speculation suggests OMNI could be part of a broader Gemini 3.1 or even Gemini 3.2 launch, although official confirmation is still lacking. Screenshots from TestingCatalog show a fully functional user interface, suggesting the model is in an advanced stage of development.

Competitive Landscape — Seedance, Sora, and the Chinese Onslaught

OMNI is not being developed in a vacuum. Competition in AI video generation is exceptionally fierce in 2026:

ByteDance Seedance 2.0 — a Chinese competitor that also combines video and audio, available since early 2026
OpenAI Sora — a long-awaited model gradually opening to users, but still lacking full audio synchronization
Alibaba's model — as of May 2026, it leads global benchmarks in realism and motion smoothness
Kling and others — Chinese models rapidly catching up to Western competition

With OMNI, Google is deploying a direct response to Seedance 2.0 and signaling that it does not want to fall behind in unified models.

Availability, Pricing, and Formats

Official pricing has not been announced yet, but speculation points to a credit system similar to Veo or Imagen. According to leaked data from the Gemini interface, pricing could be:

Pro quality (480p): 65 credits/second
Pro quality (720p): 135 credits/second
Fast mode (480p): 50 credits/second
Fast mode (720p): 110 credits/second

Supported resolutions: 480p, 720p, and likely 1080p. Clip lengths: 5, 8, and 10 seconds, ideal for YouTube Shorts, TikTok, Instagram Reels, and product videos.

Impact on the Czech and European Market

For European users, it's important that OMNI will run on Google Cloud infrastructure, which is well available in the EU. Gemini already supports Czech at a high level, and OMNI is expected to understand Czech as well — including prompts in natural language.

In the context of the EU AI Act, Google will need to ensure regulatory compliance, which for video-generating models typically means restrictions on deepfake content and mandatory watermarks (SynthID). This should come as no surprise — Google has already implemented these mechanisms in Veo and Imagen.

For Czech creators, marketers, and small business owners, this means one thing: professional-looking video with audio in a matter of tens of seconds, without the need to edit, fine-tune audio tracks, or combine three different tools.

Conclusion — What to Expect at I/O 2026?

Google OMNI looks like the most significant product announcement in generative video since the launch of Veo. If Google truly unveils a unified model at I/O 2026 that handles video, images, and audio in a single prompt, it could significantly reshape short-form video on social media, e-commerce, and advertising.

The official announcement is expected in less than 10 days — and if the leaks prove accurate, it will be one of the strongest moments of the entire conference.

Will Google OMNI be available for free?

Probably not entirely free. Google typically offers a limited free tier with watermarks and low resolution, while full quality will be part of a Google One AI Premium subscription (roughly €20/month) or a separate credit package.

Will OMNI completely replace Veo?

It doesn't look that way. Veo 3 can create longer clips (up to a minute) and higher resolution (4K). OMNI focuses on short formats (5–10s) and versatility. Coexistence is more likely: Veo for demanding projects, OMNI for quick social media and product videos.

How well does OMNI handle Czech?

Since OMNI is built on Gemini's language foundation, it should understand Czech as well as the text-based Gemini. A prompt in Czech should work — including descriptions of scenes, mood, and sound effects. We'll test it once the model is officially launched.