For decades, we have been accustomed to so-called "turn-by-turn" translation: someone says something, the system waits, processes the text, and then reads it in another language. However, this process always creates unnatural pauses that destroy the dynamics of conversation. With the arrival of the Gemini 3.5 Live Translate model, this barrier is definitively erased. Google now offers technology that functions more like a human simultaneous interpreter than a classical translation software.
Technological Breakthrough: How Simultaneous Processing Works
The main difference compared to previous generations is the way the model works with the data stream. Gemini 3.5 Live Translate does not use the traditional route (speech → text → translation → speech), which is time-consuming. Instead, it processes audio in real time while the speaker is still talking. The model must constantly balance two opposing parameters: the need to obtain enough context for accurate translation and the need for immediate output to stay synchronized with the speaker.
The result is latency in the range of mere seconds. This enables a dialogue that feels organic. Moreover, the model can preserve emotional charge. If the speaker expresses enthusiasm or seriousness through their intonation, the model attempts to replicate these nuances in the translation, which is a crucial difference from the robotic voices that were the standard until now.
Comparison with the Competition: Gemini vs. GPT-4o and ElevenLabs
In the field of multimodal models, Google's biggest rival today is OpenAI with its GPT-4o model. While GPT-4o excels in interactive voice mode, where the AI acts as a conversation partner, Gemini 3.5 Live Translate specifically focuses on translation integrity and integration into existing ecosystems such as Google Meet. Google is betting here on the breadth of its distribution channel.
When it comes to the quality of synthetic voice, Gemini now directly competes with specialized services such as ElevenLabs. While ElevenLabs is the king in generating voices from text (text-to-speech), Google is trying to dominate the speech-to-speech segment, where the goal is not only to create a beautiful voice but to preserve the identity of the original speaker during translation. In benchmark tests that indicate the new capabilities of the model, Gemini 3.5 shows significantly lower latency in complex, multilingual conversations compared to older versions of Gemini Pro.
Practical Use: From Travel to Global Business
The deployment possibilities are broad, and Google has already begun implementing them into its key products:
- Google Meet for businesses: For the corporate sphere, this means the end of language barriers during video conferences. Google plans to expand supported language pairs to more than 2000 combinations, enabling smooth meetings even in less common languages.
- Google Translate for travelers: The mobile app gets a new "speaker listening mode." Users in noisy environments (e.g., at an airport or in a restaurant) can receive translations discreetly through the phone's speaker without needing to wear headphones.
- Developers and API: Through the Gemini Live API, developers can integrate these capabilities into their own applications, for example into systems for online education or real-time customer support.
One of the first major testers is the platform Grab, which uses the model for communication between drivers and passengers, confirming the model's stability even in demanding, noisy conditions.
Availability in the Czech Republic and European Context
For Czech users, the key question is Czech language support. The Gemini 3.5 Live Translate model supports over 70 languages. Given that Czech is among the major European languages and Google has long made it a priority in its services, full integration can be expected within the Google Translate app as well as within Workspace (Meet).
From a regulatory perspective, it is important to mention that Google has implemented SynthID technology. This is a digital watermark that is imperceptible to the human ear in the audio track but enables identification that the content was AI-generated. In the context of the strict European EU AI Act regulation, this is a crucial step toward preventing disinformation and the misuse of technologies for creating deepfake voices.
Pricing Policy
Model availability varies by usage:
- Regular users: Features in Google Translate will be available for free (standard model).
- Businesses (Google Workspace): Integration into Google Meet will be part of the subscription for enterprise customers (prices vary by Workspace plan, typically starting at around 10–20 USD/month per user).
- Developers: Access through Google AI Studio is available for free during the "Public Preview" with certain limits, followed by pay-as-you-go pricing based on the number of tokens/minutes.
Is Czech speech fully supported for simultaneous translation?
Yes, Google confirms support for over 70 languages. Given Google's global strategy, Czech is among them, although the exact date of full deployment for all features within Meet may depend on the regional roll-out phase.
What are the main security risks with this model?
The main risk is the potential misuse for creating realistic voice clones (vishing). Google counters this using SynthID technology, which invisibly marks AI origin in audio files, helping comply with EU AI Act standards.
Can the model work without an internet connection?
No, Gemini 3.5 Live Translate is a cloud-based model requiring data streaming for real-time processing. An internet connection is necessary for full functionality.