Google's New Strategy: From Text to Multimodality
For many years, the primary source for training large language models (LLMs) was textual content from the internet — Wikipedia, blogs, digital books. However, in 2026, the era of purely text-based models is a thing of the past. The current trend is moving toward multimodal models that can simultaneously understand text, images, sound, and video. For Google to push its models, such as Gemini, to the level of human perception, it needs vast amounts of real-world interactions.
According to information from The Verge, Google will focus on three key areas:
- Google Lens: Photos you take using visual search help the model better understand visual context and relationships between objects in the real world.
- Search Live audio: Voice interactions during search provide data on intonation, speech speed, and nuances of human speech.
- Google Translate: Audio recordings from translations are invaluable for training models in simultaneous translation and dialect recognition.
Comparison with Competitors: How Do Others Do It?
Google is not the only player trying to dominate multimodal training. OpenAI uses similar strategies for its GPT-4o models, relying heavily on partnerships with platforms like Reddit or Shutterstock. Meta, in turn, uses vast amounts of data from Facebook and Instagram to train its Llama models. The difference, however, lies in integration: Google has a unique advantage in that it controls the entire ecosystem — from the Android operating system through the search engine to apps for translation and image recognition. This gives it access to data that is almost unavailable to competitors.
Technical Context: Why Are Audio and Images So Important?
When comparing model performance, these data play a key role in benchmarks. While text-based models are evaluated on logic and knowledge, multimodal models are tested on their ability to "see" and "hear." For example, in Visual Question Answering (VQA) tasks, the model must not only identify an object in a Lens photo but also understand its function. In the audio domain, it involves speech-to-text and subsequent contextual understanding, which is critical for future generations of AI assistants that will communicate more naturally than ever before.
Privacy at Risk? The EU and Czech Republic Perspective
For Czech users, the most important aspect is the regulation of GDPR (General Data Protection Regulation) and the newly implemented EU AI Act. Unlike in the US, where access to data is often very loose, in the European Union, Google must demonstrate that it has a legal basis for processing this data.
What does this mean for you?
- Availability in Czech: All these services (Lens, Translate, Gemini) work very well in Czech. This means that even your specific Czech interactions — the way we speak or what objects look like in our environment — will be part of the training sets.
- Opt-out option: Google should allow users to set whether their data can be used for AI development. However, it is necessary to actively check these choices within your Google Account (Privacy & Personalization).
- Legal protection: If Google were to violate EU AI Act rules, it could face enormous fines. This is our European advantage compared to the global market.
In terms of cost, these services are still free for regular users in the Czech Republic (in their basic versions), but the "payment" for this free model is precisely your data. If you wanted to use advanced versions (e.g., Gemini Advanced), be prepared for a subscription in the range of hundreds of crowns per month, which however usually also includes a higher level of data protection for business clients.
Practical Impact: What Should You Do as a User?
If you value your privacy, we recommend not hesitating to adjust your Google Account settings. Check the "Data & Privacy" section. You can limit the storage of search history and voice recordings there. However, remember that the less data you provide, the less personalized and accurate a service the AI will be able to offer you. It is a constant compromise between user convenience and identity protection.
Can I completely disable the use of my Google Lens photos for AI training?
Yes, in your Google Account's privacy settings, you can manage what data is stored and used. However, it is important to distinguish between storing a photo for your personal use and using it to improve services. Google must offer control options in compliance with EU regulations.
Will my voice recordings from search be anonymized?
Google states that when training models, it uses processes designed to remove identifiable personal data. However, in the field of AI, there is always a risk of so-called "re-identification" through unique speech patterns, which is why it is important to monitor updates in the privacy documentation.
Is this change relevant for businesses in the Czech Republic as well?
Yes, especially if your company uses Google Workspace. Businesses often have stricter privacy settings and different data processing terms than regular users, which is crucial for complying with corporate security policies.