Skip to main content

Google is changing how we will use AI: New Interactions API positions Gemini as an autonomous agent

Ilustrační obrázek
Google has officially launched the Interactions API into General Availability. This technological enhancement represents a fundamental shift in how developers interact with Gemini family models. Instead of merely sending queries and waiting for a response, we can now encounter full-fledged agents that possess their own memory, can work in the background, and utilize tools like Google Search or Maps to solve complex tasks.

Until now, we were accustomed to interacting with AI in a "query-response" format. You write a prompt, the model generates text, and the interaction ends. With the advent of the Interactions API, this model is changing. Google is striving to create an ecosystem where AI is not just a tool, but an autonomous partner that remembers context, performs calculations in an isolated environment, and can work on complex tasks even when the user isn't expecting an immediate response from you.

From Text Model to Autonomous Agent

The main difference between the old way of calling models (known as generateContent) and the new Interactions API is state management. In previous versions, the developer had to manually store the conversation history and send it back to the model with each subsequent query so that it "knew" what was being discussed. This was demanding on both computational power and data transfer.

The Interactions API introduces server-side state. This means that Google manages the interaction history itself using a unique identifier previous_interaction_id. For developers, this means simpler code, and for the end-user, a smoother experience where the AI truly "understands" the context of long-term collaboration.

Key technical parameters of the new interface:

  • Managed Agents: Google provides remote Linux sandboxes. An agent can write code, run it, analyze the result, and only then present it to you.
  • Background Execution: Thanks to the background=true parameter, you can assign a task to an agent (for example, "analyze these ten PDF documents and create a presentation from them") and close the application. The agent will complete the work on Google's server, and you will access the result later.
  • Tool Mixing: With a single call, you can combine the model's capabilities with tools like Google Search or Google Maps, allowing the agent to draw from the real world.

Comparison: Gemini Interactions API vs. Competition

If we wanted to compare this approach with other players on the market, the closest is OpenAI Assistants API. Both systems address a similar problem – how to maintain context and provide tools to the model. However, Google, thanks to its integration into its own ecosystem (Search, Maps, Workspace), has an advantage in the depth of data that its agents can access.

Feature Gemini Interactions API OpenAI Assistants API
State Management Fully server-side (Google managed) Server-side (OpenAI managed)
Ecosystem Deep Google Search/Maps integration Primarily third-party tools
Background Execution Native asynchronous support Requires polling/webhooks

Digital Marketing in the Era of "Machine Relations"

This technological shift has a huge impact on how companies will search for information in the future. The concept of Machine Relations is becoming key. If people stop searching for information directly on websites and start delegating searches to agents (e.g., "Find me the best office in Prague with available parking"), traditional SEO will no longer be sufficient.

For Czech companies, this means that their website must be not only user-friendly but, above all, machine-readable. An agent using the Interactions API will not browse pages like a human but will look for structured data, clear citations, and verifiable information that it can immediately extract into its workflow. If your websites are an unreadable chaos for agents, you will cease to exist in their results.

Price and Availability for the Czech Market

For developers in the Czech Republic, the Interactions API is available through the Google AI Studio and Vertex AI platforms. Google typically offers several tiers:

  • Free Tier: For experimentation and smaller projects (with limits on the number of requests per minute).
  • Pay-as-you-go: Payment for actual usage (number of tokens, interaction length, agent runtime in the sandbox). Prices vary by model (e.g., Gemini 1.5 Flash is significantly cheaper than Gemini Pro).

From a regulatory perspective, it is important to mention that Google must comply with the rules of the EU AI Act when implementing these agents. Autonomous agents that can make decisions or work with user data in the background are subject to strict transparency and security requirements, which guarantees European companies and developers that the tools will be in compliance with local legislation.

Czech Language Availability: Gemini models have excellent support for the Czech language. The Interactions API allows conversations and tasks to be conducted in Czech, with the agent capable of working with Czech information sources on the web.

Do I have to pay for each interaction separately?

No, you pay for resource usage. This includes the number of processed tokens (text/image) and, if applicable, the time the agent spent in the sandbox environment performing tasks.

Is the Interactions API safe for sensitive company data?

Yes, especially when used via Vertex AI, where Google guarantees that your data is not used to train their foundational models, which is crucial for compliance with GDPR and EU regulations.

Can an agent directly modify files on my computer using this API?

Not directly on your drive. The agent works in an isolated, secure cloud sandbox. However, you can upload files to it, which it will modify and then provide to you for download.

X

Don't miss out!

Subscribe for the latest news and updates.