Skip to main content

Google Launches Gemini 3.1 Flash-Lite: The Fastest Model for Agentic AI Costs $0.25 per Million Tokens

Artificial intelligence brain concept
Google officially launched Gemini 3.1 Flash-Lite yesterday — its fastest and most affordable model in the Gemini 3 lineup to date. After a two-month public preview phase, the model is now generally available on the Gemini Enterprise Agent Platform. With a price of $0.25 per million input tokens and response times under two seconds even under massive load, Google is targeting companies that need to scale AI operations without compromising between speed, quality, and budget.

Listen to this article:

What is Gemini 3.1 Flash-Lite and why it matters

Gemini 3.1 Flash-Lite is a lightweight variant of the Gemini 3.1 Flash model, optimized for ultra-fast responses and high request volumes. While models like Gemini 3.1 Pro or Claude Opus 4.7 excel at complex reasoning and long analytical tasks, Flash-Lite targets a completely different type of deployment — automated pipelines, agent orchestration, request triage, and real-time interactions where every millisecond counts.

Key technical parameters:

  • Context window: up to 1,048,576 input tokens (roughly 750,000 words)
  • Output limit: 65,535 tokens
  • Multimodal input: text, images, audio, video, PDF documents
  • Output: text only
  • Thinking mode: support from minimal to high reasoning level
  • Function calling, grounding (Google Search), code execution: fully supported
  • Availability in the EU: yes, including multi-region eu endpoint

The model runs on both global and European Google Cloud servers, which means European companies' data can remain within the EU — an important detail for compliance with GDPR and the EU AI Act.

A price that changes the calculation: from $0.25 per million tokens

The biggest asset of Flash-Lite is its price. In the base Standard PayGo tier, you pay:

  • $0.25 per 1 million input tokens (text, images, video)
  • $0.50 per 1 million input tokens (audio)
  • $1.50 per 1 million output tokens
  • $0.025 per 1 million tokens when using cache (10× cheaper)

For comparison — Gemini 3.1 Flash (full version) costs $0.50 per million input tokens, while Gemini 3.1 Pro costs $2.50. Flash-Lite is therefore half the price of Flash and one-tenth the price of Pro. In Flex/Batch mode, the price drops even further — $0.125 for input and $0.75 for output. For companies processing millions of requests daily, this represents a fundamental difference in operating costs.

Who has already deployed Flash-Lite and with what results

JetBrains: AI assistant in IDE with real-time response

Developer tools from JetBrains — known to Czech programmers through products like IntelliJ IDEA, PyCharm, and WebStorm — have integrated Flash-Lite into their AI assistant and agent Junie. "The combination of high intelligence and minimal latency makes Flash-Lite the perfect model for real-time developer support," said Vladislav Tankov, AI Director at JetBrains. For Czech developers, this means AI code completion and suggestions in JetBrains tools will be faster and cheaper to operate.

Gladly: 60% lower costs in customer support

The Gladly platform, which provides customer service for large retail brands, built the core of its text-based AI agent on Flash-Lite. While processing millions of conversations per week across SMS, WhatsApp, and Instagram, they achieved:

  • ~60% cost savings compared to comparable "thinking" models
  • p95 latency of 1.8 seconds for full response generation
  • sub-second p95 for classifiers and tool calls
  • ~99.6% success rate under high concurrent load

The model at Gladly manages the entire agent lifecycle — from tool selection and scenario classification to deciding when to hand off a request to a human operator.

Ramp and OffDeal: Finance in real time

Financial platform Ramp uses Flash-Lite for its highest-volume and most latency-sensitive functions. "Gemini 3.1 Flash-Lite powers many of our most heavily used features without compromising on quality," said Anton Biryukov, Applied AI Engineer at Ramp.

Startup OffDeal has deployed Flash-Lite in its "Archie" agent, which investment bankers use during Zoom calls for real-time financial data lookup. According to OffDeal, Flash-Lite was the only model capable of delivering answers fast enough not to slow down the conversation.

Astrocade and Krea.ai: Creativity and gaming industry

Astrocade, a platform enabling game creation through natural language, uses Flash-Lite for multimodal safety checks — before game generation begins, the model analyzes both text and images. It also handles inline translation of comments, allowing players from different countries to collaboratively improve the same game. Krea.ai uses Flash-Lite as a "prompt enhancer" — turning a brief user idea into a detailed prompt for image generation.

What this means for European companies and developers

Gemini 3.1 Flash-Lite is available through Google Cloud Console and Gemini API — all you need is a Google Cloud account and the relevant API enabled. The European multi-region endpoint (eu) ensures that data is processed within the EU, making it easier to meet GDPR requirements.

For European companies, this is an opportunity to deploy AI agents into production at dramatically lower costs than before. Typical scenarios include:

  • Automated customer support — chatbots and voice agents with response times under 2 seconds
  • Email triage and classification — automatic routing of requests to the right departments
  • Real-time software development assistance — code completion in IDEs with minimal latency
  • Content safety moderation — automatic scanning of text and images before publication

While the model does not yet support direct localization for Live API (voice interactions), text communication works without issues — the model understands and responds in multiple languages, including those spoken across the European region, as confirmed by developers from the European region.

What is the difference between Gemini 3.1 Flash and Flash-Lite?

Flash-Lite is optimized for lower latency and lower price (half the input cost of Flash), while the full Flash offers higher "intelligence" for more complex tasks. Flash-Lite's performance roughly matches the level of Gemini 2.5 Flash, but at a fraction of the price. For simple classification, data extraction, or quick responses, Flash-Lite is ideal — for complex analysis and long text generation, the Flash or Pro version is more suitable.

Can I use Gemini 3.1 Flash-Lite for free?

Unlike the Gemini chatbot (gemini.google.com), which is free, Flash-Lite is available only through Google Cloud API as a paid service. However, Google offers introductory credits for new Google Cloud users. At a price of $0.25 per million input tokens, even smaller projects will cost only a few dollars per month — 1 million tokens corresponds to roughly 750,000 words, the volume of several novels.

Is Flash-Lite suitable for deployment in the EU from a GDPR perspective?

Yes, Google Cloud offers a European multi-region endpoint (eu) that ensures data processing within the European Union. The model supports Customer-Managed Encryption Keys (CMEK), VPC Service Controls, and Access Transparency — three key security features that help meet GDPR and EU AI Act requirements. However, we always recommend consulting deployment with your legal team or DPO.

X

Don't miss out!

Subscribe for the latest news and updates.