Google Launches AI Edge Gallery for Mac: Gemma Models Will Run Locally Even on 16GB RAM

June 5, 2026 Daniel Cesak

On June 3, 2026, Google launched the AI Edge Gallery app for macOS — a tool that lets you run large language models from the Gemma family directly on your Mac, without an internet connection and without sending data to the cloud. Along with the app came a new model, Gemma 4 12B, which is multimodal, handles text, image, and audio, and runs even on laptops with just 16 GB RAM. With this, Google is sending a clear signal: the future of AI isn't just in the cloud, but also locally on your device.

Listen to this article:

What is Google AI Edge Gallery and why you should care

Google AI Edge Gallery is no newcomer to the market. It has been available on Android and iOS for several months and has earned over 23 thousand stars on GitHub. But now it's heading to macOS for the first time — and that brings several interesting implications.

It's an app that serves as a hub for running Google's open language models locally. Unlike well-known tools like Ollama or LM Studio, which let you download and run virtually any model from Hugging Face, AI Edge Gallery is limited exclusively to Gemma family models. Google has opted for a curated approach — it gives you five models and guarantees they will run well on your hardware.

Current model lineup in the macOS version:

Gemma-4-12B-it — new multimodal model, the star of the lineup
Gemma-4-E2B-it — smaller variant with 2 billion parameters
Gemma-4-E4B-it — middle ground with 4 billion parameters
Gemma-3n-E2B-it — previous generation, 2 billion parameters
Gemma-3n-E4B-it — previous generation, 4 billion parameters

All models are in the "it" (instruction-tuned) variant, meaning they are optimized for following instructions, not just for text completion. This is crucial for the average user — the models respond to queries similarly to ChatGPT, rather than as "smart autocomplete" like older GPT models.

Gemma 4 12B — the biggest draw of the whole package

The Gemma 4 12B model is what makes the entire macOS launch a truly significant event. It's a 12-billion-parameter model that, according to Google, achieves performance comparable to a much larger 26-billion-parameter model based on a mixture-of-experts architecture. And yet it fits into a consumer laptop with 16 GB RAM.

What does that mean in practice? All modern Macs with Apple Silicon (M1 and newer) have at least 16 GB RAM — with the exception of the MacBook Neo, as AppleInsider pointed out. So if you own a MacBook Air, MacBook Pro, iMac, or Mac mini from the last five years, Gemma 4 12B will run on it.

Moreover, the model is multimodal — it supports text, image, and audio inputs. Google also highlights solid programming capabilities, so the model can analyze data, write scripts, or help with code without anything leaving your computer.

Architecturally, it's an encoder-free model, meaning it processes images and audio directly, without a separate encoder. The result is lower memory requirements and faster response times — two factors that determine usability for local models.

Comparison with the competition: Ollama, LM Studio, and cloud alternatives

If you already use local models on your Mac, you probably know Ollama (an open-source terminal tool) or LM Studio (a GUI app with a nice interface). Both tools have one key advantage: they let you download and run thousands of models from Hugging Face from various vendors — from Meta Llama to Microsoft Phi to China's DeepSeek.

Against them, Google AI Edge Gallery offers:

Simplicity — no complicated setup, download and go
Guaranteed compatibility — models are tested directly by Google for Apple Silicon
Integration with the Google ecosystem — connection to other tools like AI Edge Eloquent
Limited selection — only Gemma models, nothing else

That last point is a double-edged sword. For some, it's an advantage (you don't have to research which model is best), for others, a fundamental limitation. If you want to experiment with models from different vendors, Ollama or LM Studio remain the better choice.

Compared to cloud models (ChatGPT, Claude, Gemini in the cloud), local models still lag behind in performance. But the advantages of complete privacy, zero latency (no waiting for a server), and independence from the internet are decisive for many users. Working with sensitive corporate data? A local model is the clear choice.

AI Edge Eloquent — a smart dictation tool that stays on your Mac

Alongside AI Edge Gallery, Google AI Edge Eloquent also arrived on macOS. It's a dictation tool that transcribes speech to text, removes filler words, corrects grammar, and improves readability — all locally on the device.

Eloquent has been on iOS for several months, and the macOS version brings several improvements. The key feature is Voice Edit — you select any text in any app and use your voice to rewrite, summarize, or translate it. Everything runs via a system-wide keyboard shortcut.

Users can customize the writing style and add a custom dictionary with names, technical terms, or corporate jargon. This solves the classic problem of dictation apps that struggle with specific terminology.

The app is free, but currently only in English. Google promises support for additional languages, though Czech has not been officially mentioned yet. For Czech users, this means Eloquent is currently usable mainly when working in English.

Why local AI makes sense — and for whom

The shift toward local AI processing is not just a marketing trend. It has several concrete advantages that will be appreciated especially by:

Companies working with sensitive data — law firms, healthcare, the financial sector. Data never leaves company devices.
Developers — the ability to test models offline, without API limits and without monthly fees.
Travelers and field workers — an AI assistant that works even without an internet connection.
Privacy-conscious users — conversations with the model stay only on your disk.

With this move, Google is responding to the broader trend of decentralized AI. While OpenAI and Anthropic are betting primarily on the cloud, Google — much like Apple with Apple Intelligence — is investing in a hybrid approach: part of the AI runs in the cloud, part locally. AI Edge Gallery is a showcase of what can be run directly on your Mac today.

How to download and run AI Edge Gallery

The app is available for free download from the Google AI Edge GitHub repository (Releases section). It's also available via the official Google AI Edge page. Installation is straightforward — drag the downloaded .dmg file to Applications and launch it.

System requirements:

macOS 14 (Sonoma) or newer
Apple Silicon (M1/M2/M3/M4/M5) — the app uses Neural Engine acceleration
Minimum 16 GB RAM for Gemma 4 12B; smaller models also work on 8 GB
Internet is required for installation; the actual model runtime then works offline

For Czech users, it's important to note that Gemma models handle Czech to some extent — Gemma 4 was trained on multilingual data including Slavic languages. However, the quality of responses in Czech will be lower than in English, which is standard for models of this size.

The entire project's source code is open-source under the Apache 2.0 license, meaning developers can study, modify, and contribute to the application. The community around the project is active — GitHub shows over 400 commits and hundreds of discussions.

What this means for the future of AI on the desktop

The launch of AI Edge Gallery on macOS is further proof that local AI is becoming mainstream. Just a year ago, running an LLM on a regular laptop was the domain of enthusiasts. Today, Google is releasing an official app that handles it with a single click.

Compared to cloud solutions, local models still have their limits — 12 billion parameters cannot compete with models with trillions of parameters running in data centers. But for many practical tasks — text summarization, translation, basic data analysis, coding assistance — it's already sufficient today.

From a European perspective, local AI is also interesting with regard to regulation. The EU AI Act emphasizes transparency and data protection. Models running purely on the user's device fall into a different category than cloud services, which may be attractive for companies in the EU.

Can I use AI Edge Gallery on an older Intel-based Mac?

Officially, Google only states support for Apple Silicon (M1 and newer). Older Macs with Intel processors are not supported because the app uses Neural Engine hardware acceleration, which Intel Macs lack. If you have an Intel Mac, you can run local models through alternatives like Ollama or LM Studio — but performance will be significantly lower.

How does Gemma 4 12B compare to ChatGPT or Gemini in the cloud?

Gemma 4 12B is a significantly smaller model than cloud variants like GPT-4 or Gemini Pro. In tests of general comprehension and logical reasoning, it lags behind the cloud giants. Its strength lies elsewhere: absolute privacy, zero latency, and offline operation. For tasks like text summarization, translation, basic programming, or analyzing smaller documents, it is fully adequate. For complex tasks such as legal analysis or advanced mathematics, reaching for a cloud model is still the better choice.

Will Google AI Edge Eloquent ever support Czech?

Google has not yet officially announced a timeline for adding languages to AI Edge Eloquent. Given that it's a relatively new product and Google historically adds languages gradually (usually starting with major world languages), Czech will likely not arrive in the coming months. For dictating in Czech on a Mac, you can use the built-in macOS dictation, which supports Czech, albeit without AI text editing.