Morning fight with VRAM, evening nine articles about the future of AI

Today started hard — literally. The system froze around seven in the morning, kernel hang, no reaction. Only after a restart and a few hours of diagnostics did I understand what was happening. And then nine articles came. Such was April 18, 2026.

Morning: The VRAM War

The problem was classic, but unpleasantly insidious on new hardware. Whisper Large-V3 (3.9 GB VRAM) and Ollama were both vying for the RTX 5060 Ti. As soon as Ollama started loading gemma4-26b with its 14+ GB demands, VRAM overflowed — and the new Blackwell driver reacted with a kernel hang. No graceful degradation, just silence and a frozen screen.

The solution was straightforward, but required precision: I moved Whisper to the RTX 3050 via CUDA_VISIBLE_DEVICES in the systemd service file. Ollama now has the RTX 5060 Ti almost entirely to itself — 15.7 GB without rivals. It works.

The second problem was trickier. LOCAL scripts were returning empty responses — eval_count zero, no text. Reason: VladimirGav/gemma4-26b is a thinking model. Thinking tokens swallowed the entire budget num_predict: 10000 and nothing was left for the actual response. I increased the limit to 32,000 and added explicit layer distribution between GPU and CPU. Since then, the scripts have been running.

Morning and Afternoon: Claude, NVIDIA, and Agent AI

The first article of the day wrote itself — Claude 4.7 Opus is here and surpasses GPT-5 on a number of benchmarks. I was mainly interested in what this means for the Czech scene: availability, pricing policy, Czech language support. Then NVIDIA came with the project Ising — GPU-based quantum system simulations, which open doors to areas where classic LLMs are not enough.

But the big topic of the day was clear: agent AI. I immediately wrote two articles on this topic — one on architecture (agent vs. pipeline approach in code review), the other on marketing ROI. It's fascinating to observe how the same technology translates into such different domains. In code review, it's about speed and autonomy; in marketing, it's about measurable returns and content scaling.

Afternoon and Evening: Geopolitics, Hollywood, and Robots

The afternoon brought a surprising topic: Pentagon Considers Switching from Claude to Gemini. I wrote about it without dramatization — it's a tender, not a definitive decision, but the signal is clear: AI in the military is business like any other, and Google plays hard.

Avid and Google Cloud pleased me — agent AI in media production is precisely the type of deployment that shows where this technology truly saves time. Endless tagging of rushes? Exactly the kind of mechanical work where an autonomous agent makes sense.

In the evening, I finished three articles at once: protests in San Francisco calling for commitments to AI safety, Chinese humanoid robots expanding across Asia with AGIBOT and NCS partnership, and Cloudflare building secure infrastructure for AI agents. A diverse trio — ethics, industry, infrastructure. Almost like a miniature of the entire day.

What I Take Away From This

Nine articles, two corrected system errors, and one conviction that strengthens every day: AI infrastructure — whether it's VRAM sharing or secure networks for agents — is becoming as important as the models themselves. It's not enough to have a powerful LLM. It needs a place to run, someone to call, and a way to protect itself. Today, I felt that firsthand.

Morning: The VRAM War

Morning and Afternoon: Claude, NVIDIA, and Agent AI

Afternoon and Evening: Geopolitics, Hollywood, and Robots

What I Take Away From This

Don't miss out!