Skip to main content

RoboGate: A Test of the Seven Best AI Models for Robots Ended in Fiasco. Even Nvidia GR00T Failed

AI chip circuit board illustration
When an AI robot masters a simulation at 97%, should we trust it? Korean startup AgentAI says: absolutely not. Its research on the RoboGate platform revealed a shocking gap between the virtual and real world — and Nvidia took notice.

Listen to this article:

Simulation vs. reality: a gap that could cost someone their life

Imagine a robot that picks up objects in a simulation with a 97.65% success rate. Looks great. Except that same robot, with the same model and the same checkpoint, fails in 100% of cases when tested in an industrial environment. That is exactly what the RoboGate platform revealed — developed by South Korean company AgentAI (KOSDAQ: 060900) together with its subsidiary AgentAI Labs.

The research team tested seven of the most advanced VLA models (Vision-Language-Action models) — that is, models that combine vision, language understanding, and physical action — including Nvidia GR00T N1.6, PI0, OpenVLA from Stanford, and SmolVLA. The result? In 68 industrial scenarios simulated on the NVIDIA Isaac Sim 5.1 platform, not a single model achieved even one success.

What is RoboGate and why it matters

RoboGate is an open-source platform (MIT license) that functions as a "driving test for AI robots" — a checkpoint every physical AI system must pass through before deployment into live operation. And this is not a theoretical concept: the platform already runs on NVIDIA Isaac Sim 5.1 with the Newton physics engine and, in about 14 hours on a single RTX 4090 graphics card, can carry out 50,000 experiments across four types of robotic arms (Franka Panda, UR5e, UR3e, UR10e).

The testing scenarios cover eight dimensions: friction, object mass, center-of-mass shift, object size, inverse kinematics noise, number of obstacles, environment geometry, and object placement. The platform uses two-phase adaptive sampling, which in the first round targets the entire parameter space (20,000 experiments) and in the second round specifically tests the "transition zone" with a 30–70% probability of failure (10,000 experiments).

To give an idea of the test's rigor: the Nvidia GR00T N1.6 model achieved 97.65% on the standard LIBERO benchmark, but 0% in RoboGate's industrial scenarios. The 97.65-percentage-point difference between two simulators reveals a fundamental problem: benchmarks built on simpler simulators (like MuJoCo) say nothing about a robot's real safety in actual operation.

Humanoid robots are even worse off

AgentAI also tested humanoid robots controlled by Nvidia GR00T models in 240 safety scenarios. The result? Just 16.2% success rate. And even more alarming — in the "human proximity safety" category, the success rate was 0 out of 100. The platform's verdict: NOT_READY.

The researchers identified four "universal danger zones" — parameter configurations where the success rate drops below 40% across all robot types. These involve situations with high friction, non-standard object geometry, multiple obstacles, and extreme center-of-mass shift.

Nvidia chose RoboGate for Inception — and perhaps beyond

It was precisely this data that caught Nvidia's attention to the point that the RoboGate platform was officially admitted into the Nvidia Inception program on June 8, 2026 — the global incubation program for startups in AI, robotics, and semiconductors. And that is not the only form of collaboration. AgentAI Labs has already contributed code to the open-source repository Isaac Lab (PR #506) and is currently undergoing NVIDIA ISV (Independent Software Vendor) certification.

The most interesting part, however, is the planned integration of RoboGate as an "Evaluator Plugin" into the NVIDIA Physical AI Data Factory Blueprint architecture. The metaphor AgentAI itself uses is fitting: "If Nvidia is building the highway on which AI robots travel, RoboGate plays the role of the checkpoint responsible for safety on that road."

Why it matters in Europe too

The topic of physical AI safety is not just an academic matter. The European Union, under the EU AI Act, classifies robotic systems as high-risk — and manufacturers will have to demonstrate their safety before placing them on the market. Platforms like RoboGate could become a standardized tool for this certification.

The Czech Republic has a strong tradition in robotics — from industrial robots to research centers at ČVUT and VUT to a growing number of companies integrating AI into production lines. A tool that can systematically uncover weaknesses of AI-controlled robots before their deployment will be just as relevant for Czech industry as it is for Korea's.

Moreover, RoboGate is open-source and the data is publicly available on HuggingFace and GitHub, opening up the possibility of independent testing by European research institutions.

What comes next

AgentAI plans to expand the testing leaderboard to 10+ VLA models, launch a FaaS (Failure-as-a-Service) subscription for access to failure data, and build custom safety scenarios for individual industrial sectors. The company is also developing global partnerships — for example, it recently completed infrastructure for robotaxis in Austin, Texas.

For Czech companies, researchers, and regulators, the RoboGate case is a clear signal: the era when it was enough to say "our robot succeeded in simulation" is coming to an end. Physical AI safety will require systematic, standardized, and reproducible testing. And that is exactly what RoboGate specializes in.

What is a VLA model and why does it matter?

VLA (Vision-Language-Action) is a type of AI model that combines three capabilities: vision (understands images from cameras), language understanding (comprehends natural language instructions), and physical action (controls the robotic body). This allows a robot to receive verbal commands like "move the red box to the left conveyor belt" and execute them — without manually programming every movement. VLA models are key to the future of general-purpose robots.

Is RoboGate available for Czech companies and researchers?

Yes. RoboGate is an open-source platform under the MIT license, meaning anyone can download, use, and modify it for free. The code is on GitHub and the testing datasets are on HuggingFace. To run it, you need a computer with an NVIDIA GPU (RTX 4090 recommended) and NVIDIA Isaac Sim installed. The platform is in English, but the documentation is technical and open to community contributions.

How does RoboGate relate to the EU AI Act?

The EU AI Act classifies AI systems for robotics as high-risk, meaning there is an obligation to prove safety before placing them on the market. Standardized testing platforms like RoboGate could become part of the certification process — similar to crash tests for cars today. However, no official link between RoboGate and EU legislation exists yet.

X

Don't miss out!

Subscribe for the latest news and updates.