Skip to main content

Perceptron Mk1: New AI model for video analysis is 80–90% cheaper than GPT-5, Claude and Gemini

Ilustrační obrázek pro jarvis-ai.cz
American startup Perceptron unveiled its flagship model Perceptron Mk1 on Tuesday, May 12, 2026 — a multimodal system for video analysis and spatial reasoning that, according to official benchmarks, outperforms top models from OpenAI, Anthropic, and Google. Surprisingly, however, it's not just its performance, but above all the price: developers will pay 80–90 % less for using the API compared to the competition. For Czech companies and developers, this opens up new possibilities in the areas of automation, robotics, and security systems.

Listen to this article:

A small startup with big ambitions

Perceptron Inc. is a two-year-old startup based in Bellevue, Washington. It was founded by Armen Aghajanyan and Akshat Shrivastava, former researchers at the prestigious Meta FAIR lab. Both authors contributed to the development of the Chameleon and MoMa models — families of multimodal models with so-called early-fusion architecture. It was this experience that enabled them to develop a new "recipe" for understanding the physical world from the ground up in just 16 months.

The company was officially established in November 2024 and from the outset set out to create an artificial intelligence that can not only recognize objects in images, but truly understand causal relationships, object dynamics, and the laws of physics — an area that Perceptron refers to as "physical AI".

Benchmarks: Where Mk1 dominates

Perceptron Mk1 was tested on a range of specialized benchmarks focused on spatial and temporal reasoning. The results speak for themselves:

  • EmbSpatialBench: 85.1 points — better than Google Robotics-ER 1.5 (78.4) and Alibaba Q3.5-27B (approximately 84.5).
  • RefSpatialBench: 72.4 points — a huge lead over GPT-5m (9.0) and Claude Sonnet 4.5 (2.2).
  • EgoSchema (Hard Subset): 41.4 points — on par with Alibaba Q3.5-27B and significantly above Gemini 3.1 Flash-Lite (25.0).
  • VSI-Bench: 88.5 points — the highest recorded score among the compared models.

On so-called "Efficiency Frontier" graphs, which plot the relationship between performance and price, Mk1 finds itself in a unique position: it achieves results comparable to top models GPT-5 and Gemini 3.1 Pro, but with a price profile closer to cheaper "Lite" variants.

A price tag that changes the rules of the game

Perceptron Mk1 costs 0.15 USD per million input tokens and 1.50 USD per million output tokens. At an average blended cost, we arrive at approximately 0.30 USD per million tokens.

For comparison: according to Perceptron's data, the blended cost for GPT-5 is around 2.00 USD and for Gemini 3.1 Pro approximately 3.00 USD per million tokens. This means that Mk1 is truly eight to ten times cheaper.

This aggressive pricing strategy is no accident — the startup deliberately set it so that advanced video AI would be accessible for large-scale industrial deployment, not just for experiments in research labs. For Czech companies, this means that real-time video analytics is becoming economically viable even for medium-sized businesses.

Why Mk1 understands video better than others

The technical core of the model is the ability to process native video at a speed of up to 2 frames per second (FPS) in a context window of 32,000 tokens. This enables analysis of long video sequences without loss of continuity.

Unlike traditional vision-language models (VLM), which often process video as a sequence of unrelated static images, Mk1 is designed for temporal continuity. The model can track objects even through occlusion (covering), maintain their identity over time, and return structured timestamps to specific events in the stream.

Perceptron places special emphasis on "Physical Reasoning". For example, the model can analyze a basketball scene and determine whether a shot was made before or after time expired — based on joint reasoning about the ball's position in the air and the state of the clock. It can also point to and count hundreds of objects with pixel precision in dense scenes, or read analog gauges and clocks that digital systems have historically handled poorly.

Developer platform and open Isaac models

In addition to the API, Perceptron is launching an expanded developer platform with a Python SDK. It offers specialized functions such as "Focus" (automatic zoom and crop of an area based on a text prompt), "Counting" (counting objects in clusters), and "In-Context Learning" (adaptation using a few examples).

The company also maintains a dual licensing strategy. The flagship Mk1 is a closed model accessible via API, intended for enterprise deployment. Alongside it, however, exists the Isaac series — an open-weights alternative. The Isaac 0.2-2b-preview model (2 billion parameters) is available on Hugging Face and optimized for sub-200ms latency, making it ideal for edge devices and real-time applications.

Practical use and Czech context

Perceptron has already announced several partner deployments: automatic clipping of sports highlights, teleoperation of robotic arms, multimodal quality control on production lines, and assistants for smart glasses.

For Czech users and companies, it is key that the model is available globally via public demo and API — it is therefore not geographically restricted. Given that this is video analysis, Czech companies should consider the EU AI Act, which places strict requirements on biometric identification and real-time surveillance. However, ordinary analysis of production processes, sports recordings, or social media content falls into a less regulated category.

The model does not yet support Czech as a primary language, but because communication takes place via API in English, Czech developers can work with it without any problems. The question remains how it will perform with Czech text in images — that will have to be verified in practice.

Where Perceptron is heading

Aghajanyan stated that these releases are the culmination of research whose goal is to make AI as good as possible in the physical world. The vision is clear: "physical AI" should be as common as digital AI.

With a price tag that undercuts existing market standards and benchmarks that place it among the absolute top, Perceptron Mk1 could be a breakthrough not only for large technology companies, but also for Czech startups and industrial enterprises looking for affordable AI for video analysis.

Can Perceptron Mk1 work in real time on standard hardware?

Mk1 itself is a cloud model accessible via API, so local hardware requirements are minimal. For edge deployment without an internet connection, the open Isaac 0.2-2b-preview model is intended, which is optimized for sub-200ms response on local devices.

What about personal data protection when analyzing video using Mk1?

Perceptron offers the closed enterprise model Mk1 via API, but has not yet published detailed information about data processing, including GDPR compliance. Companies should therefore verify exactly where data processing takes place and whether it meets EU personal data protection requirements before deployment.

Does Perceptron plan support for Czech and other smaller languages?

The company has not yet officially announced a roadmap for language localization. Given its focus on physical AI and video analysis, the primary communication language is English. The ability to recognize text in images in Czech will depend on training data and will only be shown in practice.

X

Don't miss out!

Subscribe for the latest news and updates.