Skip to main content

Qwen3.7-Plus: Alibaba Introduces an Autonomous Agent That Can "See" and Control Your Computer

Ilustrační obrázek
Alibaba Cloud has just entered the most intense part of the technological race for the future of artificial intelligence. Their new model Qwen3.7-Plus isn't just another improvement to a language model; it's an attempt to create a truly autonomous agent. Unlike typical chatbots that only generate text, Qwen3.7-Plus can interpret visual screen content, control the mouse and keyboard within graphical user interfaces (GUI), and simultaneously execute commands in the terminal. This hybrid approach puts Alibaba head-to-head with market leaders like OpenAI and Anthropic.

Listen to this article:

From text to action: What makes Qwen3.7-Plus different?

Most current models operate on a "question–answer" principle. Even advanced models that can write code still require a human to copy that code, run it, and report back to the model if there's an error. Qwen3.7-Plus closes this loop. The model is built on an architecture that integrates visual perception directly into the agent's decision-making loop.

This means the model doesn't just work with text tokens, but "sees" screenshots of applications, graphical elements, and the structure of web pages. As a result, it can perform "computer use" tasks, where the AI navigates the browser on its own, clicks buttons, or looks up information in desktop applications. This capability is called GUI grounding — the model's ability to precisely identify which pixel on the screen corresponds to a specific control element.

Extreme autonomy: Programming for 11 hours

One of the most stunning proofs of the new model's capabilities is a demo where the agent worked without human intervention for over 11 hours. The result was a complete English vocabulary learning application. During this process, the model:

  • Compiled requirements documentation.
  • Generated more than 10,000 lines of code.
  • Performed installation, created test scenarios, and independently solved GUI (graphical interface) errors.
This process involved over 1,000 individual agent calls, demonstrating the model's ability to sustain complex planning over a long time horizon.

Benchmarks: Where Qwen wins and where it falls short?

To understand how Qwen3.7-Plus stacks up against giants like GPT-5.4, Claude Opus 4.6, or Gemini 3.1 Pro, we need to look at the hard data. According to available tests, the model excels particularly in interface control.

In the ScreenSpot Pro benchmark, Qwen3.7-Plus achieved a score of 79.0, putting it in direct competition with the best models on the market in GUI automation. This metric is critical for the next generation of assistants that will handle our emails, book flights, or configure cloud servers.

However, the model is not perfect. While it dominates in computer control, it still lags behind models like Gemini 3.1 Pro or GPT-5.4 in purely scientific and complex logical tasks (e.g., the MedXpertQA-MM benchmark). Qwen is therefore currently a top-tier "operator," but is still finding its balance in deep abstract reasoning.

Model GUI Grounding (ScreenSpot Pro) Primary Focus
Qwen3.7-Plus 79.0 Autonomous agents, GUI/CLI control
GPT-5.4 (xhigh) High (comparable) Multimodality, complex reasoning
Claude Opus 4.6 Top-tier in code Programming, nuance in text

Price and availability: Alibaba's strategic advantage

From an economic standpoint, Qwen3.7-Plus is very aggressive. Alibaba Cloud offers this model at a fraction of the cost compared to American competitors. While top-tier models from OpenAI or Anthropic can be extremely expensive for companies deploying at massive scale, Qwen3.7-Plus pricing is set as follows:

  • Input (input tokens): $0.40 per 1 million tokens (approx. 9.20 CZK)
  • Output (output tokens): $2.40 per 1 million tokens (approx. 55.20 CZK)

For comparison, its "bigger brother" Qwen3.7-Max is significantly more expensive ($2.50 for input tokens). This pricing policy allows developers and companies to experiment with agent systems that require thousands of consecutive model calls without meaning immediate financial ruin.

What does this mean for the Czech market and EU companies?

For Czech developers and technology companies, Qwen3.7-Plus brings two key opportunities. First, the model is available via API and supports the Anthropic protocol, meaning easy integration into existing tools. Second, its ability to work with the terminal and GUI can significantly reduce automation costs in IT operations (DevOps) and administration.

However, the EU AI Act regulation must be taken into account. Since this is a model from a Chinese provider, companies in the EU must pay attention to how the model processes data, especially when it comes to sensitive information during user interface control. If you plan to deploy Qwen as an autonomous agent for working with corporate data, we recommend using instances within cloud services that comply with European privacy standards.

Czech language availability: Although Qwen is known for its excellent multilingual capability, testing is necessary for specific agent tasks (e.g., clicking buttons in Czech e-commerce systems). The model understands text in many languages, but its ability to "see" and understand the context of the Czech user environment depends on the quality of visual training.

Conclusion

Qwen3.7-Plus is not just another step in the evolution of LLMs. It's a signal that the era of "mere chatbots" is ending and the era of autonomous workers is beginning. With this move, Alibaba aims to democratize access to advanced agents through highly competitive pricing, which could change the market dynamics for European startups and large corporations alike.

Can Qwen3.7-Plus work directly with my computer and see my private data?

The model itself does not have access to your computer. It must be implemented through an application or extension (e.g., "Qwen for Chrome") that, with your explicit permission, provides it with screenshots or terminal access.

Is this model suitable for developers in the Czech Republic?

Yes, thanks to support for the Anthropic API protocol and very low per-token pricing, it is an excellent choice for automating code testing or creating scripts. However, compliance with EU regulations must be monitored when working with sensitive data.

How does it differ from regular ChatGPT?

ChatGPT is primarily a text/multimodal assistant for interaction. Qwen3.7-Plus is designed as an "agent," meaning its goal is not just to answer, but to perform a sequence of actions in real software (clicking, writing code, and immediately executing it).

X

Don't miss out!

Subscribe for the latest news and updates.