Skip to main content

The End of the "Demo Effect"? OpenAI Reveals How Expert Agents Learn from Mistakes and Change the World of Finance

Ilustrační obrázek
OpenAI has just presented proof that the era of "demo agents" that look great in the lab but fail in practice is coming to an end. The Tax AI project, developed in collaboration with Thrive Holdings, demonstrates a self-improvement mechanism using expert feedback. Instead of endless prompt tuning, agents now use corrections from human accountants as structured data for their own growth.

Most current AI agents behave like enthusiastic but inattentive interns: they are impressive in short demos, but once real stakes and complex data come into play, their reliability plummets. For the finance and tax industries, this uncertainty is unacceptable. That is why OpenAI's Tax AI project is a crucial step towards creating truly autonomous and reliable systems.

How does the "self-correction loop" work?

The key innovation is not the agent's ability to process tax returns itself, but the way it learns. Traditional software development requires an engineer to find an error, analyze it, and then modify the code or instructions. This process is slow and inefficient. Tax AI changes this model using Codex technology.

In practice, it works like this: an accountant (practitioner) reviews the prepared document. If the agent makes a mistake, the expert corrects it. This correction is not just a "text correction" but serves as a structured signal. Codex then analyzes the agent's decision-making traces (so-called reasoning traces), identifies where a logical error occurred, and engineers then use these corrections to permanently improve the system. This creates a closed loop where every expert correction directly leads to the agent not repeating the mistake next time.

Real-world results: From chaos to precision

Collaboration with a network of over 30 accounting firms within the Crete project shows that this method works. During the pilot period, Tax AI processed 7,000 tax returns (types 1040 and 1041). The results are fascinating:

  • Growth Rate: Initially, only 25% of returns achieved 75% accuracy in key fields. In just six weeks, this proportion increased to 86%.
  • Time Savings: Practitioners saved approximately one third of the time spent preparing returns thanks to the tool.
  • Efficiency: Work throughput increased by 50%, allowing accountants to dedicate more time to clients instead of administration.

From a competitive comparison perspective, it's important to note that while common LLMs (like standard ChatGPT or Claude) can analyze documents, Tax AI uses a specific agent infrastructure built on Codex, which focuses on logical chains and data validation>, not just text generation. This places it above common chatbot models in tasks requiring extreme precision.

Practical Impact: What does this mean for the Czech Republic and Europe?

Although the pilot program was primarily implemented for American tax forms, the technology itself has enormous potential for the Czech market. Czech tax legislation is known for its complexity and frequent changes. For Czech accounting firms, implementing a similar system would mean:

  1. Automation of "dirty" data: Tax AI can work with unstructured PDFs, spreadsheets, images, and client notes. This is precisely the situation that Czech companies face when converting paper documentation into digital systems.
  2. Availability: Currently, Tax AI is not available as a standalone product for end-users in the Czech Republic. OpenAI offers it more as an enterprise solution or via API. For Czech companies, however, it would mean utilizing Codex/GPT-4o models through specialized local software.
  3. Price: While standard ChatGPT Plus costs 20 USD per month, Tax AI-type systems are designed for the B2B segment, and their price will likely be based on the volume of processed documents (tokens/documents) via API, which may be more cost-effective for large companies than hiring additional human operators.

Regulation and Security in the EU

For European entities, compliance with the EU AI Act is a key issue. Tax systems fall into the category of applications that can have a high impact on financial stability or individual decision-making. This means that any agent similar to Tax AI must meet strict requirements for transparency and human oversight (human-in-the-loop). The good news is that the OpenAI Tax AI model is precisely designed so that an expert performs the final check – AI here is not a substitute for responsibility, but a tool for increasing performance.

Conclusion

The Tax AI project shows that the future of AI agents does not lie in "knowing everything," but in how quickly they can learn from human corrections. If this model expands, the line between software and an intelligent colleague will definitively blur. For the Czech accounting sector, this could be a way to alleviate administrative burden, provided local regulations are successfully integrated into the system.

Can Tax AI prepare a Czech tax return?

In its current phase, the system is optimized for American forms (1040, 1041). However, for use in the Czech Republic, it would be necessary to "retrain" the model on Czech tax laws and legislation, which is technically possible thanks to the Codex architecture.

Are my sensitive financial data safe?

For its enterprise solutions, OpenAI guarantees that data used for analysis is not used without consent to train foundational models for public users. However, when deploying in the EU, it is always necessary to verify that the provider complies with GDPR requirements and local data security regulations.

What is the difference between Tax AI and regular ChatGPT?

ChatGPT is a general assistant for text and queries. Tax AI is a specialized agent that has an integrated mechanism for working with complex documents (PDFs, spreadsheets) and, most importantly, possesses a "feedback loop" that allows it to learn from corrections made by experts.

X

Don't miss out!

Subscribe for the latest news and updates.