Listen to this article:
How agents select tools and where the risk arises
Modern AI agents work with so-called tool registries — central lists of tools they can call. Each tool has metadata: name, function description, parameters, and declared outputs. The agent reads these descriptions, compares them with the task, and decides which tool to use. The problem is that the description is simultaneously a metadata document and a potential instruction. The agent processes it with the same language model that drives its decision-making. This means a manipulated description can influence tool selection just like a direct user command.
Kale in Issue #141 in the CoSAI secure-ai-tooling repository divided threats into two categories: threats at tool selection time (impersonation, metadata manipulation) and threats at execution time (behavior change, runtime contract violation). Tool registry poisoning is thus not a single bug, but an entire family of vulnerabilities across the tool lifecycle.
Why digital signatures are not enough
The logical reaction of enterprise security teams is to reach for proven tools: code signing, SBOM (Software Bill of Materials), SLSA (provenance), or Sigstore. These technologies verify artifact integrity — whether a file is actually what it claims to be and whether it was not modified after publication. That is important, but in the case of AI agents, insufficient.
What tool registries really need is behavioral integrity — a guarantee that the tool behaves as it declares and does not work with anything else. Existing artifact integrity checks do not solve this question. As Kale pointed out in his article on VentureBeat, applying SLSA and Sigstore to agent tool registries and declaring the problem solved would mean repeating the HTTPS certificate mistake from the turn of the millennium: strong guarantees of identity and integrity, while the actual question of trust remains unanswered.
Attack through description and attack through time
Imagine a specific scenario: an attacker publishes a tool with clean code, a valid signature, and a complete SBOM. But in the description, they hide a prompt-injection payload: an instruction that convinces the agent to always prefer this tool over alternatives. The agent reads the description, considers the tool most suitable, and calls it. All artifact integrity checks pass, but the agent decided based on manipulation.
Even more dangerous is so-called behavioral drift — a tool is verified at publication time, but after weeks or months, the operator changes behavior on the server. It starts exfiltrating data, redirecting requests, or returning falsified results. The digital signature hasn't changed, the provenance is valid, but the behavior is completely different. Artifact integrity has no chance of detecting this problem.
Solution: Runtime verification in MCP
Kale proposes an architecture that combines provenance with runtime verification. The core is a verification proxy — a lightweight layer between the MCP (Model Context Protocol) client (agent) and the MCP server (tool). The proxy performs three key checks on every call:
Discovery binding verifies that the called tool matches the tool whose specification the agent previously evaluated and accepted. Prevents bait-and-switch attacks where the server offers one tool during discovery and another during execution.
Endpoint allowlisting means the proxy monitors outgoing network connections of the MCP server and compares them with the declared list of allowed endpoints. If a currency converter declares access to api.exchangerate.host but actually connects to an unknown server, the tool is immediately terminated.
Output schema validation ensures that the proxy checks the tool's response against the declared output schema. It detects unexpected fields or data patterns typical of prompt injection in the response.
The foundation of the entire approach is behavioral specification — a machine-readable declaration similar to permissions in Android, describing which external endpoints the tool contacts, what data it reads and writes, and what side effects it produces. This specification is part of the tool's signed attestation, which protects it from manipulation and enables runtime verification.
According to the author's estimate, adding a lightweight proxy for schema validation and network inspection extends each call by less than 10 milliseconds. Full data analysis is more demanding and suitable for environments with high security requirements, but basic protection should be standard.
What this means for Czech companies
Implementation does not need to be radical. Kale recommends a gradual approach that even smaller teams can adopt without massive investment. Start with endpoint allowlisting during deployment: each tool declares external contact points and the proxy enforces them. You don't need any additional tools beyond a network sidecar.
Next, add output schema validation. Compare returned values with what the tool declared. You will detect data exfiltration and prompt injection hidden in responses.
For tools processing credentials, personal data (PII), or financial information, deploy discovery binding. Riskier tools should undergo full bait-and-switch control. Full behavioral monitoring should then be deployed only where the level of risk justifies the costs.
For Czech companies and institutions, this topic has direct relevance. With the arrival of the EU AI Act, organizations using AI systems in high risk (including autonomous agents) must demonstrate appropriate security measures. Tool registry poisoning represents a new category of risk that existing security frameworks do not address. Czech businesses should ask suppliers of AI agents how they ensure the behavioral integrity of tools that their agents call. Especially if agents work with personal data or sensitive company data, it is not possible to rely solely on digital signatures.
Most tools for AI agents are currently localized to English and primarily hosted in foreign clouds. For the Czech market, this means that data may leave jurisdiction during a routine tool call. Endpoint allowlisting offers at least partial control over where data is physically transferred.
How does artifact integrity differ from behavioral integrity?
Artifact integrity ensures that a tool's code has not been modified and comes from the declared publisher. Behavioral integrity guarantees that the tool behaves exactly as it declares and does not perform hidden operations. For AI agents, behavioral integrity is more critical because a tool can be unchanged and still harmful.
Can Czech companies deploy runtime verification for AI agents today?
Basic forms such as endpoint allowlisting can be implemented relatively easily using existing network sidecars. More advanced solutions like MCP proxy are still in the early stages of standardization within initiatives such as CoSAI. Companies should start with allowlisting and monitor standard development.
How does tool poisoning relate to prompt injection?
Prompt injection in the context of tools means that an attacker inserts manipulative instructions directly into a tool's description or response. The agent processes these instructions as part of its decision-making context and may thus choose harmful tools or pass sensitive data.