Listen to this article:
Why Saying "Just Do It" Isn't Enough
The demo begins seemingly simply: you upload a Swedish car accident form (17 checkboxes) and a hand-drawn collision sketch to the console. The first prompt is minimalist — "here is the form and sketch, determine what happened and who's at fault."
Claude responds that it was a skiing accident on Chappangan street. Which is wrong — the model guessed it was Sweden, but didn't have enough context to understand it was looking at a traffic form. This is exactly the moment when, according to Anthropic, real prompt engineering begins: an iterative, empirical process where you test and improve each version.
10 Steps of a Professional Prompt
Anthropic internally uses a ten-point structure that the video gradually reveals. Not every prompt needs all ten points — it's more of a checklist of everything you should consider.
1. Task context — tell the model who it is and what it should do
The absolute foundation. Instead of "analyze the form" say: "You are an AI assistant helping an insurance claims handler. You analyze Swedish traffic accident forms." The model needs to understand its role just like a new colleague on their first day at work.
2. Tone context — teach the model when to say "I don't know"
A crucial instruction: "If you're not sure, don't guess. Only answer when you are fully confident." Without this safeguard, the model will make up data that isn't in the form. In insurance, such a hallucination could mean incorrectly assigned fault — at best embarrassment, at worst a lawsuit.
3. Background data — give the model what doesn't change
A key insight from the video: static information belongs in the system prompt. The form always has 17 rows, two columns for vehicle A and B. This never changes, so you put it in the system prompt once — and ideally use prompt caching, which saves tokens and money. Dynamic data (the specific filled-in form) then goes into the user prompt.
In practice this means describing the document structure: "The form is named X, the left column is vehicle A, the right column is vehicle B. Row 1 means X, row 2 means Y…" The more the model knows in advance, the less time it wastes deciphering.
4. Detailed instructions — break it down step by step
Perhaps the most important discovery from the demo: the order of steps matters. When the model first analyzes the confusing hand sketch and only then the structured form, it often struggles. The reverse — form first (clear data), then sketch (interpretation) — works much better.
"First carefully go through the form. Verify every checkbox. Write down what you found. Only then move to the sketch and confront it with what you already know from the form."
💡 Practical tip
When you tell the model to "carefully examine every field," it really will — and will output the entire analysis. If you don't want that, phrase the instruction more loosely: "Go through the form and identify relevant information."
5. Examples — show what you want
Few-shot prompting is according to Anthropic one of the most powerful tools. When you encounter a borderline case where the model fails, add it as an example to the system prompt. For an insurance company, this could be dozens to hundreds of historical accidents with manual evaluation — the model will learn to recognize patterns from them.
Insert examples into XML tags (<examples>), ideally with separation of data and expected response.
6. Conversation history
For automated systems running in the background (like form processing), conversation history is not used. But for chatbots and assistants, where the user communicates in many turns, history is crucial and belongs in the system prompt.
7. Reminders — repeat what's important
At the end of the prompt repeat key instructions. The model tends to "forget" rules that were mentioned at the beginning. Remind it: "If you cannot determine with certainty who caused the accident, explicitly say so. Don't make up data that isn't in the form."
8. Output formatting — force the model to structured output
For production deployment you need parseable output. Anthropic recommends:
- XML tags: Wrap the final verdict in
<final_verdict>and parse only this block - Prefill technique: Start the response for the model — write the opening tag and let the model fill in the content. For example:
〈final_verdict〉as the last characters of the prompt - JSON / Structured Outputs: For strict schemas use native structured outputs
9. Pre-filled responses
For older versions of Claude, you could "hint" the model the beginning of the response and force it to continue in the given format. Newer models (Claude 4.6+) no longer support prefill on the last assistant turn — they are replaced by Structured Outputs and direct instructions like "Answer without preamble, straight into XML tags."
10. Extended thinking
Claude 3.7 and 4.x support hybrid thinking (extended / adaptive thinking) — the model can "think" in an internal scratchpad before answering. This isn't just for complex tasks. It can also be used as a debugging tool: you read how the model reasoned about the data and find where it makes mistakes. Then you incorporate these insights back into the system prompt.
What to Take Away for Your Own Prompts
Let's summarize practical techniques that work regardless of whether you use Claude, GPT, or Gemini:
- Separate static from dynamic. What doesn't change (document structure, company rules) → system prompt. What changes (specific input) → user prompt.
- Order of steps is critical. Arrange instructions so the model processes clear data before ambiguous ones.
- XML tags work better than Markdown. The model is trained on them and understands them better.
- Teach the model to say "I don't know." Explicit instructions against hallucinations are essential for factual tasks.
- Iterate. Prompt engineering isn't an academic discipline — it's a craft. Test, watch for errors, improve.
The full workshop with code is available on Anthropic's YouTube channel and more detailed documentation in their official prompt engineering guide.
Do I have to use XML tags, or is plain text enough?
For simple prompts, plain text is sufficient. XML tags pay off when the prompt combines multiple types of information — instructions, examples, input data. Thanks to them, the model distinguishes them better and makes fewer mistakes. Plus, XML tags are part of Claude's training data, so it understands them instinctively.
How do I know whether to put something in the system prompt or the user prompt?
A simple rule: if the information doesn't change between individual queries, it belongs in the system prompt. If it changes with each query, it belongs in the user prompt. The system prompt can also be cached, which saves tokens and money — at high volumes, this makes a noticeable difference.
Does this framework work for models other than Claude?
Yes, the vast majority of techniques are universal. Separating system/user prompt, chain-of-thought, few-shot examples, output formatting — all of this works across GPT-5, Gemini 2.5, and open-source models. Only the specific "taste" of the model differs — for example, GPT responds better to Markdown, Claude prefers XML.