Claude Code in massive codebases: How the AI assistant handles millions of lines without getting lost

May 19, 2026 Daniel Cesak

Anthropic has revealed best practices for deploying Claude Code in large enterprises. From monorepos with millions of lines of code to decades-old legacy systems and distributed architectures across dozens of repositories — Claude Code navigates them like an experienced developer. But the key to success is not the AI model itself. It's what you build around it.

Listen to this article:

How Claude Code Navigates Massive Codebases

Unlike many other AI coding assistants, Claude Code does not use RAG indexing (Retrieval-Augmented Generation) — a method where the entire codebase is converted into vector embeddings and relevant parts are pulled out when queried. According to Anthropic, this approach fails in practice for large teams — by the time a developer asks a question, the embedding pipeline is outdated by days, weeks, or even hours. The result is that the AI works with a function the team renamed two weeks ago, or references a module deleted in the last sprint.

Instead, Claude Code uses agentic search. It traverses the file system, reads files, uses grep for precise searching, and follows references across the code — exactly as an experienced software engineer would. It runs locally on the developer's machine and requires no index to build, maintain, or upload to a server. Each instance works with the live version of the code.

This approach has one catch, however: it works best when Claude has enough default context to know where to look. And that's where what Anthropic calls the harness comes in — a set of extension components that determine Claude Code's performance more than the model itself.

Harness: 7 Components That Matter More Than the Model

The biggest mistake teams make, according to Anthropic, is focusing purely on model benchmarks. In reality, success is determined by the ecosystem built around the model — the harness — consisting of seven components:

1. CLAUDE.md Files — The Foundation of Everything

Context files that Claude reads automatically at the start of every session. The root file contains the overall picture of the project, while files in subdirectories describe local conventions. They are the first step you cannot do without. It's important to keep them lean — if you cram everything into them, they become more of a burden than a help.

2. Hooks — Scripts That Improve Setup on the Fly

Most teams see hooks as a tool to prevent Claude from making mistakes. But their real value lies elsewhere: a stop hook can, after a session ends, suggest an update to the CLAUDE.md file based on the work just completed. A start hook can dynamically load context specific to a given team or module. Unlike prompts that Claude merely "reads," hooks enforce rules deterministically.

3. Skills — On-Demand Expertise

In a massive codebase, you don't need all expertise in every session. Skills leverage the principle of progressive disclosure — specialized knowledge is loaded only when the task requires it. For example, a security audit skill activates only when checking for vulnerabilities, a documentation skill only when code changes require it. Skills can also be restricted to specific directories — the team owning the payment service can bind their deployment skill only to that directory.

4. Plugins — Distributing What Works

One of the biggest problems in large teams: good configurations remain isolated to individuals. A plugin packages skills, hooks, and MCP configurations into a single installable bundle. A new developer installs the plugin on day one and immediately has the same capabilities as more experienced colleagues. Updates can be distributed via a managed marketplace within the organization.

5. LSP Integration — Symbol-Level Navigation

The Language Server Protocol (LSP) is the technology that lets your IDE "jump to definition" or "find all references." When Claude Code also gets access to it, it gains the ability to distinguish identically named functions in different languages and precisely follow references. Without LSP, Claude searches by text — and in a million-line codebase, it will run into thousands of false matches. For multi-language codebases, Anthropic considers this one of the most valuable investments.

6. MCP Servers — Gateway to Internal Tools

Model Context Protocol (MCP) servers allow Claude to connect to internal tools, data sources, and APIs it wouldn't otherwise be able to reach — from internal documentation to ticketing systems to analytics platforms.

7. Subagents — Separating Exploration from Editing

A subagent is an isolated instance of Claude with its own context window that receives a task, completes it, and returns only the final result. Some teams use a read-only subagent to map out a subsystem and write findings to a file before the main agent starts editing.

Three Configuration Patterns That Work

Creating a Navigable Codebase

Anthropic recommends several specific practices:

Keep CLAUDE.md files lean and layered. The root file contains only critical information and references. More detailed instructions belong in subdirectories.
Initialize in subdirectories, not in the repository root. Claude automatically traverses the directory tree upward and loads all CLAUDE.md files, so you never lose the root context.
Specify test and lint commands for each subdirectory separately. Running the entire test suite when changing a single service wastes both time and context window space.
Use .ignore files to exclude generated files, build artifacts, and third-party code. Rules in .claude/settings.json are versioned, so the entire team shares them.

Active Configuration Maintenance

As models evolve, instructions written for an older version can slow down a newer model. A rule that helped an older model stay on track may prevent a newer one from making coordinated edits across files. Anthropic recommends a meaningful configuration review every 3–6 months and also after every major model update release.

Ownership and Adoption

Claude Code spread fastest in companies that invested in the infrastructure before the broad rollout. A small team (sometimes even one person) prepared the tooling so that Claude fit developers right from the first contact. A new role is also emerging — the agent manager, a hybrid between a product manager and an engineer who manages the Claude Code ecosystem across the organization.

Anthropic also warns against a purely bottom-up approach without centralization: enthusiasm spreads quickly, but without someone to unify conventions and maintain shared skills and plugins, know-how remains fragmented and adoption hits a ceiling.

What This Means for Czech Developers and Companies

Claude Code is available globally, including in the Czech Republic and the European Union. It works as a CLI tool that developers install locally — it does not require uploading code to cloud servers, which is appreciated not only by companies subject to GDPR or working with sensitive data. Pricing starts at $20 per month (Claude Pro), the Team plan costs $25 per user per month, and the Enterprise solution has custom pricing.

For Czech software houses that often manage extensive legacy systems — whether in Java, C#, or PHP — the insights from this article are highly relevant. Anthropic explicitly mentions that Claude Code performs better than most teams expect, especially in languages like C, C++, C#, Java, and PHP. Investing in proper CLAUDE.md file setup and LSP integration can mean the difference between a frustrating tool and a productive partner.

Is Claude Code suitable for smaller teams too, or only for corporate deployment?

The described practices scale both ways. Even a three-person team can benefit from a well-written CLAUDE.md file and basic hooks. The difference is that a smaller team doesn't need to solve plugin distribution or marketplace management — a shared configuration in Git is enough.

Do I need to migrate to a different version control system or change my project structure for Claude Code?

No. Claude Code is designed for conventional developer environments — Git, standard directory structures. It works with what you already have. Anthropic only notes that extremely non-standard environments (such as game engines with large binary assets or legacy systems on a VCS other than Git) may require additional configuration.

How does Claude Code handle GDPR and sensitive data in code?

Claude Code runs locally on the developer's machine and does not require uploading a complete codebase index to a server. Communication with the Claude API is encrypted, but the code itself is sent to Anthropic's cloud for processing with each query. For companies working with strictly regulated data (banking, healthcare), it is therefore advisable to consult the terms of the Enterprise plan, which offers more advanced data management options.