Skip to main content

DAAF: Open-source framework that turns Claude Code into a meticulous researcher. And it's free

AI article illustration for ai-jarvis.eu
Brian Heseung Kim, a PhD student at the University of Virginia, has created a free open-source framework called DAAF that turns Anthropic's Claude Code into a rigorous research assistant for data analysis. In just a few months, the tool has gained over 1,300 users, 197 GitHub stars, and is being used at Stanford and Northwestern University. What can it do and why should Czech researchers care?

Listen to this article:

What is DAAF and why was it created

The Data Analyst Augmentation Framework (DAAF) is a free open-source tool that acts as an "intelligent lab manager" for Claude Code by Anthropic. Its author Brian Heseung Kim built it on a simple premise: while large language models can write analytical code and process data, they also hallucinate, cut corners, and act overly confident. DAAF systematically suppresses these shortcomings by forcing Claude to behave like a careful and responsible researcher.

Kim has been working on the framework since 2018 — before ChatGPT was released — and dedicated his entire dissertation to teaching others how to use AI tools responsibly. DAAF is the result of these years of research and rests on five principles: transparency, scalability, rigor, reproducibility, and accountability.

How DAAF works in practice

DAAF is installed with a single command via terminal (macOS/Linux) or PowerShell (Windows) and runs in a Docker container that isolates Claude from the rest of your file system. This protects sensitive data from unwanted access. Once launched, you communicate with Claude Code in natural language — just describe what you want to analyze, and DAAF determines the most suitable workflow type on its own.

The framework offers nine work modes, from quick consultations to dataset searches to a complete research pipeline:

  • Ad Hoc Collaboration — informal brainstorming, help with code or methodology
  • Data Lookup — quick queries on datasets (e.g. "What socioeconomic variables are in the College Scorecard?")
  • Data Onboarding — teaches Claude to understand your own data and documentation
  • Data Discovery — connects information across different data sources
  • Full Pipeline — from research question to finished analysis with a report
  • Revision & Extension — improves and expands existing analyses
  • Reproducibility Verification — verifies that the analysis is fully reproducible
  • Framework Development — extends DAAF itself with new methods and libraries
  • User Support — helps with installation, Docker setup, and technical questions

What sets DAAF apart from ordinary AI use

The key innovation of DAAF is agentic orchestration. During a Full Pipeline analysis, DAAF doesn't just run a single thread — it automatically splits work among specialized assistants: one writes code, another adversarially checks it (looking for errors, methodological gaps, logical missteps), and a third verifies reproducibility. The human always remains in control — at each stage, DAAF asks for your approval before proceeding further.

Another safeguard against hallucinations is an extensive library of verified reference guides (agent skills) that DAAF dynamically loads based on context. Instead of Claude relying on its "general knowledge," it works with specific citations and methodological procedures — from causal inference through geospatial analysis to machine learning.

Methodological scope: from regression discontinuity to geospatial analysis

DAAF comes with pre-built support for more than twenty statistical methods including difference-in-differences, instrumental variables, regression discontinuity, propensity score matching, time series analysis, cluster analysis, and algorithmic fairness assessment. On top of that, it offers expert knowledge of libraries such as polars, pyfixest, statsmodels, scikit-learn, geopandas, plotly, SHAP and many others.

An interesting feature is support for translating code between R/tidyverse and Stata into Python — researchers accustomed to these tools can thus smoothly transition into the DAAF ecosystem without having to learn Python from scratch.

Who uses DAAF and what it costs

DAAF is completely free and open-source under the LGPL-3.0 license. Brian Kim explicitly rejects any premium tiers or "bait-and-switch" models — the framework is meant to serve as a public good for the research community. The only cost is an Anthropic Max subscription at $100–200 per month (approximately 2,200–4,400 CZK), or alternatively you can use an API key with per-token billing.

In less than a year since launch, DAAF has gained over 1,300 unique users and 197 GitHub stars. Workshops have been held at institutions such as Northwestern University, University of Virginia, Stanford University, Bowdoin College, and the Urban Institute. The community gathers on Discord, and the framework has 328 commits from contributors.

What this means for Czech researchers

For the Czech academic scene — from sociologists to economists to data analysts in public administration — DAAF represents an interesting opportunity. The framework is free, runs on a standard computer with Docker, and its methodological scope covers most techniques used in the social sciences. No special hardware is needed — computations run locally in Python, with Claude only handling the "thinking."

The language barrier is of course a topic — DAAF communicates in English and Claude Code does not officially support Czech. However, for researchers with at least basic English, this should not be an obstacle, especially since analytical code and documentation are in English as a standard even at Czech universities.

Also interesting is the GUIDE-LLM reporting standard, which DAAF automatically fills in — it is an internationally recognized checklist for transparent disclosure of AI use in research. If you publish in international journals, this can simplify meeting their AI disclosure requirements.

Security and reproducibility come first

DAAF runs in a Docker container, meaning Claude has no access to your personal files outside the designated working directory. The framework also includes protection against destructive commands (e.g. rm -rf) and credential scanning. Every analytical session is automatically versioned via Git, and a complete transcript including all of Claude's "thoughts" is archived for later review.

For researchers working with sensitive data (e.g. health data, personal respondent information), this is crucial — DAAF allows you to harness the power of AI without having to upload data to third-party clouds.

Availability and how to get started

Installation requires Docker Desktop (free) and an Anthropic account with a Max subscription. After running the installation script, you're ready within minutes:

  • macOS/Linux: curl -fsSL https://raw.githubusercontent.com/DAAF-Contribution-Community/daaf/main/scripts/host/install.sh | bash
  • Windows: irm https://raw.githubusercontent.com/DAAF-Contribution-Community/daaf/main/scripts/host/install.ps1 | iex

Complete documentation is available at daaf.openaugments.org, including video tutorials and an interactive guide to the anatomy of a complete analysis. The community is active on Discord and the DAAF Field Guide blog is published on Substack.

Do I need to know Python to use DAAF?

Not necessarily. DAAF communicates with you in natural English and generates and checks all code itself. For basic analyses, you just need to describe what you want to find out. However, for advanced use and output review, knowledge of Python (or R/Stata, from which DAAF can translate) is a big advantage — the framework is designed so that the human researcher remains the final arbiter of all analytical decisions.

Can DAAF be used with AI models other than Claude?

Officially, DAAF is built for Claude Code by Anthropic. However, the documentation mentions that most features can be ported to other agent platforms such as Gemini CLI, OpenAI Codex, or OpenCode — though this requires technical knowledge of the given platform. The community on GitHub welcomes contributions in this direction.

Is DAAF suitable for commercial research or only for academics?

DAAF is designed primarily for academic and public-interest research, but the LGPL-3.0 license allows unrestricted use in the commercial sphere as well. You can use it internally within a company, modify it as needed, and build your own tools on top of it. The only restriction concerns distribution of modified versions — improvements to the framework core must remain open-source.

X

Don't miss out!

Subscribe for the latest news and updates.