Over the past month, I used Claude Code (Anthropic's AI coding assistant) to help build a large R-based research pipeline. Survey data from multiple sources, latent variable models, multiple imputation, regression analysis. The kind of project that grows to 40+ R files and hundreds of variables scattered across config files and model specifications.
Claude Code logs every session as a structured JSON transcript. 1,237 sessions over 33 days, across all my research projects. I wrote a Python script to extract every R error from those transcripts, parsing stderr output and matching error patterns. 713 unique errors across 168 sessions.
I expected to find exotic failures — hallucinated function names, impossible syntax, the kind of mistakes only an AI would make. Instead, I found my own mistakes staring back at me.
The Numbers
86% of sessions produced zero R errors. The AI is not constantly breaking things. But those 713 errors across the remaining 14% tell a story, and the story is not about AI.
| Error Category | Count | % | In Plain English |
|---|---|---|---|
| Variable/object not found | 160 | 22% | Renamed in one file, forgot another |
| Type and dimension errors | 137 | 19% | haven_labelled, subscript out of bounds, factor/numeric confusion |
| File path and IO errors | 123 | 17% | Spaces in paths, wrong root directory, corrupt files |
| Package and API issues | 65 | 9% | Missing package, API renames, Quarto/igraph |
| Syntax errors | 47 | 7% | Escape characters, unexpected symbols |
| Runtime logic errors | 44 | 6% | if() on NA, pipe chain failures, grid/ggplot |
| Wrong function arguments | 29 | 4% | Unused arguments, misspecified options |
| Infrastructure failures | 51 | 7% | HPC workers, targets lock, C++ compilation |
| Numerical/convergence | 24 | 3% | Singular matrix, memory limits, hIRT initialization |
| Known pitfalls (mice) | 7 | 1% | Documented bugs, hit repeatedly |
| Missing function/method | 25 | 4% | Function removed or not exported |
The top three categories — wrong names, wrong types, wrong paths — account for 58% of all errors. Every R programmer will recognize them immediately. They share a single root cause: R gives you no feedback about code correctness until the code actually runs. This is what the rest of this post is about.
R Has No Compiler (And That Changes Everything)
The single largest category: 160 naming errors, 22% of the total. The AI renames a variable in one file — say, changing income_level to income_percentile in the data cleaning script — and misses a reference downstream. A regression formula, a config file, a visualization three pipeline steps later.
x Column `income_level` doesn't exist.
object 'birthyear' not found
This is not an AI-specific problem. It is the most common mistake every R programmer makes. I've done it myself more times than I'd like to admit. But it reveals something fundamental about why R is different from the languages where AI coding assistants shine.
In Python, referencing an undefined name raises NameError immediately. In Go or Rust, the compiler refuses to build. In R, your pipeline hums along for thirty minutes — loading data, fitting models, running imputation — and only crashes at minute 31 when it finally reaches the line with the wrong name.
# This runs for 30 minutes before failing
data <- load_all_surveys() # 2 min
data <- harmonize_variables(data) # 3 min
models <- fit_irt_models(data) # 10 min
imps <- run_mice(models, m = 20) # 15 min
results <- run_regressions(imps)
# Error: object 'income_level' not found # minute 31
The AI has no way to "compile" R code. It writes the edit, it looks correct, and nothing validates it until runtime. This is why naming errors were the only category that persisted across the entire month. Type errors were fixed once and stayed fixed. Path issues were resolved and didn't return. Only naming errors kept recurring, from the first week to the last day, because R's lack of static analysis is a permanent condition.
Chronic vs. acute
Most error categories follow an "acute" pattern: they cluster around a specific refactoring, get fixed, and don't return. The haven_labelled type errors appeared in late January, were resolved, and never came back. The mice configuration bugs, same story. Only naming errors are chronic — a permanent condition of working in a language with no compile step.
Your Data Is Lying About Its Type
If you're a quantitative social scientist, this section is about you specifically.
137 errors — 19% of the total — came from type and dimension mismatches. The most revealing variety involves survey data. When you load a Stata .dta file with haven::read_dta(), what you get back is not a normal data frame. The columns carry invisible metadata — value labels, variable labels, format specifications — encoded as a special haven_labelled type. It looks like a numeric vector. It prints like one. It isn't one.
Basic arithmetic fails. bind_rows() fails. The data looks numeric but doesn't behave numerically. The fix is a single function call — haven::zap_labels() — but you have to know to call it immediately after loading, before the labelled values propagate through the pipeline. Three files and two hours later, something breaks with an incomprehensible type error, and the root cause is a data type that never should have existed past line 10 of your cleaning script.
This is a trap that only affects people who work with survey data stored in Stata or SPSS format — which is to say, a large fraction of quantitative social scientists. The AI doesn't know about haven_labelled unless it's been told. It sees a data frame, assumes it's a normal data frame, and writes perfectly reasonable code that fails for invisible reasons.
The broader category includes other R type traps: subscript-out-of-bounds after unexpected filtering, factor/numeric confusion when columns were silently coerced, dimension mismatches after merges that dropped rows. All invisible until runtime. All correct-looking code meeting incorrect-at-runtime data.
The Gap Between Your Code and Your Computer
174 errors — 24% of the total — have nothing to do with analytical logic. They come from the messy reality of how research computing actually works.
The biggest sub-category: file paths (123 errors). If you're on macOS and your project lives in Dropbox, you probably have a path like:
/Users/you/Dropbox/My Research Project/code/R/model.R
That space in "My Research Project" is a ticking time bomb. Inside R, here::here() handles it fine. But the moment a path gets passed to a shell command via system() or Rscript, the space breaks things:
The AI writes the system() call without quoting the path — the same mistake I've made dozens of times. The fix is always the same: wrap the path in quotes. But the AI doesn't retain this lesson across sessions, because each new system() call is a new context. Separately, R's here::here() function can resolve to the wrong project root when a renv.lock file sits in a subdirectory. In our project, this sent every file lookup to code/Data/ instead of Data/. A symlink fixed it, but the AI had to rediscover the problem. Multiple times.
Then there's infrastructure (51 errors): the code is correct, but the environment fails. HPC workers crash mid-computation. Network connections drop between nodes. The targets pipeline manager accumulates stale lock files that block new runs. C++ dependencies fail to compile on a remote cluster's older toolchain. None of this is the AI's fault. But the AI cannot distinguish "my code is wrong" from "the infrastructure had a hiccup" — both produce identical-looking errors in the log, and both send it down the wrong debugging path.
What the AI Remembers (and What It Doesn't)
Seven errors, barely 1% of the total, but they reveal something important about how AI coding assistants actually work.
The mice package for multiple imputation has a known quirk: when using its parallel variant futuremice(), you must not pass the printFlag argument, because the function passes it internally. Pass it anyway, and R complains:
This was documented in our project notes from the start. Yet the AI kept re-introducing printFlag = FALSE every time it touched the imputation code for other reasons. Each time, from the AI's perspective, passing printFlag = FALSE to a mice-like function is the obviously correct thing to do — it has seen thousands of examples of exactly this pattern in its training data.
This is the tension at the heart of AI-assisted coding: the model has strong priors from training data that can override project-specific knowledge. Our project note saying "don't pass printFlag to futuremice" is one instruction competing against an overwhelming statistical prior. The prior wins, repeatedly, until the code is structured to make the error impossible rather than merely inadvisable. For researchers relying on AI assistants, the lesson is architectural: don't tell the AI what not to do; design your code so the wrong thing can't be done.
Errors Follow the Calendar
Errors are not uniformly distributed. They cluster around major refactoring events:
Full-scale imputation run + regression model rewrite
Unified loading of 33 surveys into a single function + new analysis module
Visualization refactoring + beamer presentation pipeline
First full-scale multiple imputation run (m=20, distributed across HPC)
Four days, 42% of all errors. The pattern is clear: errors spike when the project is being restructured, not when the AI is doing routine analytical work. And the errors that dominate during refactoring are almost entirely naming errors — the AI restructures code correctly in the files it touches, but misses references in files it didn't think to update. This is the same blind spot every human programmer has. You remember to update the three files you have open. You forget about the config file you last touched two weeks ago.
So What?
Here's what I keep coming back to: AI coding errors in R are not AI-specific errors. They are R-specific errors. The AI doesn't hallucinate function names or invent impossible syntax. It makes the same mistakes I make, for the same structural reasons.
R is dynamically typed, lazily evaluated, and has no compile step. A misspelled variable is only caught when the line executes, potentially hours into a pipeline. A type mismatch — haven_labelled vs. double — is invisible until an operation fails. A file path with a space in it works inside R but breaks when passed to the shell. None of these have compile-time equivalents. There is no way to "build" a multi-file R project and check for consistency before running it.
These aren't bugs in R. They are design choices that prioritize interactivity and flexibility — exactly what makes R excellent for exploratory data analysis. But they also mean that neither humans nor AI gets any structural feedback about code correctness until something breaks at runtime. In languages with static tooling (TypeScript, Rust, Go), the same AI assistant makes far fewer of these errors, because the compiler catches them before execution. R doesn't have that feedback loop.
What I'm trying next
If the AI's biggest weakness in R is the same as mine — no structural feedback before runtime — then the solution probably isn't making the AI smarter. It's giving it better tools. R's Language Server already provides some of this for IDE users: undefined variable detection, cross-file references, scope-aware renaming. I've been experimenting with making these same capabilities available to the AI assistant directly. Whether that actually reduces the error rate is a question I'll take up in a future post.
Siyao Zheng is an Assistant Professor at the School of International and Public Affairs, Shanghai Jiao Tong University. His research focuses on AI for Social Science and Digital Politics. The error data and extraction script are available on GitHub.