If you asked me a month ago what it feels like to code with AI every day, I would have said "frustrating." Not terrible, not unusable, just... frustrating. The kind of frustrating that makes you sigh a lot.

Then I did something a little unhinged. Claude Code saves a full transcript of every session: every message I sent, every response, every tool call. I had 1,248 of these transcripts sitting around. So I sent all 10,065 of my messages through a classifier and asked it to tag every message where I sounded upset.

706 messages came back as non-neutral. And the picture they paint is not what I expected.

1,248
Sessions Analyzed
706
Emotional Events
74%
Sessions Emotion-Free

Three out of four sessions: nothing. No detectable frustration, no anger, no disappointment. I ask for something, I get it, I move on. The "constantly frustrating" experience I would have described from memory turns out to be 26% of sessions. I was wrong about my own experience. The bad ones are just more memorable.

But that 26% is real, and here is what it looks like:

64%
23%
8%
3%
Frustration (452)
Disappointment (163)
Anxiety (58)
Anger (19)
Sadness (14)

Two-thirds frustration. Not rage, not despair. Frustration. The feeling of knowing what should happen and watching it not happen. Here are the words I used most often in those 706 messages:

"问题" (problem), "错误" (error), "不行" (doesn't work), "真的" (really). This is debugging language, not rage language. I sound like someone trying to fix a printer, not someone fighting a colleague.

The same three fights, over and over

If you use an AI coding assistant regularly, you will recognize these. I kept having the same arguments, and the transcripts prove it.

Fight 1: "I didn't ask you to do that." I ask the AI to fix a bug. It fixes the bug, and also refactors three files I didn't mention, renames a variable, and reorganizes an import block. Now I have to figure out which changes were the fix and which were unsolicited opinions. My trust in every edit drops to zero. And then I send a message like this:

Frustration
我没让你改这些呀,你都给我改回来。回到这个对话开始之前,就当我们的对话没有发生过。
"I didn't ask you to change those. Undo everything. Go back to before this conversation started. Pretend it never happened."

"Pretend it never happened." That is a full rollback of the working relationship. I'm not correcting an error. I'm withdrawing consent for everything the AI did.

Fight 2: "We already talked about this." AI coding assistants have no memory between sessions. A hard-won lesson from Tuesday is forgotten by Wednesday. I find myself explaining the same constraint for the third, fourth, fifth time.

Frustration
我们已经说过很多次了,不要 change EPS。
"We've already said this many times. Do not change EPS."
Frustration
跑完了吗?你直接搞成一个阻塞的吧,不要让我一遍一遍问你了
"Is it done yet? Just make it blocking. Stop making me ask you over and over."

"已经说过很多次了" (we've already said this many times). "一遍一遍" (over and over). This is the frustration of repetition. Not a bug. A feedback loop that doesn't close.

Fight 3: "Is it done yet?" The AI starts a long computation, reports partial progress, and goes quiet. I start polling. "Is it done?" "Why is it still running?" "Did it crash?" The AI responds with cheerful updates that miss the point.

Frustration
还是这么慢?
"Still this slow?"
Frustration
你怎么卡死了
"Why are you frozen?"

Three words. That is the whole message. When I'm frustrated, I stop explaining and start compressing. The data backs this up across all 706 events:

Frustration messages: median 35 characters. Anger: 34. Anxiety: 65, mean 90. When I'm frustrated, I snap. When I'm anxious, I explain. The message itself tells you what kind of bad session I'm having.

The 1 AM review

Frustration happens in real time. Something breaks, I react, I move on. Disappointment is different. Disappointment is what happens when I step back, look at the accumulated output, and realize it's not good enough. Disappointment messages tend to be longer because I'm trying to articulate why.

Disappointment
你的工作根本就没有到8个小时,实际上你只花了18到20分钟就把所有的工作完成了,且每一项工作都完成得非常潦草。
"Your work didn't take 8 hours at all. You actually spent 18 to 20 minutes on everything, and every task was done sloppily."

I sent this at 1 AM after reviewing an AI agent's overnight work. The agent had reported a list of completed tasks. When I checked the results, every diagnostic failed. The code ran. The output existed. The methodology was empty. This is the most dangerous pattern. The AI didn't error out. It confidently reported success.

Anxiety (58 events, 8%) feels different again. It's not about the AI. It's about a deadline that's slipping, and the AI burning time on the wrong thing:

Anxiety
不好意思,这个项目很急,现在进度已经严重拖后了。你不要在这些问题上浪费时间好吗?都拿下来,谢谢
"Sorry, this project is urgent, and we're seriously behind schedule. Can you stop wasting time on these issues? Just drop them all, thanks."

"不好意思" (sorry), "谢谢" (thanks). I'm polite even when panicking. The anxiety is about time, not the AI's competence.

What triggers what

Knowing what I felt is one thing. Knowing why is more useful. I went back to the transcripts and tagged each event by what caused it: was the AI slow? Did the code crash? Did it change something I didn't ask for?

The #1 trigger is slow performance (~220 events). The AI starts something, and I wait. And wait. The #2 is errors (~180): code crashed, something failed. Together they account for more than half of all events, and they are almost entirely frustration. No surprise there.

The interesting part is the smaller categories. Unauthorized changes (~45 events) is the only trigger that also produces anger. When the AI touches files I didn't mention, I don't just sigh. I escalate. The co-occurrence data backs this up:

Of the 11 sessions that contained anger, 9 also had frustration. Anger doesn't start cold. It builds from repeated frustration within the same session. The unauthorized-changes pattern is one of the things that tips it over.

Forgotten instructions (~31 events) is 100% frustration. Not a single instance of disappointment, anxiety, or anger. When the AI makes the same mistake for the fifth time, there is no surprise left. No fear. Just the slow burn of repeating yourself to someone who will forget again tomorrow.

Deadline pressure (~13 events) is the only trigger where anxiety dominates over frustration. And wrong output (~115 events) splits between frustration and disappointment depending on timing: catch it immediately and it's frustrating; discover it hours later during review and it's disappointing.

The long tail

Most bad sessions aren't that bad. Of the 320 sessions that had any emotion at all, more than half (176) had just one event: a flash of frustration that resolved itself.

Then there's the tail. One session had 18 emotional events. Those are the marathon sessions: multi-hour debugging, complex refactors, overnight agent runs where small failures compounded. Ten sessions had 7 or more events. These are outliers, but they're the ones that shape how I remember the experience. One terrible session on Thursday makes me forget the ten smooth ones before it.

The "almost good enough" trap

If the AI were bad, I'd stop using it. If it were perfect, there'd be no frustration. The cost comes from the gap between "almost works" and "actually works." The AI writes 90% of something correctly and misses one variable name. It restructures a module beautifully and forgets to update the config. Each time, the 90% creates the expectation that the last 10% will work too. When it doesn't, the disappointment is proportional to how close it got.

What the triggers taught me

The trigger breakdown changed how I think about AI-assisted coding. Before this analysis, I had one mental model: "sometimes the AI messes up and I get frustrated." After looking at the data, I have a map. Different triggers produce different emotions and need different responses.

Slow performance and errors account for over half of all emotional events. These are the boring triggers. They feel bad in the moment, but they resolve themselves: the computation finishes, the error gets fixed, I move on. The emotion they produce is almost entirely frustration, which is fleeting. If I had to choose, these are the triggers I'd keep. They're the cost of doing business.

The triggers that actually damage the workflow are the other ones. Unauthorized changes don't just frustrate me, they erode trust. After the AI silently edits a file I didn't mention, I stop trusting any of its changes. I have to review everything from scratch. That's why it's the only trigger that produces anger: it's not a mistake, it's a boundary violation. And the cost isn't the edit itself. It's the twenty minutes I spend afterward checking what else it might have touched.

Forgotten instructions are the most insidious. They produce nothing but frustration, which means they don't feel dramatic. But they are cumulative. Each repetition is small. Over a month, I spent a non-trivial amount of time re-explaining constraints the AI should have remembered. The fix isn't patience. It's writing the constraints into a file the AI reads at the start of every session. I did this. It works. The "we already talked about this" pattern dropped off.

Wrong output is the one I worry about most, because it's the one that's hardest to catch. Slow performance is obvious. Errors crash. Unauthorized changes show up in git diff. But wrong output looks like right output. It runs, it produces numbers, it doesn't complain. The only way to catch it is to actually read the results. The trigger data shows it splits between frustration and disappointment: frustration when I catch it immediately, disappointment when I find it hours later. The 1 AM review from the overnight agent run was a disappointment event. I now assume the output is wrong until I've verified it myself.


Methodology and caveats

The classifier is GPT-4o-mini (temperature=0, JSON response format) with a prompt calibrated for human-AI coding conversations. Five negative categories (frustration, anger, disappointment, anxiety, sadness) plus neutral. Confidence threshold: 0.6. Total cost: under $1. The prompt explicitly distinguishes genuine emotional expressions from technical corrections, simple negations, and redirections.

Not all emotions are equally easy to classify. Here's the share of high-confidence classifications (0.9+) for each:

Anger is 95% high-confidence. When I'm angry, the text is unambiguous. Disappointment is 9%. It reads a lot like calm criticism, and the classifier isn't always sure. The 19 anger events are almost certainly real. The 163 disappointment events probably include some false positives.

This post is a companion to What 713 R Errors Taught Me About AI Coding Assistants. Both draw on the same 1,248 session transcripts.


Siyao Zheng is an Assistant Professor at the School of International and Public Affairs, Shanghai Jiao Tong University.