How I Cut My Claude Token Usage in Half Without Losing Capability

I use Claude Code daily for development — building features, debugging, deploying, the full cycle. But I kept hitting my usage limits faster than expected. After digging into why, I realized the problem wasn't how much I was asking Claude to do. It was how much invisible context was being sent with every single message.

Here's what I found and the specific changes that cut my token consumption roughly in half.

The Hidden Cost: Context Accumulates Silently

Every message you send to Claude includes the entire conversation history. Message #1 is cheap. Message #30 carries all 29 previous messages — every file read, every tool result, every response. The cost curve is exponential, not linear.

On top of that, Claude Code loads persistent context on every message: your CLAUDE.md files, memory indexes, system prompts. If these files are bloated, you're paying a tax on every single interaction.

1. One Task, One Conversation

This is the highest-impact change. Stop reusing long conversations for multiple unrelated tasks. Each new task should be a fresh conversation. A 10-message conversation uses roughly 55 message-equivalents of tokens (1+2+3+...+10). Two 5-message conversations use only 30. Same work, 45% fewer tokens.

The instinct to keep a conversation going ("while you're here, can you also...") is the single most expensive habit in AI-assisted development.

2. Audit Your Persistent Context

Claude Code loads your CLAUDE.md files and memory indexes on every message. I found that mine had accumulated significant bloat:

Duplicate information — the same SSH credentials stored in CLAUDE.md AND a memory file AND referenced in the index
Verbose instructions — 18 lines saying the same thing three different ways
Inlined content in indexes — full paragraphs where a one-line link would suffice
Stale memories — project context from completed work that would never be relevant again

I trimmed my memory index from 48 lines to 9, deleted 6 redundant memory files, and compressed my CLAUDE.md instructions by 35%. That's roughly 65 fewer lines loaded into context on every single message. Over a 30-message conversation, that's nearly 2,000 lines of wasted tokens eliminated.

3. Delegate Research to Sub-Agents

When Claude reads a file, the entire content gets permanently added to your conversation context. Read 5 files of 200 lines each, and you've added 1,000 lines that get re-sent with every subsequent message.

Sub-agents solve this. When Claude spawns an agent to explore your codebase, that agent gets its own context window. It can read 20 files, search broadly, and return only a short summary to the main conversation. The main context stays lean.

Rule of thumb: If a task requires reading more than 2-3 files, delegate it to an exploration agent.

4. Be Specific in Your Prompts

Vague prompts trigger broad exploration. "Something's broken with auth" causes Claude to read multiple files, search for patterns, and investigate — all of which inflates context. "Fix the JWT expiry check in AuthController.php line 45" goes straight to the fix.

The more precise your prompt, the fewer exploration tokens Claude burns before doing the actual work. Include file paths, line numbers, and specific function names when you have them.

5. Control File Reading

Never read entire files when you only need a section. Use offset and limit parameters to read just the lines you need. Pipe long bash output through head or tail. These seem like small optimizations, but they compound across a conversation.

6. Skip the Summaries

By default, Claude often summarizes what it just did at the end of each response. If you can read the diff yourself, tell Claude to skip trailing summaries. Every unnecessary paragraph of output becomes part of the context for every future message.

The Math That Matters

Here's a simplified model. Assume each message averages 500 tokens of new content:

Approach	Messages	Total Tokens Sent
One long conversation	30	~232,500
Three 10-message conversations	30	~82,500
Six 5-message conversations	30	~45,000

Same total messages. The six-conversation approach uses 80% fewer tokens. This is the single most important thing to understand about token economics with conversational AI.

What I Didn't Do

I didn't build a custom prompt compression pipeline. I didn't write middleware to summarize conversation history. I didn't switch to a cheaper model. The wins came from understanding how context windows work and adjusting my workflow accordingly.

The most effective token optimization isn't technical — it's behavioral. Start new conversations. Keep persistent context lean. Be specific. Delegate research. The tools already exist; most people just aren't using them.

The Hidden Cost: Context Accumulates Silently

1. One Task, One Conversation

2. Audit Your Persistent Context

3. Delegate Research to Sub-Agents

4. Be Specific in Your Prompts

5. Control File Reading

6. Skip the Summaries

The Math That Matters

What I Didn't Do

See exactly what AI costs — across every provider.

Related posts

AI in Regulated Verticals: Audit Trails Beat Accuracy

llms.txt won the traffic war and you didn't notice

The Gap Between MVP and Operational Product Is Boring, Not Technical