Claude Opus 4.8 in Plain English: Every Feature and Setting That Matters

On May 28, 2026, Anthropic shipped Claude Opus 4.8 — its most capable model yet, and the one I now run for hours a day. If you've opened Claude Code or the Claude app since the update and stared at a menu full of words like effort, thinking, xhigh, and Ultracode, this post is for you.

I'm going to walk through every feature and setting in plain English — what each one does, when to touch it, and what it costs you. No jargon for jargon's sake. Just the working knowledge you'd get if you sat next to me while I used it.

The 30-second version

What it is: Anthropic's flagship model — the smartest one they make, tuned for hard reasoning, long coding sessions, and doing real work on its own without babysitting.
The headline number: a 1,000,000-token memory (context window) — big enough to hold an entire codebase or a stack of books in a single conversation.
The settings that matter: just three — Effort (how hard it works), Thinking (whether it reasons before answering), and Fast mode (how quickly it types). Everything else you can leave alone.
The honest upgrade: Anthropic says 4.8 is 4× less likely than the previous version to let a flaw in its own code slip by unmentioned. In daily use, that's the difference you feel most.

That's the whole thing. The rest of this post is the detail behind each line.

What Opus 4.8 actually is

Anthropic makes models in three sizes, and Opus is the biggest. Think of it as hiring the senior engineer instead of the intern: slower and pricier per task, but you hand it the gnarly problems and trust the result. Opus 4.8 is the current top of that line, built for "complex reasoning, long-horizon agentic coding, and high-autonomy work" — which is a fancy way of saying it can be pointed at a big, messy job and left to finish it.

Two facts shape everything else about it:

Claude Opus 4.8 has a 1,000,000-token context window — roughly 750,000 words, 15 novels, or an entire codebase, at standard pricing. — A million tokens is a genuinely large working memory — and you're not charged extra for using all of it.

1. It has a huge memory. The "context window" is how much the model can hold in mind at once — your question, the files you've shown it, the whole back-and-forth. Opus 4.8's window is a million tokens (roughly 750,000 words). You can drop a complete project into the conversation and it keeps all of it in view. Better still, Anthropic charges the same per-token rate whether you use 9,000 tokens or 900,000 — there's no "large request" surcharge.

2. Its knowledge stops in January 2026. Like every model, it was trained on data up to a cutoff — here, January 2026. It doesn't inherently know what happened after that. That's exactly why pairing it with live tools (web search, your codebase, your docs) matters: the tools supply the fresh facts, the model supplies the judgment.

Which Claude should you actually use?

Opus isn't always the right answer. Anthropic ships three models, and picking the right one saves you real money. Here's the plain-English version:

Model	Think of it as…	Best for	Memory
Opus 4.8	The senior engineer	Hard reasoning, big refactors, long autonomous jobs	1M tokens
Sonnet 4.6	The fast, sharp generalist	The best balance of speed and smarts — most everyday work	1M tokens
Haiku 4.5	The quick specialist	Cheap, instant tasks at near-frontier quality	200K tokens

My rule of thumb: reach for Opus when getting it wrong is expensive — architecture decisions, security-sensitive code, anything I'm going to ship without re-checking line by line. For drafting, quick edits, and high-volume tasks, Sonnet is the workhorse and Haiku is the sprinter.

The settings menu, decoded

Here's where most people freeze. When you open the model/settings menu in Claude Code, it looks like this — and the labels assume you already know what they mean:

The Claude Code settings menu: Switch model, Effort (Ultracode — xhigh + workflows), Thinking, and Fast mode — each explained. — Four controls do the heavy lifting. Once you know what each one means, the menu stops being intimidating.

Four lines matter. Let's take them one at a time.

1. Switch model

Exactly what it says: choose Opus, Sonnet, or Haiku. "Default (recommended)" lets the tool pick a sensible model for you. If you don't want to think about it, leave it on Default. If you're about to do something hard, switch to Opus 4.8 on purpose.

2. Effort — one dial for thoroughness

This is the most useful setting almost nobody understands. Effort controls how much work Claude puts into a task — how many tokens it's willing to spend across its answer, its tool use, and its private reasoning. Low effort is quick and cheap. High effort is slow and thorough. Same brain either way; you're just telling it how hard to dig.

The five effort levels: low, medium, high (the default), xhigh, and max — from quick and cheap to deepest and slowest. Ultracode in Claude Code means xhigh effort plus multi-agent workflows. — Five levels, from a quick glance to a deep, multi-step grind. The default — high — is the right call most of the time.

There are five levels:

low — fast and cheap. Good for simple, obvious tasks.
medium — a balanced middle gear.
high — the default. If you change nothing, this is what you get, and it's the right setting for most work.
xhigh — for long coding and agentic jobs that run for half an hour or more and burn through a lot of tokens.
max — the deepest, slowest, most thorough setting. Reach for it when correctness beats speed and cost.

And then there's the word that confuses everyone: Ultracode. It is not a secret super-model. In Claude Code, "Ultracode" simply means xhigh effort turned up and standing permission for Claude to spin up multi-agent workflows — i.e. to fan a big job out across several helper agents working in parallel. It's the top shelf for serious, long-running builds. For a quick question, it's overkill.

How I set it: I leave effort on high for most things and bump it to xhigh when I kick off a real coding session. I rarely touch low — saving a few cents isn't worth a worse answer.

3. Thinking — let it reason before it answers

"Thinking" lets Claude work on a private scratch-pad before it replies — planning, weighing options, catching its own mistakes — instead of blurting out the first thing. For anything with more than one moving part, it produces noticeably better answers.

With thinking off, Claude answers immediately. With adaptive thinking on, it plans and reasons on a private scratch-pad first, then answers — and decides for itself how much thinking the problem needs. — Thinking off: instant answer. Thinking on: it plans first — and on 4.8, it decides how much planning the task actually needs.

Here's the genuinely new part in 4.8: thinking is now adaptive. On older models you had to hand-set a "think for N tokens" budget. Opus 4.8 drops that knob entirely and decides for itself how much to think based on how hard the problem is — a quick question gets a quick answer, a thorny one gets real deliberation. You just turn it on and let it right-size itself.

How I set it: on, for anything non-trivial. Off only when I want a one-line answer as fast as possible.

4. Fast mode — same brain, quicker typing

Fast mode does one thing: it makes the model stream its answer faster — up to about 2.5× the normal output speed. The crucial detail: it's the exact same model. You are not being quietly downgraded to a smaller, dumber model to save time — Opus 4.8 stays Opus 4.8; it just types quicker.

Fast mode streams Opus 4.8's output up to about 2.5x faster from the exact same model — toggled with /fast, available on Opus 4.8, 4.7 and 4.6, at a higher per-token price. — Same intelligence, faster delivery. You pay for the speed in a higher per-token price.

You toggle it with /fast in Claude Code, and it works on Opus 4.8, 4.7, and 4.6. The trade-off is price: faster output costs more per token. I switch it on when I'm actively watching a long answer stream and I want it now; I leave it off for background jobs where I don't care if it finishes a few seconds later.

The rest of the menu (quick hits)

Account & usage — your plan, limits, and how much you've used. Check it if you're hitting caps.
Attach file / Mention file — feed a specific file into the conversation so Claude can see it.
Clear conversation — wipe the slate and start fresh (frees up that big memory).
Rewind — step back to an earlier point in the chat. Underrated: if a session goes sideways, rewind instead of starting over.

What's genuinely better in 4.8

Version bumps can be hype. This one earns it in a way you feel rather than read on a chart. The single most useful change: Anthropic reports Opus 4.8 is four times less likely than the previous version to let a flaw in code it wrote pass by without flagging it. In practice, that means fewer confident-but-wrong answers and more "heads up, this edge case isn't handled." For anyone shipping real software, that honesty is worth more than a benchmark point.

On the benchmarks themselves: 4.8 posts top-tier numbers for fixing real-world coding issues (high-80s on SWE-bench Verified, the standard "can it resolve an actual GitHub bug" test) and for multi-step browsing and computer-use tasks. I treat those as directional — the day-to-day improvement I actually notice is sharper judgment and a longer attention span before it loses the plot on a big job.

What it costs

You don't pay for any of this with a subscription tier alone — through the API it's billed per token. The plain numbers for Opus 4.8:

What	Price	In plain terms
Input (what you send)	$5 / million tokens	Reading is cheap
Output (what it writes)	$25 / million tokens	Writing is the cost driver
Prompt caching	~90% off repeated input	Reuse the same context, pay almost nothing to re-read it
Batch jobs	50% off both ways	For work that isn't time-sensitive

Two things keep the bill sane: prompt caching (if you keep handing it the same big file or instructions, you pay full price once and a fraction thereafter) and the fact that the 1M window is standard-priced. The levers that raise the bill are the ones above — higher effort and fast mode both spend more tokens or charge more per token. That's the trade you're making, and now you know you're making it.

Where you can use it

Opus 4.8 is everywhere that matters: the Claude apps, Claude Code, and the API, plus Amazon Bedrock, Google Vertex AI, Microsoft Foundry (note: 200K memory there, not the full million), and GitHub Copilot. Same model, your choice of front door.

How I'd set it, if you just want an answer

For the 90% case, here's my default loadout and you can stop thinking about the menu:

Model: Opus 4.8 for hard or ship-it work; Sonnet for everything else.
Effort: high normally, xhigh when you start a real coding session.
Thinking: on. Let it plan.
Fast mode: off by default; flip it on when you're watching a long answer and want it quickly.

That's it. Opus 4.8 is the most capable assistant I've worked with, and the settings aren't there to confuse you — they're a small set of honest trade-offs between thorough, fast, and cheap. Pick the two that matter for the task in front of you.

This is the same model I use to build production software for clients every day — if you want that kind of AI woven into your own product instead of bolted on, that's exactly the work I do.

See how I build with AI Try a trustworthy AI assistant I shipped

Links mentioned