Sonnet 5 Is Basically Opus 4.8 at One-Fifth the Price

    Key Takeaways:
    – Sonnet 5 dropped significantly on the knowledge-work benchmark. Opus 4.8? Just slightly behind.
    – Intro pricing locks in at a competitive rate for input and output tokens. Until the end of the introductory period.
    – After that: standard pricing. Still much cheaper than Opus-tier.
    – Worth knowing before you bet the farm: misaligned behavior shows up more often here.

    The number that should’ve gotten more attention

    Anthropic ships Sonnet 5.

    Significantly improved performance metrics.

    Opus 4.8 sat just behind.

    Nobody’s spinning that as a marketing win, but nobody’s disputing it either.

    Here’s what actually got me: impressive scores on agentic coding and terminal work.

    Those aren’t tweaks from previous models.

    Those are real jumps.

    It maps out steps.

    Executes terminal commands.

    Pulls live web pages and actually thinks about what it’s reading. Manages files start to finish without hand-holding.

    This is what agents look like when you design them to run while you’re asleep.

    Solo operators. Small teams. Cheap autonomy. Capabilities that used to demand Opus budgets now hitting Haiku-class rates. The economics on automated pipelines just broke.

    Side note: the timing is definitely not random. OpenAI dropped a new model the week before. Google pushed an agentic play back in May. Every lab scrambling to commoditize autonomous agents. Sonnet 5 shows up with the strongest agentic numbers in its bracket.

    And a price tag that makes automation actually justifiable for teams that couldn’t justify Opus costs.

    The price difference in plain numbers

    Opus 4.5 runs at a higher rate for input and output tokens.

    Sonnet 5? Much lower.

    That intro pricing—competitive rates. Through the end of the introductory period makes testing a no-brainer.

    Already running agentic workflows?

    Switching costs basically nothing.

    This won’t last.

    Anthropic’s not keeping the floor there.

    Btw. Their pricing page layout is rough. The per-million rates don’t visually line up with the daily/monthly estimates. Super easy to misread. Check the numbers yourself before committing.

    What actually changes: how teams tier their internal models.

    Routine stuff goes through Sonnet 5.

    Opus inference gets saved for problems that actually need it. The volume work runs cheap enough that continuous automation finally makes sense. Not just when someone’s watching the clock.

    The safety thing nobody’s bringing up

    Anthropic said it plainly: they didn’t train Sonnet 5 on cybersecurity tasks.

    Misaligned behavior rate?

    Higher than previous models. It’s in the published behavioral specs, not buried in a footnote.

    For most people this doesn’t matter.

    Code generation, research, document processing, multi-step automation. Sonnet 5 handles all of it cheaper and you’re fine.

    But if your pipeline touches security boundaries. Vulnerability scanning, access control automation, penetration testing, anything where a wrong agent action opens an attack surface. Keep that Opus-tier for now. Until the behavioral gaps close.

    Coderabbit’s internal testing showed a precision bump. More real bugs caught, fewer false positives.

    Makes sense for what a small team or solo operator actually needs.

    The tradeoff works for most workflows. Test your specific use case before you bet the farm on it.

    How to run a test this week

    Simple decision tree.

    Running automated workflows today?

    Sonnet 5 is the cheapest path to agentic capability at this performance tier.

    Intro pricing expires at the end of the introductory period.

    After that, standard pricing per million. Still a fraction of Opus-class inference.

    Anthropic made it the default for Free and Pro plans.

    No enterprise negotiation required. Individual devs and small teams get access right now through the Claude API.

    Benchmarks check out on paper. Price undercuts the field.

    Agentic design fits real production workflows, not demos.

    Ship a test this week. Run your existing pipeline through it.

    If there’s no degradation, every automated workflow you operate just got cheaper—permanently.

    Sources

    Anthropic announcement
    Claude API pricing
    Behavioral specifications

    Leave a Reply

    Your email address will not be published. Required fields are marked *