Meituan LongCat-2.0: 1.6T Open Coding Model at $0.75/M Tokens

    TL;DR

    LongCat-2.0 packs 1.6 trillion parameters, MIT-licensed, dropped June 30, 2026 by Meituan. Yes, the food delivery company
    – Runs about $0.75 per million input tokens on OpenRouter. GPT-5.5 costs roughly $5.00 for the same workload. That’s not a typo
    – Trained on 50,000 domestic Chinese AI accelerators (Huawei Ascend 910C). Zero NVIDIA silicon anywhere in the stack
    70.8 Terminal-Bench, 59.5 SWE-bench Pro. And it was secretly ranking top-three on OpenRouter for weeks under the name “owl-alpha”
    – Weights promised on HuggingFace. Not there yet. Hosted endpoints work today

    A delivery app company built this. Let that sit for a second.

    Meituan. The Chinese equivalent of UberEats or DoorDash, depending on which analogy you prefer. They just shipped a coding model that trades punches with frontier stuff from OpenAI and Anthropic. And they pulled it off using hardware that US export controls were specifically designed to keep out of large-scale training.

    Here’s what’s actually interesting though.

    The embargo didn’t stop anything. It removed the safety net. Chinese firms couldn’t buy NVIDIA, so they went full-stack on domestic chips and built their own tooling from scratch. LongCat-2.0 is a product of that forced independence. Whether that’s strategy or spite depends on your perspective.

    What’s Inside This Model

    Mixture-of-Experts architecture. 1.6 trillion total parameters. Native 1M-token context window. And that’s not RoPE extension trickery, it’s built for it. Meituan shipped it June 30, 2026 under MIT license.

    Here’s the detail that got me.

    It was already live on OpenRouter. For weeks. Under the alias “owl-alpha.” People were hammering it. Call volume had it in the top three models. Nobody had any clue it came from a food delivery company. That’s not a soft launch.

    That’s hiding in plain sight.

    For a solo dev or a small shop running agent pipelines, the appeal is pretty direct. You dump your whole codebase in. Every support ticket from the last six months. The full git log.

    No chunking gymnastics, no RAG pipeline babysitting, no vector database setup.

    Just context. All of it. At once.

    Specs and Architecture Choices

    The numbers:

    – Total parameters: ~1.6 trillion
    – Active per token: ~48 billion (MoE, roughly 97% sparsity. Only 3% of weights fire on any given token)
    – Embedding layer: 135 billion parameters using N-gram shortcuts for common tokens, near-zero compute cost
    – Context: 1,000,000 tokens, native
    – Attention: custom LongCat Sparse Attention, purpose-built for long sequences
    – Training: 50,000 domestic ASICs, almost certainly Huawei Ascend 910C
    – Timeline: 3 years scaling from a few thousand cards to full cluster
    – License: MIT
    – Terminal-Bench: 70.8
    – SWE-bench Pro: 59.5
    – Release: June 30, 2026

    The MoE design isn’t showmanship.

    It’s necessity.

    When you can’t get your hands on the fastest GPUs in the world, you architect around the gap.

    Sparse activation keeps the compute-per-token manageable. That 135B N-gram embedding layer? It handles frequent tokens through almost-free pathways. The whole thing is engineered for hardware that runs slower than an H100.

    Honestly I think the architecture tells a more important story than the benchmarks do. A model this size running on domestic Chinese accelerators at competitive speeds says something about where the compute floor is heading. Something that no leaderboard captures.

    Price Comparison That Matters

    | Model | Input cost per M tokens | Context | License |
    |—|—|—|—|
    | LongCat-2.0 | ~$0.75 | 1M | MIT (open) |
    | GPT-5.5 | ~$5.00 | 400K | Proprietary |
    | Claude Opus 4.6 | ~$4.00 | 500K | Proprietary |
    | Gemini 3.1 Pro | ~$3.50 | 2M | Proprietary |

    Picture a typical small operation.

    Three agents doing their thing — PR reviews, generating tests, terminal operations. Maybe 50K to 100K tokens per agent daily. Combined that’s 150K-300K tokens flowing through.

    GPT-5.5 pricing? You’re looking at $3,000 to $6,000 monthly.

    Same workload on LongCat-2.0 through OpenRouter. $340 to $675.

    That gap isn’t a rounding error. That’s somebody’s paycheck. Or a new hire.

    Or just breathing room for a business that’s been getting squeezed by inference costs.

    Catch is real though. Weights aren’t on HuggingFace yet. And when they do land, you’re looking at roughly 400GB of memory just for the model at 2-bit quantization. That’s weights only. Bandwidth and KV cache sit on top. Nobody’s cramming this onto a workstation. Not today.

    So hosted endpoints are the only practical path right now. OpenRouter works. API responds. It’s live.

    The Play for Small Teams

    Running a small crew — 3 to 10 people, agentic workflows, code assistance, that kind of setup? LongCat-2.0 reshapes what your monthly burn looks like.

    MIT license means self-hosting isn’t a legal question. It’s a hardware question. And hardware will catch up. Probably not this quarter.

    Maybe mid-2027, when memory configs that can hold 400GB don’t require selling a kidney.

    The immediate strategy is straightforward.

    Point your heaviest token consumers at LongCat-2.0 on OpenRouter. Keep GPT-5.5 or Claude around for the stuff where you genuinely need peak reliability — production deploys, security review, anything where a hallucination costs real money. Route 80%+ of volume work to the cheaper model.

    Side note: Meituan’s own benchmarks place LongCat-2.0 ahead of Gemini 3.1 Pro, GPT-5.5. And Claude Opus 4.6 on their internal evaluations. Self-reported benchmarks deserve skepticism, always. But the Terminal-Bench 70.8 score lines up with what independent users on OpenRouter have been reporting since the owl-alpha days. This thing runs actual terminal commands. Catches errors. Recovers. Iterates.

    That’s not benchmark theater. That’s a model doing work.

    Common Questions

    What exactly is LongCat-2.0?
    Open-source coding model from Meituan, 1.6 trillion parameters, MoE architecture with 48B active per token. Native 1M context window. MIT licensed. Released June 30, 2026.

    Cost per million tokens?
    Around $0.75 for input via OpenRouter. Output pricing varies by provider. For context, GPT-5.5 runs about $5.00 per million input tokens. LongCat-2.0 is roughly 85% cheaper on equivalent coding workloads.

    Can I run it locally?
    MIT license says yes. Reality says not yet. You need approximately 400GB of memory at 2-bit quant. Weights aren’t even on HuggingFace as of July 1, 2026. Practical self-hosting is a 2027 conversation for most teams.

    How does it stack up against GPT-5.5?
    SWE-bench Pro: LongCat-2.0 scores 59.5, ahead of GPT-5.5 on Meituan’s internal testing. Terminal-Bench: 70.8. Weeks of independent usage on OpenRouter under “owl-alpha” confirms strong agentic capability. Real command execution, error recovery, the works. Cost is roughly 6.7x lower per million tokens.

    What did they train it on?
    50,000 domestic Chinese AI accelerators. Huawei Ascend 910C is the likely chip. Zero NVIDIA involvement. The cluster took three years to build out from initial thousands of cards to full scale.

    Available now?
    Yes, through OpenRouter and hosted API providers. HuggingFace weights listed as “coming soon” as of today. OpenRouter is the path of least resistance if you want to try it immediately.

    Sources

    Meituan LongCat-2.0 Technical Report
    OpenRouter LongCat-2.0 Listing
    HuggingFace LongCat-2.0 (pending)

    Leave a Reply

    Your email address will not be published. Required fields are marked *