TL;DR
– Pixtral 12B is Mistral AI’s first vision-language model, built on Nemo 12B with a 400M-parameter vision encoder trained from scratch.
– Released under Apache 2.0 — unrestricted commercial use, no strings attached. 24GB on disk, 128k token context, variable image sizes.
– Mistral has since deprecated it in favor of newer models. The announcement page exists; the deprecation notice is at the bottom.
– If you’re building document processing pipelines on Pixtral 12B, you have a working model today and a migration problem tomorrow.
—
Mistral AI dropped Pixtral 12B. It’s their first multimodal model that can handle both images and text in one context. 12 billion parameters in the decoder. 400 million in the vision encoder, trained from scratch. Built on Mistral Nemo 12B.
Apache 2.0 license. 52.5% on MMMU.
Native variable image resolution. 128k token context window.
These are all real.
They’re all in the official announcements. And none of them change the fact that Mistral has since deprecated the model.
The deprecation isn’t hidden, exactly.
It’s on the product page.
It’s in the Vercel model listing. It’s in the official docs: “Pixtral 12B is no longer maintained and has been replaced by our latest, more powerful vision and multimodal models.” If you were reading carefully when it was released, you’d have found it. If you’re discovering Pixtral 12B today through a search result or an old bookmark, you’re looking at a model that has already been sunset.
This is the part of the model release cycle nobody writes about. Not the specs. Not the benchmarks. The lifecycle. The moment between “Mistral’s inaugural VLM” — Amazon’s words, not mine — and “no longer maintained.”
What Pixtral 12B Actually Does
Start with what’s real. Pixtral 12B is described as “natively multimodal,” trained on interleaved image and text data. That’s different from models that bolt a vision encoder onto a text-only base and call it a day. The vision encoder — a new 400M-parameter model trained from scratch — converts images to tokens before the decoder processes them alongside text. This matters.
A model trained on interleaved data understands images and text as the same conversation, not as two separate pipes duct-taped together.
The model supports variable image sizes and aspect ratios. You’re not stuck with 224×224 squares.
Feed it a 4000-pixel-wide screenshot of a spreadsheet or a 600-pixel diagram from a whitepaper — it handles native resolution.
It also accepts an arbitrary number of images per request, as URLs or base64-encoded data, within that 128k token context window.
Benchmark-wise, Mistral reports 52.5% on the MMMU reasoning benchmark, outperforming a number of larger open models.
The model maintains state-of-the-art performance on text-only benchmarks while adding multimodal reasoning.
Amazon’s Bedrock Marketplace listing says it “surpasses other open models and rivals larger counterparts.” This wasn’t a toy release. It was a legitimate contender in the 12B weight class.
The use cases are what you’d expect: chart and figure understanding, document question answering, multimodal reasoning, instruction following. Scanned documents. Diagrams.
Screenshots of dashboards.
The stuff that makes up 80% of what clients actually send you when they say “here’s the data.”
24GB, Apache 2.0, and the Quiet Deprecation
Pixtral 12B weighs about 24GB on disk, per TechCrunch.
That fits on a consumer GPU.
Not a datacenter rack. A single RTX 4090 or an A6000. Apache 2.0 licensing means you can download it from Hugging Face, fine-tune it on your own data. And deploy it commercially without a lawyer. That combination — sub-30GB, open license, real multimodal performance — is rare enough that teams built pipelines around it.
Amazon put it on Bedrock Marketplace. Mistral made it available through Le Chat and La Plateforme. The model card on Hugging Face identifies the checkpoint as “Pixtral-12B.” The arXiv paper describes it as trained to understand both natural images and documents.
The deprecation is documented.
Mistral’s site says Pixtral 12B is no longer maintained.
The replacement model is listed as a newer version. If you built production infrastructure on Pixtral 12B, you’re now in the awkward position of running a deprecated model that still works fine. The weights haven’t vanished. The Apache 2.0 license hasn’t expired. The inference code still runs. But you’re on your own for fixes, patches, and improvements.
Why the Deprecation Tells You More Than the Launch
Mistral didn’t do anything wrong here. Models get superseded. The roadmap moved on. What’s worth paying attention to isn’t the deprecation itself — it’s how the deprecation sits below the fold.
The launch page for Pixtral 12B is still live on Mistral’s site.
The benchmark numbers.
The feature descriptions. The “natively multimodal” headline. The deprecation note is there too — but it’s at the bottom. If you’re a solo developer scanning model announcements to figure out what to build on, you could easily read the entire feature list, get excited about variable image resolution, download the weights. And miss that the model is end-of-life.
This is the tension in open-weight releases. The license says “use it however you want.” The maintainer says “we’ve moved on.” Both are true at the same time. And if you’re the one running a document processing pipeline on a deprecated model, you’re the one who owns the risk.
I run a small AI consulting operation. When a client asks me whether to build on an open model, I look at two things: the license and the deprecation policy. Pixtral 12B scores high on the first and undefined on the second. That’s not a dealbreaker — Apache 2.0 means nobody can pull the rug.
But it means the maintenance burden shifts to you the moment Mistral stops shipping updates.
For a model this size, that’s fine-tuning, hosting, monitoring, and eventual migration.
The deprecation too creates an odd second life for the model. Since the weights are open and the license is permissive, the community can keep Pixtral 12B alive indefinitely. Fine-tuned variants. Quantized versions. Integrations that Mistral never built. If you’re running a vision task on a budget, a deprecated-but-free 12B model beats a current-but-metered API every time.
What You Should Actually Do
If you’re already running Pixtral 12B: keep running it.
The model hasn’t broken.
Your pipeline hasn’t stopped. The deprecation means you stop expecting upstream patches, not that you stop using it. Start planning a migration to a newer version or Pixtral Large on your own timeline, not Mistral’s.
If you’re evaluating Pixtral 12B for a new project: look at the deprecation before the benchmarks.
The model works.
It will continue working. But you’re signing up for self-supported infrastructure from day one. That’s fine if you have the GPU and the expertise. It’s less fine if your plan was “Mistral will keep improving this.”
If you’re building document processing pipelines, look at what Pixtral 12B does well: chart understanding, figure interpretation, scanned document Q&A. These are the tasks where vision and text actually need to be natively integrated, not just piped through separate models. A deprecated model that does this well is more useful than a current model that doesn’t.
The Apache 2.0 license means you can download Pixtral 12B from Hugging Face, fine-tune it. And deploy it commercially without restrictions. Mistral made that call intentionally. The deprecation doesn’t undo it.
Pixtral 12B is a working model with an expiration date that Mistral set but can’t enforce. Use it. Plan around it. Don’t mistake “no longer maintained” for “no longer useful.” The difference is whose problem the maintenance becomes — and for 24GB of open-weight multimodal intelligence, that’s a problem worth taking on.
Sources: Mistral AI Pixtral 12B announcement | Hugging Face model card | TechCrunch coverage | AWS Bedrock Marketplace | arXiv paper
