Mercury 2 vs Cerebras
Mercury 2 delivers ~1,000 tok/s on standard NVIDIA GPUs. Haiku-tier quality, no minimum contract, self-serve API key.
THROUGHPUT
tok/s
on standard NVIDIA GPUs
QUALITY TIER
Haiku & GPT-5 Mini
comparable quality
PRICING
/ $0.75 at 1M
Pay as you go
MIN COMMITMENT
Zero commitment
API Key ready in 60s
Mercury 2 vs Cerebras.
Side by side.
Mercury 2
Cerebras
Architecture
Diffusion LLM on NVIDIA GPUs
Wafer-scale custom silicon (WSE-3)
Model
Mercury 2 — one proprietary model
Open-weight catalog you select (Llama, Qwen3, gpt-oss, GLM, Kimi K2)
Quality tier
~Claude Haiku 4.5 / GPT-5 mini
Depends on model run — entry to frontier-class
Peak output speed
~1,000 tok/s
~1800 tok/s
Pricing / 1M tok
$0.25 in / $0.75 out — flat
$0.10–$2.75, ~25× spread by model
Path to production
Self-serve, pay as you go — no contract
Self-serve dev tier; enterprise contract for production capacity
Capacity model
Elastic, standard cloud GPU supply
Allocation-gated; Code Pro/Max self-serve sold out¹
API
OpenAI-compatible drop-in
OpenAI-compatible
*Comparison reflects publicly disclosed information at time of publication. Quality and speed tier benchmarks via Artificial Analysis. Methodology in our docs.
Verify it yourself
Don’t take our word. Run the harness on your traffic
Same prompts, both endpoints, your requests. It prints p50/p95 latency and cost per request. If we're wrong about your workload, you'll know in ten minutes.
Real-time reasoning
Coding agents
Sub-seconds loops
Autocomplete, multi-step agents loops, and refactors that land before developer breaks flow.
Voice
Real-time pipelines
Conversational interfaces where latency is the product. Tightest latency budgets in AI.
Enterprise Search
Reasoning at retrieval speed
Multi-hop retrieval, reranking, and summarization without blowing the latency budget.
Cerebras achieves speed through wafer-scale custom chips and through the capacity constraints and contract minimums that come with them.
Mercury 2 hits comparable throughput through a fundamentally different path: parallel difussion-based generation running on widely available NVIDIA instrastructure.
Is the quality really there at this speed?
Mercury 2 sits in the speed-optimized tier alongside Claude 4.5 Haiku and GPT-5 Mini on Artificial Analysis’s Agentic Index, not competing with frontier models like Opus or GPT-5.
If your workload runs well on Haiku or Mini today, Mercury 2 will hit it at roughly an order of magnitude more throughput. If you need frontier reasoning, you need a frontier model. We’re honest about which tier this is.
Benchmark it on your own workload
Spin up an API key, drop in the OpenAI-compatible endpoint, run it against Cerebras on your real prompts.
