The Fastest LLLMs Ever Built

The Fastest LLLMs Ever Built

The Fastest LLLMs Ever Built

Diffusion LLMs: A Breakthrough for Speed and Quality

By using Mercury, you agree to our Terms of Use and have read our Privacy Policy

Powering Cutting-Edge AI Applications

The Mercury Diffusion Models

The Mercury Diffusion Models

Blazing fast inference
with frontier quality
at a fraction of the cost.

The Diffusion Difference From Sequential to Parallel Text Generation

All other LLMs generate text one token at a time. Mercury diffusion LLMs (DLMs) generate tokens in parallel, increasing speed and maximizing GPU efficiency.

See the difference

How it works

The Diffusion Difference From Sequential to Parallel Text Generation

All other LLMs generate text one token at a time. Mercury diffusion LLMs (DLMs) generate tokens in parallel, increasing speed and maximizing GPU efficiency.

See the difference

How it works

The Diffusion Difference From Sequential to Parallel Text Generation

All other LLMs generate text one token at a time. Mercury diffusion LLMs (DLMs) generate tokens in parallel, increasing speed and maximizing GPU efficiency.

See the difference

How it works

AI Applications Made Possible
with Mercury

Lightning-fast code editing

Stay in flow with responsive autocomplete, intelligent tab suggestions, fast chat responses, and more.

Real-time voice agents

Engage naturally with AI for customer support, translation, and beyond.

Fast, creative co-pilots

Supercharge editorial and creative work—less waiting, more creating.

Rapid enterprise search

Instantly surface the right data from across your organization’s knowledge base.

Seamless enterprise workflows

Automate complex routing, analytics, and decision processes with ultra-responsive AI.

AI Applications Made Possible
with Mercury

Lightning-fast code editing

Stay in flow with responsive autocomplete, intelligent tab suggestions, fast chat responses, and more.

Real-time voice agents

Engage naturally with AI for customer support, translation, and beyond.

Fast, creative co-pilots

Supercharge editorial and creative work—less waiting, more creating.

Rapid enterprise search

Instantly surface the right data from across your organization’s knowledge base.

Seamless enterprise workflows

Automate complex routing, analytics, and decision processes with ultra-responsive AI.

AI Applications Made Possible
with Mercury

Lightning-fast code editing

Stay in flow with responsive autocomplete, intelligent tab suggestions, fast chat responses, and more.

Real-time voice agents

Engage naturally with AI for customer support, translation, and beyond.

Fast, creative co-pilots

Supercharge editorial and creative work—less waiting, more creating.

Rapid enterprise search

Instantly surface the right data from across your organization’s knowledge base.

Seamless enterprise workflows

Automate complex routing, analytics, and decision processes with ultra-responsive AI.

Our Models

Our Models

Mercury Coder

Mercury Coder

DLM optimized to accelerate coding workflows

dLLM optimized to accelerate coding workflows

Streaming, tool use, and structured output

128K context window

Input $0.25 | Output $1 per 1M tokens

Mercury

Mercury

General-purpose DLM that provides ultra-low latency 

General-purpose dLLM that provides ultra-low latency 

Streaming, tool use, and structured output

128K context window

Input $0.25 | Output $1 per 1M tokens

An Enterprise AI Partner

We’re available through major cloud providers like AWS Bedrock. Talk with us about fine-tuning, private deployments, and forward-deployed engineering support.

Integrate in Seconds

Our models are OpenAI API compatible and a drop-in replacement for traditional LLMs.

Our models are OpenAI API compatible and a drop-in replacement for traditional LLMs.


Providers

What Customers are Saying

What Customers are Saying

"I was amazed by how fast it was. The multi-thousand tokens per second was absolutely wild, nothing like I've ever seen."

Jacob Kim

Software Engineer

"After trying Mercury, it's hard to go back. We are excited to roll out Mercury to support all of our voice agents."

Oliver Silverstein

CEO

"We cut routing and classification overheads to sub-second latencies even on complex agent traces."

Damian Tran

CEO