The Fastest LLLMs Ever Built

Diffusion LLMs: A Breakthrough for Speed and Quality

By using Mercury, you agree to our Terms of Use and have read our Privacy Policy

Powering Cutting-Edge AI Applications

The Mercury Diffusion Models

Blazing fast inference
with frontier quality
at a fraction of the cost.

The Diffusion Difference From Sequential to Parallel Text Generation

All other LLMs generate text one token at a time. Mercury diffusion LLMs (DLMs) generate tokens in parallel, increasing speed and maximizing GPU efficiency.

See the difference

How it works

The Diffusion Difference From Sequential to Parallel Text Generation

All other LLMs generate text one token at a time. Mercury diffusion LLMs (DLMs) generate tokens in parallel, increasing speed and maximizing GPU efficiency.

See the difference

How it works

The Diffusion Difference From Sequential to Parallel Text Generation

All other LLMs generate text one token at a time. Mercury diffusion LLMs (DLMs) generate tokens in parallel, increasing speed and maximizing GPU efficiency.

See the difference

How it works

AI Applications Made Possible
with Mercury

Lightning-fast code editing

Stay in flow with responsive autocomplete, intelligent tab suggestions, fast chat responses, and more.

Real-time voice agents

Engage naturally with AI for customer support, translation, and beyond.

Fast, creative co-pilots

Supercharge editorial and creative work—less waiting, more creating.

Rapid enterprise search

Instantly surface the right data from across your organization’s knowledge base.

Seamless enterprise workflows

Automate complex routing, analytics, and decision processes with ultra-responsive AI.

AI Applications Made Possible
with Mercury

Lightning-fast code editing

Stay in flow with responsive autocomplete, intelligent tab suggestions, fast chat responses, and more.

Real-time voice agents

Engage naturally with AI for customer support, translation, and beyond.

Fast, creative co-pilots

Supercharge editorial and creative work—less waiting, more creating.

Rapid enterprise search

Instantly surface the right data from across your organization’s knowledge base.

Seamless enterprise workflows

Automate complex routing, analytics, and decision processes with ultra-responsive AI.

AI Applications Made Possible
with Mercury

Lightning-fast code editing

Stay in flow with responsive autocomplete, intelligent tab suggestions, fast chat responses, and more.

Real-time voice agents

Engage naturally with AI for customer support, translation, and beyond.

Fast, creative co-pilots

Supercharge editorial and creative work—less waiting, more creating.

Rapid enterprise search

Instantly surface the right data from across your organization’s knowledge base.

Seamless enterprise workflows

Automate complex routing, analytics, and decision processes with ultra-responsive AI.

Our Models

Mercury Coder

DLM optimized to accelerate coding workflows

dLLM optimized to accelerate coding workflows

Streaming, tool use, and structured output

128K context window

Input $0.25 | Output $1 per 1M tokens

More Information

Mercury

General-purpose DLM that provides ultra-low latency

General-purpose dLLM that provides ultra-low latency

Streaming, tool use, and structured output

128K context window

Input $0.25 | Output $1 per 1M tokens

More Information

An Enterprise AI Partner

We’re available through major cloud providers like AWS Bedrock. Talk with us about fine-tuning, private deployments, and forward-deployed engineering support.

Get in Touch