A new frontier in LLM speed

A new frontier in LLMs speed

A new frontier in LLM speed

Inception’s breakthrough diffusion-based approach to language generation enables the world’s fastest, most efficient AI models with best-in-class quality.

How can I help you today?
How can I help you today?
Inception builds and deploys next‑generation large language models (LLMs) that are powered by diffusion rather than traditional auto‑regressive generation. By using diffusion, their models can produce many tokens in parallel, making them several times faster and less than half the cost of conventional LLMs. The diffusion framework also provides fine‑grained control over outputs, allowing adherence to specific schemas and semantic constraints. Additionally, it offers a unified paradigm for combining language with other data modalities such as audio, images, and video. The company’s team includes leading researchers and engineers from Stanford, UCLA, Cornell, Google DeepMind, Meta AI, Microsoft AI, and OpenAI, and they are currently deploying these diffusion LLMs at Fortune 500 companies.

Explain what Inception does

Inception builds and deploys next‑generation large language models (LLMs) that are powered by diffusion rather than traditional auto‑regressive generation. By using diffusion, their models can produce many tokens in parallel, making them several times faster and less than half the cost of conventional LLMs. The diffusion framework also provides fine‑grained control over outputs, allowing adherence to specific schemas and semantic constraints. Additionally, it offers a unified paradigm for combining language with other data modalities such as audio, images, and video. The company’s team includes leading researchers and engineers from Stanford, UCLA, Cornell, Google DeepMind, Meta AI, Microsoft AI, and OpenAI, and they are currently deploying these diffusion LLMs at Fortune 500 companies.

Explain what Inception does

Here are some prompts you can try with a diffusion-style LLM: 1. Explain a complex topic step by step, showing intermediate reasoning. 2. Generate multiple variations of a product tagline and refine them progressively. 3. Write a short story that improves its wording over several iterations. 4. Brainstorm startup ideas and evolve the best one through revisions. 5. Refactor a piece of code and show incremental improvements. 6. Describe an image concept and refine the details in stages. 7. Compare two technologies with increasingly deeper analysis. 8. Draft a landing page headline and iterate toward a clearer version. 9. Simulate a design critique that becomes more precise each step. 10. Turn rough notes into a polished summary through gradual refinement.

Suggest 10 prompts for a dLLM

Here are some prompts you can try with a diffusion-style LLM: 1. Explain a complex topic step by step, showing intermediate reasoning. 2. Generate multiple variations of a product tagline and refine them progressively. 3. Write a short story that improves its wording over several iterations. 4. Brainstorm startup ideas and evolve the best one through revisions. 5. Refactor a piece of code and show incremental improvements. 6. Describe an image concept and refine the details in stages. 7. Compare two technologies with increasingly deeper analysis. 8. Draft a landing page headline and iterate toward a clearer version. 9. Simulate a design critique that becomes more precise each step. 10. Turn rough notes into a polished summary through gradual refinement.

Suggest 10 prompts for a dLLM

Create a Javascript animation

Create a Javascript animation

Trusted by teams at

Trusted by teams at

The Mercury diffusion models introduce blazing fast inference with frontier quality at a fraction of the cost of other top-tier models.

The Mercury diffusion models introduce blazing fast inference with frontier quality at a fraction of the cost of other top-tier models.

Speed Benchmark

Tokens/sec

Speed Benchmark

Tokens/sec

The diffusion difference. From sequential to parallel

All other LLMs generate text one token at a time. Mercury diffusion LLMs (dLLMs) generate tokens in parallel, increasing speed and maximizing GPU efficiency.

Parallel Generation

Mercury

zap
mango
crisp
lunar
wobble
spin
felt
droop
echo

Sequential Generation

ChatGPT

The

Quick

Brown

Fox

Jumps

Over

The

Lazy

Dog

Parallel Generation

Mercury

zap
mango
crisp
lunar
wobble
spin
felt
droop
echo

Sequential Generation

ChatGPT

The

Quick

Brown

Fox

Jumps

Over

The

Lazy

Dog

Blazing-fast performance you can notice

Write code

Real-Time Voice

Instant Agents

Write code

Real-Time Voice

Instant Agents

Build the future of AI apps with Mercury

Lightning fast agents

Automate complex coding and other business workflows with with ultra-responsive AI.

Real-time voice

Engage naturally with AI in voice-powered workflows like customer support, translation, and immersive gaming.

Instant code editing

Stay in-the-flow with responsive autocomplete, intelligent tab suggestions, and fast chat responses.

Fast, creative co-pilots

Supercharge editorial and creative work—less waiting, more creating.

Rapid search

Instantly surface the right data from across your organization’s knowledge base.

Foundational models

Meet our family of diffusion models

The fastest reasoning LLM and the first reasoning dLLM. Ideal for complex applications where performance and speed are crucial.

Input $0.25 per 1M tokens

Output $0.75 per 1M tokens

Mercury 2

The fastest reasoning LLM and the first reasoning dLLM. Ideal for complex applications where performance and speed are crucial.

Input $0.25 per 1M tokens

Output $0.75 per 1M tokens

Mercury Edit

A small, coding-focused dLLM. Ideal for code editing and other extremely latency-sensitive components of coding workflows.

Input $0.25 per 1M tokens

Output $0.75 per 1M tokens

Mercury Edit

A small, coding-focused dLLM. Ideal for code editing and other extremely latency-sensitive components of coding workflows.

Input $0.25 per 1M tokens

Output $0.75 per 1M tokens

Enterprise-grade privacy and reliability

Enterprise-grade privacy and reliability

Enterprise-grade privacy and reliability

We’re available through major cloud providers like AWS Bedrock and Azure Foundry. Talk with us about fine-tuning and private deployments.

Integrate in seconds

Our models are OpenAI API compatible and a drop-in replacement for traditional LLMs.

Enterprise AI partner

We’re available through major cloud providers like AWS Bedrock and Azure Foundry.

Reliability at scale

Get 99.5%+ uptime and priority support with custom SLAs.

The future of LLMs is here

The future of LLMs is here