Mercury 2 and the Rise of Real-time Subagents

Learn more

Mercury 2 and the Rise of Real-time Subagents

Models

Build high-performance AI apps with Mercury

Inception’s diffusion LLMs (dLLMs) deliver frontier LLM quality at 5x greater speed.

Get Started

Trusted by teams at

Overview

(Un)paralled speeds

Our models run at 1000+ tokens per second on commercial NVIDIA GPUs, enabling instant, in-the-flow AI solutions.

(Un)paralled speeds

Our models run at 1000 tokens per second on commercial NVIDIA GPUs, enabling instant, in-the-flow AI solutions.

Exceptional Quality

We match the intelligence of speed-optimized autoregressive models like GPT-5 mini and Claude Haiku 4.5.

Exceptional Quality

We match the intelligence of speed-optimized autoregressive models like GPT-5.2 and Claude Sonnet 4.5.

Seamless integration

Our models are OpenAI compatible and a drop-in replacement for traditional LLMs.

Discover our models

Mercury 2

The fastest reasoning LLM and our most powerful model. Ideal for complex applications where performance and speed are crucial.

Get Started

Docs

Pricing

Input

$0.25 / 1M Tokens

Cached Input

$0.025 / 1M Tokens

Output

$0.75 / 1M Tokens

Features

128K context window

Reasoning

Tool use

Structured Output

Use cases

Rapid Coding Iteration

Workflow Subagents

Customer Support

Realtime Voice

Enterprise Search

Mercury Edit 2

A small, coding-focused dLLM. Ideal for code editing and other extremely latency-sensitive components of coding workflows.

Get Started

Docs

Pricing

Input

$0.25 / 1M Tokens

Cached Input

$0.025 / 1M Tokens

Output

$0.75 / 1M Tokens

Features

32K context window

Use cases

Autocomplete

Next Edit

* Mercury 1 remains supported for existing customers. For access or migration guidance, contact your Inception representative.

Get started with Mercury today

Read our Docs

Create your account

Create an Inception Platform account or sign in directly if you already have one.

Create your API Key

Go to API Keys and create a new API key. New API keys comes with 10 million free tokens

Make your first request

We are OpenAI API compatible and are supported through libraries including AISuite, LiteLLM, and LangChain.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import requests
 
response = requests.post(
    'https://api.inceptionlabs.ai/v1/chat/completions',
    headers={
        'Content-Type': 'application/json',
        'Authorization': 'Bearer INCEPTION_API_KEY'
    },
    json={
        'model': 'mercury-2',
        'messages': [
            {'role': 'user', 'content': 'What is a diffusion model?'}
        ],
        'max_tokens': 1000
    }
)
data = response.json()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import requests
 
response = requests.post(
    'https://api.inceptionlabs.ai/v1/chat/completions',
    headers={
        'Content-Type': 'application/json',
        'Authorization': 'Bearer INCEPTION_API_KEY'
    },
    json={
        'model': 'mercury-2',
        'messages': [
            {'role': 'user', 'content': 'What is a diffusion model?'}
        ],
        'max_tokens': 1000
    }
)
data = response.json()