Models

Build high-performance AI apps with Mercury

Build high-performance AI apps with Mercury

Build high-performance AI apps with Mercury

Inception’s diffusion LLMs (dLLMs) deliver frontier LLM quality at 5x greater speed.

Trusted by teams at

Trusted by teams at

Overview

(Un)paralled speeds

Our models run at 1000+ tokens per second on commercial NVIDIA GPUs, enabling instant, in-the-flow AI solutions.

(Un)paralled speeds

Our models run at 1000 tokens per second on commercial NVIDIA GPUs, enabling instant, in-the-flow AI solutions.

Exceptional Quality

We match the intelligence of speed-optimized autoregressive models like GPT-5 mini and Claude Haiku 4.5.

Exceptional Quality

We match the intelligence of speed-optimized autoregressive models like GPT-5.2 and Claude Sonnet 4.5.

Seamless integration

Our models are OpenAI compatible and a drop-in replacement for traditional LLMs.

Discover our models

Mercury 2

The fastest reasoning LLM and our most powerful model. Ideal for complex applications where performance and speed are crucial.

The fastest reasoning LLM and our most powerful model. Ideal for complex applications where performance and speed are crucial.

Pricing

Input

$0.25 / 1M Tokens

Cached Input

$0.025 / 1M Tokens

Output

$0.75 / 1M Tokens

Features

128K context window

Reasoning

Tool use

Structured Output

Use cases

Rapid Coding Iteration

Workflow Subagents

Customer Support

Realtime Voice

Enterprise Search

Mercury Edit

A small, coding-focused dLLM. Ideal for code editing and other extremely latency-sensitive components of coding workflows.

A small, coding-focused dLLM. Ideal for code editing and other extremely latency-sensitive components of coding workflows.

Pricing

Input

$0.25 / 1M Tokens

Cached Input

$0.025 / 1M Tokens

Output

$0.75 / 1M Tokens

Features

128K context window

Tool use

Structured Output

Use cases

Autocomplete

Next Edit

* Mercury 1 remains supported for existing customers. For access or migration guidance, contact your Inception representative.

Get started with Mercury today

1

Create your account

Create an Inception Platform account or sign in directly if you already have one.

2

Create your API Key

Go to API Keys and create a new API key. New API keys comes with 10 million free tokens

3

Make your first request

We are OpenAI API compatible and are supported through libraries including AISuite, LiteLLM, and LangChain.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import requests
 
response = requests.post(
'https://api.inceptionlabs.ai/v1/chat/completions',
headers={
'Content-Type': 'application/json',
'Authorization': 'Bearer INCEPTION_API_KEY'
},
json={
'model': 'mercury-2',
'messages': [
{'role': 'user', 'content': 'What is a diffusion model?'}
],
'max_tokens': 1000
}
)
data = response.json()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import requests
 
response = requests.post(
'https://api.inceptionlabs.ai/v1/chat/completions',
headers={
'Content-Type': 'application/json',
'Authorization': 'Bearer INCEPTION_API_KEY'
},
json={
'model': 'mercury-2',
'messages': [
{'role': 'user', 'content': 'What is a diffusion model?'}
],
'max_tokens': 1000
}
)
data = response.json()

Pricing

Choose the access plan that works best for your needs

Free

For people that want to try out our playground

Access all models

10 million free tokens

Developer

For people that want to try out our playground

Usage-based pricing

Generous rate limits

Priority support

Enterprise

For people that want to try out our playground

Custom rate limits

SLA guarantees

Security and privacy

Volume-based pricing

The future of LLMs is here

The future of LLMs is here