Apr 2, 2025
Product
Announcing our API
A few weeks ago, we announced the Mercury models, the first commercial-scale diffusion large language models (dLLMs). Today, we are excited to announce the release of the Inception API, providing programmatic access to our dLLMs!
We are launching the API service with support for Mercury Coder Small. Mercury Coder Small is a coding-focused model that runs more than 5x faster than speed-optimized frontier models like GPT-4o Mini and Claude 3.5 Haiku while matching them in quality.

Mercury Coder's speed means that developers can stay in the flow while coding, enjoying rapid chat-based iteration and responsive code completion suggestions. On Copilot Arena, Mercury Coder ranks 1st in speed and ties for 2nd in quality.
Mercury Coder is available on OpenRouter and integrates with many popular IDEs (see our docs). Our partners at Continue, a leading AI extension for VSCode, wrote a blog post describing how Mercury Coder represents a paradigm shift for developers.
Getting Started
To get started with the API, do the following:
Visit the Inception Platform and create your account. Please note that this account is separate from the account you may have created to access our playground.
Sign up for a billing plan and generate an API key in the dashboard.
Check out our quick start guide for implementation examples.
If you encounter any issues or have questions, please email support@inceptionlabs.ai or join our Discord channel.
Features
Inception API operates on an OpenAI-compatible API interface. This means you can use existing OpenAI client libraries or direct REST calls to access our services.
Here is an example curl call:
Mercury Coder Small can be queried via two endpoints:
chat/completions – This supports standard conversational interactions.
fim/completions – This supports fill-in-the-middle (FIM) workflows, where the model infills given a prefix and suffix.
Notable features of Mercury Coder Small include the following:
Context Window – 32k tokens
Max Output Tokens – 16k tokens
Supported Languages – Python, JavaScript, Java, TypeScript, Bash, SQL, C, C++, PHP, HTML, and more.
Support for tool use
The API supports streaming and other standard LLM parameters.
Diffusing Mode
We are providing a diffusing
parameter that enables users to visualize the diffusion process. When this parameter is True
, the model will stream blocks of noisy tokens that are steadily refined into the correct output. Note that the noisy tokens returned when this effect is enabled are not counted for billing purposes.

See the docs for implementation details.
What’s Next
We are actively adding features to the public API, including the following:
Support for longer contexts
Support for structured object generation
Access to reasoning models
You can submit feature requests at support@inceptionlabs.ai.
Related Reading