Nov 18, 2025

Partnerships

Mercury Diffusion LLM Now Available on Azure AI Foundry

Mercury Diffusion LLM Now Available on Azure AI Foundry

Akash Palrecha

Akash Palrecha

Today, we're thrilled to announce that Mercury is available on Azure AI Foundry, bringing the first commercial-scale diffusion large language model (dLLM) to enterprise developers. This release combines Inception's breakthrough diffusion architecture with Azure's enterprise-ready infrastructure, giving developers access to a model that delivers both exceptional speed and quality.

Built by the team behind foundational AI technologies including Flash Attention, Direct Preference Optimization, and the original diffusion models for images, Mercury represents a fundamental shift in how language models generate text. Developers on Azure AI Foundry now have access to a model that redefines what's possible in real-time AI applications.

A New Architecture for Language Generation

Traditional language models generate text sequentially, one token at a time. This creates an inherent bottleneck where each token must wait for all previous tokens to be generated. Mercury uses a diffusion-based architecture to generate multiple tokens in parallel, enabling dramatically faster inference. The result: Mercury runs up to 10x faster than comparable autoregressive models.

Performance: Speed & Quality

Mercury provides frontier quality with unparalleled speeds. Across knowledge, coding, instruction following, and mathematical benchmarks, Mercury performs on par with models like Gemini 2.5 Flash and Claude 4.5 Haiku, while running up to 10x faster.

Enterprise-Ready on Azure

Mercury on Azure AI Foundry is production-ready out of the box. It features a 128K token context window for processing large documents and maintaining extensive conversations, with native tool calling and structured output support using JSON schemas for building agentic workflows. The API is OpenAI-compatible, making integration with existing codebases seamless.

Azure AI Foundry provides enterprise-grade infrastructure, including network isolation, data privacy guarantees ensuring your data stays in your Azure environment and is never used for training, Azure compliance standards including SOC2 and HIPAA, and comprehensive observability through Azure Monitor and Application Insights.

Real-World Use Cases

Mercury's speed-quality combination makes applications faster and more responsive:

  • Coding assistants: Stay in flow with responsive autocomplete, intelligent tab suggestions, fast chat responses, and more.

  • Real-time voice agents: Engage naturally with AI for customer support, translation, and beyond.

  • Seamless enterprise workflows: Automate complex routing, analytics, and decision processes with ultra-responsive AI.

  • Rapid enterprise search: Instantly surface the right data from across your organization’s knowledge base.

Deploy on Azure AI Foundry

Mercury is available in the US and Canada regions through Azure AI Foundry. Deployment is straightforward—provision your model endpoint and configure your infrastructure through Azure AI Foundry's unified catalog. The Mercury software license costs $0.78/hour, with compute costs billed separately through your Azure account based on the resources you provision.

Azure AI Foundry Integration:

Mercury integrates seamlessly with the broader Azure ecosystem. Deploy using Azure AI Foundry's model catalog, apply Azure AI Content Safety for content filtering, monitor performance and costs in real-time, manage access with Azure RBAC and managed identities, and build multi-model applications using Azure's agent framework.

Get Started in Three Steps

  1. Navigate to Azure AI Foundry’s Model Catalog, search for Mercury, and click on the model card. Alternatively, you can visit our model card here

  2. Click on Use this model, choose a project, and follow along the UI prompts to provision your desired capacity.

  3. While the default configuration should work for most developers, we recommend choosing ND-H100-v5 instances for optimal speeds. The deployment should take a few minutes to finish.

That’s it! You should now be able to start building with your newly provisioned API endpoint:

curl --location 'https://mercury-endpoint-ftecx.eastus.inference.ml.azure.com/v1/chat/completions' \\
--header 'azureml-model-deployment: mercury-1' \\
--header 'Content-Type: application/json' \\
--header 'Authorization: Bearer <your-api-key>' \\
--data '{
  "model": "mercury",
  "messages":  [
    { "role": "user", "content": "Hello!" }
  ],
  "stream": true,
  "temperature": 0.0,
  "max_tokens": 512
}'

In Python:

import requests
import json

url = "https://mercury-endpoint-ftecx.eastus.inference.ml.azure.com/v1/chat/completions"

payload = json.dumps({
  "model": "mercury",
  "messages": [
    {
      "role": "user",
      "content": "Hello!"
    }
  ],
  "stream": False,
  "temperature": 0,
  "max_tokens": 512
})
headers = {
  'azureml-model-deployment': 'mercury-1',
  'Content-Type': 'application/json',
  'Authorization': 'Bearer <your-api-key>'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(json.loads(response.text))

The Future of Real-Time AI

The launch of Mercury on Azure AI Foundry marks a turning point for production AI applications. Developers no longer have to choose between speed and quality, between real-time responsiveness and frontier-model capabilities.

Deploy Mercury on Azure AI Foundry →

For detailed documentation, benchmarks, and integration guides, visit the Mercury model card in Azure AI Foundry.

Related Reading