Jul 6, 2025
Customers
Buildglare is a natural-language, AI-powered low-code platform for rapidly creating full-stack websites. Users chat-to-build pages and components, and Buildglare deploys them instantly via Cloudflare Workers. However, in practice, websites grow complex, and even a single file can contain hundreds of lines of code. Re-generating entire files for every change would be painfully slow and expensive. Instead, Buildglare needed a way to apply partial code edits – updating only the relevant segments of a file in place.
“When we first tried Inception’s diffusion model, we were convinced it was exactly what we needed. Its speed, quality, and cost amazed us,” recalls Zuch Chuang, CEO of Buildglare.
Inception’s Mercury Coder, the first commercial diffusion Large Language Model (dLLM), proved ideal for Buildglare’s workflow. Mercury’s diffusion approach generates code in parallel passes rather than token-by-token, making in-place edits fast and efficient. Buildglare integrated Mercury Coder to handle all “code apply” tasks (similar to the Apply feature in AI IDEs like Cursor), while reserving a larger model for only initial planning. This hybrid strategy (Claude for high-level suggestions, Mercury for patching) dramatically cut latency and token costs.
Challenge: Efficient Partial Code Updates
In AI site building, users frequently refine specific features of a page. For example, adding a new form field or changing a style rule should only update a fragment of the code, not regenerate every line. But most LLMs excel at full-file generation and struggle with large-context edits. Moreover, state-of-the-art chat models like Anthropic’s Claude are powerful but costly and relatively slow (Claude Opus 4 charges about $15 per million output tokens ). Running such models on every user change would make the editor laggy and expensive.
Buildglare’s solution was to use Claude (or similar) only to draft what needs to change, then employ a fast diffusion model to actually apply those changes. This mirrors the approach taken by the AI code editor Cursor: heavy models generate a diff, then a lighter model “apply”s it. By doing this, Buildglare ensures that major code edits happen in real time without blowing up infrastructure costs.
Solution: Mercury Coder Diffusion LLM
Inception’s Mercury Coder excels at this use-case. Unlike autoregressive LLMs, Mercury Coder uses a diffusion architecture, refining its output in parallel passes. According to 3rd party benchmarking from Artificial Analysis, Mercury Coder matches the quality of speed-optimized models like Claude 3.5 Haiku and GPT-4o Mini while running 5-10x faster. And on real-world developer tests from CoPilot Arena, Mercury ranks 1st in speed while tying for 1st in quality. For Buildglare, these figures translated to real-world gains. Applying Mercury Coder meant that multi-page or multi-file edits happen almost instantly in Buildglare’s chat interface, keeping users in “flow”. Buildglare’s engineers note that Mercury Coder’s parallel edits feel like an ideal solution for the partial-update scenario.
Workflow & Implementation
Buildglare’s editor uses a mixed-model pipeline. When a user asks to change code, the system may invoke Claude or ChatGPT to interpret the intent and outline the change (e.g. “add a search input to the header component”). That model outputs a concise code diff or patch description. Next, Mercury Coder takes over: it receives the existing file content plus the diff instructions, and generates the updated file content in place. Thanks to its diffusion architecture, Mercury efficiently integrates the snippet into the file, updating only the relevant token spans.
This approach minimizes overall cost: the larger model only needs to output a small snippet of text, while Mercury (which costs far less to run) handles the heavy lifting of embedding that snippet into the codebase. In fact, Mercury’s cost structure is exceptionally low: just $0.25 per million input tokens and $1 per million output tokens. By contrast, even Claude’s smaller “Sonnet” model charges around $15 per million output tokens. In Buildglare’s usage, that means Mercury is roughly an order of magnitude cheaper for the actual code output phase.
Conclusion
Buildglare demonstrates how diffusion-based LLMs can transform real-world developer tools. By integrating Inception’s Mercury Coder, Buildglare optimized its low-code site builder for instant partial code modifications. The platform now offers users a near real-time AI coding assistant: changes are applied within seconds, with production-quality code, and at minimal cost.
In summary, Mercury Coder gave Buildglare exactly the recipe for success – matching the performance of heavyweight models on code tasks, but running 5–10x faster and charging only pennies per edit. As one metrics slide puts it, Mercury “pushes the frontier of AI capabilities” by delivering high-quality output and blazing speed. This synergy of speed, quality, and efficiency makes Mercury an ideal fit for Buildglare’s vision of AI-powered, instant website creation.
Related Reading