Product

How we scaled our platform to 50 million agent runs daily

A deep dive into the infrastructure decisions that let us handle massive scale without breaking a sweat — or the bank.

Published on

Apr 25, 2026

Written by

Professional portrait of a man with folded arms against a soft gradient background.

David Park

Scenic lake with grassy shoreline, rocky foreground, and mountains under a bright blue sky.

Last month, we hit a milestone: 50 million agent runs processed in a single day. Zero downtime. Average latency under 100ms. And our infrastructure costs actually went down compared to the previous quarter.

This post is a behind-the-scenes look at how we got here.

When we started, like most startups, we ran everything on a single Kubernetes cluster. It worked fine for our first few customers. But as usage grew, we started hitting walls. Cold starts were killing our latency. Scaling was reactive, not predictive. And our AWS bill was becoming a recurring nightmare in our board meetings.

The first big change was moving to a multi-region architecture. We now run inference workloads across 12 regions globally, automatically routing requests to the nearest healthy cluster. This alone cut our p99 latency by 60%. Users in Singapore were no longer waiting for round-trips to us-east-1.

The second change was rethinking how we handle bursty workloads. AI agents are inherently unpredictable. A single customer might go from 100 requests per minute to 10,000 in seconds. Traditional auto-scaling couldn't keep up — by the time new instances spun up, the burst was over.

Our solution was predictive scaling based on historical patterns combined with aggressive warm pooling. We maintain a reserve of pre-warmed instances that can absorb traffic spikes instantly. The system learns each customer's usage patterns and pre-provisions capacity before they need it. It sounds expensive, but it's actually cheaper than reactive scaling because we waste fewer resources on cold starts.

The third change was optimizing our inference layer. We built a custom request batching system that groups similar queries together, maximizing GPU utilization without sacrificing latency. We also implemented speculative execution for multi-step agent workflows — starting likely next steps before the current step completes.

The results speak for themselves. Our infrastructure now handles 50M+ daily runs with 99.99% uptime. Median latency is 42ms. And we're doing it at a cost per request that's 3x lower than when we started.

We'll be open-sourcing parts of this infrastructure in the coming months. If you're interested in early access, drop us a line.

OTHER BLOGS

Explore other blogs

Apr 12, 2026

The future of work is agentic and it's already happening

AI agents aren't coming — they're already transforming how companies operate. Here's what we've learned from powering millions of agent interactions every day.

Apr 11, 2026

How Acme Corp cut their costs by 60% using AI agents

A deep dive into how one fast-growing startup automated their customer support without sacrificing quality — and actually improved customer satisfaction.

Apr 25, 2026

Advanced agent patterns: working with loops and more

Take your agents to the next level with advanced workflow patterns — from conditional logic to approval workflows to multi-agent orchestration.

Apr 25, 2026

Why we're betting everything on the future of AI agents

The future isn't chatbots or copilots — it's autonomous agents that get work done. Here's why we're building the infrastructure to make that happen.

Apr 21, 2026

How to build and deploy your first AI agent in 15 minutes

A step-by-step tutorial to create, test, and deploy a customer support agent that actually resolves tickets — not just responds to them.

Apr 18, 2026

Introducing the new visual workflow builder for AI agents

Design complex agent workflows with drag-and-drop simplicity. Connect LLMs, tools, and APIs visually — no infrastructure code required.

FAQ

Frequently asked questions

Everything you need to know to get started.

What's the difference between an agent and a chatbot?

An AI agent goes beyond simple Q&A. It can understand goals, make decisions, use tools, and take actions autonomously — like resolving tickets or running workflows.

Do I need coding experience to get started?

Not at all. Our visual canvas lets you build agents with drag-and-drop. For more control, our SDK and API offer full flexibility.

Which LLM providers do you support?

All major providers — OpenAI, Anthropic, Google Gemini, Mistral, Llama, Cohere, and more. Bring your own keys or use ours.

Is there a free tier available?

Yes. We offer a generous free tier to get started — no credit card required. Paid plans scale with your usage.

Can I self-host or deploy on-prem?

Yes. We offer cloud, hybrid, and on-prem deployment options to fit your security and compliance needs.

How long does it take to deploy an agent?

Minutes. Build your workflow, hit deploy, and you're live. No DevOps, no infrastructure setup — we handle all of it.

Can I connect my existing tools and APIs?

Yes. We integrate with 150+ tools out of the box — Slack, Notion, Salesforce, databases, and more. Custom APIs supported too.

What kind of support do you offer?

Free tier includes docs and community support. Paid plans get priority support, dedicated Slack channels, and onboarding help.

FAQ

Frequently asked questions

Everything you need to know to get started.

What's the difference between an agent and a chatbot?

An AI agent goes beyond simple Q&A. It can understand goals, make decisions, use tools, and take actions autonomously — like resolving tickets or running workflows.

Do I need coding experience to get started?

Not at all. Our visual canvas lets you build agents with drag-and-drop. For more control, our SDK and API offer full flexibility.

Which LLM providers do you support?

All major providers — OpenAI, Anthropic, Google Gemini, Mistral, Llama, Cohere, and more. Bring your own keys or use ours.

Is there a free tier available?

Yes. We offer a generous free tier to get started — no credit card required. Paid plans scale with your usage.

Can I self-host or deploy on-prem?

Yes. We offer cloud, hybrid, and on-prem deployment options to fit your security and compliance needs.

How long does it take to deploy an agent?

Minutes. Build your workflow, hit deploy, and you're live. No DevOps, no infrastructure setup — we handle all of it.

Can I connect my existing tools and APIs?

Yes. We integrate with 150+ tools out of the box — Slack, Notion, Salesforce, databases, and more. Custom APIs supported too.

What kind of support do you offer?

Free tier includes docs and community support. Paid plans get priority support, dedicated Slack channels, and onboarding help.

FAQ

Frequently asked questions

Everything you need to know to get started.

What's the difference between an agent and a chatbot?

An AI agent goes beyond simple Q&A. It can understand goals, make decisions, use tools, and take actions autonomously — like resolving tickets or running workflows.

Do I need coding experience to get started?

Not at all. Our visual canvas lets you build agents with drag-and-drop. For more control, our SDK and API offer full flexibility.

Which LLM providers do you support?

All major providers — OpenAI, Anthropic, Google Gemini, Mistral, Llama, Cohere, and more. Bring your own keys or use ours.

Is there a free tier available?

Yes. We offer a generous free tier to get started — no credit card required. Paid plans scale with your usage.

Can I self-host or deploy on-prem?

Yes. We offer cloud, hybrid, and on-prem deployment options to fit your security and compliance needs.

How long does it take to deploy an agent?

Minutes. Build your workflow, hit deploy, and you're live. No DevOps, no infrastructure setup — we handle all of it.

Can I connect my existing tools and APIs?

Yes. We integrate with 150+ tools out of the box — Slack, Notion, Salesforce, databases, and more. Custom APIs supported too.

What kind of support do you offer?

Free tier includes docs and community support. Paid plans get priority support, dedicated Slack channels, and onboarding help.

Ready to ship your first agent?

Get started free. No credit card required.

Book a Demo

Ready to ship your first agent?

Get started free. No credit card required.

Book a Demo

Ready to ship your first agent?

Get started free. No credit card required.

Book a Demo