Product

How we scaled our platform to 50 million agent runs daily

A deep dive into the infrastructure decisions that let us handle massive scale without breaking a sweat — or the bank.

Published on

Written by

Professional portrait of a man with folded arms against a soft gradient background.

David Park

Scenic lake with grassy shoreline, rocky foreground, and mountains under a bright blue sky.

Last month, we hit a milestone: 50 million agent runs processed in a single day. Zero downtime. Average latency under 100ms. And our infrastructure costs actually went down compared to the previous quarter.

This post is a behind-the-scenes look at how we got here.

When we started, like most startups, we ran everything on a single Kubernetes cluster. It worked fine for our first few customers. But as usage grew, we started hitting walls. Cold starts were killing our latency. Scaling was reactive, not predictive. And our AWS bill was becoming a recurring nightmare in our board meetings.

The first big change was moving to a multi-region architecture. We now run inference workloads across 12 regions globally, automatically routing requests to the nearest healthy cluster. This alone cut our p99 latency by 60%. Users in Singapore were no longer waiting for round-trips to us-east-1.

The second change was rethinking how we handle bursty workloads. AI agents are inherently unpredictable. A single customer might go from 100 requests per minute to 10,000 in seconds. Traditional auto-scaling couldn't keep up — by the time new instances spun up, the burst was over.

Our solution was predictive scaling based on historical patterns combined with aggressive warm pooling. We maintain a reserve of pre-warmed instances that can absorb traffic spikes instantly. The system learns each customer's usage patterns and pre-provisions capacity before they need it. It sounds expensive, but it's actually cheaper than reactive scaling because we waste fewer resources on cold starts.

The third change was optimizing our inference layer. We built a custom request batching system that groups similar queries together, maximizing GPU utilization without sacrificing latency. We also implemented speculative execution for multi-step agent workflows — starting likely next steps before the current step completes.

The results speak for themselves. Our infrastructure now handles 50M+ daily runs with 99.99% uptime. Median latency is 42ms. And we're doing it at a cost per request that's 3x lower than when we started.

We'll be open-sourcing parts of this infrastructure in the coming months. If you're interested in early access, drop us a line.

FAQ

Frequently asked questions

Everything you need to know to get started.

What's the difference between an agent and a chatbot?

An AI agent goes beyond simple Q&A. It can understand goals, make decisions, use tools, and take actions autonomously — like resolving tickets or running workflows.

Do I need coding experience to get started?

Not at all. Our visual canvas lets you build agents with drag-and-drop. For more control, our SDK and API offer full flexibility.

Which LLM providers do you support?

All major providers — OpenAI, Anthropic, Google Gemini, Mistral, Llama, Cohere, and more. Bring your own keys or use ours.

Is there a free tier available?

Yes. We offer a generous free tier to get started — no credit card required. Paid plans scale with your usage.

Can I self-host or deploy on-prem?

Yes. We offer cloud, hybrid, and on-prem deployment options to fit your security and compliance needs.

How long does it take to deploy an agent?

Minutes. Build your workflow, hit deploy, and you're live. No DevOps, no infrastructure setup — we handle all of it.

Can I connect my existing tools and APIs?

Yes. We integrate with 150+ tools out of the box — Slack, Notion, Salesforce, databases, and more. Custom APIs supported too.

What kind of support do you offer?

Free tier includes docs and community support. Paid plans get priority support, dedicated Slack channels, and onboarding help.

FAQ

Frequently asked questions

Everything you need to know to get started.

What's the difference between an agent and a chatbot?

An AI agent goes beyond simple Q&A. It can understand goals, make decisions, use tools, and take actions autonomously — like resolving tickets or running workflows.

Do I need coding experience to get started?

Not at all. Our visual canvas lets you build agents with drag-and-drop. For more control, our SDK and API offer full flexibility.

Which LLM providers do you support?

All major providers — OpenAI, Anthropic, Google Gemini, Mistral, Llama, Cohere, and more. Bring your own keys or use ours.

Is there a free tier available?

Yes. We offer a generous free tier to get started — no credit card required. Paid plans scale with your usage.

Can I self-host or deploy on-prem?

Yes. We offer cloud, hybrid, and on-prem deployment options to fit your security and compliance needs.

How long does it take to deploy an agent?

Minutes. Build your workflow, hit deploy, and you're live. No DevOps, no infrastructure setup — we handle all of it.

Can I connect my existing tools and APIs?

Yes. We integrate with 150+ tools out of the box — Slack, Notion, Salesforce, databases, and more. Custom APIs supported too.

What kind of support do you offer?

Free tier includes docs and community support. Paid plans get priority support, dedicated Slack channels, and onboarding help.

FAQ

Frequently asked questions

Everything you need to know to get started.

What's the difference between an agent and a chatbot?

An AI agent goes beyond simple Q&A. It can understand goals, make decisions, use tools, and take actions autonomously — like resolving tickets or running workflows.

Do I need coding experience to get started?

Not at all. Our visual canvas lets you build agents with drag-and-drop. For more control, our SDK and API offer full flexibility.

Which LLM providers do you support?

All major providers — OpenAI, Anthropic, Google Gemini, Mistral, Llama, Cohere, and more. Bring your own keys or use ours.

Is there a free tier available?

Yes. We offer a generous free tier to get started — no credit card required. Paid plans scale with your usage.

Can I self-host or deploy on-prem?

Yes. We offer cloud, hybrid, and on-prem deployment options to fit your security and compliance needs.

How long does it take to deploy an agent?

Minutes. Build your workflow, hit deploy, and you're live. No DevOps, no infrastructure setup — we handle all of it.

Can I connect my existing tools and APIs?

Yes. We integrate with 150+ tools out of the box — Slack, Notion, Salesforce, databases, and more. Custom APIs supported too.

What kind of support do you offer?

Free tier includes docs and community support. Paid plans get priority support, dedicated Slack channels, and onboarding help.

Ready to ship your first agent?

Get started free. No credit card required.

Ready to ship your first agent?

Get started free. No credit card required.

Ready to ship your first agent?

Get started free. No credit card required.

Create a free website with Framer, the website builder loved by startups, designers and agencies.