Rate Limiting

Intent & Description

🎯 Intent

Cap the number of requests, tokens, or tool calls per user (or session) within a time window.

📋 Context

A team runs a multi-tenant agent product where many users share the same backend resources — token budgets with model providers, tool API quotas, compute capacity. Any one of those users can, accidentally or maliciously, send much more traffic than the operator priced for: a runaway script, a compromised account, or simply a single power user opening hundreds of concurrent sessions.

💡 Solution

Define limits per identity at multiple horizons (per minute, per hour, per day). Use token-bucket or sliding-window counters. Apply at API gateway and at agent loop level. Surface limit hits to the user clearly.

Real-world Use Case

A single user or compromised account could otherwise bankrupt the product or starve others.
Limits per identity can be enforced at API gateway and inside the agent loop.
Limit hits can be surfaced to users in a clear, actionable way.

Source

View Original Source →

Intent & Description

🎯 Intent

📋 Context

💡 Solution

Real-world Use Case

Source

Advantages

Disadvantages