Rate Limiting, Throttling & Circuit Breakers — Practical Techniques We Use in Enterprise Web Apps

Dung Le

Rate Limiting, Throttling & Circuit Breakers — Practical Techniques We Use in Enterprise Web Apps

Summary:
To keep enterprise systems stable, responsive, and secure, we must protect our APIs and web applications from overload. This includes controlling traffic spikes, preventing bad bots from burning CPU, and ensuring downstream services do not fail under pressure. In this article, we share the practical techniques we use—Rate Limiting, Throttling, Circuit Breakers, and Retry with Polly—and discuss both their benefits and limitations, plus how we monitor them in real environments.

1. Why Overload Protection Matters in Enterprise Systems

Modern enterprise applications face several risks:

Sudden spikes of traffic from integrations or automated tools
Bad bots crawling internal pages and causing high CPU
Users rapidly refreshing or resubmitting heavy operations
Downstream systems failing and causing chain reactions

Without the right protections, these situations can lead to:

High CPU and memory usage
API timeouts and long response times
Database overload and deadlocks
Complete system downtime

To avoid this, we apply several defensive patterns across the entire architecture: at the edge (Azure Front Door), inside the app (.NET middleware and Polly), and between services.

2. Rate Limiting — Controlling Traffic with Azure Front Door

Rate limiting restricts how many requests a client can send in a specific time window. We use Azure Front Door to apply rate limits and protect the application before requests reach the web server.

✔ Why We Use Azure Front Door

Blocks or slows down bad bots that crawl aggressively
Reduces CPU load on the application by stopping excess traffic at the edge
Works together with WAF to enforce security rules
Protects public endpoints from abuse or misbehaving clients

✔ Typical Rate-Limit Rules

Limit requests per IP (for example: 100 requests/minute)
Stricter limits on heavy endpoints (exports, reports)
Different rules for anonymous vs authenticated requests

✔ Weaknesses of Rate Limiting & How We Mitigate Them

Rate limiting is powerful but not perfect. There are some real-world issues:

Good bots (search engines) can be accidentally slowed or blocked, impacting SEO.
Corporate networks behind one NAT IP can hit the same limit, blocking multiple internal users sharing a single external IP.
WAF false positives: some requests with valid cookies or parameters may be flagged as SQL injection or XSS and temporarily blocked.

To reduce these problems, we can:

Tune rate limits per route: apply strict limits to heavy POST/PUT endpoints, but keep more generous limits for read-only GET endpoints.
Use more than just IP: where possible, include user identity or API key as part of the limit key, not only IP.
Allowlist known good bots (carefully) by User-Agent or known IP ranges for important public endpoints, but still monitor them.
Monitor WAF logs to detect false-positive SQL injection / XSS rules and create exclusion rules for specific headers, cookies, or paths.
Log all 429 (Too Many Requests) and 403 (Forbidden) and review dashboards regularly to ensure we are not blocking valid users.

The key is to treat rate limiting and WAF rules as living configurations that must be monitored and tuned over time.

3. Throttling — Preventing the Server from Overworking

While rate limiting protects at the edge, throttling is about what happens inside the application when too many operations are requested at once.

Examples of throttling strategies:

Limit how many heavy background jobs run in parallel
Use a semaphore or queue for expensive operations (e.g., big reports)
Reject or delay lower-priority operations when CPU or DB load is too high

✔ How Throttling Is Actually Applied

Common patterns include:

In-memory semaphores to allow only N concurrent operations
Queues (e.g., Azure Queue, Service Bus) where requests are processed by workers at a controlled rate
Per-endpoint throttling to ensure one heavy feature does not slow down all others

For example, a simple in-memory limit using a semaphore:

using (await _semaphoreSlim.WaitAsync())
{
    // Execute heavy operation
}

✔ Limitations of Throttling

It may increase latency for users when many requests are queued.
If not tuned, it can create bottlenecks and timeouts.
In multi-instance environments, each instance enforces its own throttle; globally coordinated throttling needs shared state (e.g., Redis, queue length).

Therefore, throttling must be monitored and tuned, not just turned on and forgotten.

4. Retry with Polly — and Why Retry Alone Is Not Enough

Retries are useful when errors are transient (temporary): a network blip, intermittent timeout, or brief service unavailability. We use Polly in .NET to implement this pattern with context and logging.

✔ Our Retry Pattern with Polly

using Polly;
using Polly.Retry;

return await _retryPolicy.ExecuteAsync(
    action: async (context) =>
    {
        context.TryAdd(RetryPolicyContextKeys.StartTime, DateTime.UtcNow);
        try
        {
            return await next();
        }
        catch
        {
            _logger.LogDebug(
                "Exception occurred processing {RequestType} with content: {@RequestContent}",
                typeof(TRequest).Name, request);
            throw;
        }
    },
    contextData: new Context());

We usually combine this with exponential backoff, e.g. retry after 1s, 2s, 4s, etc., to reduce pressure on the server.

✔ Retry Pitfalls and How We Solve Them

Retries are not always safe. Some real issues:

A request may perform multiple actions: create a main object, then create related objects. If it fails halfway and we simply retry, the main object may be created again → duplicate data.
If the database or server is completely down, retrying immediately cannot fix it; it just adds more load once it starts to recover.

To mitigate this, we:

Use idempotency for create operations (idempotency keys or business keys) so repeated requests do not create duplicates.
Wrap multi-step operations in transactions where possible:

using var tx = await _db.Database.BeginTransactionAsync();
try
{
    // 1. Insert main object
    // 2. Insert related objects
    await tx.CommitAsync();
}
catch
{
    await tx.RollbackAsync();
    throw;
}

Only apply retry to operations known to be safe and idempotent (e.g., GET data, idempotent POST with keys).
Use max retry count and backoff to avoid hammering a failing service.
Combine retry with a circuit breaker so repeated failures stop quickly.

Retry is powerful, but without idempotency and transactions, it can create more problems than it solves.

5. Circuit Breaker — Stopping Calls to Unhealthy Dependencies

Circuit breakers prevent the application from repeatedly calling a service that is likely to fail. After enough failures, the circuit opens and short-circuits further calls for a while.

States:

Closed — normal operations
Open — calls are blocked immediately, fallback or error is returned
Half-Open — a few test calls are allowed to check if the dependency has recovered

✔ Circuit Breaker with Polly in .NET

At the application level, we can use Polly like this:

var circuitBreakerPolicy = Policy
    .Handle<Exception>()
    .CircuitBreakerAsync(
        exceptionsAllowedBeforeBreaking: 5,
        durationOfBreak: TimeSpan.FromSeconds(30),
        onBreak: (ex, breakDelay) => 
        {
            _logger.LogWarning("Circuit opened for 30s due to {Message}", ex.Message);
        },
        onReset: () => 
        {
            _logger.LogInformation("Circuit reset");
        },
        onHalfOpen: () => 
        {
            _logger.LogInformation("Circuit in half-open state");
        });

This can wrap outbound HTTP calls, database calls, or other external dependencies.

✔ Circuit Breaker in Azure and Multi-Instance Scenarios

When we run multiple web instances, each instance with Polly has its own circuit state. That means some instances might be open while others are closed.
For many cases, this is acceptable, because each instance independently detects failures.
If we want shared circuit state, we need a centralized store (e.g., Redis) or to use an API gateway pattern that implements circuit breaking at the gateway level.

Azure itself does not automatically apply circuit breaking for our app code, but:

Azure Front Door / API Management can be used to throttle or temporarily block calls to a failing backend.
Application-level circuit breakers using Polly are still the main mechanism inside .NET.

6. Combined Architecture Diagram (Text)

[Client Requests]
        ↓
Azure Front Door
  - Rate Limit
  - Bot Protection
  - WAF Rules
        ↓
Web Application / API (.NET)
  - Throttling / Semaphores
  - Structured Logging
        ↓
Polly Resilience Layer
  - Retry with Backoff
  - Circuit Breaker
        ↓
Downstream Services
  - Database
  - External APIs
  - Storage

Each layer has a job: edge protection, internal protection, and dependency protection.

7. Best Practices We Follow

Apply rate limiting and WAF at the edge using Azure Front Door.
Use throttling inside the app for heavy operations (reports, bulk updates).
Apply retry only to safe, idempotent operations with exponential backoff.
Combine retry with circuit breaker to avoid hitting unhealthy services repeatedly.
Log all 429, 503, and WAF block events and review them regularly.
Revisit limits and rules when the system or usage patterns change.

8. How We Monitor Whether These Protections Work

Implementing rate limits, throttling, retry, and circuit breakers is only half of the job. The other half is monitoring them so the maintenance team can confirm they are working and adjust when needed.

✔ 1. Monitor Edge Metrics (Azure Front Door / WAF)

Number of blocked requests by WAF rules
Number of requests hit by rate limiting (429 responses)
Top IPs, User-Agents, and paths causing blocks

We look for patterns such as:

Are we blocking too many legitimate users from the same corporate IP?
Are good bots (search engines) being blocked too aggressively?
Do some rules produce many false positives (e.g., SQL injection on valid cookies)?

If we see problems, we tune the rules, adjust thresholds, or add safe exceptions.

✔ 2. Monitor Application-Level Metrics

Inside the app (.NET), we send metrics and logs to a centralized platform (e.g., Application Insights, ELK, Splunk). Key things to track:

Count of retry attempts per dependency
How often circuit breaker opens and for how long
Number of requests rejected or delayed by throttling
Average and P95/P99 response times
Error rate per endpoint and per build version

Example of logging when the circuit opens or closes:

_logger.LogWarning("Circuit opened for dependency {Name}", "ExternalApiX");
_logger.LogInformation("Circuit reset for dependency {Name}", "ExternalApiX");

✔ 3. Dashboards & Alerts for Maintenance Team

We build dashboards that show:

Requests per minute vs 429 (rate-limited) responses
Errors and warnings trend over time
Circuit breaker open events per service
Retry count spikes for a specific dependency

Alert examples:

“WAF blocked > X requests in the last 10 minutes”
“Circuit breaker for Service A opened more than Y times in 1 hour”
“Retry attempts for DB increased 3x compared to yesterday”

These alerts give the maintenance/on-call team a chance to act before users complain.

✔ 4. Regular Review and Tuning

Overload protection and security are not “set and forget” features. The team should periodically:

Review WAF logs for false positives and tune rules
Adjust rate-limit thresholds as load grows
Refine retry and circuit breaker settings per dependency
Check if throttling limits still make sense as business usage changes

This continual improvement loop keeps protections effective and minimizes impact on legitimate users.

9. Conclusion

Protecting enterprise web applications from overload requires a combination of techniques, each with strengths and limitations. By using Azure Front Door for rate limiting and bot protection, throttling inside the app, Polly-based retry and circuit breakers for downstream services, and by monitoring all of these with good dashboards and alerts, we build systems that are stable, scalable, and easier to maintain.

These patterns not only protect infrastructure and performance but also improve the overall experience for users and support teams.

Dung Le

I am a Technical Lead with over eight years at the company, specializing in system migration, performance optimization, stabilizing legacy modules, and improving development processes. I enjoy solving complex problems, supporting team growth, and continuously learning new technologies to deliver better, more reliable solutions for the business.

Solutions

Industry

Our thinking

Rate Limiting, Throttling & Circuit Breakers — Practical Techniques We Use in Enterprise Web Apps

Dung Le

Table of Contents

Rate Limiting, Throttling & Circuit Breakers — Practical Techniques We Use in Enterprise Web Apps

1. Why Overload Protection Matters in Enterprise Systems

2. Rate Limiting — Controlling Traffic with Azure Front Door

✔ Why We Use Azure Front Door

✔ Typical Rate-Limit Rules

✔ Weaknesses of Rate Limiting & How We Mitigate Them

3. Throttling — Preventing the Server from Overworking

✔ How Throttling Is Actually Applied

✔ Limitations of Throttling

4. Retry with Polly — and Why Retry Alone Is Not Enough

✔ Our Retry Pattern with Polly

✔ Retry Pitfalls and How We Solve Them

5. Circuit Breaker — Stopping Calls to Unhealthy Dependencies

✔ Circuit Breaker with Polly in .NET

✔ Circuit Breaker in Azure and Multi-Instance Scenarios

6. Combined Architecture Diagram (Text)

7. Best Practices We Follow

8. How We Monitor Whether These Protections Work

✔ 1. Monitor Edge Metrics (Azure Front Door / WAF)

✔ 2. Monitor Application-Level Metrics

✔ 3. Dashboards & Alerts for Maintenance Team

✔ 4. Regular Review and Tuning

9. Conclusion

Dung Le

Leave a Comment Cancel Reply

Suggested Article

NashTech

Solutions

Useful links

Connect with us

Our achievements