Your application works perfectly in development. In production, the database hiccups, the third-party API returns 503s, and the network drops packets.
Resilience patterns are the difference between a service that recovers gracefully and one that cascades into a complete outage.
Common questions this answers
- When should I use retry vs circuit breaker vs both?
- How do I configure exponential backoff with jitter?
- What are the circuit breaker states and how do I tune thresholds?
- How do I integrate Polly with HttpClientFactory?
- When should I NOT retry an operation?
Definition (what this means in practice)
Resilience patterns are strategies for handling transient failures in distributed systems. They enable applications to recover from temporary issues (retry), protect against cascading failures (circuit breaker), and prevent resource exhaustion (timeout, bulkhead).
In practice, this means configuring policies that determine how your application responds when external dependencies fail, slow down, or become unavailable.
Terms used
- Transient failure: a temporary failure that resolves itself (network blip, temporary overload).
- Circuit breaker: a pattern that stops calling a failing service, giving it time to recover.
- Exponential backoff: increasing wait times between retries (2s, 4s, 8s, 16s...).
- Jitter: randomness added to backoff to prevent thundering herd when many clients retry simultaneously.
- Bulkhead: isolating failures by limiting concurrent operations to a resource.
- Hedging: sending parallel requests to reduce latency by using the fastest response.
Reader contract
This article is for:
- Engineers building services that call external APIs or databases.
- Teams implementing fault tolerance in distributed systems.
You will leave with:
- A decision framework for choosing resilience patterns.
- Production-ready configurations for retry, circuit breaker, and timeout.
- HttpClientFactory integration patterns.
This is not for:
- Saga pattern or distributed transactions (different problem domain).
- Distributed circuit breakers across multiple service instances.
Quick start (10 minutes)
If you do nothing else, do this:
Verified on: ASP.NET Core (.NET 10).
The modern approach uses Microsoft.Extensions.Http.Resilience, which replaces the older Microsoft.Extensions.Http.Polly package.
- Add the resilience package:
dotnet add package Microsoft.Extensions.Http.Resilience
- Add the standard resilience handler to your HttpClient:
// Program.cs
builder.Services.AddHttpClient<IPaymentService, PaymentService>(client =>
{
client.BaseAddress = new Uri("https://api.payments.example.com");
})
.AddStandardResilienceHandler();
This single line adds:
- Rate limiter (1,000 permits)
- Total timeout (30 seconds)
- Retry (3 attempts, exponential backoff with jitter)
- Circuit breaker (10% failure ratio threshold)
- Per-attempt timeout (10 seconds)
For most HTTP client scenarios, the standard handler is production-ready out of the box.
When to use each pattern
| Pattern | Use When | Example |
|---|---|---|
| Retry | Transient failures likely to resolve | Network timeout, HTTP 503 |
| Circuit Breaker | Service may be down for extended period | Database outage, API rate limit |
| Timeout | Operation might hang indefinitely | Slow third-party API |
| Bulkhead | Need to isolate failures | Multiple downstream services |
| Fallback | Have alternative behavior | Return cached data on failure |
Retry patterns
Retry handles transient failures by re-executing the operation after a delay.
Exponential backoff
Each retry waits longer than the previous one, giving the failing service time to recover:
builder.Services.AddHttpClient<IOrderService, OrderService>()
.AddResilienceHandler("retry-pipeline", builder =>
{
builder.AddRetry(new HttpRetryStrategyOptions
{
MaxRetryAttempts = 3,
Delay = TimeSpan.FromSeconds(2),
BackoffType = DelayBackoffType.Exponential,
UseJitter = true
});
});
With these settings:
- Attempt 1: immediate
- Retry 1: ~2 seconds (with jitter)
- Retry 2: ~4 seconds (with jitter)
- Retry 3: ~8 seconds (with jitter)
Why jitter matters
Without jitter, if 1,000 clients fail at the same moment, they all retry at exactly the same times. This thundering herd can overwhelm the recovering service.
Jitter adds randomness to the delay, spreading retries across time:
builder.AddRetry(new HttpRetryStrategyOptions
{
BackoffType = DelayBackoffType.Exponential,
UseJitter = true // Adds +/- 25% randomness to delays
});
Always enable jitter in production.
Handle specific failures
By default, the HTTP retry strategy handles:
- HTTP 500+ status codes
- HTTP 408 (Request Timeout)
- HTTP 429 (Too Many Requests)
HttpRequestExceptionTimeoutRejectedException
To customize which failures trigger retries:
builder.AddRetry(new HttpRetryStrategyOptions
{
ShouldHandle = static args =>
{
return ValueTask.FromResult(args is
{
Outcome.Result.StatusCode:
HttpStatusCode.RequestTimeout or
HttpStatusCode.TooManyRequests or
HttpStatusCode.ServiceUnavailable
});
}
});
Disable retries for non-idempotent operations
POST, PUT, DELETE, and PATCH operations may not be safe to retry. If the server processed the request but the response was lost, retrying creates duplicates.
builder.Services.AddHttpClient<IOrderService, OrderService>()
.AddStandardResilienceHandler(options =>
{
// Only retry safe HTTP methods (GET, HEAD, OPTIONS)
options.Retry.DisableForUnsafeHttpMethods();
});
Or disable for specific methods:
options.Retry.DisableFor(HttpMethod.Post, HttpMethod.Delete);
Circuit breaker patterns
The circuit breaker prevents your application from repeatedly calling a failing service. It has three states:
Circuit breaker states
Closed (normal operation): Requests pass through. The breaker monitors failure rate.
Open (circuit broken): Requests fail immediately with BrokenCircuitException. No calls reach the downstream service.
Half-Open (testing recovery): A limited number of requests pass through. If they succeed, the circuit closes. If they fail, it opens again.
Configuration
builder.Services.AddHttpClient<IInventoryService, InventoryService>()
.AddResilienceHandler("circuit-breaker-pipeline", builder =>
{
builder.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
{
// Open circuit after 10% of requests fail
FailureRatio = 0.1,
// Minimum requests before failure ratio is calculated
MinimumThroughput = 100,
// Time window for calculating failure ratio
SamplingDuration = TimeSpan.FromSeconds(30),
// How long circuit stays open before testing
BreakDuration = TimeSpan.FromSeconds(30)
});
});
Handle BrokenCircuitException
When the circuit is open, calls throw BrokenCircuitException. Handle this gracefully:
public async Task<IActionResult> GetInventory(int productId)
{
try
{
var inventory = await _inventoryService.GetAsync(productId);
return Ok(inventory);
}
catch (BrokenCircuitException)
{
// Circuit is open - service is known to be down
_logger.LogWarning("Inventory service circuit is open");
return StatusCode(503, new ProblemDetails
{
Title = "Service temporarily unavailable",
Detail = "Inventory service is currently unavailable. Please retry later."
});
}
}
Tune thresholds for your traffic
The default settings assume moderate traffic. Adjust based on your load:
| Traffic Level | MinimumThroughput | SamplingDuration | BreakDuration |
|---|---|---|---|
| Low (< 100 req/min) | 10 | 60s | 30s |
| Medium (100-1000 req/min) | 100 | 30s | 30s |
| High (> 1000 req/min) | 500 | 10s | 15s |
Low-traffic services need lower thresholds to detect failures. High-traffic services can use shorter sampling windows.
Timeout patterns
Timeouts prevent operations from hanging indefinitely.
Total timeout vs attempt timeout
The standard resilience handler uses two timeouts:
builder.Services.AddHttpClient<ISearchService, SearchService>()
.AddStandardResilienceHandler(options =>
{
// Maximum time for entire operation including retries
options.TotalRequestTimeout.Timeout = TimeSpan.FromSeconds(30);
// Maximum time for each individual attempt
options.AttemptTimeout.Timeout = TimeSpan.FromSeconds(10);
});
With 3 retries and 10-second attempt timeout:
- Each attempt can take up to 10 seconds
- Total operation fails after 30 seconds regardless of retry state
Standalone timeout
For non-HTTP operations:
builder.Services.AddResiliencePipeline("database-timeout", builder =>
{
builder.AddTimeout(TimeSpan.FromSeconds(5));
});
// Usage
var pipeline = provider.GetRequiredService<ResiliencePipelineProvider<string>>()
.GetPipeline("database-timeout");
await pipeline.ExecuteAsync(async ct =>
{
await _database.QueryAsync(sql, ct);
});
Combining patterns
Real production systems combine multiple patterns. Order matters.
Standard resilience handler order
The standard handler applies strategies in this order:
- Rate limiter - Prevents overwhelming the service
- Total timeout - Caps entire operation duration
- Retry - Retries failed attempts
- Circuit breaker - Prevents calls when service is down
- Attempt timeout - Caps individual attempt duration
This order ensures:
- Rate limiting happens before any work
- Total timeout captures the entire retry sequence
- Circuit breaker prevents retries when service is known to be down
- Each attempt has its own timeout
Custom pipeline
For fine-grained control:
builder.Services.AddHttpClient<ICriticalService, CriticalService>()
.AddResilienceHandler("critical-service", builder =>
{
// 1. Total timeout for entire operation
builder.AddTimeout(new TimeoutStrategyOptions
{
Timeout = TimeSpan.FromSeconds(60),
Name = "TotalTimeout"
});
// 2. Retry with exponential backoff
builder.AddRetry(new HttpRetryStrategyOptions
{
MaxRetryAttempts = 3,
BackoffType = DelayBackoffType.Exponential,
UseJitter = true,
Delay = TimeSpan.FromSeconds(1)
});
// 3. Circuit breaker
builder.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
{
FailureRatio = 0.1,
MinimumThroughput = 50,
SamplingDuration = TimeSpan.FromSeconds(30),
BreakDuration = TimeSpan.FromSeconds(30)
});
// 4. Per-attempt timeout
builder.AddTimeout(new TimeoutStrategyOptions
{
Timeout = TimeSpan.FromSeconds(10),
Name = "AttemptTimeout"
});
});
HttpClientFactory integration
All examples above use HttpClientFactory, which is the recommended pattern.
Named clients
builder.Services.AddHttpClient("PaymentApi", client =>
{
client.BaseAddress = new Uri("https://api.payments.example.com");
client.DefaultRequestHeaders.Add("Api-Key", configuration["PaymentApiKey"]);
})
.AddStandardResilienceHandler();
// Usage
public class PaymentService(IHttpClientFactory httpClientFactory)
{
public async Task<PaymentResult> ProcessAsync(Payment payment)
{
var client = httpClientFactory.CreateClient("PaymentApi");
var response = await client.PostAsJsonAsync("/v1/payments", payment);
return await response.Content.ReadFromJsonAsync<PaymentResult>();
}
}
Typed clients
builder.Services.AddHttpClient<IOrderService, OrderService>(client =>
{
client.BaseAddress = new Uri("https://api.orders.example.com");
})
.AddStandardResilienceHandler();
// Usage
public class OrderService(HttpClient httpClient) : IOrderService
{
public async Task<Order> GetAsync(int id)
{
return await httpClient.GetFromJsonAsync<Order>($"/orders/{id}");
}
}
Default resilience for all clients
Apply resilience to all HttpClients by default:
builder.Services.ConfigureHttpClientDefaults(clientBuilder =>
{
clientBuilder.AddStandardResilienceHandler();
});
// Individual clients can override
builder.Services.AddHttpClient("NoResilience")
.RemoveAllResilienceHandlers();
When NOT to retry
Retrying is not always the right answer.
Do not retry these
| Scenario | Why |
|---|---|
| HTTP 400 Bad Request | Client error, retrying won't help |
| HTTP 401/403 Unauthorized | Authentication issue, not transient |
| HTTP 404 Not Found | Resource doesn't exist |
| Non-idempotent POST without safeguards | May create duplicates |
| Business logic failures | Application error, not infrastructure |
Idempotency is required for safe retries
An operation is idempotent if calling it multiple times produces the same result as calling it once.
Safe to retry:
- GET /orders/123
- PUT /orders/123 (replaces entire resource)
- DELETE /orders/123 (deleting twice = still deleted)
Dangerous to retry without safeguards:
- POST /orders (creates new order each time)
- PATCH /orders/123 (may apply partial update twice)
If you must retry non-idempotent operations, implement idempotency keys:
public async Task<Order> CreateOrderAsync(CreateOrderRequest request)
{
// Client generates unique idempotency key
var idempotencyKey = request.IdempotencyKey ?? Guid.NewGuid().ToString();
var httpRequest = new HttpRequestMessage(HttpMethod.Post, "/orders")
{
Content = JsonContent.Create(request)
};
httpRequest.Headers.Add("Idempotency-Key", idempotencyKey);
var response = await _httpClient.SendAsync(httpRequest);
return await response.Content.ReadFromJsonAsync<Order>();
}
The server must check the idempotency key and return the same response for duplicate requests.
Decision framework
Use this framework to choose patterns:
| Question | If Yes | If No |
|---|---|---|
| Is the failure transient? | Use Retry | Don't retry |
| Is the operation idempotent? | Retry all methods | Retry GET only, or implement idempotency keys |
| Could the service be down for a while? | Add Circuit Breaker | Retry alone may suffice |
| Could the operation hang? | Add Timeout | May not need timeout |
| Do you call multiple downstream services? | Consider Bulkhead isolation | Combined pipeline sufficient |
Copy/paste artifact: production resilience configuration
// Program.cs - Production resilience configuration
using Microsoft.Extensions.Http.Resilience;
using Polly;
// Standard resilience for most HTTP clients
builder.Services.AddHttpClient<IExternalApiClient, ExternalApiClient>(client =>
{
client.BaseAddress = new Uri(builder.Configuration["ExternalApi:BaseUrl"]!);
})
.AddStandardResilienceHandler(options =>
{
// Total timeout: 30 seconds for entire operation
options.TotalRequestTimeout.Timeout = TimeSpan.FromSeconds(30);
// Retry: 3 attempts with exponential backoff and jitter
options.Retry.MaxRetryAttempts = 3;
options.Retry.Delay = TimeSpan.FromSeconds(1);
options.Retry.BackoffType = DelayBackoffType.Exponential;
options.Retry.UseJitter = true;
// Disable retry for non-idempotent methods
options.Retry.DisableForUnsafeHttpMethods();
// Circuit breaker: open after 10% failures
options.CircuitBreaker.FailureRatio = 0.1;
options.CircuitBreaker.MinimumThroughput = 100;
options.CircuitBreaker.SamplingDuration = TimeSpan.FromSeconds(30);
options.CircuitBreaker.BreakDuration = TimeSpan.FromSeconds(30);
// Per-attempt timeout: 10 seconds
options.AttemptTimeout.Timeout = TimeSpan.FromSeconds(10);
});
Copy/paste artifact: resilience checklist
Resilience Configuration Checklist
1. Retry configuration
- [ ] Exponential backoff enabled
- [ ] Jitter enabled to prevent thundering herd
- [ ] Max retries appropriate for SLA (typically 2-5)
- [ ] Non-idempotent methods excluded or have idempotency keys
2. Circuit breaker configuration
- [ ] Failure ratio tuned for traffic level
- [ ] MinimumThroughput prevents false positives on low traffic
- [ ] BreakDuration gives downstream time to recover
- [ ] BrokenCircuitException handled gracefully in calling code
3. Timeout configuration
- [ ] Total timeout covers entire operation including retries
- [ ] Per-attempt timeout prevents single slow request consuming budget
- [ ] Timeouts aligned with SLA requirements
4. Error handling
- [ ] 4xx errors not retried (except 408, 429)
- [ ] BrokenCircuitException returns appropriate error to caller
- [ ] TimeoutRejectedException handled gracefully
5. Observability
- [ ] Retry attempts logged
- [ ] Circuit state changes logged
- [ ] Metrics exposed for monitoring
Common failure modes
- Retry storm: retrying without jitter causes all clients to retry simultaneously, overwhelming the recovering service.
- Circuit never opens: MinimumThroughput too high for actual traffic, so failure ratio never calculated.
- Circuit never closes: BreakDuration too short, half-open tests fail, circuit reopens immediately.
- Timeout too aggressive: per-attempt timeout shorter than normal response time causes constant failures.
- Retrying non-idempotent operations: creates duplicate orders, payments, or other side effects.
Checklist
- Standard resilience handler added to HTTP clients.
- Jitter enabled on retry policies.
- Non-idempotent operations excluded from retry or use idempotency keys.
- Circuit breaker thresholds tuned for traffic level.
- BrokenCircuitException handled in calling code.
- Timeouts configured for both total operation and per-attempt.
FAQ
Should I use Microsoft.Extensions.Http.Polly or Microsoft.Extensions.Http.Resilience?
Use Microsoft.Extensions.Http.Resilience. The older Microsoft.Extensions.Http.Polly package is deprecated. The new package is built on Polly v8 and integrates better with .NET's resilience infrastructure.
What is the difference between Polly v7 and v8?
Polly v8 introduced a new API based on ResiliencePipeline instead of Policy. The new API is more composable and integrates with Microsoft.Extensions.Resilience. The concepts (retry, circuit breaker, timeout) remain the same.
How do I know if my circuit breaker thresholds are correct?
Monitor your circuit breaker state transitions. If the circuit opens too frequently on minor issues, increase MinimumThroughput or FailureRatio. If it never opens during actual outages, decrease thresholds.
Should I retry database operations?
Yes, for transient failures like connection timeouts. EF Core has built-in retry with EnableRetryOnFailure(). For raw ADO.NET, wrap in a Polly retry policy. Ensure operations are idempotent or use transactions.
How do I test resilience patterns?
Use chaos engineering approaches. Inject failures in test environments using tools like Simmy (Polly's chaos engineering extension) or configure test doubles that fail intermittently. Verify that retries happen, circuits open, and timeouts trigger as expected.
What about distributed circuit breakers?
The patterns in this article are per-instance. For distributed circuit breakers (shared state across multiple service instances), consider external state stores like Redis or purpose-built solutions. This adds complexity and is often unnecessary for most applications.
What to do next
Add Microsoft.Extensions.Http.Resilience to your project and apply .AddStandardResilienceHandler() to your HTTP clients. Review any existing retry logic for idempotency concerns.
For more on building production-quality ASP.NET Core applications, read Async/Await Pitfalls: The Deadlocks That Ship to Production.
If you want help implementing resilience patterns in your architecture, reach out via Contact.
References
- Build resilient HTTP apps
- Introduction to resilient app development
- Implement HTTP call retries with exponential backoff
- Implement the Circuit Breaker pattern
- Cloud-native resiliency patterns
- Polly documentation
Author notes
Decisions:
- Focus on Microsoft.Extensions.Http.Resilience over raw Polly. Rationale: it's the modern, supported approach with better HttpClientFactory integration.
- Recommend standard resilience handler as default. Rationale: production-ready defaults, less configuration burden.
- Emphasize idempotency for retry safety. Rationale: retrying non-idempotent operations is a common production bug.
Observations:
- Teams often add retry without circuit breaker, causing retry storms during outages.
- Circuit breaker thresholds copied from tutorials without adjusting for actual traffic patterns.
- Non-idempotent POST operations retried, causing duplicate records.
- Timeouts set without considering retry delays, causing unexpected total wait times.