Your container orchestrator needs to know two things: is your application ready to receive traffic, and is it still alive?
Get these wrong and you face cascading failures, dropped requests, and deployments that break production.
Common questions this answers
- What is the difference between readiness and liveness probes?
- How do I implement custom health checks for databases and external services?
- How do I configure Kubernetes probes for ASP.NET Core?
- How do I implement graceful shutdown with connection draining?
- Why does my application drop requests during deployment?
Definition (what this means in practice)
Health checks are endpoints that report application status to orchestrators, load balancers, and monitoring systems. Graceful shutdown is the process of stopping an application without dropping active requests.
In practice, this means configuring separate endpoints for readiness and liveness, implementing custom health checks for dependencies, and using IHostApplicationLifetime to handle shutdown signals properly.
Terms used
- Liveness probe: checks if the application is running and not deadlocked. Failure triggers a container restart.
- Readiness probe: checks if the application can handle traffic. Failure removes the pod from load balancer rotation.
- Startup probe: gives slow-starting applications time to initialize before liveness checks begin.
- Connection draining: allowing in-flight requests to complete before shutting down.
- IHostApplicationLifetime: .NET interface for responding to application lifecycle events.
Reader contract
This article is for:
- Engineers deploying ASP.NET Core to Kubernetes or container orchestrators.
- Teams experiencing dropped requests during deployments.
- Developers implementing health checks for microservices.
You will leave with:
- Clear understanding of readiness vs liveness vs startup probes.
- Copy-paste health check configurations.
- Kubernetes YAML for proper probe configuration.
- Graceful shutdown implementation with connection draining.
This is not for:
- Full Kubernetes deployment guides (separate topic).
- APM and distributed tracing setup.
Quick start (10 minutes)
If you do nothing else, do this:
Verified on: ASP.NET Core (.NET 10).
- Add health checks to your application:
// Program.cs
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddHealthChecks();
var app = builder.Build();
// Liveness: is the app running?
app.MapHealthChecks("/healthz/live", new HealthCheckOptions
{
Predicate = _ => false // No checks, just confirms app responds
});
// Readiness: can the app handle traffic?
app.MapHealthChecks("/healthz/ready");
app.Run();
- Add Kubernetes probe configuration:
spec:
containers:
- name: myapp
livenessProbe:
httpGet:
path: /healthz/live
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /healthz/ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
Readiness vs liveness vs startup
These three probe types serve different purposes. Using them incorrectly causes production incidents.
Liveness probes
Purpose: Detect if the application is deadlocked or in an unrecoverable state.
Failure action: Kubernetes kills and restarts the container.
What to check: Only whether the process can respond. Do not check external dependencies.
// Liveness endpoint - minimal check
app.MapHealthChecks("/healthz/live", new HealthCheckOptions
{
Predicate = _ => false // No health checks, just HTTP 200
});
Why no health checks? If your liveness probe fails because a database is down, Kubernetes restarts your container. The database is still down. Kubernetes restarts again. You now have a restart loop that makes things worse.
Readiness probes
Purpose: Determine if the application can handle traffic.
Failure action: Kubernetes removes the pod from Service endpoints. No traffic is routed to it.
What to check: External dependencies required to handle requests.
// Readiness endpoint - check dependencies
builder.Services.AddHealthChecks()
.AddSqlServer(connectionString, tags: new[] { "ready" })
.AddCheck<RedisHealthCheck>("redis", tags: new[] { "ready" });
app.MapHealthChecks("/healthz/ready", new HealthCheckOptions
{
Predicate = check => check.Tags.Contains("ready")
});
If the database is down, readiness fails. Traffic stops flowing to this pod. Other healthy pods handle requests. No restart loop.
Startup probes
Purpose: Give slow-starting applications time to initialize.
Failure action: If startup probe never succeeds within failureThreshold * periodSeconds, Kubernetes kills the pod.
When to use: Applications with long initialization (cache warming, migrations).
startupProbe:
httpGet:
path: /healthz/ready
port: 8080
failureThreshold: 30
periodSeconds: 10
# Total startup time: 30 * 10 = 300 seconds (5 minutes)
While startup probe is running, liveness and readiness probes do not run. This prevents premature restarts during initialization.
Decision matrix
| Scenario | Liveness | Readiness | Startup |
|---|---|---|---|
| App deadlocked | Fail (restart) | - | - |
| Database down | Pass | Fail (no traffic) | - |
| Cache warming | Pass | Fail (no traffic) | Pass after warm |
| Memory leak | Fail eventually | - | - |
| Temporary network blip | Pass | Fail briefly | - |
Custom health checks
Implementing IHealthCheck
public class DatabaseHealthCheck : IHealthCheck
{
private readonly IDbConnection _connection;
public DatabaseHealthCheck(IDbConnection connection)
{
_connection = connection;
}
public async Task<HealthCheckResult> CheckHealthAsync(
HealthCheckContext context,
CancellationToken cancellationToken = default)
{
try
{
using var command = _connection.CreateCommand();
command.CommandText = "SELECT 1";
command.CommandTimeout = 5;
await _connection.OpenAsync(cancellationToken);
await command.ExecuteScalarAsync(cancellationToken);
return HealthCheckResult.Healthy();
}
catch (Exception ex)
{
return HealthCheckResult.Unhealthy(
"Database connection failed",
exception: ex);
}
}
}
Registration with tags
builder.Services.AddHealthChecks()
.AddCheck<DatabaseHealthCheck>(
"database",
failureStatus: HealthStatus.Unhealthy,
tags: new[] { "ready", "db" })
.AddCheck<RedisHealthCheck>(
"redis",
failureStatus: HealthStatus.Degraded,
tags: new[] { "ready", "cache" })
.AddCheck<ExternalApiHealthCheck>(
"payment-api",
failureStatus: HealthStatus.Degraded,
tags: new[] { "ready", "external" });
Built-in health check packages
# SQL Server
dotnet add package AspNetCore.HealthChecks.SqlServer
# Entity Framework Core
dotnet add package Microsoft.Extensions.Diagnostics.HealthChecks.EntityFrameworkCore
# Redis
dotnet add package AspNetCore.HealthChecks.Redis
# RabbitMQ
dotnet add package AspNetCore.HealthChecks.Rabbitmq
builder.Services.AddHealthChecks()
.AddSqlServer(connectionString, tags: new[] { "ready" })
.AddDbContextCheck<AppDbContext>(tags: new[] { "ready" })
.AddRedis(redisConnectionString, tags: new[] { "ready" });
Health check with timeout
Long-running health checks can cause probe timeouts. Set explicit timeouts:
builder.Services.AddHealthChecks()
.AddAsyncCheck("slow-dependency", async cancellationToken =>
{
using var cts = CancellationTokenSource
.CreateLinkedTokenSource(cancellationToken);
cts.CancelAfter(TimeSpan.FromSeconds(3));
try
{
await CheckSlowDependencyAsync(cts.Token);
return HealthCheckResult.Healthy();
}
catch (OperationCanceledException)
{
return HealthCheckResult.Unhealthy("Health check timed out");
}
}, tags: new[] { "ready" });
Health check response customization
JSON response format
app.MapHealthChecks("/healthz/ready", new HealthCheckOptions
{
ResponseWriter = WriteJsonResponse
});
static Task WriteJsonResponse(HttpContext context, HealthReport report)
{
context.Response.ContentType = "application/json";
var response = new
{
status = report.Status.ToString(),
totalDuration = report.TotalDuration.TotalMilliseconds,
checks = report.Entries.Select(e => new
{
name = e.Key,
status = e.Value.Status.ToString(),
duration = e.Value.Duration.TotalMilliseconds,
description = e.Value.Description,
exception = e.Value.Exception?.Message
})
};
return context.Response.WriteAsJsonAsync(response);
}
Custom HTTP status codes
app.MapHealthChecks("/healthz/ready", new HealthCheckOptions
{
ResultStatusCodes =
{
[HealthStatus.Healthy] = StatusCodes.Status200OK,
[HealthStatus.Degraded] = StatusCodes.Status200OK,
[HealthStatus.Unhealthy] = StatusCodes.Status503ServiceUnavailable
}
});
Degraded returning 200 keeps the pod in rotation but signals that something needs attention.
Graceful shutdown
When Kubernetes sends SIGTERM, your application must:
- Stop accepting new requests
- Complete in-flight requests
- Close connections cleanly
- Exit
IHostApplicationLifetime events
public class GracefulShutdownService : IHostedService
{
private readonly IHostApplicationLifetime _lifetime;
private readonly ILogger<GracefulShutdownService> _logger;
public GracefulShutdownService(
IHostApplicationLifetime lifetime,
ILogger<GracefulShutdownService> logger)
{
_lifetime = lifetime;
_logger = logger;
}
public Task StartAsync(CancellationToken cancellationToken)
{
_lifetime.ApplicationStarted.Register(OnStarted);
_lifetime.ApplicationStopping.Register(OnStopping);
_lifetime.ApplicationStopped.Register(OnStopped);
return Task.CompletedTask;
}
public Task StopAsync(CancellationToken cancellationToken)
{
return Task.CompletedTask;
}
private void OnStarted()
{
_logger.LogInformation("Application started");
}
private void OnStopping()
{
_logger.LogInformation("Application stopping, draining connections...");
}
private void OnStopped()
{
_logger.LogInformation("Application stopped");
}
}
Shutdown timeout configuration
The default shutdown timeout is 30 seconds. Configure based on your longest request:
builder.Host.ConfigureHostOptions(options =>
{
options.ShutdownTimeout = TimeSpan.FromSeconds(60);
});
In Kubernetes, ensure terminationGracePeriodSeconds exceeds this:
spec:
terminationGracePeriodSeconds: 90 # Must be > app shutdown timeout
containers:
- name: myapp
# ...
Connection draining pattern
The key is to fail readiness immediately on shutdown while allowing in-flight requests to complete:
public class ReadinessHealthCheck : IHealthCheck
{
private volatile bool _isReady = true;
public void SetNotReady() => _isReady = false;
public Task<HealthCheckResult> CheckHealthAsync(
HealthCheckContext context,
CancellationToken cancellationToken = default)
{
return Task.FromResult(_isReady
? HealthCheckResult.Healthy()
: HealthCheckResult.Unhealthy("Shutting down"));
}
}
public class ShutdownService : IHostedService
{
private readonly IHostApplicationLifetime _lifetime;
private readonly ReadinessHealthCheck _readiness;
public ShutdownService(
IHostApplicationLifetime lifetime,
ReadinessHealthCheck readiness)
{
_lifetime = lifetime;
_readiness = readiness;
}
public Task StartAsync(CancellationToken cancellationToken)
{
_lifetime.ApplicationStopping.Register(() =>
{
// Immediately fail readiness
_readiness.SetNotReady();
// Give load balancer time to remove us from rotation
Thread.Sleep(TimeSpan.FromSeconds(5));
});
return Task.CompletedTask;
}
public Task StopAsync(CancellationToken cancellationToken)
{
return Task.CompletedTask;
}
}
Register:
builder.Services.AddSingleton<ReadinessHealthCheck>();
builder.Services.AddHostedService<ShutdownService>();
builder.Services.AddHealthChecks()
.AddCheck<ReadinessHealthCheck>("shutdown", tags: new[] { "ready" });
Kubernetes configuration
Complete probe configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
replicas: 3
template:
spec:
terminationGracePeriodSeconds: 90
containers:
- name: myapp
image: myapp:latest
ports:
- name: http
containerPort: 8080
# Startup: wait for app to initialize
startupProbe:
httpGet:
path: /healthz/ready
port: http
failureThreshold: 30
periodSeconds: 10
# 5 minutes total startup time
# Liveness: restart if deadlocked
livenessProbe:
httpGet:
path: /healthz/live
port: http
initialDelaySeconds: 0
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
# Readiness: remove from LB if unhealthy
readinessProbe:
httpGet:
path: /healthz/ready
port: http
initialDelaySeconds: 0
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
successThreshold: 1
Probe parameters explained
| Parameter | Purpose | Recommendation |
|---|---|---|
| initialDelaySeconds | Wait before first probe | 0 if using startupProbe |
| periodSeconds | Time between probes | 5-10s for readiness, 10-30s for liveness |
| timeoutSeconds | Max wait for response | 3-5s, less than periodSeconds |
| failureThreshold | Failures before action | 3 for both, prevents flapping |
| successThreshold | Successes to recover | 1 for liveness, 1-2 for readiness |
Pre-stop hook for connection draining
An alternative to the in-app shutdown hook:
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- sleep 10
This delays SIGTERM by 10 seconds, giving the load balancer time to update.
Docker health check
For non-Kubernetes deployments:
FROM mcr.microsoft.com/dotnet/aspnet:10.0
WORKDIR /app
COPY --from=build /app/publish .
HEALTHCHECK --interval=30s --timeout=5s --start-period=60s --retries=3 \
CMD curl --fail http://localhost:8080/healthz/live || exit 1
ENTRYPOINT ["dotnet", "MyApp.dll"]
Copy/paste artifact: production health check setup
// Program.cs
var builder = WebApplication.CreateBuilder(args);
// Configure shutdown timeout
builder.Host.ConfigureHostOptions(options =>
{
options.ShutdownTimeout = TimeSpan.FromSeconds(60);
});
// Register health checks
builder.Services.AddSingleton<StartupHealthCheck>();
builder.Services.AddHealthChecks()
.AddCheck<StartupHealthCheck>("startup", tags: new[] { "ready" })
.AddSqlServer(
builder.Configuration.GetConnectionString("Default")!,
tags: new[] { "ready", "db" });
// Register shutdown service
builder.Services.AddHostedService<StartupBackgroundService>();
var app = builder.Build();
// Liveness: minimal check, no dependencies
app.MapHealthChecks("/healthz/live", new HealthCheckOptions
{
Predicate = _ => false
});
// Readiness: check all dependencies
app.MapHealthChecks("/healthz/ready", new HealthCheckOptions
{
Predicate = check => check.Tags.Contains("ready")
});
app.Run();
// Startup health check
public class StartupHealthCheck : IHealthCheck
{
private volatile bool _isReady;
public bool StartupCompleted
{
get => _isReady;
set => _isReady = value;
}
public Task<HealthCheckResult> CheckHealthAsync(
HealthCheckContext context,
CancellationToken cancellationToken = default)
{
return Task.FromResult(_isReady
? HealthCheckResult.Healthy()
: HealthCheckResult.Unhealthy("Startup not complete"));
}
}
// Background service to mark startup complete
public class StartupBackgroundService : BackgroundService
{
private readonly StartupHealthCheck _healthCheck;
private readonly ILogger<StartupBackgroundService> _logger;
public StartupBackgroundService(
StartupHealthCheck healthCheck,
ILogger<StartupBackgroundService> logger)
{
_healthCheck = healthCheck;
_logger = logger;
}
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
// Perform startup tasks (cache warming, etc.)
_logger.LogInformation("Performing startup tasks...");
await Task.Delay(TimeSpan.FromSeconds(5), stoppingToken);
_healthCheck.StartupCompleted = true;
_logger.LogInformation("Startup complete, ready for traffic");
}
}
Copy/paste artifact: health check checklist
Health Check Checklist
1. Probe separation
- [ ] Liveness endpoint has no dependency checks
- [ ] Readiness endpoint checks all required dependencies
- [ ] Startup probe configured for slow-starting apps
2. Timeouts
- [ ] Health check timeouts < probe timeoutSeconds
- [ ] Probe timeoutSeconds < periodSeconds
- [ ] App shutdown timeout < terminationGracePeriodSeconds
3. Kubernetes configuration
- [ ] startupProbe failureThreshold * periodSeconds > max startup time
- [ ] livenessProbe failureThreshold >= 3 (prevents flapping)
- [ ] readinessProbe periodSeconds short enough for fast recovery
4. Graceful shutdown
- [ ] Readiness fails immediately on SIGTERM
- [ ] Delay before stopping to allow LB update
- [ ] In-flight requests complete before exit
- [ ] ShutdownTimeout configured appropriately
5. Health check implementation
- [ ] Database checks use simple queries (SELECT 1)
- [ ] External service checks have timeouts
- [ ] Health checks tagged appropriately
- [ ] Degraded status used for non-critical failures
Common failure modes
- Liveness checking dependencies: database down triggers restart loop. Fix by only checking process health in liveness.
- Probe timeout too short: health check takes 3 seconds, timeout is 1 second. Fix by increasing timeout or optimizing check.
- No startup probe: slow app killed before initialization. Fix by adding startup probe with adequate failureThreshold.
- Instant SIGTERM handling: requests dropped because LB still sends traffic. Fix by delaying shutdown after failing readiness.
- ShutdownTimeout too short: long requests killed mid-flight. Fix by increasing timeout to exceed longest request.
Checklist
- Liveness probe does not check external dependencies.
- Readiness probe checks all required dependencies.
- Startup probe configured for slow-starting applications.
- Graceful shutdown fails readiness immediately.
- Shutdown delay allows load balancer to update.
- terminationGracePeriodSeconds exceeds application shutdown timeout.
FAQ
Should liveness check the database?
No. If the database is down, restarting your application does not fix the database. Liveness should only detect conditions that a restart would fix, like deadlocks or memory leaks.
What is the difference between startup and initialDelaySeconds?
initialDelaySeconds delays all probes by a fixed time. Startup probe runs repeatedly until success, then enables liveness and readiness. Startup probe is more flexible for variable initialization times.
How long should terminationGracePeriodSeconds be?
At least 10 seconds longer than your application's shutdown timeout, which should be longer than your longest expected request. For most applications: app timeout 60s, Kubernetes grace period 90s.
Should readiness return 200 when degraded?
Yes, if the degraded service is not required for handling requests. Use HealthStatus.Degraded with 200 OK to keep the pod in rotation while signaling that attention is needed.
How do I test health checks locally?
Use curl or your browser:
curl http://localhost:5000/healthz/live
curl http://localhost:5000/healthz/ready
For graceful shutdown testing, send SIGTERM:
kill -SIGTERM <pid>
What to do next
Add separate liveness and readiness endpoints to your application today. If you deploy to Kubernetes, add startup probes for applications that take more than a few seconds to initialize.
For more on building resilient applications, read Resilience Patterns with Polly: Circuit Breakers, Retries, and Timeouts.
If you want help configuring health checks for your infrastructure, reach out via Contact.
References
- Health checks in ASP.NET Core
- .NET Generic Host
- IHostApplicationLifetime
- Background tasks with hosted services in ASP.NET Core
- HealthCheckOptions
- Configure Liveness, Readiness and Startup Probes
- AspNetCore.Diagnostics.HealthChecks
Author notes
Decisions:
- Liveness should not check dependencies. Rationale: prevents restart loops when external services fail.
- Use startup probe instead of long initialDelaySeconds. Rationale: more flexible, adapts to variable startup times.
- Fail readiness immediately on shutdown signal. Rationale: enables connection draining by stopping new traffic.
Observations:
- Teams often put database checks in liveness, causing restart loops during outages.
- Missing startup probes cause slow-starting apps to be killed repeatedly.
- Shutdown without readiness delay drops requests because load balancer still routes traffic.
- Health check timeouts longer than Kubernetes probe timeouts cause false failures.