Rate Limiting in Minimal APIs

In this article, I'll explore how we can implement rate limiting in .NET Minimal APIs. We’ll explore how to apply rate limiting per endpoint. You’ll see how to configure policies, handle fallback responses, and create a fully working example. We'll keep it simple by building a cat-related API with a single GET endpoint.

Why We Need Rate Limiting

Rate limiting is a critical part of any production API. It can help us protect our APIs from abuse and ensure fair usage among users, and helps manage system resources effectively. Without rate limiting, ill-intended users can swamp our API, degrading performance, or even bring down our service.

Typical use cases include:

Preventing denial-of-service attacks
Throttling access for anonymous users
Enforcing SLAs for paying clients
Fair sharing of backend resources

.NET offers built-in rate limiting middleware, which makes it incredibly easy to get started with. Moreover we get a the option to choose between different rate limiting strategies, such as fixed window, sliding window, concurrency, and token bucket.

Fixed Window Rate Limiting Strategy in .NET

In our case, we will focus on the Fixed Window strategy. It divides time into concrete intervals and enforces a maximum number of requests within each interval:

Imagine we have a time window that lasts exactly 5 seconds. During that window, users are allowed to make up to 2 requests. We also let one extra request wait in line. If one of the original 2 requests finishes early, the queued request gets a chance to go through too. But if you try sending more than that within the same 5-second window, those extra requests will be rejected.

This means that the third request won't be rejected immediately, but it will be queued. Any further requests will be rejected until the next time window starts. This is a good strategy for APIs that have predictable traffic patterns.

Setting Up The Project And Implementing Rate Limiting

Let's start by creating a new .NET 9 Minimal API project:

Program.cs

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddOpenApi();

var app = builder.Build();

if (app.Environment.IsDevelopment())
{
	app.MapOpenApi();
}

app.UseHttpsRedirection();

var cats = new List<Cat>
{
	new() { Id = 1, Name = "Whiskers", Breed = "Siamese", Age = 3 },
	new() { Id = 2, Name = "Mittens", Breed = "Persian", Age = 5 },
	new() { Id = 3, Name = "Felix", Breed = "Tabby", Age = 2 },
	new() { Id = 4, Name = "Shadow", Breed = "Maine Coon", Age = 4 },
	new() { Id = 5, Name = "Luna", Breed = "Ragdoll", Age = 1 }
};

app.MapGet("/api/cats", () =>
{
	return Results.Ok(cats);
});

app.Run();

public class Cat
{
	public int Id { get; set; }
	public required string Name { get; set; }
	public required string Breed { get; set; }
	public required int Age { get; set; }
}

We also create the Cat class, generate a collection of cats, and add a GET endpoint to retrieve the list of cats. This will serve as our in-memory database for the API. This will be enough to showcase how rate limiting works.

Now, let's focus on implementing our fixed window strategy:

Program.cs

using Microsoft.AspNetCore.RateLimiting;
using System.Threading.RateLimiting;

builder.Services.AddRateLimiter(limiterOptions =>
{
	limiterOptions.RejectionStatusCode = StatusCodes.Status429TooManyRequests;

	limiterOptions.AddFixedWindowLimiter("fixed-window", options =>
	{
		options.Window = TimeSpan.FromSeconds(5);
		options.PermitLimit = 2;
		options.QueueLimit = 1;
		options.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
	});
});

We use the AddRateLimiter() method from the Microsoft.AspNetCore.RateLimiting namespace to add the rate limiting middleware to our application.

Then, we set the RejectionStatusCode to 429, which is the standard HTTP status code for "Too Many Requests". By default the status code is 503, which is not entirely accurate for rate limiting.

Next, we call the AddFixedWindowLimiter() method to add our fixed window rate limiting strategy. We set the Window to 5 seconds, the PermitLimit to 2, and the QueueLimit to 1. Finally, we set the QueueProcessingOrder is set to OldestFirst, which means that the oldest request in the queue will be processed first. The QueueProcessingOrder is an enumeration located in the System.Threading.RateLimiting namespace.

Any queued requests will take a longer time to be processed. This might not be ideal, so setting the QueueLimit to 0 will reject any new requests once the limit is hit. It's up to you to decide if queueing would be suitable for your users or not.

If you are interested in having a deep dive into rate limiting you can check my Minimal APIs in ASP.NET Core course here.

Our penultimate step is to register the rate limiting middleware in the pipeline:

Program.cs

app.UseRateLimiter();

We do this by utilizing the UseRateLimiter() method. This middleware will check if the incoming request exceeds the rate limit and will either allow or reject it based on the configured policies.

Finally, we need to apply the rate limiting policy to our GET endpoint:

Program.cs

app.MapGet("/api/cats", () =>
{
	return Results.Ok(cats);
})
.RequireRateLimiting("fixed-window");

We use the RequireRateLimiting() method to apply the rate limiting policy to our endpoint. The string we pass in here is the name of the policy we defined earlier. Note that, the RequireRateLimiting() method is an extension method that can be used on both individual endpoint as well as on an entire route group.

Conclusion

Rate limiting is a powerful tool to maintain our API’s stability and ensure fair usage. With the built-in middleware in .NET, we can easily set up global or per-endpoint rate limiting strategies. By integrating this in our Minimal APIs, we are building APIs that are not only functional but also resilient and secure.

Why We Need Rate Limiting

Fixed Window Rate Limiting Strategy in .NET

Setting Up The Project And Implementing Rate Limiting

If you are interested in having a deep dive into rate limiting you can check my Minimal APIs in ASP.NET Core course here.

Conclusion