What Is Rate Limiting?

Rate limiting is a simple yet highly effective technique for protecting APIs from unintentional and malicious overuse. Without a rate limit, anyone can bombard a server with requests and cause spikes in traffic that eat up resources, "starve" other users, and make the service unresponsive.

This article is an intro to rate limiting and the importance of restricting the number of requests that reach APIs and services. We explain what rate limits are and how they work, plus cover the different types of algorithms you can use to adopt rate limiting for your use case.

Rate limiting explained

Almost 95% of companies have had an API-related security incident in 2022. Additionally, approximately 31% (around 5 billion) of all malicious transactions targeted APIs, which should place securing this attack vector at the top of an organization's to-do list.

What Is Rate Limiting?

Rate limiting is the practice of restricting the number of requests users can make to a specific API or service. You place a cap on how often users can repeat an action (i.e., attempting to log into an account or send a message) within a certain time frame. If someone reaches their limit, the server begins rejecting additional requests.

Rate limiting is both a cybersecurity precaution and a key part of software quality assurance (QA). Companies use rate limits to:

Technically, rate limiting is a form of traffic shaping. The practice lets you control the flow and distribution of traffic to prevent infrastructure overload or failure.

Most systems with a rate limit have caps well above what even a high-volume user could realistically request. The most common example is social media messaging. All social media websites have a cap on the number of direct messages you can send to other users. If someone decides to send a thousand messages to other profiles, rate limiting kicks in and stops the user from sending messages for a certain period.

Learn the most effective ways to prevent DDoS attacks and stay a step ahead of would-be hackers trying to overload your server with fake traffic.

Why Is Rate Limiting Important?

Here's a list of the main reasons why rate limiting is an essential aspect of any healthy service:

Our comprehensive article on the different types of cyberattacks takes you through 16 kinds of attacks your team must be ready to face.

How rate limiting works

How Does Rate Limiting Work?

To set a rate limit, an admin places a cap on the number of requests users can make to a server or API within a certain time frame. Typically, the rate-limiting mechanism tracks two key factors:

The main metric for rate limits is the Transactions Per Second (TPS). If a single IP address makes too many requests within a certain period (i.e., goes over its TPS limit), rate limiting stops the server or API from responding. The user gets an error message and is unable to send further requests until the timer resets. 

Rate limiting always relies on some form of throttling mechanism that slows down or blocks requests. Admins implement rate limiting on the server or client side, depending on which strategy better fits the use case:

Many admins also set rate limits based on usernames. This approach prevents brute force attackers from attempting to log in from multiple IP addresses.

Worried about bots brute forcing your usernames and passwords? Here are 8 simple yet highly effective strategies for preventing brute force attacks.

Types of Rate Limits

Let's look at the different types of rate limits you can use to control access to a server or API. Just remember that you can combine different types into a hybrid strategy. For example, you may limit the number of requests based on both IP addresses and certain time intervals.

Types of rate limits

Time-Based Rate Limits

Time-based rate limits operate on pre-defined time intervals. For example, a server may limit requests to a certain number per time period (such as 100 per minute).

Time-based rate limits typically apply to all users. You can set these limits to be either fixed (timers count down regardless of when and if users make requests) or sliding (the countdown starts whenever someone makes the first request).

Geographic Rate Limits

Geographic rate limits restrict the number of requests coming from certain regions. These caps are an excellent choice when running location-based campaigns. Admins get to limit the requests from outside the target audience and increase availability in target regions.

These rate limits are also good at preventing suspicious traffic. For example, you could predict that users in a certain region are less active between 11:00 PM and 8:00 AM. You set a lower rate limit for this time, which further constraints any attacker hoping to cause problems with malicious traffic.

User-Based Rate Limits

User-based rate limits control the number of actions individual users can take in a certain time frame. For example, a server may limit the number of login attempts each user can make to 100 per day.

User-based limits are the most common type of rate limiting. Most systems track the user's IP address or API key (or both). If the user exceeds the set rate limit, the app denies any further requests until the per-user counter resets.

Keep in mind that this type of rate limiting requires the system to maintain the usage statistics of each user. Such setup often leads to operational overhead and increases overall IT costs.

Concurrency Rate Limiting

Concurrency rate limits control the number of parallel sessions the system allows in a certain time frame. For example, an app might prevent more than 1000 sessions within a minute.

Server Rate Limits

Server rate limiting helps admins share a workload among different servers. For example, if you run a distributed architecture with five servers, you could use a rate limit to place a cap on each device.

If one of the servers reaches its cap, the device either routes it to another server or drops the request. Such a strategy is vital to achieving high availability and preventing DoS attacks that target a specific server.

API Endpoint-Based Rate Limiting

These rate limits are based on the specific API endpoints users are trying to access. For example, an admin may limit requests to a specific endpoint to 50 per minute, either due to security or overloading concerns.

Learn about endpoint security and see what it takes to keep devices at the network's edge safe from malicious activity.

Rate Limiting Algorithms

Here are the most common algorithms companies rely on to implement rate limiting:

The main factors to consider when choosing a rate-limiting algorithm are the unique needs of your API and the expected traffic volume. Your method of choice must prevent overload and stop malicious activity but also ensure legitimate users use the service without interruptions.

How To Implement Rate Limiting?

Below is a step-by-step guide to implementing rate limiting (although the exact way you set limits depends on your specific tech stack):

Implementing rate limiting is a simple process for most use cases. For example, if you're using Nginx as a web server and wish to set a rate limit at the server level, you'll use the ngx_http_limit_req_module module. Simply add the following code to the Nginx configuration file to set up rate limits based on the user's IP address:

http {
    limit_req_zone $binary_remote_addr zone=one:10m rate=2r/s;
    ...

server {
    ...
    location /promotion/ {
        limit_req zone=one burst=5;
    }
}

The code above allows no more than 2 requests per second on average, while bursts cannot exceed 5 requests.

Rate limiting challenges

A Simple, Yet Highly Effective Defensive Practice

Rate limiting is essential both for the security and quality of your APIs, apps, and websites. Failing to limit the number of requests leaves you open to traffic-based attacks and leads to poor performance (which causes higher bounce rates, problems with customer retention, etc.). Considering how easy it is to implement this precaution, setting a rate limit is a no-brainer decision for most use cases.