Rate Limiter

Sources

https://www.youtube.com/watch?v=FU4WlwfS3G0 chapter from SD interview insider’s guide https://medium.com/criteo-engineering/distributed-rate-limiting-algorithms-a35f7e24783 https://www.mailgun.com/blog/it-and-engineering/gubernator-cloud-native-distributed-rate-limiting-microservices/

todo: review video and discuss Rate Limiter with Vladimir write paper and place it to my site communicate with other people and get feedback about my composed solution

structure of the article

introduction

Which things should be in the article?

Understand the problem and establish the design scope

As usual, let’s start with the problem statement.

  • So, here is an example. We have launched a web application which has become highly popular. Suddenly, one or several clients started to send much more requests than they did previously. And because of this, other clients of our application begin to experience higher latency for their requests, or a higher rate of failed requests. These situations may lead to a so-called “noisy neighbor problem” when one client utilizes too many shared resources on a service host, like CPU, memory, disk, or network I/O.
  • Get another example. We may need cost reduction. Limiting excess requests means fewer servers and allocating more resources to high-priority APIs.

As a solution to these kinds of problems, we can use throttling.

Architectural characteristics:

  • low latency
  • accuracy
  • scalability
  • high availability
  • fault tolerance
  • integration easy

Possible requirement types for RL:

  • where to place: client-side or server-side
  • scale: startup or big company
  • what information give to clients of throttled requests
  • kind of throttling: soft or hard
  • throttling rules: user id, IP, other properties
  • level of RL
  • should RL be part of an application or done as a separate service

Further, we can list some patterns to use:

  • autoscaling capability of our service
  • load balancer
  • rate limiter <– select this option

Create a High-Level Design

https://youtu.be/FU4WlwfS3G0?t=559

Check design with envelope estimation

Desing Deep Dive

There are 3 possible ways: OOP https://youtu.be/FU4WlwfS3G0?t=1009 Going Distributed: We need to decide between availability and performance trade-offs.

Distributed algorithms

What we need is a centralized and synchronous storage system and an algorithm that can leverage it to compute the current rate for each client. An in-memory cache (like Memcached or Redis) is ideal. But not all rate-limiting algorithms can be implemented with every caching system. So let’s see what kind of algorithm exists.

Wrapping Up

Algorithms

Criteo has said that a rate limiter is a critical piece of software, and its very goal is to protect the rest of the system from a heavy load. But I’m afraid I have to disagree with this point because there is a case where we can use it to suppress requests from a client on a free plan in a scheme with paid/free contracts.

You can refill a bucket at a fixed rate and on a token retrieval call.

  • list definitions of the chosen architectural charateristics

Follow up: performance optimization

So we have two models: one metamodel that contains a broad problem analysis and the part containing things mostly belonging to the SD interview process.