Stopping API Abuse: Migrating from Fixed Window to Redis Sliding Window (Lua Implementation)

At 00:00:01, our database CPU spiked to 98%. We were supposed to have strict API Rate Limiting in place—capped at 1,000 requests per minute. Yet, monitoring logs showed nearly 2,000 requests slipping through in a localized 2-second window. The culprit wasn't a DDoS attack; it was a fundamental flaw in our initial "Fixed Window" implementation. The counter reset exactly at the minute mark, allowing savvy users (and bots) to exhaust their quota at 00:00:59 and immediately start a new batch at 00:01:00. This created a "thundering herd" effect that our backend wasn't scaled to handle.

The Flaw in Fixed Window Counters

Our environment runs on a cluster of Node.js microservices backed by Redis Cluster. We serve approximately 15,000 requests per second (RPS) during peak hours. The initial defense strategy was simple: efficient, low-latency Traffic Control using Redis string counters.

The logic was straightforward: INCR key:user_id:minute_timestamp. If the result > limit, reject. While this is O(1) and extremely memory-efficient, it ignores the granularity of time. It treats 12:00:59 and 12:01:00 as completely separate buckets, effectively allowing a user to double their throughput if they time their bursts across the boundary. In a high-stakes environment where Backend Security is paramount, this margin of error is unacceptable.

Production Log Analysis:
Timestamp 12:00:59.800 -> User X Requests: 998/1000 (Allowed)
Timestamp 12:01:00.100 -> User X Requests: 50/1000 (Allowed)
Result: 1048 requests in 300ms. Database connection pool exhausted.

Why Simple Expiration Didn't Work

My first attempt to fix this without changing the architecture involved using a "Token Bucket" approach in application memory. However, in a distributed system with 20+ auto-scaling pods, local memory counters are useless—they don't share state. A user could hit Pod A 500 times and Pod B 500 times. We needed a centralized, atomic solution. I also considered simply reducing the window size (e.g., per second instead of per minute), but that blocked legitimate users who had legitimate short bursts of activity. The only robust solution was the Sliding Window Algorithm.

The Solution: Redis Sorted Sets & Lua

To implement a true sliding window, we need to store the timestamps of each request, not just a count. Redis Sorted Sets (ZSET) are perfect for this. The "Score" is the timestamp (in milliseconds), and the "Member" is a unique identifier for the request (UUID).

However, running multiple Redis commands (ZREMRANGEBYSCORE, ZADD, ZCARD) from the application layer introduces network latency and race conditions. If two requests come in simultaneously, they might both read a count of 999 and both proceed. To guarantee atomicity, we must use Redis Lua scripting. This executes the entire logic server-side as a single atomic operation.

-- Sliding Window Rate Limiter in Lua
-- KEYS[1]: The rate limit key (e.g., "rate_limit:user:123")
-- ARGV[1]: Current timestamp in milliseconds
-- ARGV[2]: Window size in milliseconds (e.g., 60000 for 1 min)
-- ARGV[3]: Max allowed requests (e.g., 100)
-- ARGV[4]: Unique Request ID (UUID) for uniqueness in ZSET

local key = KEYS[1]
local current_time = tonumber(ARGV[1])
local window_size = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])
local unique_id = ARGV[4]

-- 1. Calculate the start of the sliding window
local window_start = current_time - window_size

-- 2. Remove requests that are older than the window
-- This is the "Sliding" part: cleaning up old data
redis.call('ZREMRANGEBYSCORE', key, '-inf', window_start)

-- 3. Count how many requests are currently in the active window
local request_count = redis.call('ZCARD', key)

-- 4. Decision logic
if request_count < limit then
    -- Allowed: Add the current request timestamp
    redis.call('ZADD', key, current_time, unique_id)
    -- Set a TTL on the whole key to clean up if the user stops active usage
    -- TTL should be slightly longer than the window size
    redis.call('PEXPIRE', key, window_size + 1000)
    return 1 -- Request Allowed
else
    return 0 -- Request Denied (Too Many Requests)
end

The logic inside the script is critical. First, we perform a ZREMRANGEBYSCORE to evict old entries. This ensures that the ZCARD (count) only reflects requests within the strictly defined [current_time - window_size, current_time] interval. By wrapping this in Lua, no other request can interleave between the count check and the addition of the new request.

Performance Verification

Moving from O(1) counters to O(log N) Sorted Sets does come with a cost. We ran benchmarks using wrk to ensure our Redis CPU wouldn't melt under the load of ZSET operations. The test scenario assumed a window size of 1 minute and a limit of 1,000 requests.

Metric	Fixed Window (INCR)	Sliding Window (ZSET + Lua)	Impact Analysis
Accuracy	Low (±100% burst potential)	High (Exact per ms)	Critical for preventing DB overload.
Redis CPU Usage (10k RPS)	4%	18%	Higher, but acceptable for the safety gained.
Memory per User	~50 Bytes	~120 KB (for 1k entries)	See caveats below.
Average Latency	0.4ms	0.9ms	Negligible impact on user experience.

The increase in Redis CPU usage was noticeable but manageable. The trade-off is clear: we exchange some compute cycles and memory for mathematical precision in traffic shaping. The elimination of the boundary burst meant our downstream services saw a smooth, predictable load curve, even during flash sales.

Check Official Redis Lua Documentation

Edge Cases & Memory Management

While this solution provides superior Backend Security, it is not a silver bullet. You must consider the memory footprint. In the Fixed Window approach, a user with 10,000 requests takes up the same space as a user with 1 request (just a simple integer). In the Sliding Window approach using ZSETs, every single request adds an entry to the set.

If you set a rate limit of 10,000 requests per minute, you are storing 10,000 UUIDs and timestamps per active user. For 1 million active users, this can explode your Redis memory usage.

Optimization Tip: If your rate limits are very high (e.g., > 2,000 RPM), consider a hybrid approach or the Generic Cell Rate Algorithm (GCRA), which calculates leak rates mathematically without storing individual timestamps. The Sliding Window Log is best for strict, lower-volume endpoints (e.g., Login, Payment, OTP generation).

Another potential issue is clock skew. If your application servers have drifting clocks, the current_time passed to Lua might be inconsistent. Ensure all your servers are synced via NTP, or better yet, use the redis.call('TIME') command inside the Lua script (though this has implications for script replication in older Redis versions).

Conclusion

Switching to a Sliding Window algorithm using Redis Lua scripts solved our "thundering herd" issues at the minute boundaries. It gave us precise control over API Rate Limiting, ensuring that 1,000 requests per minute meant exactly that—any sliding 60-second window, not just clock-aligned minutes. While it requires more memory and CPU than simple counters, the stability it brings to high-traffic production systems makes it the professional choice for robust backend architecture.