The security audit report landed on my desk on a Friday afternoon, and it wasn't pretty. While our access token expiration was tight (15 minutes), our handling of refresh tokens created a massive vulnerability window. A penetration tester had successfully exfiltrated a valid refresh token via a minor XSS vulnerability. Because our system only checked for expiration, they could silently renew sessions for days without us knowing. Standard JWT Security practices often glaze over what happens after a token leaks. If you are relying solely on expiration dates, you are already compromised.
The Persistence Problem in Authentication Systems
In the project I was architecting—a fintech dashboard serving roughly 50,000 concurrent users—we faced a classic trade-off between user experience (keeping them logged in) and Web Security. We used stateless JWTs for access, but the refresh tokens were stored in a secure HttpOnly cookie. The issue wasn't the storage; it was the lifecycle.
When a refresh token is static (valid for 30 days until expiry), a theft is catastrophic. The attacker has a duplicate key to the house. We initially thought about using a blacklist in Redis, but that requires maintaining state for every invalidated token, which bloats memory usage over time.
We needed a mechanism that changes the lock every time the key is used. This is where Refresh Token Rotation comes into play. It turns a static credential into a one-time-use artifact.
Why Simple Rotation Failed
My first attempt at rotation was naive. I simply wrote logic to issue a new refresh token and delete the old one upon use. However, I didn't account for network latency and race conditions. The client app (a React SPA) would sometimes fire two parallel requests (e.g., one for profile data, one for notifications) when the access token was expired. Both requests would send the same refresh token simultaneously.
The first request would succeed, rotate the token, and delete the old one. The second request, processing milliseconds later, would find the token missing and throw a `403 Forbidden`, logging the user out immediately. This caused a massive spike in "random logouts" for users with slower connections.
The Solution: Reuse Detection & Token Families
To solve the race condition and the theft issue simultaneously, we need to implement "Reuse Detection" as recommended by the OAuth 2.0 Threat Model. The core concept is organizing tokens into "families."
Here is the strategy:
- Rotation: Every time a refresh token is used, issue a new one and invalidate the parent.
- Family Trace: Link the new token to the "family" of the previous token.
- The Trap: If an *invalidated* token is used again (which implies either a race condition or theft), we assume the worst: Theft. We then invalidate the entire family of tokens.
This means if an attacker steals Token A, and the user rotates it to Token B, when the attacker tries to use Token A, the system detects the reuse. It sees Token A was already used. It then nukes the whole chain, invalidating Token B (and the legitimate user) immediately. The user is forced to re-login, but the attacker is locked out.
// Data Model (Prisma/SQL Conceptual)
// model RefreshToken {
// id String @id @default(uuid())
// hashedToken String
// userId String
// familyId String // Critical for linking the chain
// isUsed Boolean @default(false)
// expiresAt DateTime
// }
/**
* Rotates token and handles Reuse Detection
* @param {string} incomingToken - The token sent by the client
*/
async function rotateRefreshToken(incomingToken) {
const tokenRecord = await db.refreshToken.findUnique({
where: { hashedToken: hash(incomingToken) }
});
if (!tokenRecord) throw new Error("Invalid Token");
// REUSE DETECTION LOGIC
if (tokenRecord.isUsed) {
// SECURITY ALERT: Attempt to use an old token.
// This implies theft or a serious race condition.
// Nuke the entire family to stop the attacker.
await db.refreshToken.deleteMany({
where: { familyId: tokenRecord.familyId }
});
throw new SecurityException("Reuse detected! Session terminated for safety.");
}
// Happy Path: Rotate
const newFamilyId = tokenRecord.familyId; // Keep the family
const newTokenString = generateSecureRandomString();
// Transactional Update
await db.$transaction([
// 1. Mark current as used
db.refreshToken.update({
where: { id: tokenRecord.id },
data: { isUsed: true }
}),
// 2. Issue new descendant
db.refreshToken.create({
data: {
hashedToken: hash(newTokenString),
userId: tokenRecord.userId,
familyId: newFamilyId,
isUsed: false,
expiresAt: getExpirationDate()
}
})
]);
return newTokenString;
}
The code above demonstrates the critical pivot point. The `isUsed` flag acts as a tripwire. Unlike my failed attempt, we don't delete the old token immediately; we mark it. This allows us to distinguish between "token not found" and "token reused."
Performance Impact & Database Load
Implementing this in high-throughput Authentication Systems introduces a write operation on every token refresh. Previously, validating a JWT might have been purely CPU-bound (signature verification). Now, it requires a database read and write. We benchmarked this transition on AWS RDS (Postgres).
| Metric | Stateless Refresh (Legacy) | Rotation + Reuse Detection |
|---|---|---|
| Latency (P99) | 15ms | 45ms |
| DB Writes / Login | 1 | 1 + (N Refreshes) |
| Security Status | Vulnerable (Long window) | Secure (Self-healing) |
While the latency tripled, 45ms is still negligible for a token refresh operation that happens only once every 15-60 minutes per user. The security gain vastly outweighs the I/O cost. By indexing the `hashedToken` and `familyId` columns, we maintained consistent performance even with millions of rows.
Edge Cases & Concurrency Handling
Even with this robust logic, you must handle concurrency on the client side. If your frontend fires 5 requests at once, and all fail with `401 Unauthorized`, they might all try to refresh the token simultaneously.
This will trigger your "Reuse Detection" tripwire because the first request rotates the token, and the subsequent 4 requests send the *now-used* token. This results in the user being logged out. To fix this, you must implement a "Lock" or "Promise Singleton" pattern in your frontend Axios/Fetch interceptor. Check if a refresh is already in progress, and if so, queue the other requests to wait for the new token.
Conclusion
Implementing Refresh Token Rotation with reuse detection transforms your authentication layer from a static target into a moving one. By accepting the slight overhead of database persistence, you effectively neutralize the threat of stolen long-lived tokens. Remember, in security, if you can't prevent the theft, you must ensure the stolen goods are worthless. This architecture ensures that any stolen token becomes a self-destruct button for the attacker's session.
Post a Comment