How I Built a Production-Grade Auth Service From Scratch
A complete authentication microservice with JWT, MFA, OAuth, role-based access, and monitoring. Explained two ways.
What Is This?
Think about every app you use. Instagram, Gmail, Spotify, your banking app. Before you can scroll your feed, read an email, or play a song, you have to log in.
That login screen looks simple. Just an email and a password. But behind that little form, there's an entire system deciding if you're really you, keeping hackers out, remembering you so you don't have to log in every 5 minutes, and locking things down when something suspicious happens.
Someone had to build that system. This is what I built.
Think of it like a bouncer at a nightclub. He checks your ID at the door (authentication), checks if you're on the VIP list (authorization), gives you a wristband so you don't have to show your ID at every bar inside (tokens), and writes your name in the logbook so there's a record of everyone who came and went (audit trail).
Except this bouncer also:
- Lets you skip the line if you have a Google or GitHub badge, like "Sign in with Google"
- Sends a text to your phone to double-check it's really you. Even if someone stole your password, they still need your phone
- Emails you a secret link if you forget your password
- Locks the door after 5 wrong guesses so nobody can brute-force their way in
- Keeps a detailed security log of every login, every failure, every suspicious event
- Sign up and Log in with email/password or Google/GitHub OAuth
- Two-factor authentication (MFA) with 6-digit codes from an authenticator app + backup codes
- Password reset via secure email link, auto-logs out all devices
- Session management to see which devices you're logged in on and revoke any of them
- Admin dashboard to manage users, view audit logs, and 7-day stats
- Auto-monitoring with Prometheus metrics and Grafana dashboards out of the box
A production-grade authentication & authorization microservice built on Node.js/Express with TypeScript strict mode. It implements industry-standard security patterns including JWT RS256 asymmetric signing, Argon2id password hashing, TOTP-based MFA, refresh token rotation with family-based reuse detection, Redis-backed sliding window rate limiting, and a comprehensive audit trail.
- Dependency Injection: Services accept Prisma/Redis clients via constructor
- Modular Feature Organization: auth, mfa, oauth, user, session, admin modules
- Defense in Depth: 14 security layers from Helmet headers to audit logging
- Zero Trust Tokens: Asymmetric RS256, token blacklist, rotation, family tracking
- Operational Readiness: Health checks, Prometheus metrics, Grafana dashboards, structured logging
The Big Picture
Here's how the whole system works at a high level:
The "pass" is called a JWT token. It's a long string of characters that proves who you are. You get a short-lived one (15 minutes) and a long-lived one (7 days) that can get you a new short one.
The system uses a dual-token architecture with RS256 asymmetric JWTs:
Tech Stack
Here are the main tools I used and why:
| Tool | What It Does | Why I Chose It |
|---|---|---|
| Node.js + Express | Runs the server | Fast, huge ecosystem, everyone knows it |
| TypeScript | Adds type safety to JavaScript | Catches bugs before they happen |
| PostgreSQL | Stores users, tokens, logs | Rock-solid relational database |
| Redis | Fast in-memory cache | Rate limiting, token blacklist, session tracking |
| Prisma | Talks to the database for us | Auto-generates TypeScript types from schema |
| Docker | Packages everything into containers | One command to run the whole thing |
| Layer | Technology | Rationale |
|---|---|---|
| Runtime | Node.js 20 + TypeScript 5.6 (strict) | Strict mode with noImplicitAny, strictNullChecks, noUnusedLocals |
| Framework | Express 4.21 | Minimal overhead, composable middleware, massive ecosystem |
| ORM | Prisma 5.22 | Type-safe query builder, automatic migration SQL, introspection |
| Cache | ioredis 5.4 → Redis 7 | Pipelining, Lua scripting, sorted sets for sliding window |
| Auth | jsonwebtoken (RS256) + Argon2id | Asymmetric JWTs for microservice verification, memory-hard hashing |
| MFA | otplib (RFC 6238) + qrcode | TOTP standard, compatible with Google Authenticator / Authy |
| OAuth | Passport.js (Google + GitHub strategies) | Proven library, handles PKCE + state parameter automatically |
| Validation | Zod 3.23 | Runtime type validation, auto-strips unknown fields, composable |
| Logging | Pino 9.5 | JSON structured, 5x faster than winston, automatic redaction |
| Metrics | prom-client 15.1 | Prometheus-native, histograms for latency, counters for auth events |
| Security Headers | Helmet 8 | CSP, HSTS (1yr), X-Frame-Options DENY, nosniff, referrer-policy |
| Nodemailer 8 | Ethereal auto-accounts in dev, configurable SMTP in prod | |
| Testing | Jest 29 + Supertest 7 + ts-jest | Unit + integration + E2E, 80% coverage thresholds |
| Container | Docker multi-stage (Alpine) + docker-compose | 263MB production image, dumb-init for PID 1, non-root user |
Project Structure
The code is organized like a building where each floor has a specific purpose:
Each module follows the same pattern: routes (URLs) → controller (handles the request) → service (business logic) → database.
Follows a modular monolith pattern with feature-based organization. Each module has its own controller, service, routes, validation schemas, and tests.
How Authentication Works
Authentication is like a passport system. Here's the journey a user takes:
User provides email + password. We hash the password (scramble it so no one can read it), save the user, and send a verification email.
User clicks the link in the email. We mark their account as verified.
User provides email + password. We check the password against the scrambled version. If correct, we give them two tokens: a short pass (15 min) and a long pass (7 days).
Every request sends the short pass in the header. We verify it's real and not expired.
When the short pass expires, the long pass gets a new short pass. The old long pass is destroyed.
The short pass gets blacklisted so no one can use it, even if they stole it.
Complete Authentication Lifecycle
Database Design
The database has 5 tables. Think of each table as a spreadsheet:
Entity Relationship
Key Design Decisions
- Token hashing: Never store raw tokens. All tokens (refresh, email verify, password reset) stored as SHA-256 hashes. If the DB is breached, tokens are useless.
- Refresh token families: Each login session creates a family UUID. All rotated tokens share it. If a revoked token is replayed, the entire family gets nuked.
- Soft deletes: Users have
deletedAtinstead of hard deletes. Preserves audit trail integrity. - MFA secret encryption: TOTP secrets encrypted at rest with AES-256-GCM using a dedicated
MFA_ENCRYPTION_KEY. Format:iv:authTag:ciphertext(all hex). - Backup codes: 8 codes generated per MFA enrollment, each SHA-256 hashed. One-time use (
usedAttimestamp marks consumption).
Security Deep Dive
Security is built in at every level. Here are the protections:
| Protection | What It Means |
|---|---|
| Password Hashing | Passwords are scrambled using Argon2id. Even if hackers steal the database, they can't recover passwords. |
| Token Rotation | Every time you refresh your login, you get a brand new token and the old one is destroyed. If a hacker steals the old one, the system detects it and locks down all your sessions. |
| Two-Factor Auth | Even if someone knows your password, they need a 6-digit code from your phone to get in. |
| Rate Limiting | Limits how fast someone can try passwords. After 5 wrong guesses, the account locks for 15 minutes. |
| Email Enumeration Prevention | The login page never tells you "this email doesn't exist" so hackers can't probe for valid emails. |
| Security Headers | Browser protections that prevent clickjacking, code injection, and other attacks. |
| Audit Logs | Every security event is recorded: logins, failures, password changes, role changes. |
1. Password Security: Argon2id
// OWASP 2023 recommended parameters
const ARGON2_OPTIONS = {
type: argon2.argon2id, // Hybrid: resists GPU + side-channel attacks
memoryCost: 65536, // 64 MiB per hash
timeCost: 3, // 3 iterations
parallelism: 4, // 4 threads
};
Each hash includes a random salt. The output is self-describing: $argon2id$v=19$m=65536,t=3,p=4$SALT$HASH. Verification is constant-time to prevent timing attacks.
2. JWT Architecture: RS256
// Asymmetric signing: private key signs, public key verifies
// This means microservices only need the PUBLIC key to verify tokens
const signOptions: SignOptions = {
algorithm: 'RS256',
expiresIn: '15m', // Short-lived access tokens
subject: payload.userId,
jwtid: crypto.randomUUID(), // Unique per token (prevents same-second collisions)
};
// Access token payload
{ userId, email, role, type: 'access', iat, exp, sub, jti }
// Refresh token payload (adds family for rotation tracking)
{ userId, email, role, type: 'refresh', family: 'uuid', iat, exp, sub, jti }
3. Sliding Window Rate Limiter
// Redis Sorted Set algorithm (per-IP, per-endpoint)
// Key: rl:auth:{ip}:{normalized_path}
pipeline.zremrangebyscore(key, '-inf', windowStart); // 1. Prune old entries
pipeline.zcard(key); // 2. Count remaining
pipeline.zadd(key, now, uuid); // 3. Add this request
pipeline.pexpire(key, windowMs); // 4. Auto-expire key
// Global: 100 req / 60s per IP
// Auth endpoints: 5 req / 60s per IP (brute-force protection)
// Path normalization: /users/abc-123 → /users/:id (prevents key explosion)
4. Token Blacklist (Logout)
// On logout, add access token to Redis with TTL = remaining lifetime
const remainingSeconds = payload.exp - Math.floor(Date.now() / 1000);
await redis.set(`bl:${accessToken}`, '1', 'EX', remainingSeconds);
// authenticate middleware checks EVERY request:
const isBlacklisted = await redis.get(`bl:${token}`);
if (isBlacklisted) throw new AuthenticationError('Token has been revoked');
5. MFA: TOTP + AES-256-GCM
// TOTP secret encrypted at rest
// encrypt(): iv (12 bytes) + AES-256-GCM + authTag (16 bytes)
// Storage format: "hex(iv):hex(authTag):hex(ciphertext)"
// Key: MFA_ENCRYPTION_KEY env var (256-bit / 64 hex chars)
// Verification allows ±30s window (standard TOTP tolerance)
authenticator.check(userCode, decryptedSecret); // true/false
Middleware Pipeline
Every request passes through a series of checkpoints before reaching your code. Think of it as airport security:
Adds protective headers to every response. Think of it as putting on a seatbelt.
Only allows requests from approved websites.
Stamps each request with a unique ID and times how long it takes.
Blocks you if you're making too many requests too fast.
Verifies your token is real, not expired, and not blacklisted.
For admin routes, checks you have the ADMIN role.
Checks your data is in the right format before processing.
If anything goes wrong at any step, catches it and sends a clean error message.
Execution Order (app.ts)
// ─── Global (applied to ALL routes) ────────────────
app.use(securityHeaders); // 1. Helmet: CSP, HSTS, X-Frame-Options, nosniff
app.use(corsMiddleware); // 2. CORS whitelist from CORS_ORIGINS env
app.use(compression()); // 3. Gzip responses
app.use(express.json({limit: '10kb'})); // 4. Body parser (DoS protection)
app.use(cookieParser()); // 5. Parse Cookie header
app.use(requestLogger); // 6. X-Request-ID, duration, field redaction
app.use(metricsMiddleware); // 7. Prometheus counters & histograms
app.use(globalRateLimiter); // 8. 100 req/60s per IP (Redis sorted set)
app.use(passport.initialize()); // 9. OAuth strategy registration
// ─── Route-specific (applied per-route) ────────────
// authRateLimiter // 5 req/60s (on /login, /register, /forgot-password)
// authenticate // JWT verify + Redis blacklist check
// authorize('ADMIN') // Role-based access control
// validateRequest(schema) // Zod validation on body/params/query
// ─── Global catch-all ──────────────────────────────
app.use(errorHandler); // Normalizes ALL errors to standard envelope
Registration
When someone signs up, here's what happens behind the scenes:
- We check if the email is already taken
- We scramble the password with Argon2id, an algorithm that uses 64MB of memory per hash, making it extremely hard to crack
- We generate a random verification code and email it to them
- We create two tokens: a short-lived one for immediate use, and a long-lived one for refreshing
- We log the event for auditing
See the Authentication Lifecycle section for the full registration flow diagram. Key implementation details:
- Email verification token:
crypto.randomBytes(32).toString('hex')→ 64 hex chars - Stored as SHA-256 hash (if DB is breached, raw tokens are useless)
- Email sent via
void emailService.sendVerificationEmail()(fire-and-forget, errors logged but don't break registration) - ConflictError (409) on duplicate email, not generic 400
Login & MFA
Regular Login
You enter your email and password. If correct, you get your tokens and you're in.
What if someone guesses wrong?
After 5 wrong guesses, the account locks for 15 minutes. This prevents hackers from trying thousands of passwords.
Two-Factor Authentication (MFA)
If you've turned on MFA, logging in with your password isn't enough. You also need a 6-digit code from an app like Google Authenticator. This code changes every 30 seconds. Even if a hacker knows your password, they can't get in without your phone.
You also get 8 backup codes when you set up MFA. Each can be used once if you lose your phone.
Account Lockout Implementation
// Redis-backed counter with TTL (survives server restarts)
const key = `failed_login:${userId}`;
const attempts = await redis.incr(key); // Atomic increment
await redis.expire(key, 900); // 15 min TTL
if (attempts >= 5) {
await prisma.user.update({
where: { id: user.id },
data: { lockedUntil: new Date(Date.now() + 15 * 60 * 1000) },
});
}
MFA Challenge Flow
When user.mfaEnabled === true, login returns a short-lived MFA challenge token (type: mfa_challenge, 15min expiry) instead of full tokens. The client must then call POST /mfa/verify-login with this token + a valid TOTP code to receive full access/refresh tokens.
TOTP secrets stored encrypted: AES-256-GCM(secret, MFA_ENCRYPTION_KEY) → iv:authTag:ciphertext. Decrypted only during verification.
Token System
Lives for 15 minutes. Sent with every request. If it gets stolen, it's only useful for 15 minutes. Signed with RSA cryptography so it's impossible to forge.
Lives for 7 days. Only used to get new access tokens. Every time it's used, it self-destructs and a new one is created. If someone tries to reuse an old one, all sessions get killed.
When you log out, your access token goes on a "banned list" in Redis. Any request with that token is immediately rejected.
Token Lifecycle State Machine
Token Cleanup Job
// Runs hourly via setInterval (unref'd so it won't prevent process exit)
await prisma.refreshToken.deleteMany({
where: { OR: [
{ expiresAt: { lt: now } }, // Expired tokens
{ revokedAt: { not: null, lt: revokedCutoff } }, // Revoked >24h ago
]},
});
Password Reset
- User clicks "Forgot Password" and enters their email
- We always say "check your email" even if the email doesn't exist, so hackers can't check which emails are registered
- We email a secret reset link (valid for 1 hour)
- User clicks the link, enters a new password
- We update the password and log out all devices (all refresh tokens get revoked)
Two-step process: POST /auth/forgot-password (generates token) → POST /auth/reset-password/:token (consumes token). The reset token is a 32-byte random hex string, stored as SHA-256. Expiry: 1 hour (PASSWORD_RESET_EXPIRY_MS = 60 * 60 * 1000). After reset, all user's refresh tokens are revoked with revokedAt = now() to force re-authentication on every device.
OAuth (Google & GitHub)
Users can sign in with their Google or GitHub accounts instead of creating a password. When they click "Login with Google":
- We redirect them to Google's login page
- Google verifies them and sends us back their email/name
- If they're new, we create an account. If they exist, we link the accounts
- They get their tokens and are logged in
This is optional. It only works if you configure Google/GitHub API keys in the environment variables.
Implemented via Passport.js strategies. Each strategy is conditionally loaded. If the client ID env var is not set, the strategy is skipped with a warning log. Account resolution logic: (1) find by OAuthAccount(provider, providerId), (2) find by email match, (3) create new user with emailVerified: true.
Admin Panel
Admin users (role: ADMIN) get access to extra endpoints:
- User Management: list all users, search, filter, lock/unlock accounts, change roles
- Audit Logs: view every security event (who logged in, from where, when)
- Dashboard: 7-day statistics on signups, logins, and active users
Regular users get a "403 Forbidden" error if they try to access these.
All admin routes are guarded by authenticate + authorize('ADMIN') middleware. The authorize middleware checks req.userRole (set by authenticate). The admin service supports paginated queries with Zod-validated params: page, limit (max 100), role filter, status filter (active/locked/deleted), dateFrom/dateTo, search (email fuzzy match), sortBy/sortOrder.
All API Endpoints
Authentication
| Method | Endpoint | Auth | Description |
|---|---|---|---|
| POST | /api/v1/auth/register | Public | Register new account |
| POST | /api/v1/auth/login | Public | Login with email & password |
| POST | /api/v1/auth/refresh | Public | Exchange refresh token for new pair |
| POST | /api/v1/auth/logout | Auth | Blacklist current access token |
| POST | /api/v1/auth/verify-email/:token | Public | Confirm email address |
| POST | /api/v1/auth/forgot-password | Public | Initiate password reset |
| POST | /api/v1/auth/reset-password/:token | Public | Complete password reset |
Multi-Factor Authentication
| Method | Endpoint | Auth | Description |
|---|---|---|---|
| POST | /api/v1/mfa/enable | Auth | Generate TOTP secret & backup codes |
| POST | /api/v1/mfa/verify | Auth | Activate MFA with TOTP code |
| POST | /api/v1/mfa/verify-login | Public | Complete MFA login challenge |
| POST | /api/v1/mfa/verify-backup-code | Public | Use backup code for MFA |
| POST | /api/v1/mfa/disable | Auth | Disable MFA (requires password + TOTP) |
User & Sessions
| Method | Endpoint | Auth | Description |
|---|---|---|---|
| GET | /api/v1/users/me | Auth | Get current user profile |
| GET | /api/v1/sessions | Auth | List active sessions |
| DEL | /api/v1/sessions/:id | Auth | Revoke a session |
Admin (RBAC: ADMIN role required)
| Method | Endpoint | Auth | Description |
|---|---|---|---|
| GET | /api/v1/admin/users | Admin | List all users (paginated, filterable) |
| GET | /api/v1/admin/users/:id | Admin | Get user details |
| PATCH | /api/v1/admin/users/:id | Admin | Update user (role, lock, verify) |
| GET | /api/v1/admin/audit-logs | Admin | View audit trail |
| GET | /api/v1/admin/dashboard | Admin | Dashboard stats (7-day trends) |
System
| Method | Endpoint | Description |
|---|---|---|
| GET | /health | Health check (Docker liveness probe) |
| GET | /metrics | Prometheus metrics |
| GET | /api-docs | Swagger UI (interactive docs) |
| GET | /demo | Interactive demo page |
Error Handling
Every error returns a consistent format so your frontend always knows what to expect:
{
"success": false,
"error": {
"code": "AUTHENTICATION_ERROR",
"message": "Invalid email or password",
"statusCode": 401
}
}
Errors never expose sensitive information. In production, unexpected errors return a generic message while the full details are logged server-side.
Error Class Hierarchy
AppError (base)
├── AuthenticationError // 401 - invalid token, wrong password
├── AuthorizationError // 403 - insufficient role
├── ValidationError // 400 - Zod schema failures (includes field errors[])
├── NotFoundError // 404 - user/resource not found
├── ConflictError // 409 - duplicate email
├── RateLimitError // 429 - too many requests
└── InternalError // 500 - isOperational=false (masked in prod)
The global errorHandler middleware catches all errors, maps Prisma errors (P2002 → 409, P2025 → 404), Zod errors to ValidationError, and masks non-operational errors in production (returns generic "An unexpected error occurred" while logging full stack trace).
Docker & Deployment
The entire system runs with one command:
docker-compose up
This starts 5 services:
- App (port 3000): the auth service itself
- PostgreSQL (port 5432): the database
- Redis (port 6379): fast cache for sessions and rate-limiting
- Prometheus (port 9090): collects performance metrics
- Grafana (port 3001): dashboards to monitor everything
Multi-Stage Dockerfile
# Stage 1: Build (full Node, all devDeps)
FROM node:20-alpine AS builder
RUN npm ci # Deterministic install
RUN npx prisma generate # Generate typed client
RUN npm run build # tsc → dist/
# Stage 2: Production (minimal image)
FROM node:20-alpine
RUN apk add --no-cache dumb-init openssl
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/prisma ./prisma
COPY --from=builder /app/public ./public
USER node # Non-root user
CMD ["dumb-init", "node", "dist/server.js"] # PID 1 handler
docker-compose.yml
5 services on a shared bridge network (auth-network). App depends on postgres/redis health checks. Postgres uses pg_isready, Redis uses redis-cli ping. Data persisted via named volumes. Docker entrypoint generates RSA keys if missing, runs Prisma migrations, and seeds in dev.
Monitoring
The service automatically tracks its own health:
- Structured Logs: every request is logged as JSON with a unique ID, timing, and user info
- Prometheus Metrics: request counts, response times, error rates, active sessions
- Grafana Dashboards: visual graphs showing traffic, latency, and auth success/failure rates
- Health Check: a
/healthendpoint that Docker uses to know if the service is alive
Prometheus Metrics Collected
| Metric | Type | Labels |
|---|---|---|
http_requests_total | Counter | method, path, status_code |
http_request_duration_seconds | Histogram | method, path |
auth_login_total | Counter | result (success/failure/locked) |
auth_active_sessions | Gauge | - |
auth_rate_limit_hits_total | Counter | - |
| Default process metrics | Various | CPU, memory, event loop lag, GC |
Logging via Pino (JSON in production, pretty-printed in dev). Automatic field redaction for passwords, tokens, and secrets. Every log entry includes requestId for distributed tracing.
Testing
The entire system has 136 automated tests that verify everything works correctly:
| Type | Count | What It Tests |
|---|---|---|
| Unit Tests | 86 | Individual functions in isolation (password hashing, JWT, crypto, validation schemas) |
| Integration Tests | 40 | Full HTTP requests through the server (register, login, admin, MFA) |
| E2E Tests | 10 | Complete user journey: register → verify email → login → enable MFA → MFA login → refresh → sessions → logout |
Run them all with npm test. The project requires 80% code coverage to pass.
Test Architecture
// Coverage thresholds (jest.config.js)
coverageThreshold: {
global: { branches: 70, functions: 80, lines: 80, statements: 80 }
}
// Test commands
npm run test:unit // src/**/*.test.ts (86 tests)
npm run test:integration // tests/integration/ (40 tests, real DB)
npm run test:e2e // tests/e2e/ (10 steps, sequential flow)
Key Testing Patterns
- Unit tests: Pure function testing (argon2 roundtrip, JWT sign/verify, crypto encrypt/decrypt, Zod schema edge cases). No DB or network calls.
- Integration tests: Supertest against real Express app with real Prisma (test DB) and real Redis.
afterEachcleans all tables. - E2E tests: Sequential 10-step journey using
setup-e2e.ts(noafterEachcleanup, onlyafterAll) so state persists between steps. - Test helpers:
factories.tsprovidescreateTestUser()andcreateTestApp()with TestAgent for cookie/header persistence. - TypeScript config:
tsconfig.test.jsonextends base withnoUnusedLocals: falseto allow test-scoped variables.
Auth Microservice • 136 tests • 25 endpoints • 14 security layers
Built with Node.js, TypeScript, PostgreSQL, Redis, and Docker