URL shorteners seem simple on the surface, but designing one at scale reveals fascinating engineering challenges. This is a classic system design interview question - let's solve it step by step.
Step 1: Requirements Gathering
Functional Requirements:
- Given a long URL, generate a short unique URL
- When users visit the short URL, redirect to the original
- Links should expire after a configurable time
- Users can optionally create custom short URLs
Non-Functional Requirements:
- High availability (99.99% uptime)
- Low latency redirects (< 100ms)
- Short URLs should not be predictable
Scale Estimates:
Write: 100M new URLs per month
Read: 10B redirects per month (100:1 read-write ratio)
Storage: 100M * 12 months * 5 years * 500 bytes = ~300GB
QPS: ~4000 writes/sec, ~400K reads/secStep 2: High-Level Architecture
Client --> Load Balancer --> API Servers --> Cache (Redis)
--> Database (Cassandra)
Flow:
1. Client sends long URL
2. API server generates short code
3. Stores mapping in DB + cache
4. Returns short URL
Redirect:
1. Client visits short URL
2. Cache lookup (fast path)
3. If cache miss, DB lookup
4. Return 301/302 redirect
Step 3: URL Encoding Strategy
// Base62 encoding (a-z, A-Z, 0-9)
// 7 characters = 62^7 = 3.5 trillion unique URLs
function encode(id) {
const chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
let shortUrl = "";
while (id > 0) {
shortUrl = chars[id % 62] + shortUrl;
id = Math.floor(id / 62);
}
return shortUrl.padStart(7, "a");
}
// Example: encode(125) = "aaaaaCB"ID Generation Options:
| Approach | Pros | Cons |
|---|---|---|
| Auto-increment DB | Simple, unique | Single point of failure |
| UUID + Hash | No coordination | Collisions possible |
| Snowflake ID | Distributed, sorted | Complex setup |
| Range-based | Fast, no conflicts | Range management |
Step 4: Database Design
CREATE TABLE urls (
id BIGINT PRIMARY KEY,
short_code VARCHAR(10) UNIQUE NOT NULL,
original_url TEXT NOT NULL,
user_id BIGINT,
created_at TIMESTAMP DEFAULT NOW(),
expires_at TIMESTAMP,
click_count BIGINT DEFAULT 0,
INDEX idx_short_code (short_code),
INDEX idx_expires (expires_at)
);
-- Analytics table (separate for performance)
CREATE TABLE url_clicks (
id BIGINT PRIMARY KEY,
url_id BIGINT,
clicked_at TIMESTAMP,
ip_address VARCHAR(45),
user_agent TEXT,
referrer TEXT,
country VARCHAR(2)
);Step 5: Scaling Strategies
- Caching: Redis cache for hot URLs (80/20 rule - 20% of URLs get 80% of traffic)
- Database Sharding: Shard by short_code hash for write distribution
- Read Replicas: Multiple read replicas for redirect lookups
- CDN: Edge caching for popular redirects
- Rate Limiting: Prevent abuse of URL creation endpoint
Key Design Decisions
- 301 vs 302 redirect? Use 302 (temporary) if you want analytics; 301 (permanent) if you want browser caching
- Custom URLs? Check availability, store in same table with custom flag
- Expiration? Background job to clean up, lazy deletion on access
Interview Tip: Always start with requirements and back-of-envelope calculations. Show the interviewer you think about scale before jumping into design.
No comments yet. Be the first!