Load Testing
Run load tests against your deployment to find breaking points before production. Performance Guide
Scale FraiseQL to handle millions of requests with horizontal scaling, smart caching, and database optimization.
FraiseQL scales in four phases:
Simplest production setup: 3+ stateless FraiseQL instances behind a load balancer.
All major platforms support load balancing to multiple FraiseQL instances. Each instance is stateless and connects to the same shared database.
Health check configuration (all platforms):
Endpoint: /health/readyInterval: 30 secondsTimeout: 5 secondsUnhealthy threshold: 3 failuresHealthy threshold: 2 successesAssume:- 1 instance = 1000 RPS capacity- 3 instances = 3000 RPS capacity
Traffic growth:- Month 1: 1000 RPS (1 instance)- Month 2: 2000 RPS (2 instances)- Month 3: 5000 RPS (5 instances)- Month 6: 15000 RPS (15 instances)- Month 12: 100000 RPS (100 instances)Automatically scale instances based on demand.
CPU Utilization (simplest, most common)
Scale up when: Average CPU > 70%Scale down when: Average CPU < 30%Cooldown: 5 min up, 15 min downMemory Utilization (for memory-intensive queries)
Scale up when: Average Memory > 80%Scale down when: Average Memory < 50%Request Count (most accurate for API)
Scale up when: Requests/sec > 5000Scale down when: Requests/sec < 2000Custom Metrics (database queue depth, cache hit rate)
Scale up when: Database pool > 80% utilizedScale down when: Database pool < 40% utilizedMinSize: 3MaxSize: 100DesiredCapacity: 3TargetTrackingScalingPolicies: - TargetValue: 0.70 # Target 70% CPU PredefinedMetric: ASGAverageCPUUtilization ScaleOutCooldown: 60s ScaleInCooldown: 300sminReplicas: 3maxReplicas: 100targetCPUUtilizationPercentage: 70targetMemoryUtilizationPercentage: 80behavior: scaleUp: stabilizationWindowSeconds: 0 policies: - type: Percent value: 100 # Double replicas periodSeconds: 30 scaleDown: stabilizationWindowSeconds: 300 policies: - type: Percent value: 50 # Half replicas periodSeconds: 60# Scale out: +1 instance when CPU > 70%# Scale in: -1 instance when CPU < 30%# Min: 3 instances, Max: 100 instances# Automatic scaling based on request concurrency# Max concurrent requests per instance: 80 (default)# Scale up: +50 instances if queue > 0# Scale down: -1 instance per minute# Max instances: 1000 (configurable)aws autoscaling describe-scaling-activities \ --auto-scaling-group-name fraiseql-asgkubectl get hpa fraiseql --watchaz monitor autoscale history list \ --resource-group mygroup \ --resource fraiseqlAs traffic grows, the database becomes the bottleneck.
Limit connections to prevent database overload:
Without pooling, 1000 instances at 20 connections each would require 20,000 database connections — well beyond what most databases support (typically capped at ~5,000).
With PgBouncer, 50 instances with a minimum pool of 5 results in only 250 connections to the database:
# PgBouncer for PostgreSQLPGBOUNCER_MIN_POOL_SIZE=5PGBOUNCER_MAX_POOL_SIZE=20 # Per databasePGBOUNCER_CONNECTION_TIMEOUT=30PGBOUNCER_IDLE_IN_TRANSACTION_SESSION_TIMEOUT=600Distribute read traffic across replicas. Write queries go to the primary; read queries are spread across replicas.
Setup (AWS RDS):
# Create 3 read replicasfor i in {1..3}; do aws rds create-db-instance-read-replica \ --db-instance-identifier fraiseql-read-$i \ --source-db-instance-identifier fraiseql-proddone
# Configure FraiseQL to read from replicasDATABASE_URL_PRIMARY=postgresql://user:pass@fraiseql-prod:5432/dbDATABASE_URL_REPLICA=postgresql://user:pass@fraiseql-read-1:5432/db
# In code: Route read queries to replica, writes to primaryIdentify slow queries:
-- PostgreSQL: Enable slow query loggingALTER SYSTEM SET log_min_duration_statement = 500; -- Log queries > 500msSELECT pg_reload_conf();
-- View slow queriesSELECT query, calls, mean_exec_time, max_exec_timeFROM pg_stat_statementsORDER BY mean_exec_time DESCLIMIT 10;Add indexes:
-- Find missing indexesEXPLAIN ANALYZESELECT * FROM users WHERE email = 'user@example.com';
-- If seq scan, add indexCREATE INDEX idx_users_email ON users(email);
-- Composite index for common filtersCREATE INDEX idx_posts_user_published ON posts(user_id, published) WHERE published = true;Optimize N+1 queries:
FraiseQL automatically batches queries. But verify with monitoring:
-- If you see 1000 queries per request, something is wrong-- Use query profiling to identify N+1 problems
-- Before (N+1):SELECT * FROM users LIMIT 10; -- 1 querySELECT * FROM posts WHERE user_id = ? (×10) -- 10 queries-- Total: 11 queries
-- After (batched):SELECT * FROM users LIMIT 10; -- 1 querySELECT * FROM posts WHERE user_id IN (?, ?, ...) -- 1 query-- Total: 2 queries
-- FraiseQL does this automatically!For massive scale (millions of users), shard by a key such as user_id:
shard-1.db.example.comshard-2.db.example.comshard-3.db.example.comConfigure in FraiseQL:
@fraiseql.querydef get_user(user_id: ID) -> User: # Route query to correct shard shard_num = hash(user_id) % 3 return query_shard(f"shard-{shard_num}", user_id)Reduce database load with intelligent caching.
Use HTTP cache headers for static data:
query GetUser($id: ID!) { user(id: $id) @cache(ttl: 3600) { # Cache for 1 hour id name email }}Configure HTTP caching:
# In load balancer or reverse proxycache-control: public, max-age=3600etag: "user-123-v1"Cache expensive query results:
import redisfrom functools import wraps
cache = redis.Redis(host='cache.example.com', port=6379)
def cached(ttl: int = 3600): def decorator(func): @wraps(func) def wrapper(*args, **kwargs): # Create cache key key = f"{func.__name__}:{args}:{kwargs}"
# Try cache cached_result = cache.get(key) if cached_result: return json.loads(cached_result)
# Execute query result = func(*args, **kwargs)
# Cache result cache.setex(key, ttl, json.dumps(result)) return result return wrapper return decorator
@fraiseql.query@cached(ttl=3600)def get_user(id: ID) -> User: # Expensive query executed only once per hour return ...# Cache expires after X seconds@cached(ttl=3600) # 1 hourdef get_user(id: ID) -> User: pass@fraiseql.mutationdef update_user(id: ID, name: str) -> User: result = update_db(id, name)
# Invalidate cache cache.delete(f"get_user:{id}")
# Notify subscribers pubsub.publish(f"user:{id}:updated", result)
return result# Cache depends on tag@cached(tags=["user:123"])def get_user(id: ID) -> User: pass
# Invalidate all queries tagged "user:123"cache.delete_by_tag("user:123")Cache hit rate: (hits) / (hits + misses)Target: > 80% for high-traffic endpointsExample: 8000 hits, 200 misses = 97.5% hit rate
Cache size: Total data in cacheTarget: < 80% of available memoryServe global traffic with multiple regions.
Traffic is routed to the nearest region based on user location. Each region has its own database and cache, with data replication strategies:
# Route 53 weighted routing# 50% traffic to us-east-1# 50% traffic to eu-west-1
aws route53 change-resource-record-sets \ --hosted-zone-id Z123 \ --change-batch '{...}'# Kubefed for multi-cluster orchestrationkubefedctl join cluster-eu --host-cluster-context=hostkubefedctl join cluster-asia --host-cluster-context=host
# Replicate service across clusterskubectl apply -f - <<EOFapiVersion: types.kubefed.io/v1beta1kind: FederatedDeploymentmetadata: name: fraiseqlspec: template: ... placement: clusterNames: - cluster-eu - cluster-asiaEOF# Cloud Load Balancing for global routinggcloud compute backend-services create fraiseql-global \ --global \ --health-checks=health-check \ --load-balancing-scheme=EXTERNAL
gcloud compute backend-services add-backend fraiseql-global \ --instance-group=us-central1-ig \ --instance-group-zone=us-central1-a \ --global# Use Apache Bench, wrk, or k6 to load test
# Start at low load and increase# Load = 100 RPS, 500 RPS, 1000 RPS, ...# Measure: response time, error rate, resource usage
# Scaling = good when:# - Response time stays constant as load increases# - Error rate stays < 0.1%# - CPU/memory scale linearly with loadLoad test with k6:
import http from 'k6/http';import { check, sleep } from 'k6';
export let options = { vus: 100, // 100 virtual users duration: '5m', // 5 minute test};
export default function () { let res = http.post('http://api.example.com/graphql', { query: 'query { users(limit: 50) { id name } }' });
check(res, { 'is status 200': (r) => r.status === 200, 'response time < 500ms': (r) => r.timings.duration < 500, });
sleep(1);}Run:
k6 run load-test.js# Output:# ✓ is status 200# ✓ response time < 500ms# Average response time: 145ms# 99th percentile: 280msGiven:- Current traffic: 5000 RPS- Current response time: 100ms (acceptable)- Target growth: 2x per year- Max acceptable response time: 500ms
Calculate:- Breaking point (where response time > 500ms): ~20,000 RPS- Time until breaking point: 6 months (at 2x growth)- Required capacity: 30,000 RPS (1.5x breaking point)- Instances needed: 30,000 RPS ÷ 1000 RPS/instance = 30 instances- Cost: 30 instances × $100/month = $3000/month
Plan:- Month 1-3: 10 instances ($1000)- Month 4-6: 20 instances ($2000)- Month 7-9: 30 instances ($3000)- Add monitoring alert at 80% capacityReserved Instances (1-3 year commitment):
Pay on-demand: 100 instances × $100/month = $10,000Pay reserved (1yr): 100 instances × $50/month = $5,000Annual savings: $60,000 (50% reduction)Spot/Preemptible Instances (for fault-tolerant workloads):
On-demand: $100/month/instanceSpot (AWS): $30/month/instance (can be interrupted)Preemptible (GCP): $25/month/instance (24 hour max)
Use mix: 70% spot + 30% on-demandAverage: (0.7 × $30) + (0.3 × $100) = $51/instanceSavings: 49% reductionRead Replicas for analytics:
Without replicas:- Primary: 10,000 RPS (expensive)- Load: 7000 app reads + 3000 analytics reads
With replicas:- Primary: 7,000 RPS (cheaper)- Analytics replica: 3,000 RPS (cheaper)- Total cost: 30-40% reductionStorage tier optimization:
Hot data (last 30 days): SSD storage ($0.10/GB/month)Warm data (30-90 days): HDD storage ($0.05/GB/month)Cold data (>90 days): Archive storage ($0.01/GB/month)
Cost reduction: 50-90% for rarely accessed dataKey metrics to track:
Availability├── Uptime: Target 99.99% (4.3 min downtime/month)├── Error rate: Target < 0.1%└── Latency: p50 < 100ms, p99 < 500ms
Scaling├── Auto-scale time: < 60 seconds to add new instance├── Scale-up efficiency: Response time improves with more capacity└── Scale-down safety: Doesn't over-scale and waste money
Resource efficiency├── CPU utilization: 60-70% (not too high, not too idle)├── Memory utilization: 70-80%├── Database connections: < 80% of pool size└── Cache hit rate: > 80%
Cost├── Cost per request: Should decrease as you scale├── Cost per RPS: Should stabilize or decrease└── ROI: Revenue growth > Cost growthSet up alerts:
- Alert: Scale-out failure Condition: Desired capacity > actual capacity for 5 minutes Action: Page on-call engineer
- Alert: Auto-scaling thrashing Condition: Scale up then down more than 3× in 1 hour Action: Review auto-scale policies (cooldown might be too short)
- Alert: Cache degradation Condition: Cache hit rate < 70% Action: Increase cache size or adjust TTL
- Alert: Database overload Condition: Connection pool > 90% utilized Action: Add read replicas or optimize slow queriesLoad Testing
Run load tests against your deployment to find breaking points before production. Performance Guide
AWS Auto-Scaling
Configure ECS service auto-scaling and Application Load Balancer on AWS. AWS Guide
Monitoring
Set up Prometheus metrics and alerting rules to monitor scaling behavior. Deployment Overview
Troubleshooting
Diagnose connection pool exhaustion and other scaling-related issues. Troubleshooting Guide