Load Testing
Run load tests against your deployment to find breaking points before production. Performance Guide
Scale FraiseQL to handle millions of requests with horizontal scaling, smart caching, and database optimization.
FraiseQL scales in four phases:
Simplest production setup: 3+ stateless FraiseQL instances behind a load balancer.
All major platforms support load balancing to multiple FraiseQL instances. Each instance is stateless and connects to the same shared database.
All three transports — GraphQL, REST, and gRPC — scale identically because they are served by the same binary on port 8080. Adding a replica adds capacity for all transports simultaneously. The only infrastructure difference is that gRPC requires HTTP/2 between the client and the load balancer; see the Kubernetes, AWS, GCP, and Azure deployment guides for load-balancer-specific configuration.
Health check configuration (all platforms):
Endpoint: /healthInterval: 30 secondsTimeout: 5 secondsUnhealthy threshold: 3 failuresHealthy threshold: 2 successesAssume:- 1 instance = 1000 RPS capacity- 3 instances = 3000 RPS capacity
Traffic growth:- Month 1: 1000 RPS (1 instance)- Month 2: 2000 RPS (2 instances)- Month 3: 5000 RPS (5 instances)- Month 6: 15000 RPS (15 instances)- Month 12: 100000 RPS (100 instances)Automatically scale instances based on demand.
CPU Utilization (simplest, most common)
Scale up when: Average CPU > 70%Scale down when: Average CPU < 30%Cooldown: 5 min up, 15 min downMemory Utilization (for memory-intensive queries)
Scale up when: Average Memory > 80%Scale down when: Average Memory < 50%Request Count (most accurate for API)
Scale up when: Requests/sec > 5000Scale down when: Requests/sec < 2000Custom Metrics (database queue depth, cache hit rate)
Scale up when: Database pool > 80% utilizedScale down when: Database pool < 40% utilizedMinSize: 3MaxSize: 100DesiredCapacity: 3TargetTrackingScalingPolicies: - TargetValue: 0.70 # Target 70% CPU PredefinedMetric: ASGAverageCPUUtilization ScaleOutCooldown: 60s ScaleInCooldown: 300sminReplicas: 3maxReplicas: 100targetCPUUtilizationPercentage: 70targetMemoryUtilizationPercentage: 80behavior: scaleUp: stabilizationWindowSeconds: 0 policies: - type: Percent value: 100 # Double replicas periodSeconds: 30 scaleDown: stabilizationWindowSeconds: 300 policies: - type: Percent value: 50 # Half replicas periodSeconds: 60# Scale out: +1 instance when CPU > 70%# Scale in: -1 instance when CPU < 30%# Min: 3 instances, Max: 100 instances# Automatic scaling based on request concurrency# Max concurrent requests per instance: 80 (default)# Scale up: +50 instances if queue > 0# Scale down: -1 instance per minute# Max instances: 1000 (configurable)aws autoscaling describe-scaling-activities \ --auto-scaling-group-name fraiseql-asgkubectl get hpa fraiseql --watchaz monitor autoscale history list \ --resource-group mygroup \ --resource fraiseqlAs traffic grows, the database becomes the bottleneck.
Limit connections to prevent database overload:
Without pooling, 1000 instances at 20 connections each would require 20,000 database connections — well beyond what most databases support (typically capped at ~5,000).
With PgBouncer, 50 instances with a minimum pool of 5 results in only 250 connections to the database.
Configure FraiseQL’s own connection pool in fraiseql.toml:
[database]url = "postgresql://user:pass@host:5432/dbname"pool_min = 5 # Minimum connections per instancepool_max = 20 # Maximum connections per instanceDistribute read traffic across replicas. Write queries go to the primary; read queries are spread across replicas.
Setup (AWS RDS):
# Create 3 read replicasfor i in {1..3}; do aws rds create-db-instance-read-replica \ --db-instance-identifier fraiseql-read-$i \ --source-db-instance-identifier fraiseql-proddone
# Configure FraiseQL to read from replicasDATABASE_URL_PRIMARY=postgresql://user:pass@fraiseql-prod:5432/dbDATABASE_URL_REPLICA=postgresql://user:pass@fraiseql-read-1:5432/dbIdentify slow queries:
-- PostgreSQL: Enable slow query loggingALTER SYSTEM SET log_min_duration_statement = 500; -- Log queries > 500msSELECT pg_reload_conf();
-- View slow queriesSELECT query, calls, mean_exec_time, max_exec_timeFROM pg_stat_statementsORDER BY mean_exec_time DESCLIMIT 10;Add indexes:
-- Find missing indexesEXPLAIN ANALYZESELECT * FROM tb_user WHERE identifier = 'user@example.com';
-- If seq scan, add indexCREATE INDEX idx_tb_user_identifier ON tb_user(identifier);
-- Composite index for common filtersCREATE INDEX idx_tb_post_published ON tb_post(fk_user, is_published) WHERE is_published = true;Optimize N+1 queries:
Use query profiling to identify N+1 problems:
-- If you see many queries per request, something may be wrong-- Use pg_stat_statements to identify repeated patterns
-- Before (N+1):SELECT id, data FROM v_user LIMIT 10; -- 1 querySELECT id, data FROM v_post WHERE fk_user = ? -- repeated 10×-- Total: 11 queries
-- After (batched):SELECT id, data FROM v_user LIMIT 10; -- 1 querySELECT id, data FROM v_post WHERE fk_user IN (?, ?, ...) -- 1 query-- Total: 2 queriesFraiseQL’s Rust engine operates against your PostgreSQL views — ensure your views and indexes are designed to support set-based lookups.
For massive scale (millions of users), FraiseQL does not currently provide first-class sharding support. The recommended approach is to deploy separate FraiseQL instances each pointing to an independent database shard:
shard-1.db.example.com — FraiseQL instance Ashard-2.db.example.com — FraiseQL instance Bshard-3.db.example.com — FraiseQL instance CRoute requests at the load balancer or API gateway layer based on the shard key.
Each shard has its own fraiseql.toml pointing to its own [database] URL.
Reduce database load with intelligent caching.
Use HTTP cache headers in your reverse proxy or load balancer for static data:
# In nginx or Caddycache-control: public, max-age=3600etag: "user-123-v1"FraiseQL’s caching is configured via fraiseql.toml — FraiseQL is a Rust binary, and Python is only used at compile time to define the schema. There is no Python runtime to write cache logic in. Enable the Redis backend in your TOML:
[caching]enabled = truebackend = "redis"redis_url = "redis://cache.example.com:6379"Cache TTL is specified at the query level in your Python schema file (compile-time only):
@fraiseql.querydef get_user(id: ID) -> User | None: return fraiseql.config(sql_source="v_user", cache_ttl_seconds=3600)Cache invalidation is handled through the observers system in fraiseql.toml. When a mutation runs, FraiseQL publishes events to the configured observer backend, which triggers cache invalidation for related queries:
[observers]backend = "nats"nats_url = "nats://nats-server:4222"Cache hit rate: (hits) / (hits + misses)Target: > 80% for high-traffic endpointsExample: 8000 hits, 200 misses = 97.5% hit rate
Cache size: Total data in cacheTarget: < 80% of available memoryServe global traffic with multiple regions.
Traffic is routed to the nearest region based on user location. Each region has its own database and cache, with data replication strategies:
# Route 53 weighted routing# 50% traffic to us-east-1# 50% traffic to eu-west-1
aws route53 change-resource-record-sets \ --hosted-zone-id Z123 \ --change-batch '{...}'# Kubefed for multi-cluster orchestrationkubefedctl join cluster-eu --host-cluster-context=hostkubefedctl join cluster-asia --host-cluster-context=host
# Replicate service across clusterskubectl apply -f - <<EOFapiVersion: types.kubefed.io/v1beta1kind: FederatedDeploymentmetadata: name: fraiseqlspec: template: ... placement: clusterNames: - cluster-eu - cluster-asiaEOF# Cloud Load Balancing for global routinggcloud compute backend-services create fraiseql-global \ --global \ --health-checks=health-check \ --load-balancing-scheme=EXTERNAL
gcloud compute backend-services add-backend fraiseql-global \ --instance-group=us-central1-ig \ --instance-group-zone=us-central1-a \ --global# Use Apache Bench, wrk, or k6 to load test
# Start at low load and increase# Load = 100 RPS, 500 RPS, 1000 RPS, ...# Measure: response time, error rate, resource usage
# Scaling = good when:# - Response time stays constant as load increases# - Error rate stays < 0.1%# - CPU/memory scale linearly with loadLoad test with k6:
import http from 'k6/http';import { check, sleep } from 'k6';
export let options = { vus: 100, // 100 virtual users duration: '5m', // 5 minute test};
export default function () { let res = http.post('http://api.example.com/graphql', { query: 'query { users(limit: 50) { id name } }' });
check(res, { 'is status 200': (r) => r.status === 200, 'response time < 500ms': (r) => r.timings.duration < 500, });
sleep(1);}Run:
k6 run load-test.js# Output:# ✓ is status 200# ✓ response time < 500ms# Average response time: 145ms# 99th percentile: 280msGiven:- Current traffic: 5000 RPS- Current response time: 100ms (acceptable)- Target growth: 2x per year- Max acceptable response time: 500ms
Calculate:- Breaking point (where response time > 500ms): ~20,000 RPS- Time until breaking point: 6 months (at 2x growth)- Required capacity: 30,000 RPS (1.5x breaking point)- Instances needed: 30,000 RPS ÷ 1000 RPS/instance = 30 instances- Cost: 30 instances × $100/month = $3000/month
Plan:- Month 1-3: 10 instances ($1000)- Month 4-6: 20 instances ($2000)- Month 7-9: 30 instances ($3000)- Add monitoring alert at 80% capacityReserved Instances (1-3 year commitment):
Pay on-demand: 100 instances × $100/month = $10,000Pay reserved (1yr): 100 instances × $50/month = $5,000Annual savings: $60,000 (50% reduction)Spot/Preemptible Instances (for fault-tolerant workloads):
On-demand: $100/month/instanceSpot (AWS): $30/month/instance (can be interrupted)Preemptible (GCP): $25/month/instance (24 hour max)
Use mix: 70% spot + 30% on-demandAverage: (0.7 × $30) + (0.3 × $100) = $51/instanceSavings: 49% reductionRead Replicas for analytics:
Without replicas:- Primary: 10,000 RPS (expensive)- Load: 7000 app reads + 3000 analytics reads
With replicas:- Primary: 7,000 RPS (cheaper)- Analytics replica: 3,000 RPS (cheaper)- Total cost: 30-40% reductionStorage tier optimization:
Hot data (last 30 days): SSD storage ($0.10/GB/month)Warm data (30-90 days): HDD storage ($0.05/GB/month)Cold data (>90 days): Archive storage ($0.01/GB/month)
Cost reduction: 50-90% for rarely accessed dataKey metrics to track:
Availability├── Uptime: Target 99.99% (4.3 min downtime/month)├── Error rate: Target < 0.1%└── Latency: p50 < 100ms, p99 < 500ms
Scaling├── Auto-scale time: < 60 seconds to add new instance├── Scale-up efficiency: Response time improves with more capacity└── Scale-down safety: Doesn't over-scale and waste money
Resource efficiency├── CPU utilization: 60-70% (not too high, not too idle)├── Memory utilization: 70-80%├── Database connections: < 80% of pool size└── Cache hit rate: > 80%
Cost├── Cost per request: Should decrease as you scale├── Cost per RPS: Should stabilize or decrease└── ROI: Revenue growth > Cost growthSet up alerts:
- Alert: Scale-out failure Condition: Desired capacity > actual capacity for 5 minutes Action: Page on-call engineer
- Alert: Auto-scaling thrashing Condition: Scale up then down more than 3× in 1 hour Action: Review auto-scale policies (cooldown might be too short)
- Alert: Cache degradation Condition: Cache hit rate < 70% Action: Increase cache size or adjust TTL
- Alert: Database overload Condition: Connection pool > 90% utilized Action: Add read replicas or optimize slow queriesLoad Testing
Run load tests against your deployment to find breaking points before production. Performance Guide
AWS Auto-Scaling
Configure ECS service auto-scaling and Application Load Balancer on AWS. AWS Guide
Monitoring
Set up Prometheus metrics and alerting rules to monitor scaling behavior. Deployment Overview
Troubleshooting
Diagnose connection pool exhaustion and other scaling-related issues. Troubleshooting Guide