Deployment
Deployment — Production monitoring setup
FraiseQL provides comprehensive observability through metrics, distributed tracing, and structured logging.
FraiseQL exposes a /metrics endpoint in Prometheus format. The metric names and label schema are:
# No TOML config needed — metrics endpoint is always available at /metricscurl http://localhost:8080/metrics| Metric | Type | Description |
|---|---|---|
fraiseql_http_requests_total | Counter | Total HTTP requests |
fraiseql_http_responses_2xx | Counter | Total 2xx HTTP responses |
fraiseql_http_responses_4xx | Counter | Total 4xx HTTP responses |
fraiseql_http_responses_5xx | Counter | Total 5xx HTTP responses |
| Metric | Type | Description |
|---|---|---|
fraiseql_graphql_queries_total | Counter | Total GraphQL queries executed |
fraiseql_graphql_queries_success | Counter | Total successful GraphQL queries |
fraiseql_graphql_queries_error | Counter | Total failed GraphQL queries |
fraiseql_graphql_query_duration_ms | Gauge | Average query execution time in milliseconds |
fraiseql_validation_errors_total | Counter | Total validation errors |
fraiseql_parse_errors_total | Counter | Total parse errors |
fraiseql_execution_errors_total | Counter | Total execution errors |
| Metric | Type | Description |
|---|---|---|
fraiseql_database_queries_total | Counter | Total database queries executed |
fraiseql_database_query_duration_ms | Gauge | Average database query time in milliseconds |
FraiseQL monitors connection pool pressure and emits scaling recommendations. The pool does not resize itself at runtime (the underlying library has no resize API) — use these metrics to decide when to raise pool_max in fraiseql.toml and restart.
| Metric | Type | Description |
|---|---|---|
fraiseql_pool_tuning_size | Gauge | Current configured pool size |
fraiseql_pool_tuning_queue_depth | Gauge | Pending connection requests in the pool queue |
fraiseql_pool_tuning_adjustments_total | Counter | Scaling recommendations emitted (not actual resizes) |
| Metric | Type | Description |
|---|---|---|
fraiseql_cache_hits | Counter | Total cache hits |
fraiseql_cache_misses | Counter | Total cache misses |
fraiseql_cache_hit_ratio | Gauge | Cache hit ratio (0–1) |
| Metric | Type | Description |
|---|---|---|
fraiseql_apq_hits_total | Counter | Automatic Persisted Query cache hits |
fraiseql_apq_misses_total | Counter | Automatic Persisted Query cache misses |
fraiseql_apq_stored_total | Counter | Automatic Persisted Queries stored |
fraiseql_apq_redis_errors_total | Counter | Redis errors in APQ store (fail-open; only present with redis-apq feature) |
fraiseql_ws_connections_total | Counter | WebSocket connection attempts (labeled by result) |
fraiseql_ws_subscriptions_total | Counter | WebSocket subscription attempts (labeled by result) |
fraiseql_trusted_documents_hits_total | Counter | Trusted document cache hits |
fraiseql_trusted_documents_misses_total | Counter | Trusted document cache misses |
fraiseql_trusted_documents_rejected_total | Counter | Rejected untrusted documents |
fraiseql_pkce_redis_errors_total | Counter | Redis errors in PKCE state store (fail-open; only present with redis-pkce feature) |
fraiseql_rate_limit_redis_errors_total | Counter | Redis errors in rate limiter (fail-open; only present with redis-rate-limiting feature) |
fraiseql_multi_root_queries_total | Counter | Multi-root GraphQL queries executed in parallel |
fraiseql_observer_dlq_overflow_total | Counter | Observer DLQ entries dropped due to max_dlq_size cap |
fraiseql_schema_reloads_total | Counter | Successful schema hot-reloads (via admin API or SIGUSR1) |
fraiseql_schema_reload_errors_total | Counter | Failed schema reload attempts |
Transport-level metrics are emitted alongside per-query metrics:
fraiseql_rest_requests_total — REST request countfraiseql_grpc_requests_total — gRPC request countfraiseql_query_* metrics include a transport label (graphql, rest, grpc, websocket)Common labels across metrics:
| Label | Description |
|---|---|
operation | GraphQL operation name |
type | query, mutation, subscription |
status | success, error |
error_code | Error code if failed |
transport | graphql, rest, grpc, websocket |
FraiseQL ships a pre-built 12-panel Grafana 10+ dashboard covering latency percentiles, connection pool health, cache stats, and error rates. Fetch it at runtime:
curl http://localhost:8080/api/v1/admin/grafana-dashboard > fraiseql-dashboard.jsonImport the JSON into Grafana (Dashboards → Import). The dashboard wires directly to your Prometheus datasource with no manual panel configuration.
Example PromQL queries for custom panels:
# Request raterate(fraiseql_http_requests_total[5m])
# Average query duration (milliseconds gauge)fraiseql_graphql_query_duration_ms
# Error raterate(fraiseql_graphql_queries_error[5m]) / rate(fraiseql_graphql_queries_total[5m])
# Cache hit ratio (use the pre-computed gauge)fraiseql_cache_hit_ratio
# 5xx server error raterate(fraiseql_http_responses_5xx[5m])
# Connection pool queue depth (alert when sustained > 0 → increase pool_max)fraiseql_pool_tuning_queue_depthFraiseQL exposes admin endpoints under /api/v1/admin/ for operational tooling. These require an authenticated request (admin role or equivalent policy configured via [security]).
POST /api/v1/admin/explain — runs EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) for a named query against the connected database and returns the full plan alongside the generated SQL.
Request body:
{ "query_name": "posts", "parameters": { "limit": 10, "status": "published" }}Response:
{ "query_name": "posts", "sql_source": "v_post", "generated_sql": "SELECT data FROM v_post WHERE ...", "parameters": ["published", 10], "explain_output": [ ... ]}Use this endpoint to understand query plans, verify index usage, and diagnose slow queries without needing direct database access.
GET /api/v1/admin/grafana-dashboard — returns the pre-built Grafana dashboard JSON described above.
OpenTelemetry tracing is compiled into the server by default. When no endpoint is configured, there is zero overhead — no gRPC connection attempt occurs.
Configure via [tracing] in fraiseql.toml or standard OpenTelemetry environment variables (env vars act as fallbacks when TOML fields are omitted):
[tracing]service_name = "fraiseql-api"otlp_endpoint = "http://otel-collector:4317"otlp_export_timeout_secs = 10OTEL_SERVICE_NAME=fraiseql-apiOTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317OTEL_EXPORTER_OTLP_PROTOCOL=grpc # or http/protobufOTEL_TRACES_SAMPLER=traceidratioOTEL_TRACES_SAMPLER_ARG=0.1 # sample 10% of requestsFraiseQL creates spans for each request with attributes including:
| Attribute | Description |
|---|---|
graphql.operation.name | Operation name |
graphql.operation.type | query/mutation/subscription |
graphql.document | Query document (if enabled) |
db.system | Database type |
db.statement | SQL query (if enabled) |
db.operation | SELECT/INSERT/UPDATE/DELETE |
user.id | Authenticated user ID |
tenant.id | Tenant ID |
Spans include a transport attribute (graphql, rest, grpc, websocket) to distinguish traffic sources in your tracing backend.
FraiseQL propagates trace context via headers:
traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01tracestate: fraiseql=user:123OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317OTEL_EXPORTER_OTLP_PROTOCOL=grpcOTEL_SERVICE_NAME=fraiseql-apiJaeger supports OTLP natively (v1.35+). Point the OTLP exporter at Jaeger’s OTLP receiver:
OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4317OTEL_SERVICE_NAME=fraiseql-apiOTEL_TRACES_EXPORTER=zipkinOTEL_EXPORTER_ZIPKIN_ENDPOINT=http://zipkin:9411/api/v2/spansOTEL_SERVICE_NAME=fraiseql-apiLogging is configured via environment variables:
# Log levelRUST_LOG=info # error | warn | info | debug | traceRUST_LOG=fraiseql=debug,info # per-crate level (fraiseql at debug, everything else at info)
# Log format (JSON output for production log aggregators)FRAISEQL_LOG_FORMAT=json # json | pretty (default: pretty in dev, json in prod){ "timestamp": "2024-01-15T10:30:00.123Z", "level": "INFO", "target": "fraiseql_server::graphql", "message": "Query executed", "span": { "request_id": "abc-123", "user_id": "user-456" }, "fields": { "operation": "getUser", "duration_ms": 45, "cache_hit": true }}Use the standard RUST_LOG directive syntax for per-crate levels:
RUST_LOG=fraiseql_server=info,fraiseql_core::cache=debug,fraiseql_core::db=warn,tower_http=debug,sqlx=warnFraiseQL exposes health endpoints automatically — no configuration required:
Basic health:
curl http://localhost:8080/health# {"status": "ok"}Detailed health:
curl http://localhost:8080/health/detailed{ "status": "ok", "checks": { "database": { "status": "ok", "latency_ms": 2 }, "cache": { "status": "ok", "size": 1500, "max_size": 10000 }, "schema": { "status": "ok", "version": "1.0.0", "loaded_at": "2024-01-15T10:00:00Z" } }, "version": "2.0.0", "uptime_seconds": 3600}livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 5 periodSeconds: 10
readinessProbe: httpGet: path: /health/detailed port: 8080 initialDelaySeconds: 5 periodSeconds: 5Example Prometheus alerting rules:
groups: - name: fraiseql rules: - alert: HighErrorRate expr: | rate(fraiseql_graphql_queries_error[5m]) / rate(fraiseql_graphql_queries_total[5m]) > 0.05 for: 5m labels: severity: warning annotations: summary: "High GraphQL error rate"
- alert: HighQueryDuration expr: | fraiseql_graphql_query_duration_ms > 1000 for: 5m labels: severity: warning annotations: summary: "Average query duration above 1 second"
- alert: LowCacheHitRate expr: | fraiseql_cache_hit_ratio < 0.5 for: 15m labels: severity: info annotations: summary: "Cache hit rate below 50%"
- alert: High5xxRate expr: | rate(fraiseql_http_responses_5xx[5m]) > 1 for: 5m labels: severity: critical annotations: summary: "Elevated 5xx server error rate"OTEL_TRACES_SAMPLER=traceidratioOTEL_TRACES_SAMPLER_ARG=0.1 # sample 10% of requestsFor error-focused sampling, configure your OTel collector or use a tail-based sampler (e.g., Grafana Tempo, OpenTelemetry Collector’s tailsampling processor) to keep 100% of error traces while downsampling success traces.
Log rotation is handled by your container runtime or log aggregator (Fluent Bit, Logstash, Loki, etc.) — not by FraiseQL. Write logs to stdout and let your infrastructure handle rotation and retention.
FraiseQL does not include user_id or request_id as metric labels by default — these high-cardinality values are kept in structured logs and traces instead, where cardinality is not a concern.
/metrics endpoint is reachable from your Prometheus instancefraiseql run with no errors)OTEL_EXPORTER_OTLP_ENDPOINT is set and reachableOTEL_TRACES_SAMPLER_ARG — a low value (e.g., 0.001) may drop traces in low-traffic testsRUST_LOG level per component (e.g., RUST_LOG=warn,fraiseql=info)Start FraiseQL — metrics are always available, no config needed:
fraiseql runTest metrics endpoint:
curl http://localhost:8080/metricsExpected output (partial):
# HELP fraiseql_http_requests_total Total HTTP requests# TYPE fraiseql_http_requests_total counterfraiseql_http_requests_total 42
# HELP fraiseql_graphql_queries_total Total GraphQL queries executed# TYPE fraiseql_graphql_queries_total counterfraiseql_graphql_queries_total 38
# HELP fraiseql_cache_hit_ratio Cache hit ratio (0-1)# TYPE fraiseql_cache_hit_ratio gaugefraiseql_cache_hit_ratio 0.92Execute some queries to generate metrics:
# Run a few queriesfor i in {1..10}; do curl -s -X POST http://localhost:8080/graphql \ -H "Content-Type: application/json" \ -d '{"query": "{ __typename }"}' > /dev/nulldoneVerify metrics updated:
curl http://localhost:8080/metrics | grep fraiseql_graphql_queries_totalExpected output:
fraiseql_graphql_queries_total 48Test health endpoints:
# Basic healthcurl http://localhost:8080/healthExpected output:
{"status": "ok"}# Detailed healthcurl http://localhost:8080/health/detailedExpected output:
{ "status": "ok", "checks": { "database": { "status": "ok", "latency_ms": 2 }, "schema": { "status": "ok", "version": "1.0.0" } }, "version": "2.0.0", "uptime_seconds": 3600}Verify structured logging:
# Make a request and check logscurl -s -X POST http://localhost:8080/graphql \ -H "Content-Type: application/json" \ -d '{"query": "{ me { id } }"}'Check FraiseQL stdout for JSON log lines:
{ "timestamp": "2024-01-15T10:30:00.123Z", "level": "INFO", "target": "fraiseql_server::graphql", "message": "Query executed", "span": { "request_id": "abc-123", "user_id": "user-456" }, "fields": { "operation": "me", "duration_ms": 12, "cache_hit": false }}Test OpenTelemetry tracing (if enabled):
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 \OTEL_SERVICE_NAME=fraiseql-api \fraiseql runAfter making requests, check your tracing backend (Jaeger, Zipkin, etc.) for spans.
If /metrics returns empty or connection refused:
Check FraiseQL started successfully:
fraiseql run 2>&1 | head -20Verify the metrics endpoint is on port 8080 (same port as GraphQL):
curl http://localhost:8080/metricsIf database connection metrics are absent:
Verify database connectivity:
curl http://localhost:8080/health/detailed | jq .checks.databaseCheck [database] pool configuration in fraiseql.toml
FraiseQL does not include user_id or request_id as metric labels by default — no configuration needed.
If traces don’t show up in your backend:
Verify OTLP endpoint is reachable from the FraiseQL process:
curl http://otel-collector:4317Set OTEL_TRACES_SAMPLER_ARG=1.0 temporarily to sample 100% for testing
Ensure trace context propagation:
curl -H "traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01" \ http://localhost:8080/graphqlDeployment
Deployment — Production monitoring setup
Performance
Performance — Using metrics to optimize
Troubleshooting
Troubleshooting — Debugging with logs and traces