Skip to content

Metrics reference

All services expose Prometheus metrics on a dedicated port configured via metrics.listen.

Edge Gateway metrics

Request metrics

MetricTypeLabelsDescription
eg_requests_totalcounterhost, status, sourceTotal requests processed
eg_request_duration_secondshistogramhost, sourceRequest processing time

Cache metrics

MetricTypeLabelsDescription
eg_cache_hits_totalcounterhost, dimensionCache hit count
eg_cache_misses_totalcounterhost, dimensionCache miss count
eg_cache_stale_totalcounterhost, dimensionStale cache served count

Render metrics

MetricTypeLabelsDescription
eg_render_requests_totalcounterhost, statusRender requests sent
eg_render_duration_secondshistogramhostRender request duration
eg_render_errors_totalcounterhost, error_typeRender errors

Bypass metrics

MetricTypeLabelsDescription
eg_bypass_requests_totalcounterhost, statusBypass requests
eg_bypass_cache_hits_totalcounterhostBypass cache hits

Render Service metrics

Chrome pool metrics

MetricTypeLabelsDescription
rs_chrome_pool_sizegaugeConfigured pool size
rs_chrome_pool_activegaugeCurrently active instances
rs_chrome_pool_availablegaugeAvailable instances

Render metrics

MetricTypeLabelsDescription
rs_render_totalcounterstatusTotal renders
rs_render_duration_secondshistogramRender duration
rs_render_errors_totalcountererror_typeRender errors

Instance lifecycle

MetricTypeLabelsDescription
rs_chrome_restarts_totalcounterreasonChrome instance restarts
rs_chrome_requests_per_instancehistogramRequests before restart

Cache Daemon metrics

Queue metrics

MetricTypeLabelsDescription
cd_queue_sizegaugehost, priorityQueue size by priority
cd_queue_due_nowgaugehost, priorityEntries ready to process

Processing metrics

MetricTypeLabelsDescription
cd_recache_totalcounterhost, statusRecache operations
cd_recache_duration_secondshistogramhostRecache duration

Grafana dashboard queries

Request rate

promql
rate(eg_requests_total[5m])

Cache hit ratio

promql
sum(rate(eg_cache_hits_total[5m])) / sum(rate(eg_cache_hits_total[5m]) + rate(eg_cache_misses_total[5m]))

Chrome pool utilization

promql
rs_chrome_pool_active / rs_chrome_pool_size

P99 render latency

promql
histogram_quantile(0.99, rate(rs_render_duration_seconds_bucket[5m]))

Alert examples

High error rate

yaml
alert: HighRenderErrorRate
expr: rate(eg_render_errors_total[5m]) / rate(eg_render_requests_total[5m]) > 0.05
for: 5m

Chrome pool exhaustion

yaml
alert: ChromePoolExhausted
expr: rs_chrome_pool_available == 0
for: 1m

Low cache hit rate

yaml
alert: LowCacheHitRate
expr: sum(rate(eg_cache_hits_total[5m])) / sum(rate(eg_cache_hits_total[5m]) + rate(eg_cache_misses_total[5m])) < 0.5
for: 10m