Prometheus Metrics
Litestream exposes metrics in Prometheus format for monitoring replication health and performance. Metrics are disabled by default and require enabling the HTTP server.
Enabling Metrics
Add an addr field to your configuration file to enable the metrics endpoint:
addr: ":9090"
Metrics will be available at http://localhost:9090/metrics.
Database Metrics
These metrics track the state and operations of each replicated database. All
database metrics include a db label containing the absolute path to the
database file.
litestream_db_size
Type: Gauge
The current size of the database file in bytes.
litestream_db_size{db="/var/lib/myapp.db"} 4194304
litestream_wal_size
Type: Gauge
The current size of the WAL (Write-Ahead Log) file in bytes.
litestream_wal_size{db="/var/lib/myapp.db"} 32768
litestream_total_wal_bytes
Type: Counter
Total number of bytes written to the shadow WAL since Litestream started. This metric is cumulative and only increases.
litestream_total_wal_bytes{db="/var/lib/myapp.db"} 1048576
litestream_txid
Type: Gauge
The current transaction ID (TXID) of the database. This value increases with each SQLite transaction.
litestream_txid{db="/var/lib/myapp.db"} 42
litestream_sync_count
Type: Counter
Number of sync operations performed. A sync operation copies WAL frames to the shadow WAL for replication.
litestream_sync_count{db="/var/lib/myapp.db"} 150
litestream_sync_error_count
Type: Counter
Number of sync errors that have occurred. Monitor this metric for replication health—any non-zero growth indicates sync failures.
litestream_sync_error_count{db="/var/lib/myapp.db"} 0
litestream_sync_seconds
Type: Counter
Cumulative time spent syncing shadow WAL, in seconds. Divide by litestream_sync_count
to get average sync duration.
litestream_sync_seconds{db="/var/lib/myapp.db"} 0.523
litestream_checkpoint_count
Type: Counter
Labels: db, mode
Number of checkpoint operations performed. The mode label indicates the
checkpoint type: passive or truncate.
litestream_checkpoint_count{db="/var/lib/myapp.db",mode="passive"} 60
litestream_checkpoint_count{db="/var/lib/myapp.db",mode="truncate"} 1
litestream_checkpoint_error_count
Type: Counter
Labels: db, mode
Number of checkpoint errors that have occurred, grouped by checkpoint mode.
litestream_checkpoint_error_count{db="/var/lib/myapp.db",mode="passive"} 0
litestream_checkpoint_error_count{db="/var/lib/myapp.db",mode="truncate"} 0
litestream_checkpoint_seconds
Type: Counter
Labels: db, mode
Cumulative time spent checkpointing, in seconds, grouped by checkpoint mode.
litestream_checkpoint_seconds{db="/var/lib/myapp.db",mode="passive"} 1.234
Replica Metrics
These metrics track operations performed against replicas (S3, GCS, Azure, etc.).
litestream_replica_operation_total
Type: Counter
Labels: replica_type, operation
The number of replica operations performed. The replica_type label indicates
the storage backend (e.g., s3, gcs, abs, file). The operation label
indicates the operation type.
litestream_replica_operation_total{replica_type="s3",operation="put"} 100
litestream_replica_operation_total{replica_type="s3",operation="get"} 25
litestream_replica_operation_total{replica_type="s3",operation="delete"} 10
litestream_replica_operation_bytes
Type: Counter
Labels: replica_type, operation
The number of bytes transferred by replica operations.
litestream_replica_operation_bytes{replica_type="s3",operation="put"} 52428800
litestream_replica_operation_bytes{replica_type="s3",operation="get"} 4194304
Example Prometheus Queries
Replication Lag
Calculate the average sync duration over the last 5 minutes:
rate(litestream_sync_seconds[5m]) / rate(litestream_sync_count[5m])
Sync Error Rate
Calculate the sync error rate:
rate(litestream_sync_error_count[5m])
Data Transfer Rate
Calculate bytes uploaded to S3 per second:
rate(litestream_replica_operation_bytes{operation="put"}[5m])
Database Growth
Track database size growth:
litestream_db_size
Alerting Examples
Example Prometheus alerting rules:
groups:
- name: litestream
rules:
- alert: LitestreamSyncErrors
expr: increase(litestream_sync_error_count[5m]) > 0
for: 1m
labels:
severity: warning
annotations:
summary: "Litestream sync errors detected"
description: "Database {{ $labels.db }} has sync errors"
- alert: LitestreamReplicationStopped
expr: increase(litestream_sync_count[10m]) == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Litestream replication appears stopped"
description: "No sync operations for database {{ $labels.db }}"
Grafana Dashboard
A basic Grafana dashboard can visualize these metrics. Key panels to include:
- Database size over time (
litestream_db_size) - Sync operations per second (
rate(litestream_sync_count[1m])) - Sync error rate (
rate(litestream_sync_error_count[1m])) - Replica bytes transferred (
rate(litestream_replica_operation_bytes[1m])) - WAL size (
litestream_wal_size)