Troubleshooting
Configuration Issues
Invalid Configuration File
Error: yaml: unmarshal errors or cannot parse config
Solution: Validate your YAML syntax:
- Check indentation (use spaces, not tabs)
- Ensure proper nesting of configuration sections
- Validate string values are properly quoted when containing special characters
# ❌ Invalid - mixed tabs and spaces
dbs:
- path: /path/to/db.sqlite
replica:
url: s3://bucket/path
# ✅ Valid - consistent spacing
dbs:
- path: /path/to/db.sqlite
replica:
url: s3://bucket/path
Managing Credentials Securely
Properly securing credentials is critical for Litestream deployments. This section covers best practices for credential management across different deployment scenarios.
Best Practices
- Never commit credentials to version control — Use
.gitignoreto exclude configuration files containing sensitive data - Prefer environment variables — Litestream supports environment variable expansion in configuration files
- Use secret management systems — For production, use Kubernetes Secrets, Docker Secrets, or HashiCorp Vault
- Minimize credential exposure — Provide only the permissions needed for your use case (principle of least privilege)
- Rotate credentials regularly — Update access keys and secrets periodically
- Audit access — Monitor credential usage through cloud provider logs
Environment Variable Expansion
Litestream automatically expands $VAR and ${VAR} references in configuration files.
This is the simplest way to pass credentials without embedding them in files:
dbs:
- path: /var/lib/mydb.db
replica:
url: s3://mybucket/db
access-key-id: ${AWS_ACCESS_KEY_ID}
secret-access-key: ${AWS_SECRET_ACCESS_KEY}
# Set environment variables before running Litestream
export AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
litestream replicate
To disable environment variable expansion if it conflicts with your values:
litestream replicate -no-expand-env
Kubernetes Secrets
For Kubernetes deployments, mount credentials as environment variables from Secrets:
apiVersion: v1
kind: Secret
metadata:
name: litestream-aws-credentials
type: Opaque
stringData:
access-key-id: AKIAIOSFODNN7EXAMPLE
secret-access-key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
template:
spec:
containers:
- name: app
image: myapp:latest
env:
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: litestream-aws-credentials
key: access-key-id
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: litestream-aws-credentials
key: secret-access-key
volumeMounts:
- name: litestream-config
mountPath: /etc/litestream
readOnly: true
volumes:
- name: litestream-config
configMap:
name: litestream-config
---
apiVersion: v1
kind: ConfigMap
metadata:
name: litestream-config
data:
litestream.yml: |
dbs:
- path: /data/myapp.db
replica:
url: s3://mybucket/myapp
access-key-id: ${AWS_ACCESS_KEY_ID}
secret-access-key: ${AWS_SECRET_ACCESS_KEY}
For GCS with workload identity (recommended for Kubernetes on GKE):
apiVersion: v1
kind: ServiceAccount
metadata:
name: litestream-sa
annotations:
iam.gke.io/gcp-service-account: litestream@your-project.iam.gserviceaccount.com
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
template:
spec:
serviceAccountName: litestream-sa
containers:
- name: app
image: myapp:latest
env:
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /var/run/secrets/cloud.google.com/service_account/key.json
Docker Secrets
For Docker Swarm deployments, use Docker Secrets:
version: '3.8'
services:
myapp:
image: myapp:latest
environment:
AWS_ACCESS_KEY_ID_FILE: /run/secrets/aws_access_key_id
AWS_SECRET_ACCESS_KEY_FILE: /run/secrets/aws_secret_access_key
secrets:
- aws_access_key_id
- aws_secret_access_key
configs:
- source: litestream_config
target: /etc/litestream.yml
configs:
litestream_config:
file: ./litestream.yml
secrets:
aws_access_key_id:
external: true
aws_secret_access_key:
external: true
Then read these in your startup script:
#!/bin/sh
export AWS_ACCESS_KEY_ID=$(cat /run/secrets/aws_access_key_id)
export AWS_SECRET_ACCESS_KEY=$(cat /run/secrets/aws_secret_access_key)
exec litestream replicate
Azure with Managed Identity
For Azure deployments, use managed identity instead of shared keys:
# Pod with Azure managed identity
apiVersion: aad.banzaicloud.com/v1
kind: AzureIdentity
metadata:
name: litestream-identity
spec:
type: 0 # Managed Service Identity
resourceID: /subscriptions/{subscription}/resourcegroups/{resourcegroup}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/litestream
clientID: {client-id}
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
template:
metadata:
labels:
aadpodidbinding: litestream-identity
spec:
containers:
- name: app
image: myapp:latest
volumeMounts:
- name: litestream-config
mountPath: /etc/litestream
readOnly: true
volumes:
- name: litestream-config
configMap:
name: litestream-config
---
apiVersion: v1
kind: ConfigMap
metadata:
name: litestream-config
data:
litestream.yml: |
dbs:
- path: /data/myapp.db
replica:
url: abs://account@myaccount.blob.core.windows.net/container/db
# Managed identity authentication (no keys needed)
Credential Security Checklist
- ✅ Credentials stored in environment variables or secret management systems
- ✅ Configuration files never committed to version control with credentials
- ✅ Credentials have minimal required permissions
- ✅ Access is logged and auditable
- ✅ Credentials rotated on a regular schedule
- ✅ Development and production credentials are separate
- ✅ Database backup location is restricted to authorized users
- ✅ Network access to cloud storage is restricted to necessary services
Database Path Issues
Error: no such file or directory or database is locked
Solution:
- Ensure the database path exists and is accessible
- Check file permissions (Litestream needs read/write access)
- Verify the database isn’t being used by another process without proper
busy_timeout
-- Set busy timeout in your application
PRAGMA busy_timeout = 5000;
MCP Server Won’t Start
Error: bind: address already in use
Solution: Check if another process is using the MCP port:
# Check what's using port 3001
lsof -i :3001
# Use a different port in configuration
mcp-addr: ":3002"
Replication Issues
S3 Connection Failures
Error: NoCredentialsProviders or access denied
Solution:
-
Verify AWS credentials are properly configured:
# Check AWS credentials aws configure list -
Ensure IAM permissions include required S3 actions:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::your-bucket", "arn:aws:s3:::your-bucket/*" ] } ] }
NATS Connection Issues
Error: connection refused or authentication failed
Solution:
-
Verify NATS server is running and accessible:
# Test NATS connectivity nats server check --server nats://localhost:4222 -
Check authentication credentials:
dbs: - path: /path/to/db.sqlite replica: type: nats url: nats://localhost:4222/bucket # Use appropriate auth method username: user password: pass
Slow Replication
Symptoms: High lag between database changes and replica updates
Solution:
-
Check sync intervals in configuration:
dbs: - path: /path/to/db.sqlite # Reduce intervals for faster sync monitor-interval: 1s checkpoint-interval: 1m replica: sync-interval: 1s -
Monitor system resources (CPU, memory, network)
-
Consider using local file replica for testing performance
Database Issues
WAL Mode Problems
Error: database is not in WAL mode
Solution: Litestream requires WAL mode. Enable it in your application:
-- Enable WAL mode
PRAGMA journal_mode = WAL;
Or let Litestream enable it automatically by ensuring proper database permissions.
Database Locks
Error: database is locked or SQLITE_BUSY
Solution:
- Set busy timeout in your application (see above)
- Ensure no long-running transactions are blocking checkpoints
- Check for applications holding exclusive locks
WAL Growth and Checkpoint Blocking
Symptoms: WAL file growing excessively large or writes timing out
Solution:
-
Check if you have long-lived read transactions preventing checkpoints
-
Review checkpoint configuration in your config file
-
Consider disabling
truncate-page-nif you have long-running queries:dbs: - path: /path/to/db.sqlite truncate-page-n: 0 # Disable blocking checkpoints -
Monitor WAL file size and disk space
For detailed guidance on checkpoint configuration and trade-offs, see the WAL Truncate Threshold Configuration guide.
Corruption Detection
Error: database disk image is malformed
Solution:
-
Stop Litestream replication
-
Run SQLite integrity check:
sqlite3 /path/to/db.sqlite "PRAGMA integrity_check;" -
If corrupted, restore from latest backup:
litestream restore -o /path/to/recovered.db /path/to/db.sqlite
Performance Issues
High CPU Usage
Symptoms: Litestream consuming excessive CPU (100%+ sustained)
Common Causes:
- Unbounded WAL growth — Long-running read transactions blocking checkpoints
- State corruption — Tracking files mismatched with replica state
- Blocked checkpoints — Application holding read locks
Diagnosis:
# Check CPU usage over time
pidstat -p $(pgrep litestream) 1 5
# Check WAL file size (large WAL indicates checkpoint blocking)
ls -lh /path/to/db.sqlite-wal
# Check for blocking processes
sqlite3 /path/to/db.sqlite "PRAGMA wal_checkpoint(PASSIVE);"
# Result: status|log|checkpointed
# status=1 means checkpoint was blocked
Solutions:
-
Reduce monitoring frequency:
dbs: - path: /path/to/db.sqlite monitor-interval: 10s replica: # ... (other replica settings) sync-interval: 5m -
Fix blocked checkpoints — Kill long-running read connections in your application
-
Reset corrupted state — See Recovering from corrupted tracking state
Memory Issues
Symptoms: High memory usage or out-of-memory errors
Solution:
-
Monitor snapshot sizes and retention policies
-
Adjust retention settings:
snapshot: interval: 24h retention: 72h # Keep fewer snapshots
Network and Connectivity
Intermittent Network Failures
Error: connection reset by peer or timeout
Solution:
-
Adjust sync interval to reduce frequency of requests during outages:
dbs: - path: /path/to/db.sqlite replica: url: s3://bucket/path sync-interval: 10s -
Check network stability and firewall rules
-
Consider using regional endpoints for cloud storage
-
For production, use a configuration file to persist your settings (see Configuration Reference)
DNS Resolution Issues
Error: no such host or DNS timeouts
Solution:
-
Test DNS resolution:
nslookup s3.amazonaws.com -
Use IP addresses instead of hostnames if needed
-
Check
/etc/resolv.confconfiguration
Logging and Debugging
Enabling Debug Logging
Add debug logging to your configuration:
logging:
level: debug
type: text
stderr: true
Reading Logs
Common log locations:
- Linux systemd:
journalctl -u litestream - Docker:
docker logs container_name - Windows Service: Event Viewer → Application → Litestream
- Command Line: stdout/stderr
Important Log Messages
Look for these key messages:
initialized db: Database successfully loadedreplicating to: Replica configuration loadedsync error: Replication issuescheckpoint completed: Successful WAL checkpoint
Recovery and Restore
Point-in-Time Recovery
List available restore points:
litestream ltx /path/to/db.sqlite
Restore to specific time:
litestream restore -timestamp 2025-01-01T12:00:00Z -o restored.db /path/to/db.sqlite
Backup Validation
Verify backup integrity:
# Restore to temporary location
litestream restore -o /tmp/test.db /path/to/db.sqlite
# Run integrity check
sqlite3 /tmp/test.db "PRAGMA integrity_check;"
Operations That Invalidate Tracking State
Litestream maintains internal tracking state in .{filename}-litestream directories
(e.g., .db.sqlite-litestream for a database file named db.sqlite) to efficiently
replicate changes. Certain operations can corrupt or invalidate
this tracking, leading to high CPU usage, replication errors, or state mismatch
between local tracking and remote replicas.
Operations to avoid
| Operation | Why It’s Problematic | Safe Alternative |
|---|---|---|
In-place VACUUM |
Rewrites entire database, invalidating page tracking | Use VACUUM INTO 'new.db' |
| Manual checkpoint while Litestream is stopped | Large WAL changes database state without tracking | Let Litestream manage checkpoints |
Deleting .sqlite-litestream directory |
Creates local/remote state mismatch | Delete both local tracking AND remote replica |
| Restoring database while Litestream is running | Overwrites database without updating tracking | Stop Litestream before restore |
In-place VACUUM
The SQLite VACUUM command rewrites the entire database file. Litestream tracks
changes at the page level, so a full rewrite invalidates all tracking state.
-- Dangerous: Invalidates Litestream tracking
VACUUM;
-- Safe: Creates new file, preserves original
VACUUM INTO '/path/to/compacted.db';
If you must use in-place VACUUM:
- Stop Litestream
- Run
VACUUM - Delete the
.sqlite-litestreamtracking directory - Delete the remote replica data (start fresh)
- Restart Litestream
Symptoms of corrupted tracking state
-
High CPU usage (100%+) even when database is idle
-
Repeated log messages with identical transaction IDs
-
“timeout waiting for db initialization” warnings
-
Missing LTX file errors:
level=ERROR msg="monitor error" error="open .../ltx/0/0000000000000001.ltx: no such file or directory" -
Local/remote state mismatch:
level=INFO msg="detected database behind replica" db_txid=0000000000000000 replica_txid=0000000000000001
Recovering from Corrupted Tracking State
When Litestream’s tracking state becomes corrupted, a complete state reset is required. This procedure removes all local tracking and remote replica data, forcing a fresh snapshot.
Recovery procedure
# 1. Stop Litestream
sudo systemctl stop litestream
# 2. Kill any processes holding database connections
# (application-specific - check for zombie processes)
lsof /path/to/db.sqlite
# 3. Checkpoint the database to clear WAL
sqlite3 /path/to/db.sqlite "PRAGMA wal_checkpoint(TRUNCATE);"
# Verify: result should be "0|0|0" (success)
# 4. Remove local Litestream tracking
rm -rf /path/to/.db.sqlite-litestream
# 5. Remove remote replica data (start fresh)
# For S3:
aws s3 rm s3://bucket/path/db.sqlite --recursive
# For GCS:
gsutil rm -r gs://bucket/path/db.sqlite
# For Azure:
az storage blob delete-batch --source container --pattern "path/db.sqlite/*"
# 6. Restart Litestream
sudo systemctl start litestream
Verifying recovery
After restarting, verify Litestream has recovered:
# Check CPU usage is normal (should be near 0% when idle)
pidstat -p $(pgrep litestream) 1 5
# Check logs for successful snapshot
journalctl -u litestream -f
# Should see: "snapshot written" or similar
Preventing future issues
-
Avoid in-place VACUUM — Use
VACUUM INTOinstead -
Set busy timeout — Prevent checkpoint blocking:
PRAGMA busy_timeout = 5000; -
Monitor WAL size — Alert if WAL exceeds 50% of database size
-
Kill zombie connections — Ensure application processes don’t hold long-lived read locks
Getting Help
Before Asking for Help
- Check the logs for error messages (use debug level)
- Test with minimal config to isolate the issue
- Verify versions: Ensure you’re using compatible Litestream version
- Search existing issues on GitHub
Where to Get Help
- GitHub Issues: github.com/benbjohnson/litestream/issues
- GitHub Discussions: github.com/benbjohnson/litestream/discussions
- Slack Community: Join Litestream Slack
- Documentation: Review Configuration Reference
Reporting Issues
When reporting issues on GitHub, the bug report template will ask for:
- Bug Description: Clear description of the issue
- Environment: Litestream version, operating system, installation method, storage backend
- Steps to Reproduce: Numbered steps, expected vs actual behavior
- Configuration: Your
litestream.ymlfile (remove sensitive data) - Logs: Relevant log output with debug level enabled
- Additional Context: Recent changes, related issues, workarounds attempted
SQLite Driver Issues (v0.5.0+)
v0.5.0
Litestream migrated from mattn/go-sqlite3 to modernc.org/sqlite. This section covers issues specific to this change.
PRAGMA Configuration Errors
Error: PRAGMAs not taking effect or unknown pragma errors
Solution: v0.5.0+ uses different PRAGMA syntax in connection strings:
# OLD (v0.3.x - mattn/go-sqlite3):
file:/path/to/db?_busy_timeout=5000
# NEW (v0.5.0+ - modernc.org/sqlite):
file:/path/to/db?_pragma=busy_timeout(5000)
See the SQLite Driver Migration guide for complete syntax.
Busy Timeout Not Working
Error: SQLITE_BUSY errors despite setting busy timeout
Solution: Verify you’re using the correct syntax for v0.5.0+:
# Correct v0.5.0+ syntax
?_pragma=busy_timeout(5000)
# Incorrect (v0.3.x syntax - won't work in v0.5.0+)
?_busy_timeout=5000
Build Errors with CGO
Error: CGO-related build errors when building Litestream
Solution: v0.5.0+ does not require cgo for the main binary:
# Explicitly disable cgo if you're seeing cgo errors
CGO_ENABLED=0 go build ./cmd/litestream
Performance Differences
Symptoms: Different performance characteristics after upgrading
Solution: While modernc.org/sqlite is highly optimized:
- Benchmark your specific workload if performance is critical
- The pure Go driver performs comparably for most use cases
- For VFS/experimental features, the cgo driver is still available
Common Error Reference
| Error Message | Common Cause | Solution |
|---|---|---|
database is locked |
No busy timeout set | Add PRAGMA busy_timeout = 5000; |
no such file or directory |
Incorrect database path | Verify path exists and permissions |
NoCredentialsProviders |
Missing AWS credentials | Configure AWS credentials |
connection refused |
Service not running | Check if target service is accessible |
yaml: unmarshal errors |
Invalid YAML syntax | Validate configuration file syntax |
bind: address already in use |
Port conflict | Change MCP port or stop conflicting service |
| PRAGMA not taking effect | Wrong syntax for v0.5.0+ | Use _pragma=name(value) syntax |