Prometheus Metrics
Marmot exposes Prometheus metrics for monitoring cluster health, replication performance, and query processing. All metrics use the marmot_v2 namespace and include a node_id label for multi-node visibility.
Enabling Metrics
[prometheus]
enabled = true # Metrics served on gRPC port at /metrics endpointAccessing Metrics:
curl http://localhost:8080/metricsCluster Health Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
marmot_v2_cluster_nodes | Gauge | status | Number of nodes in cluster by status (ALIVE, SUSPECT, DEAD, JOINING, REMOVED) |
marmot_v2_cluster_quorum_available | Gauge | - | Whether quorum is achievable (1=yes, 0=no) |
marmot_v2_gossip_rounds_total | Counter | - | Total number of gossip rounds executed |
marmot_v2_gossip_messages_total | Counter | direction | Total gossip messages by direction (sent, received) |
marmot_v2_gossip_failures_total | Counter | - | Total failed gossip send attempts |
marmot_v2_node_state_transitions_total | Counter | from, to | Node state transitions (e.g., ALIVE to SUSPECT) |
marmot_v2_cluster_join_total | Counter | result | Cluster join attempts by result (success, failed) |
Transaction Metrics (2PC)
| Metric | Type | Labels | Description |
|---|---|---|---|
marmot_v2_txn_total | Counter | type, result | Total transactions by type (write, read) and result (success, failed, conflict) |
marmot_v2_txn_duration_seconds | Histogram | type | Transaction duration in seconds |
marmot_v2_twophase_prepare_seconds | Histogram | - | 2PC prepare phase duration in seconds |
marmot_v2_twophase_commit_seconds | Histogram | - | 2PC commit phase duration in seconds |
marmot_v2_twophase_quorum_acks | Histogram | phase | Number of quorum acknowledgments received per phase |
marmot_v2_write_conflicts_total | Counter | type, path | Write conflicts by type (mvcc, intent) and detection path (fast, slow) |
marmot_v2_intent_filter_checks_total | Counter | result | Intent filter checks by result (fast_path, slow_path_miss, slow_path_conflict) |
marmot_v2_intent_filter_size | Gauge | - | Current number of entries in the Cuckoo filter |
marmot_v2_intent_filter_false_positives_total | Counter | - | Intent filter false positives (slow path found no conflict) |
marmot_v2_intent_filter_txn_count | Gauge | - | Number of transactions with active intents in filter |
marmot_v2_replication_requests_total | Counter | phase, result | Replication requests by phase (prepare, commit, replay) and result |
marmot_v2_active_transactions | Gauge | - | Number of currently active transactions |
Query Processing Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
marmot_v2_queries_total | Counter | type, result | Total queries by type (select, insert, update, delete, ddl) and result |
marmot_v2_query_duration_seconds | Histogram | type | Query duration in seconds |
marmot_v2_rows_affected | Histogram | - | Number of rows affected per write query |
marmot_v2_rows_returned | Histogram | - | Number of rows returned per read query |
marmot_v2_mysql_connections | Gauge | - | Number of active MySQL protocol connections |
marmot_v2_ddl_operations_total | Counter | result | DDL operations by result (success, failed) |
marmot_v2_ddl_lock_wait_seconds | Histogram | - | Time waiting for DDL lock in seconds |
Anti-Entropy Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
marmot_v2_antientropy_rounds_total | Counter | - | Total anti-entropy rounds executed |
marmot_v2_antientropy_syncs_total | Counter | type, result | Anti-entropy syncs by type (delta, snapshot) and result |
marmot_v2_antientropy_duration_seconds | Histogram | - | Anti-entropy round duration in seconds |
marmot_v2_replication_lag_txns | Gauge | peer | Transaction lag behind peer |
marmot_v2_delta_sync_txns_total | Counter | - | Total transactions applied via delta sync |
Histogram Buckets
Different metrics use histogram buckets optimized for their expected latency profiles:
Write Transaction Buckets (for distributed writes with network + consensus):
5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, 10sRead Transaction Buckets (for local SQLite reads):
0.1ms, 0.5ms, 1ms, 5ms, 10ms, 25ms, 50ms, 100ms, 250ms2PC Phase Buckets (for prepare/commit latencies):
1ms, 5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5sSync Buckets (for anti-entropy and background sync):
100ms, 500ms, 1s, 2.5s, 5s, 10s, 30s, 60sPrometheus Scrape Configuration
scrape_configs:
- job_name: 'marmot'
static_configs:
- targets: ['node1:8080', 'node2:8080', 'node3:8080']
scrape_interval: 15sExample Queries
Cluster health:
# Check if all nodes are alive
sum(marmot_v2_cluster_nodes{status="ALIVE"}) by (node_id)
# Quorum availability across cluster
min(marmot_v2_cluster_quorum_available)Transaction performance:
# Write transaction p99 latency
histogram_quantile(0.99, rate(marmot_v2_txn_duration_seconds_bucket{type="write"}[5m]))
# Transaction success rate
sum(rate(marmot_v2_txn_total{result="success"}[5m])) / sum(rate(marmot_v2_txn_total[5m]))2PC performance:
# Prepare phase p95 latency
histogram_quantile(0.95, rate(marmot_v2_twophase_prepare_seconds_bucket[5m]))
# Commit phase p95 latency
histogram_quantile(0.95, rate(marmot_v2_twophase_commit_seconds_bucket[5m]))Conflict detection:
# Write conflicts per minute
sum(rate(marmot_v2_write_conflicts_total[1m])) by (type)Replication lag:
# Max replication lag across all peers
max(marmot_v2_replication_lag_txns) by (node_id)