Operations
Security
Marmot supports Pre-Shared Key (PSK) authentication for cluster communication. This is strongly recommended for production deployments.
[cluster]
# All nodes in the cluster must use the same secret
cluster_secret = "your-secret-key-here"Environment Variable (Recommended):
For production, use the environment variable to avoid storing secrets in config files:
export MARMOT_CLUSTER_SECRET="your-secret-key-here"
./marmotThe environment variable takes precedence over the config file.
Generating a Secret:
# Generate a secure random secret
openssl rand -base64 32Behavior:
- If
cluster_secretis empty andMARMOT_CLUSTER_SECRETis not set, authentication is disabled - A warning is logged at startup when authentication is disabled
- All gRPC endpoints (gossip, replication, snapshots) are protected when authentication is enabled
- Nodes with mismatched secrets will fail to communicate (connection rejected with "invalid cluster secret")
Logging
[logging]
verbose = false # Enable verbose logging
format = "console" # Log format: console or jsonCluster Membership Management
Marmot provides an admin API for managing cluster membership. This allows operators to view cluster state, remove nodes, and control which nodes can rejoin the cluster.
Node Lifecycle
Auto-Join (Default Behavior):
- New nodes automatically join the cluster by contacting seed nodes
- Restarted nodes automatically rejoin via gossip protocol
- No manual intervention required for normal operations
Explicit Removal:
- Nodes marked as REMOVED via admin API are permanently excluded
- REMOVED nodes cannot auto-rejoin - they are rejected at the gossip layer
- Must use
/admin/cluster/allow/{node_id}to permit rejoining - This prevents decommissioned or compromised nodes from rejoining
Prerequisites:
- The
cluster_secretmust be configured (see Security section above) - Admin endpoints are served on the same port as gRPC (default: 8080)
View Cluster Members
curl -H "X-Marmot-Secret: your-secret" http://localhost:8080/admin/cluster/membersResponse:
{
"members": [
{"NodeID": 1, "Address": "node1:8080", "Status": "ALIVE", "Incarnation": 5},
{"NodeID": 2, "Address": "node2:8080", "Status": "ALIVE", "Incarnation": 3},
{"NodeID": 3, "Address": "node3:8080", "Status": "SUSPECT", "Incarnation": 2}
],
"total_membership": 3,
"alive_count": 2,
"quorum_size": 2,
"local_node_id": 1
}Node Status Values:
| Status | Description |
|---|---|
ALIVE | Node is healthy and participating in replication |
SUSPECT | Node missed recent gossip - may be failing |
DEAD | Node failed health checks - excluded from replication |
JOINING | Node is syncing data before becoming ALIVE |
REMOVED | Node explicitly removed via admin API |
Remove a Node
Permanently remove a node from the cluster. The node will be excluded from quorum calculations and cannot rejoin until explicitly allowed.
curl -X POST -H "X-Marmot-Secret: your-secret" \
http://localhost:8080/admin/cluster/remove/2Response:
{
"success": true,
"message": "node 2 marked as REMOVED",
"total_membership": 2,
"alive_count": 2,
"quorum_size": 2
}Behavior:
- REMOVED state propagates to all nodes via gossip protocol
- REMOVED nodes are excluded from quorum calculation (affects split-brain prevention)
- REMOVED nodes cannot rejoin via normal gossip - they will be rejected
- You cannot remove the local node (prevents self-removal)
Allow Node to Rejoin
Allow a previously removed node to rejoin the cluster.
curl -X POST -H "X-Marmot-Secret: your-secret" \
http://localhost:8080/admin/cluster/allow/2Response:
{
"success": true,
"message": "node 2 allowed to rejoin cluster"
}After this, the node can restart and will go through the normal join process (JOINING → ALIVE).
Use Cases
Decommissioning a Node:
# 1. Remove node from cluster
curl -X POST -H "X-Marmot-Secret: $SECRET" http://node1:8080/admin/cluster/remove/3
# 2. Stop the node
ssh node3 'systemctl stop marmot'
# 3. Verify quorum is still achievable
curl -H "X-Marmot-Secret: $SECRET" http://node1:8080/admin/cluster/membersReplacing a Failed Node:
# 1. Remove the failed node
curl -X POST -H "X-Marmot-Secret: $SECRET" http://node1:8080/admin/cluster/remove/3
# 2. Start replacement node with same or new node_id
# If reusing node_id, first allow rejoin:
curl -X POST -H "X-Marmot-Secret: $SECRET" http://node1:8080/admin/cluster/allow/3
# 3. Start the replacement node
./marmot -config node3-config.tomlShrinking Cluster Size:
# Remove nodes to reduce cluster size
# Quorum recalculates automatically: (total_membership / 2) + 1
curl -X POST -H "X-Marmot-Secret: $SECRET" http://node1:8080/admin/cluster/remove/4
curl -X POST -H "X-Marmot-Secret: $SECRET" http://node1:8080/admin/cluster/remove/5
# 5-node cluster → 3-node cluster, quorum: 3 → 2Prometheus Metrics
[prometheus]
enabled = true # Metrics served on gRPC port at /metrics endpointAccessing Metrics:
# Metrics are multiplexed with gRPC on the same port
curl http://localhost:8080/metrics
# Prometheus scrape config
scrape_configs:
- job_name: 'marmot'
static_configs:
- targets: ['node1:8080', 'node2:8080', 'node3:8080']