Replication

Replication

DDL Replication

Marmot v2 supports distributed DDL (Data Definition Language) replication without requiring master election:

How It Works

  1. Cluster-Wide Locking: Each DDL operation acquires a distributed lock per database (default: 30-second lease)

    • Prevents concurrent schema changes on the same database
    • Locks automatically expire if a node crashes
    • Different databases can have concurrent DDL operations
  2. Automatic Idempotency: DDL statements are automatically rewritten for safe replay

    CREATE TABLE users (id INT)
    CREATE TABLE IF NOT EXISTS users (id INT)
     
    DROP TABLE users
    DROP TABLE IF EXISTS users
  3. Schema Version Tracking: Each database maintains a schema version counter

    • Incremented on every DDL operation
    • Exchanged via gossip protocol for drift detection
    • Used by delta sync to validate transaction applicability
  4. Quorum-Based Replication: DDL replicates like DML through the same 2PC mechanism

    • No special master node needed
    • Works with existing consistency levels (QUORUM, ALL, etc.)

Configuration

[ddl]
# DDL lock lease duration (seconds)
lock_lease_seconds = 30
 
# Automatically rewrite DDL for idempotency
enable_idempotent = true

Best Practices

  • Do: Execute DDL from a single connection/node at a time
  • Do: Use qualified table names (mydb.users instead of users)
  • ⚠️ Caution: ALTER TABLE is less idempotent - avoid replaying failed ALTER operations
  • Don't: Run concurrent DDL on the same database from multiple nodes

CDC-Based Replication

Marmot v2 uses Change Data Capture (CDC) for replication instead of SQL statement replay:

How It Works

  1. Row-Level Capture: Instead of replicating SQL statements, Marmot captures the actual row data changes (INSERT/UPDATE/DELETE)
  2. Binary Data Format: Row data is serialized as CDC messages with column values, ensuring consistent replication regardless of SQL dialect
  3. Deterministic Application: Row data is applied directly to the target database, avoiding parsing ambiguities

Benefits

  • Consistency: Same row data applied everywhere, no SQL parsing differences
  • Performance: Binary format is more efficient than SQL text
  • Reliability: No issues with SQL syntax variations between MySQL and SQLite

Row Key Extraction

For UPDATE and DELETE operations, Marmot automatically extracts row keys:

  • Uses PRIMARY KEY columns when available
  • Falls back to ROWID for tables without explicit primary key
  • Handles composite primary keys correctly

Recovery Scenarios

Marmot handles various failure and recovery scenarios automatically:

Network Partition (Split-Brain)

ScenarioBehavior
Minority partitionWrites fail - cannot achieve quorum
Majority partitionWrites succeed - quorum achieved
Partition healsDelta sync + LWW merges divergent data

How it works:

  1. During partition, only the majority side can commit writes (quorum enforcement)
  2. When partition heals, nodes exchange transaction logs via StreamChanges RPC
  3. Conflicts resolved using Last-Writer-Wins (LWW) with HLC timestamps
  4. Higher node ID breaks ties for simultaneous writes

Node Failure & Recovery

ScenarioRecovery Method
Brief outageDelta sync - replay missed transactions
Extended outageSnapshot transfer + delta sync
New node joiningFull snapshot from existing node

Anti-Entropy Background Process:

Marmot v2 includes an automatic anti-entropy system that continuously monitors and repairs replication lag across the cluster:

  1. Lag Detection: Every 60 seconds (configurable), each node queries peers for their replication state
  2. Smart Recovery Decision:
    • Delta Sync if lag < 10,000 transactions AND < 1 hour: Streams missed transactions incrementally
    • Snapshot Transfer if lag exceeds thresholds: Full database file transfer for efficiency
  3. Gap Detection: Detects when transaction logs have been GC'd and automatically falls back to snapshot
  4. Multi-Database Support: Tracks and syncs each database independently
  5. GC Coordination: Garbage collection respects peer replication state - logs aren't deleted until all peers have applied them

Delta Sync Process:

  1. Lagging node queries last_applied_txn_id for each peer/database
  2. Requests transactions since that ID via StreamChanges RPC
  3. Gap Detection: Checks if first received txn_id has a large gap from requested ID
    • If gap > delta_sync_threshold_txns, indicates missing (GC'd) transactions
    • Automatically falls back to snapshot transfer to prevent data loss
  4. Applies changes using LWW conflict resolution
  5. Updates replication state tracking (per-database)
  6. Progress logged every 100 transactions

GC Coordination with Anti-Entropy:

  • Transaction logs are retained with a two-tier policy:
    • Min retention (2 hours): Must be >= delta sync threshold, respects peer lag
    • Max retention (24 hours): Force delete after this time to prevent unbounded growth
  • Config validation enforces: gc_min >= delta_threshold and gc_max >= 2x delta_threshold
  • Each database tracks replication progress per peer
  • GC queries minimum applied txn_id across all peers before cleanup
  • Gap detection prevents data loss if GC runs while nodes are offline