What & Why?
Marmot is a distributed SQLite replicator with leaderless, and eventual consistency. It allows you to build a robust replication between your nodes by building on top of fault-tolerant NATS JetStream.
So if you are running a read heavy website based on SQLite, you should be easily able to scale it out by adding more SQLite replicated nodes. SQLite is probably the most ubiquitous DB that exists almost everywhere, Marmot aims to make it even more ubiquitous for server side applications by building a replication layer on top.
Why?
SQLite is a probably the most ubiquitous DB that exists almost everywhere, this project aims to make it even more ubiquitous for server side applications by building a master-less replication layer on top. This means if you are running a read heavy website based on SQLite you should be easily able to scale it out by adding more nodes of your app with SQLite replicated nodes.
Quick Start
Download latest Marmot and extract package using:
tar vxzf marmot-v*.tar.gz
From extracted directory run examples/run-cluster.sh
. Make a change in /tmp/marmot-1.db
using:
bash > sqlite3 /tmp/marmot-1.db
sqlite3 > INSERT INTO Books (title, author, publication_year) VALUES ('Pride and Prejudice', 'Jane Austen', 1813);
Now observe changes getting propagated to other database /tmp/marmot-2.db
:
bash > sqlite3 /tmp/marmot-2.db
sqlite3 > SELECT * FROM Books;
You should be able to make changes interchangeably and see the changes getting propagated.
For more complicated demos, checkout following (older versions):
What is the difference from others?
Marmot is essentially a CDC (Change Data Capture) and replication pipeline running top of NATS. It can automatically configure appropriate JetStreams making sure those streams evenly distribute load over those shards, so scaling simply boils down to adding more nodes, and re-balancing those JetStreams (auto re-balancing not implemented yet).
There are a few solutions like rqlite, dqlite, and LiteFS etc. All of them either are layers on top of SQLite (e.g. rqlite, dqlite) that requires them to sit in the middle with network layer in order to provide replication; or intercept physical page level writes to stream them off to replicas. In both cases they require a single primary node where all the writes have to go, and then these changes are applied to multiple readonly replicas.
Marmot on the other hand is born different. It's born to act as a side-car to your existing processes:
- Instead of requiring single primary, there is no primary! Which means any node can make changes to its local DB. Marmot will use triggers to capture your changes (hence atomic records), and then stream them off to NATS.
- Instead of being strongly consistent, it's eventually consistent. Which means no locking, or blocking of nodes.
- It does not require any changes to your application logic for reading/writing.
Making these choices has multiple benefits:
- You can read and write to your SQLite database like you normally do. No extension, or VFS changes.
- You can write on any node! You don't have to go to single primary for writing your data.
- As long as you start with same copy of database, all the mutations will eventually converge (hence eventually consistent).
FAQ
What happens when there is a race condition?
In Marmot every row is uniquely mapped to a JetStream. This guarantees that for any node to publish changes for a row it has to go through same JetStream as everyone else. If two nodes perform a change to same row in parallel, both of the nodes will compete to publish their change to JetStream cluster. Due to RAFT quorum constraint only one of the writer will be able to get its changes published first. Now as these changes are applied (even the publisher applies its own changes to database) the last writer will always win. This means there is NO serializability guarantee of a transaction spanning multiple tables. This is a design choice, in order to avoid any sort of global locking, and performance.
Won't capturing changes with triggers use more disk space?
Yes it will require additional storage to old/new values from triggers. But right now that is the only way sqlite can and should allow one to capture changes. However, in a typical setting these captured changes will be picked up pretty quickly, and cleaned up as soon as they have been pushed to NATS. Disk space is usually the cheapest part of modern cloud, so it should not be a huge problem.
How do I cleanup my database?
Ask marmot to remove hooks and log tables by:
marmot -config /path/to/config.toml -cleanup
How many shards should I have?
Mostly you won't need more than 1. But it depends on your use-case, and what problem you are solving for. While read scaling won't be a problem, your write throughput will depend on your network and disk speeds (Network being the biggest culprit). Shards are there to alleviate the problem when you are writing fast enough to cause bottleneck by NATS JetStream (very unlikely with a commodity SQLite node).
Can I use Marmot as single primary and multiple replicas?
Yes you can. There are two flags in configuration that will allow you to do that. First flag publish
enables/disables publishing local changes to NATS, you should disable publish
(set it to false
)
on replicas. Second flag is replicate
that enables/disables replicating changes from NATS on to
local node. You should disable replicate
(set it to false
) on primary.