Vector Search
Marmot provides ANN vector search over float32 embeddings stored directly in user tables. The public surface stays small:
vec_match(col, ?, K)inWHEREvec_distance(col, ?)inORDER BYCREATE VECTOR INDEXREINDEX VECTORDROP VECTOR INDEX
The exact embedding remains in the base table. The vector index builds local derived state for fast reads and never duplicates serving payload into SQLite side tables.
Design
- Base table is the truth: your embedding blob stays in the user table and is the only exact vector copy.
- Local file-backed serving state: Marmot stores centroids, stable cluster segments, row maps, manifests, and an overlay journal in a per-index
.vecsegdirectory next to the database. - No SQLite vector sidecar: candidate serving does not scan
__marmot_vec_*_membersor any equivalent payload table. - Predicate-aware planning: Marmot chooses between exact pre-filtering and IVF post-filtering. Exact rerank always comes from the base table shortlist.
- Overlay-first bootstrap and maintenance: empty-table create starts with overlay visibility, then publishes the first clustered generation from a local overlay snapshot and maintains later generations incrementally.
- Crash-safe publish: new stable generations are written immutably and published by swapping
manifest/current, Lucene-style.
Quick Start
CREATE TABLE docs (
id INTEGER PRIMARY KEY,
title TEXT,
embed BLOB,
status TEXT
);
INSERT INTO docs (id, title, embed, status)
VALUES (1, 'hello', ?, 'published');
CREATE VECTOR INDEX docs_embed_idx
ON docs(embed)
DIM 128
METRIC cosine;
SELECT id, title
FROM docs
WHERE vec_match(embed, ?, 10)
AND status = 'published'
ORDER BY vec_distance(embed, ?)
LIMIT 10;DDL
CREATE VECTOR INDEX
CREATE VECTOR INDEX <index_name>
ON <table>(<blob_column>)
DIM <n>
METRIC <l2 | cosine | dot>
[WITH (nlist = <n>, nprobe = <n>, max_norm = <f>)];| Clause | Meaning |
|---|---|
DIM | Vector dimensionality. Required. |
METRIC | l2, cosine, or dot. Required. |
WITH (nlist = n) | IVF cluster count. Optional; auto-tuned when omitted. |
WITH (nprobe = n) | Probed clusters per query. Optional; auto-derived when omitted. |
WITH (max_norm = f) | Required only for METRIC dot. |
Requirements:
- Base table must declare
INTEGER PRIMARY KEY. - Target column must be a packed little-endian
float32BLOB.
If the table already has rows, CREATE VECTOR INDEX computes centroids, builds the first stable segment generation, and marks the index ready.
If the table is empty, the index is still created immediately. Marmot serves fresh rows from the local overlay right away, then starts a local bootstrap watcher. Once enough indexable rows arrive and the overlay snapshot stops growing, Marmot trains the first probe set from that snapshot and publishes the first clustered generation automatically. No manual REINDEX VECTOR is required just because the table was empty at create time.
REINDEX VECTOR
REINDEX VECTOR <index_name>;Retrains centroids from the current base-table rows, rewrites a new immutable local generation, and atomically swaps it into service. Old generations remain valid for existing readers until the new one is published. Normal empty-table growth and steady-state overlay merges do not require REINDEX VECTOR; they use the local incremental maintenance path.
DROP VECTOR INDEX
DROP VECTOR INDEX <index_name>;Drops the index metadata and local derived files. The base-table vectors are unaffected.
Query Syntax
vec_match
WHERE vec_match(<column>, <query_vec>, <k>)Marks the query as vector-filtered. Exactly one vec_match is supported per SELECT.
vec_distance
ORDER BY vec_distance(<column>, <query_vec>)Ranks the result set. The rewriter resolves this to the metric-specific distance function internally.
Metrics
| Metric | Distance |
|---|---|
l2 | Squared Euclidean distance |
cosine | 1 - cosine_similarity |
dot | Negative inner product |
Planning
Marmot uses two execution shapes:
- Pre-filter: apply the scalar predicate first, then compute exact distance over the smaller filtered set.
- Post-filter: probe IVF centroids, scan stable segment clusters plus the overlay, then exact-rerank a shortlist from the base table.
The planner uses estimated predicate cardinality plus session overrides to choose between them. In post-filter mode, Marmot chooses a probe prefix from the nearest clusters; with default auto-tuning that prefix is budget-driven rather than a fixed hard-coded sqrt(nlist) probe count.
Session Variables
Current query/session controls are per connection.
| Variable | Default | Meaning |
|---|---|---|
@@marmot_vec_nprobe | index default | Probed clusters per query |
@@marmot_vec_force_plan | auto | auto, pre, or post |
@@marmot_vec_prefilter_cap | 5000 | Hard cap for pre-filter row count |
@@marmot_vec_fallback | on | Allow exact fallback if post-filter returns too few rows |
@@marmot_vec_use_go_rank | on | Use the Go-side ranking path |
Configuration
[vector_index]
enabled = true
data_dir = ""enabledturns vector index support on.data_diris reserved for a future explicit override of the local vector-state root. Today Marmot colocates.vecsegstate next to the database file.
Under the Hood
Local Layout
For docs_embed_idx on docs(embed), Marmot stores local derived state under:
<db>.docs_embed_idx.vecseg/
manifest/
current
gen-00000000000000000001.mf
segments/
gen-00000000000000000001.dat
rowmap/
gen-00000000000000000001.rmap
overlay/
current.logWhat each file does:
manifest/current: points to the active generation.gen-*.mf: manifest with index identity, probe/stable centroid epochs and blobs, checksums, and file names.segments/gen-*.dat: stable vector payload laid out by cluster.rowmap/gen-*.rmap: rowid-to-stable-location map for rebuild and compaction.overlay/current.log: append-only local overlay journal for recent committed mutations.
Read Path
For IVF post-filter queries:
- Load probe/stable centroid state from the active manifest into memory.
- Pick the nearest cluster prefix for this query, using either explicit
nprobeor the auto-tuned scan budget. - Read stable cluster payload from the segment file with
pread. - Merge the in-memory overlay snapshot.
- Fetch the shortlist vectors from the base table, materialize them in Go, and exact-rerank there.
The hot candidate scan is file-backed, not SQLite-row backed.
Write Path
Committed base-table inserts, updates, and deletes update the local overlay journal and overlay snapshot. Before the first clustered publish, queries read from the overlay only. After bootstrap, stable segment files remain immutable; Marmot rewrites touched clusters into a new generation, advances routing metadata, and publishes that generation atomically. Bounded cluster-count growth uses the same local generation machinery instead of a SQLite sidecar.
Crash Safety
Publish order is:
- write temp segment and rowmap files
- fsync temp files
- write temp manifest
- fsync manifest
- rename manifest into place
- swap
manifest/current - fsync manifest directory
On reopen, Marmot validates:
manifest/current- manifest version and index identity
- stable file sizes and checksums
- rowmap sizes and checksums
If local derived state is invalid, Marmot rebuilds it from the base table.
Benchmarking
Use cmd/vec-bench for local vector benchmarking. A meaningful run should always report the full bundle:
- create time
- first clustered publish time
- final settled time
- query-ready throughput
- write throughput
- read QPS
- recall
- p50 / p95 / p99 latency
- RSS after build, vector settle, warmup, and measurement
- CPU profiles when relevant
Example:
./vec-bench \
--data-dir=/tmp/marmot/benchdata/dbpedia-openai-1536 \
--db-dir=/tmp/marmot/bench-active \
--db-name=dbpedia \
--table=docs \
--column=embed \
--index=embed_idx \
--force-build \
--insert-n=100000 \
--warmup=200 \
--n-queries=500 \
--query-concurrency=1 \
--settle-timeout=10m \
--profile-dir=/tmp/marmot/bench-active/profLimitations
- One
vec_matchperSELECT. LIMITis required withvec_match.- Base table must have
INTEGER PRIMARY KEY. - Exact rerank still reads the shortlist vectors from the base table.
- Initial bootstrap, larger promotions, and
REINDEX VECTORare still CPU- and memory-heavy because centroid training happens in-process. - Raw insert throughput is not the same thing as final query-ready throughput; for create-on-empty runs, judge the system by the reported clustered-publish and settled milestones, not by insert speed alone.