Vector Search

Vector Search

Marmot provides ANN vector search over float32 embeddings stored directly in user tables. The public surface stays small:

  • vec_match(col, ?, K) in WHERE
  • vec_distance(col, ?) in ORDER BY
  • CREATE VECTOR INDEX
  • REINDEX VECTOR
  • DROP VECTOR INDEX

The exact embedding remains in the base table. The vector index builds local derived state for fast reads and never duplicates serving payload into SQLite side tables.

Design

  • Base table is the truth: your embedding blob stays in the user table and is the only exact vector copy.
  • Local file-backed serving state: Marmot stores centroids, stable cluster segments, row maps, manifests, and an overlay journal in a per-index .vecseg directory next to the database.
  • No SQLite vector sidecar: candidate serving does not scan __marmot_vec_*_members or any equivalent payload table.
  • Predicate-aware planning: Marmot chooses between exact pre-filtering and IVF post-filtering. Exact rerank always comes from the base table shortlist.
  • Overlay-first bootstrap and maintenance: empty-table create starts with overlay visibility, then publishes the first clustered generation from a local overlay snapshot and maintains later generations incrementally.
  • Crash-safe publish: new stable generations are written immutably and published by swapping manifest/current, Lucene-style.

Quick Start

CREATE TABLE docs (
    id    INTEGER PRIMARY KEY,
    title TEXT,
    embed BLOB,
    status TEXT
);
 
INSERT INTO docs (id, title, embed, status)
VALUES (1, 'hello', ?, 'published');
 
CREATE VECTOR INDEX docs_embed_idx
    ON docs(embed)
    DIM 128
    METRIC cosine;
 
SELECT id, title
  FROM docs
 WHERE vec_match(embed, ?, 10)
   AND status = 'published'
 ORDER BY vec_distance(embed, ?)
 LIMIT 10;

DDL

CREATE VECTOR INDEX

CREATE VECTOR INDEX <index_name>
    ON <table>(<blob_column>)
    DIM <n>
    METRIC <l2 | cosine | dot>
    [WITH (nlist = <n>, nprobe = <n>, max_norm = <f>)];
ClauseMeaning
DIMVector dimensionality. Required.
METRICl2, cosine, or dot. Required.
WITH (nlist = n)IVF cluster count. Optional; auto-tuned when omitted.
WITH (nprobe = n)Probed clusters per query. Optional; auto-derived when omitted.
WITH (max_norm = f)Required only for METRIC dot.

Requirements:

  • Base table must declare INTEGER PRIMARY KEY.
  • Target column must be a packed little-endian float32 BLOB.

If the table already has rows, CREATE VECTOR INDEX computes centroids, builds the first stable segment generation, and marks the index ready.

If the table is empty, the index is still created immediately. Marmot serves fresh rows from the local overlay right away, then starts a local bootstrap watcher. Once enough indexable rows arrive and the overlay snapshot stops growing, Marmot trains the first probe set from that snapshot and publishes the first clustered generation automatically. No manual REINDEX VECTOR is required just because the table was empty at create time.

REINDEX VECTOR

REINDEX VECTOR <index_name>;

Retrains centroids from the current base-table rows, rewrites a new immutable local generation, and atomically swaps it into service. Old generations remain valid for existing readers until the new one is published. Normal empty-table growth and steady-state overlay merges do not require REINDEX VECTOR; they use the local incremental maintenance path.

DROP VECTOR INDEX

DROP VECTOR INDEX <index_name>;

Drops the index metadata and local derived files. The base-table vectors are unaffected.

Query Syntax

vec_match

WHERE vec_match(<column>, <query_vec>, <k>)

Marks the query as vector-filtered. Exactly one vec_match is supported per SELECT.

vec_distance

ORDER BY vec_distance(<column>, <query_vec>)

Ranks the result set. The rewriter resolves this to the metric-specific distance function internally.

Metrics

MetricDistance
l2Squared Euclidean distance
cosine1 - cosine_similarity
dotNegative inner product

Planning

Marmot uses two execution shapes:

  • Pre-filter: apply the scalar predicate first, then compute exact distance over the smaller filtered set.
  • Post-filter: probe IVF centroids, scan stable segment clusters plus the overlay, then exact-rerank a shortlist from the base table.

The planner uses estimated predicate cardinality plus session overrides to choose between them. In post-filter mode, Marmot chooses a probe prefix from the nearest clusters; with default auto-tuning that prefix is budget-driven rather than a fixed hard-coded sqrt(nlist) probe count.

Session Variables

Current query/session controls are per connection.

VariableDefaultMeaning
@@marmot_vec_nprobeindex defaultProbed clusters per query
@@marmot_vec_force_planautoauto, pre, or post
@@marmot_vec_prefilter_cap5000Hard cap for pre-filter row count
@@marmot_vec_fallbackonAllow exact fallback if post-filter returns too few rows
@@marmot_vec_use_go_rankonUse the Go-side ranking path

Configuration

[vector_index]
enabled = true
data_dir = ""
  • enabled turns vector index support on.
  • data_dir is reserved for a future explicit override of the local vector-state root. Today Marmot colocates .vecseg state next to the database file.

Under the Hood

Local Layout

For docs_embed_idx on docs(embed), Marmot stores local derived state under:

<db>.docs_embed_idx.vecseg/
  manifest/
    current
    gen-00000000000000000001.mf
  segments/
    gen-00000000000000000001.dat
  rowmap/
    gen-00000000000000000001.rmap
  overlay/
    current.log

What each file does:

  • manifest/current: points to the active generation.
  • gen-*.mf: manifest with index identity, probe/stable centroid epochs and blobs, checksums, and file names.
  • segments/gen-*.dat: stable vector payload laid out by cluster.
  • rowmap/gen-*.rmap: rowid-to-stable-location map for rebuild and compaction.
  • overlay/current.log: append-only local overlay journal for recent committed mutations.

Read Path

For IVF post-filter queries:

  1. Load probe/stable centroid state from the active manifest into memory.
  2. Pick the nearest cluster prefix for this query, using either explicit nprobe or the auto-tuned scan budget.
  3. Read stable cluster payload from the segment file with pread.
  4. Merge the in-memory overlay snapshot.
  5. Fetch the shortlist vectors from the base table, materialize them in Go, and exact-rerank there.

The hot candidate scan is file-backed, not SQLite-row backed.

Write Path

Committed base-table inserts, updates, and deletes update the local overlay journal and overlay snapshot. Before the first clustered publish, queries read from the overlay only. After bootstrap, stable segment files remain immutable; Marmot rewrites touched clusters into a new generation, advances routing metadata, and publishes that generation atomically. Bounded cluster-count growth uses the same local generation machinery instead of a SQLite sidecar.

Crash Safety

Publish order is:

  1. write temp segment and rowmap files
  2. fsync temp files
  3. write temp manifest
  4. fsync manifest
  5. rename manifest into place
  6. swap manifest/current
  7. fsync manifest directory

On reopen, Marmot validates:

  • manifest/current
  • manifest version and index identity
  • stable file sizes and checksums
  • rowmap sizes and checksums

If local derived state is invalid, Marmot rebuilds it from the base table.

Benchmarking

Use cmd/vec-bench for local vector benchmarking. A meaningful run should always report the full bundle:

  • create time
  • first clustered publish time
  • final settled time
  • query-ready throughput
  • write throughput
  • read QPS
  • recall
  • p50 / p95 / p99 latency
  • RSS after build, vector settle, warmup, and measurement
  • CPU profiles when relevant

Example:

./vec-bench \
  --data-dir=/tmp/marmot/benchdata/dbpedia-openai-1536 \
  --db-dir=/tmp/marmot/bench-active \
  --db-name=dbpedia \
  --table=docs \
  --column=embed \
  --index=embed_idx \
  --force-build \
  --insert-n=100000 \
  --warmup=200 \
  --n-queries=500 \
  --query-concurrency=1 \
  --settle-timeout=10m \
  --profile-dir=/tmp/marmot/bench-active/prof

Limitations

  • One vec_match per SELECT.
  • LIMIT is required with vec_match.
  • Base table must have INTEGER PRIMARY KEY.
  • Exact rerank still reads the shortlist vectors from the base table.
  • Initial bootstrap, larger promotions, and REINDEX VECTOR are still CPU- and memory-heavy because centroid training happens in-process.
  • Raw insert throughput is not the same thing as final query-ready throughput; for create-on-empty runs, judge the system by the reported clustered-publish and settled milestones, not by insert speed alone.