Merge Operators

MergeOperator lets writers record partial updates with Db::merge, Db::merge_with_options, and WriteBatch::merge. Those APIs append merge operands to the row history instead of forcing every writer to read the current value first. SlateDB resolves that history later with the operator you install.

Example

This example uses a merge operator that concatenates string fragments.

use bytes::Bytes;
use slatedb::{Db, Error, MergeOperator, MergeOperatorError};
use slatedb::object_store::{memory::InMemory, ObjectStore};
use std::sync::Arc;

struct StringConcatMergeOperator;

impl MergeOperator for StringConcatMergeOperator {
    fn merge(
        &self,
        _key: &Bytes,
        existing_value: Option<Bytes>,
        operand: Bytes,
    ) -> Result<Bytes, MergeOperatorError> {
        let mut result = existing_value.unwrap_or_default().as_ref().to_vec();
        result.extend_from_slice(&operand);
        Ok(Bytes::from(result))
    }
}

#[tokio::main]
async fn main() -> Result<(), Error> {
    let object_store: Arc<dyn ObjectStore> = Arc::new(InMemory::new());
    let db = Db::builder("example", object_store)
        .with_merge_operator(Arc::new(StringConcatMergeOperator))
        .build()
        .await?;

    db.merge(b"greeting", b"hello, ").await?;
    db.merge(b"greeting", b"world").await?;

    let value = db.get(b"greeting").await?.unwrap();
    assert_eq!(value.as_ref(), b"hello, world");
    Ok(())
}

Resolution

For one key, SlateDB reads the newest visible row first and walks backward until it reaches a plain value, a tombstone, or the end of history. It then applies the newer operands from oldest to newest on top of that base.

Value -> Merge -> Merge reads as the base value plus both operands.
Tombstone -> Merge -> Merge ignores everything older than the delete and applies only the newer operands.
Merge -> Merge returns the merged operand bytes even though no base value exists yet.

When SlateDB flushes or compacts data, that last case may remain a single Merge row instead of turning into a plain value, because SlateDB still has merge operands but no base value for the key. If older SST levels contain an older value or tombstone for that key, a later read or compaction can still combine that older state with the newer merged operand.

Contract

The MergeOperator trait has two methods:

merge combines one operand with an optional accumulated value.
merge_batch combines an optional accumulated value with a slice of operands ordered from oldest to newest.

The default merge_batch implementation calls merge pairwise. Override it if you can process a batch more efficiently.

The operator must be associative. SlateDB may regroup operands during reads, memtable flush, and compaction, so the result has to stay stable when those boundaries move. SlateDB installs one merge operator per database, but your implementation can dispatch by key prefix or value format if you need different merge rules.

Where Merges Happen

Reads

Normal reads wrap the iterator stack with MergeOperatorIterator. Point lookups and scans can merge operands across the write batch, memtable, immutable memtables, L0 SSTs, and sorted runs.

The read path merges operands even when their expire_ts values differ. The returned row uses the minimum expire_ts across the merged operands and any base value. Time covers the TTL rules in more detail.

If no merge operator is configured and a read encounters a Merge row, SlateDB returns MergeOperatorMissing.

Rewrites

Inside a committed WriteBatch, SlateDB reduces consecutive merge operations for the same key when the writer has a merge operator configured. If the batch also contains a value or tombstone base for that key, the reduced row becomes a plain value. If it has only operands, the reduced row stays a Merge.

Memtable flush and compaction use the same merge logic before they write new SSTs. Those rewrite paths are stricter than reads. They keep rows with different expire_ts values in separate groups so they can expire independently, and they stop at snapshot and durability retention boundaries so active snapshots or remote readers can still see the version chain they need.

Raw APIs

WalReader and WAL-based Change Data Capture do not apply the merge operator. They expose raw ValueDeletable::Merge rows. Downstream consumers need to materialize those operands themselves if they need final values.

Configuration

Every process that reads or rewrites stored merge rows needs compatible merge logic.

Writers use DbBuilder::with_merge_operator.
Standalone compactors use CompactorBuilder::with_merge_operator.
Read-only handles use DbReaderBuilder::with_merge_operator.

If one process writes merge operands and another opens the same database without a compatible operator, the raw rows can still exist in storage, but later reads, memtable flushes, or compaction can fail when they reach them.