# Merge Operators

> How SlateDB stores merge operands and resolves them on reads and rewrites

[`MergeOperator`](https://docs.rs/slatedb/latest/slatedb/trait.MergeOperator.html) lets writers record partial updates with [`Db::merge`](https://docs.rs/slatedb/latest/slatedb/struct.Db.html#method.merge), [`Db::merge_with_options`](https://docs.rs/slatedb/latest/slatedb/struct.Db.html#method.merge_with_options), and [`WriteBatch::merge`](https://docs.rs/slatedb/latest/slatedb/struct.WriteBatch.html#method.merge). Those APIs append merge operands to the row history instead of forcing every writer to read the current value first. SlateDB resolves that history later with the operator you install.

## Example

This example uses a merge operator that concatenates string fragments.

```rust
use bytes::Bytes;
use slatedb::{Db, Error, MergeOperator, MergeOperatorError};
use slatedb::object_store::{memory::InMemory, ObjectStore};
use std::sync::Arc;

struct StringConcatMergeOperator;

impl MergeOperator for StringConcatMergeOperator {
    fn merge(
        &self,
        _key: &Bytes,
        existing_value: Option,
        operand: Bytes,
    ) -> Result {
        let mut result = existing_value.unwrap_or_default().as_ref().to_vec();
        result.extend_from_slice(&operand);
        Ok(Bytes::from(result))
    }
}

#[tokio::main]
async fn main() -> Result<(), Error> {
    let object_store: Arc<dyn ObjectStore> = Arc::new(InMemory::new());
    let db = Db::builder("example", object_store)
        .with_merge_operator(Arc::new(StringConcatMergeOperator))
        .build()
        .await?;

    db.merge(b"greeting", b"hello, ").await?;
    db.merge(b"greeting", b"world").await?;

    let value = db.get(b"greeting").await?.unwrap();
    assert_eq!(value.as_ref(), b"hello, world");
    Ok(())
}
```

## Resolution

For one key, SlateDB reads the newest visible row first and walks backward until it reaches a plain value, a tombstone, or the end of history. It then applies the newer operands from oldest to newest on top of that base.

- `Value -> Merge -> Merge` reads as the base value plus both operands.
- `Tombstone -> Merge -> Merge` ignores everything older than the delete and applies only the newer operands.
- `Merge -> Merge` returns the merged operand bytes even though no base value exists yet.

When SlateDB flushes or compacts data, that last case may remain a single `Merge` row instead of turning into a plain value, because SlateDB still has merge operands but no base value for the key. If older SST levels contain an older value or tombstone for that key, a later read or compaction can still combine that older state with the newer merged operand.

## Contract

The `MergeOperator` trait has two methods:

- [`merge`](https://docs.rs/slatedb/latest/slatedb/trait.MergeOperator.html#tymethod.merge) combines one operand with an optional accumulated value.
- [`merge_batch`](https://docs.rs/slatedb/latest/slatedb/trait.MergeOperator.html#method.merge_batch) combines an optional accumulated value with a slice of operands ordered from oldest to newest.

The default `merge_batch` implementation calls `merge` pairwise. Override it if you can process a batch more efficiently.

The operator must be associative. SlateDB may regroup operands during reads, memtable flush, and compaction, so the result has to stay stable when those boundaries move. SlateDB installs one merge operator per database, but your implementation can dispatch by key prefix or value format if you need different merge rules.

## Where Merges Happen

### Reads

Normal reads wrap the iterator stack with [`MergeOperatorIterator`](https://github.com/slatedb/slatedb/blob/main/slatedb/src/merge_operator.rs). Point lookups and scans can merge operands across the write batch, memtable, immutable memtables, L0 SSTs, and sorted runs.

The read path merges operands even when their `expire_ts` values differ. The returned row uses the minimum `expire_ts` across the merged operands and any base value. [Time](/docs/design/time) covers the TTL rules in more detail.

If no merge operator is configured and a read encounters a `Merge` row, SlateDB returns `MergeOperatorMissing`.

### Rewrites

Inside a committed [`WriteBatch`](https://docs.rs/slatedb/latest/slatedb/struct.WriteBatch.html), SlateDB reduces consecutive merge operations for the same key when the writer has a merge operator configured. If the batch also contains a value or tombstone base for that key, the reduced row becomes a plain value. If it has only operands, the reduced row stays a `Merge`.

Memtable flush and compaction use the same merge logic before they write new SSTs. Those rewrite paths are stricter than reads. They keep rows with different `expire_ts` values in separate groups so they can expire independently, and they stop at snapshot and durability retention boundaries so active snapshots or remote readers can still see the version chain they need.

### Raw APIs

[`WalReader`](https://docs.rs/slatedb/latest/slatedb/struct.WalReader.html) and WAL-based [Change Data Capture](/docs/design/change-data-capture) do not apply the merge operator. They expose raw `ValueDeletable::Merge` rows. Downstream consumers need to materialize those operands themselves if they need final values.

## Configuration

Every process that reads or rewrites stored merge rows needs compatible merge logic.

- Writers use [`DbBuilder::with_merge_operator`](https://docs.rs/slatedb/latest/slatedb/struct.DbBuilder.html#method.with_merge_operator).
- Standalone compactors use [`CompactorBuilder::with_merge_operator`](https://docs.rs/slatedb/latest/slatedb/struct.CompactorBuilder.html#method.with_merge_operator).
- Read-only handles use [`DbReaderBuilder::with_merge_operator`](https://docs.rs/slatedb/latest/slatedb/struct.DbReaderBuilder.html#method.with_merge_operator).

If one process writes merge operands and another opens the same database without a compatible operator, the raw rows can still exist in storage, but later reads, memtable flushes, or compaction can fail when they reach them.
