Custom Comparators

<!— Replace “RFC Title” with your RFC’s short, descriptive title. —>

Status: Rejected

Authors:

Almog Gavra

> [!CAUTION] > This RFC was rejected. See the Rejection Analysis for more details.

<!— TOC —><a name=“summary-motivation”></a>

Summary & Motivation

This RFC proposes a new API for customizing the ordering of keys in SlateDB. The ordering of keys in Slate is a critical component of the performance characteristics of the database and its ability to efficiently serve range queries.

The current implementation of SlateDB uses a lexicographic ordering of keys. This is a good default for many use cases, but it may not always be the best choice. Some examples of use cases where a different ordering is desirable include:

Correct semantics for non-string keys (e.g. sorting floating point numbers)
Sorting composite keys (e.g. a composite key of [user_id, timestamp] may sort ascending by user_id and descending by timestamp)
Domain-specific ordering (e.g. Z-ordering for geographic data)

<!— TOC —><a name=“goals”></a>

Goals

Provide a new API for customizing the ordering of keys in SlateDB
Maintain backwards compatibility by defaulting to the existing lexicographic ordering
Best-effort enforcing of the ordering invariant (e.g. a comparator doesn’t change after first deployment)

<!— TOC —><a name=“non-goals”></a>

Non-Goals

Providing default implementations for non-lexicographic use cases
Supporting changes to the ordering invariant after first deployment

<!— TOC —><a name=“design”></a>

Design

<!— A detailed description of the proposed change. Include diagrams, examples, schemas, and pseudo-code as appropriate. —>

<!— TOC —><a name=“api”></a>

API

We will introduce a new interface for customizing the ordering of keys in SlateDB, the KeyComparator trait:

use std::cmp::Ordering;
use std::sync::Arc;

/// Custom key comparison trait.
/// WARNING: Changing comparators on existing DB causes corruption.
pub trait KeyComparator: Send + Sync + std::fmt::Debug {
    /// Compare two keys.
    fn compare(&amp;self, a: &amp;[u8], b: &amp;[u8]) -&gt; Ordering;

    /// Unique identifier stored in manifest for validation.
    fn name(&amp;self) -&gt; &str;
}

pub type KeyComparatorType = Arc&lt;dyn KeyComparator&gt;;

This will be passed in when constructing a new Db instance with the DbBuilder:

let db = DbBuilder::new(path)
    // ...
    .with_key_comparator(key_comparator)
    .build();

<!— TOC —><a name=“manifest”></a>

Manifest

We will introduce a new field to the manifest format to store the name of the comparator:

table ManifestV1 {
    // ...
    key_comparator_name: string;
}

This field will be used to validate that the comparator is not changed after first deployment. if it is set in the manifest, a comparator with the same name must be used to create the database in runtime (no comparator will also cause a panic). If it is not set and the manifest already exists, the database will panic if a custom comparator is supplied at runtime.

<!— TOC —><a name=“implementation”></a>

Implementation

We’ll need to modify each of these places to use the new KeyComparator passed in from the DbBuilder. Note that because it’s a trait, we’ll need to use dynamic dispatch to call the comparator.

> [!NOTE] > Since comparisons happen on a hot path, we may want to avoid using the dyn KeyComparator > if not necessary to avoid the overhead of dynamic dispatch. Instead, we can choose to > use an Option<KeyComparatorType> to store the comparator and inline the default > lexicographic comparator if not set. > > The alternative would be to always use the dyn KeyComparator and just implement a default > implementation of it. This would be simpler, but it would add a small overhead to every > comparison.

The following table was generated via an agent, but it seems to be accurate as far as I can tell:

Location	Usage
`slatedb/src/bytes_range.rs:8`<br>`slatedb/src/comparable_range.rs:8`	`BytesRange` and its `ComparableRange` backing types rely on the default `Ord` implementation for `Bytes` when comparing start/end bounds (via `cmp_bound`). Their logic powers `range.contains`, `intersect`, and conversions that every range-aware component (memtables, SST views, sorted runs, iterator seek validation) uses.
`slatedb/src/mem_table.rs:36,189`<br>`slatedb/src/batch.rs:153,409`	`SequencedKey::cmp` keeps the memtable `SkipMap` and write-batch `BTreeMap` ordered by ascending user key and descending sequence number. `MemTableIterator::seek` and `WriteBatchIterator::seek` compare `RowEntry.key` / `key.user_key` to `next_key`, and `WriteBatch::remove_ops_by_key` issues `start..=end` range deletes that depend on this ordering.
`slatedb/src/merge_iterator.rs:21-119`	`MergeIterator` keeps a binary heap ordered by `RowEntry.key.cmp` (and reverse seq on ties) and short-circuits child seeks when `next_kv.key >= next_key`. This governs how memtable/WAL/SST iterators are merged for reads and compactions.
`slatedb/src/block_iterator.rs:73-135`<br>`slatedb/src/sst_iter.rs:93-552`<br>`slatedb/src/partitioned_keyspace.rs:13-80`	Block iterators skip entries with `kv.key < next_key`. Above the block layer, `SstView::contains`/`key_exceeds`, `InternalSstIterator::blocks_covering_view`, and `SstIterator::seek` compare requested keys to SST range bounds and block first-keys (via the partitioned keyspace helpers) to jump to the first block whose contents are `>= next_key`.
`slatedb/src/db_state.rs:137-303`<br>`slatedb/src/sorted_run_iterator.rs:176-188`	`SsTableHandle::intersects_range`, `SortedRun::find_sst_with_range_covering_key`, and `table_idx_covering_range` compare SST start keys and requested ranges to decide which files overlap a read/compaction. `SortedRunIterator::seek` then drops whole SSTs whose `first_key < next_key` before seeking inside the current SST iterator.
`slatedb/src/db_iter.rs:56-69`<br>`slatedb/src/db_iter.rs:341-357`	`DbIteratorRangeTracker` maintains the min/max key observed via `<`/`>` comparisons for SSI conflict detection, and `DbIterator::seek` enforces that callers only seek forward and stay inside the configured `BytesRange` by comparing `next_key` to `last_key` and the range bounds.
`slatedb/src/reader.rs:512-528`	The SST-building helper sorts buffered `RowEntry`s with `a.key.cmp(&b.key)` before emitting a table, modeling the requirement that compaction/flush writers provide keys in comparator order.

For use of the comparator in the memtable, we will need to modify the SequencedKey type to include a reference to the comparator and override the Ord implementation to use it. All other locations can use the comparator reference directly.

<!— TOC —><a name=“impact-analysis”></a>

Impact Analysis

SlateDB features and components that this RFC interacts with. Check all that apply.

<!— TOC —><a name=“core-api-query-semantics”></a>

Core API & Query Semantics

Basic KV API (get/put/delete)
Range queries, iterators, seek semantics
Range deletions
Error model, API errors

Range query semantics will be affected by this change since the comparator is used to determine the order of keys in the range.

<!— TOC —><a name=“consistency-isolation-and-multi-versioning”></a>

Consistency, Isolation, and Multi-Versioning

Transactions
Snapshots
Sequence numbers

DbIteratorRangeTracker maintains the min/max key observed via </> comparisons for SSI conflict detection. Since the comparator is used to determine the order of keys, the min/max key observed will be in the order of the comparator (this should be consistent with the query semantics so the change is mostly transparent to the user).

<!— TOC —><a name=“time-retention-and-derived-state”></a>

Time, Retention, and Derived State

N/A - the change should not affect time, retention, and derived state.

<!— TOC —><a name=“metadata-coordination-and-lifecycles”></a>

Metadata, Coordination, and Lifecycles

We will add a new field to the manifest format to store the name of the comparator to serve as a basic validation mechanism that ensures the comparator is not changed after first deployment (this is rudimentary and does not prevent changes if the comparator uses the same name).

<!— TOC —><a name=“compaction”></a>

Compaction

Compactions will need to use the comparator to sort the keys in the SSTs.

<!— TOC —><a name=“storage-engine-internals”></a>

Storage Engine Internals

The indexes and SSTs will be sorted by the comparator

<!— TOC —><a name=“ecosystem-operations”></a>

Ecosystem & Operations

CLI tools
Language bindings (Go/Python/etc)
Observability (metrics/logging/tracing)

<!— TOC —><a name=“operations”></a>

Operations

<!— TOC —><a name=“performance-cost”></a>

Performance & Cost

<!— Describe performance and cost implications of this change. —>

There will potentially be a performance impact in situations that perform many comparisons, such as compactions due to the cost of dynamic dispatch. As part of implementation we will benchmark the performance impact and decide whether or not to use the Option<KeyComparatorType> approach outlined in the implementation section above.

<!— TOC —><a name=“observability”></a>

Observability

<!— Describe any operational changes required to support this change. —>

This will require a new configuration option to set the key comparator.

<!— TOC —><a name=“compatibility”></a>

Compatibility

Existing deployments will set an empty string for the key comparator name in the manifest the first time it is deployed with a version including this RFC. If the field does not exist but the manifest already does, a custom comparator will not be allowed.

<!— TOC —><a name=“testing”></a>

Testing

We should have comprehensive tests to ensure we don’t break the ordering invariant going forward. All main code paths should be covered, in particular queriyng and compactions.

<!— TOC —><a name=“rollout”></a>

Rollout

<!— Describe the plan for rolling out this change to production. —>

Milestones / phases:
Feature flags / opt-in:
Docs updates:

<!— TOC —><a name=“alternatives”></a>

Alternatives

N/A - the only alternative is to not use a custom comparator.

<!— TOC —><a name=“open-questions”></a>

Open Questions

The biggest option question is whether to use the Option<KeyComparatorType> approach or the dyn KeyComparator approach (see implementation section above).

<!— TOC —><a name=“references”></a>

References

<!— Bullet list of related issues, PRs, RFCs, papers, docs, discord discussions, etc. —>

github.com/slatedb/slatedb/issues/663
github.com/crossbeam-rs/crossbeam/issues/1180

<!— TOC —><a name=“updates”></a>

Updates

<!— TOC —><a name=“rejection-analysis”></a>

Rejection Analysis

After discussion among maintainers, this RFC was rejected for the following reasons:

Complexity vs. benefit: Custom comparators add significant implementation complexity (dynamic dispatch on hot paths, validation logic, manifest changes) for marginal benefit (there are alternative ways to structure most keys without custom comparators).
Bug potential: RocksDB and Pebble have experienced bugs and regressions related to custom comparators. The additional code paths and invariants are difficult to test exhaustively.
Industry precedent: FoundationDB, DynamoDB, TiKV and CockroachDB intentionally do not expose custom comparators—bytewise comparison only, by design. Users encode keys to sort correctly using their well-documented tuple layer. This approach has proven successful at scale.
Use cases are solvable with encoding: All motivating use cases can be addressed through proper key encoding:
- Floating points: flip sign bit for positives, flip all bits for negatives
- Composite keys: use 0x00 delimiters or fixed-length components
- Descending order: XOR bytes with 0xFF or subtract from max value
- Geospatial: Z-order and Hilbert curves encode to sortable integers
Documentation over API complexity: Providing clear documentation on key encoding patterns (with links to libraries like storekey) is preferable to exposing a complex API that enables subtle data corruption if misused.

See the Data Modeling documentation for recommended key encoding patterns.

See the Github Discussion for more details.