# Custom Comparators Status: Rejected Authors: * [Almog Gavra](https://github.com/agavra) > [!CAUTION] > This RFC was rejected. See the [Rejection Analysis](#rejection-analysis) for more details. ## Summary & Motivation This RFC proposes a new API for customizing the ordering of keys in SlateDB. The ordering of keys in Slate is a critical component of the performance characteristics of the database and its ability to efficiently serve range queries. The current implementation of SlateDB uses a lexicographic ordering of keys. This is a good default for many use cases, but it may not always be the best choice. Some examples of use cases where a different ordering is desirable include: - Correct semantics for non-string keys (e.g. sorting floating point numbers) - Sorting composite keys (e.g. a composite key of `[user_id, timestamp]` may sort ascending by `user_id` and descending by `timestamp`) - Domain-specific ordering (e.g. Z-ordering for geographic data) ## Goals - Provide a new API for customizing the ordering of keys in SlateDB - Maintain backwards compatibility by defaulting to the existing lexicographic ordering - Best-effort enforcing of the ordering invariant (e.g. a comparator doesn't change after first deployment) ## Non-Goals - Providing default implementations for non-lexicographic use cases - Supporting changes to the ordering invariant after first deployment ## Design ### API We will introduce a new interface for customizing the ordering of keys in SlateDB, the `KeyComparator` trait: ```rust use std::cmp::Ordering; use std::sync::Arc; /// Custom key comparison trait. /// WARNING: Changing comparators on existing DB causes corruption. pub trait KeyComparator: Send + Sync + std::fmt::Debug { /// Compare two keys. fn compare(&self, a: &[u8], b: &[u8]) -> Ordering; /// Unique identifier stored in manifest for validation. fn name(&self) -> &str; } pub type KeyComparatorType = Arc; ``` This will be passed in when constructing a new `Db` instance with the `DbBuilder`: ```rust let db = DbBuilder::new(path) // ... .with_key_comparator(key_comparator) .build(); ``` ### Manifest We will introduce a new field to the manifest format to store the name of the comparator: ```proto table ManifestV1 { // ... key_comparator_name: string; } ``` This field will be used to validate that the comparator is not changed after first deployment. if it is set in the manifest, a comparator with the same name must be used to create the database in runtime (no comparator will also cause a panic). If it is not set and the manifest already exists, the database will panic if a custom comparator is supplied at runtime. ### Implementation We'll need to modify each of these places to use the new `KeyComparator` passed in from the `DbBuilder`. Note that because it's a trait, we'll need to use dynamic dispatch to call the comparator. > [!NOTE] > Since comparisons happen on a hot path, we may want to avoid using the `dyn KeyComparator` > if not necessary to avoid the overhead of dynamic dispatch. Instead, we can choose to > use an `Option` to store the comparator and inline the default > lexicographic comparator if not set. > > The alternative would be to always use the `dyn KeyComparator` and just implement a default > implementation of it. This would be simpler, but it would add a small overhead to every > comparison. The following table was generated via an agent, but it seems to be accurate as far as I can tell: | Location | Usage | | --- | --- | | `slatedb/src/bytes_range.rs:8`
`slatedb/src/comparable_range.rs:8` | `BytesRange` and its `ComparableRange` backing types rely on the default `Ord` implementation for `Bytes` when comparing start/end bounds (via `cmp_bound`). Their logic powers `range.contains`, `intersect`, and conversions that every range-aware component (memtables, SST views, sorted runs, iterator seek validation) uses. | | `slatedb/src/mem_table.rs:36,189`
`slatedb/src/batch.rs:153,409` | `SequencedKey::cmp` keeps the memtable `SkipMap` and write-batch `BTreeMap` ordered by ascending user key and descending sequence number. `MemTableIterator::seek` and `WriteBatchIterator::seek` compare `RowEntry.key` / `key.user_key` to `next_key`, and `WriteBatch::remove_ops_by_key` issues `start..=end` range deletes that depend on this ordering. | | `slatedb/src/merge_iterator.rs:21-119` | `MergeIterator` keeps a binary heap ordered by `RowEntry.key.cmp` (and reverse seq on ties) and short-circuits child seeks when `next_kv.key >= next_key`. This governs how memtable/WAL/SST iterators are merged for reads and compactions. | | `slatedb/src/block_iterator.rs:73-135`
`slatedb/src/sst_iter.rs:93-552`
`slatedb/src/partitioned_keyspace.rs:13-80` | Block iterators skip entries with `kv.key < next_key`. Above the block layer, `SstView::contains`/`key_exceeds`, `InternalSstIterator::blocks_covering_view`, and `SstIterator::seek` compare requested keys to SST range bounds and block first-keys (via the partitioned keyspace helpers) to jump to the first block whose contents are `>= next_key`. | | `slatedb/src/db_state.rs:137-303`
`slatedb/src/sorted_run_iterator.rs:176-188` | `SsTableHandle::intersects_range`, `SortedRun::find_sst_with_range_covering_key`, and `table_idx_covering_range` compare SST start keys and requested ranges to decide which files overlap a read/compaction. `SortedRunIterator::seek` then drops whole SSTs whose `first_key < next_key` before seeking inside the current SST iterator. | | `slatedb/src/db_iter.rs:56-69`
`slatedb/src/db_iter.rs:341-357` | `DbIteratorRangeTracker` maintains the min/max key observed via `<`/`>` comparisons for SSI conflict detection, and `DbIterator::seek` enforces that callers only seek forward and stay inside the configured `BytesRange` by comparing `next_key` to `last_key` and the range bounds. | | `slatedb/src/reader.rs:512-528` | The SST-building helper sorts buffered `RowEntry`s with `a.key.cmp(&b.key)` before emitting a table, modeling the requirement that compaction/flush writers provide keys in comparator order. | For use of the comparator in the memtable, we will need to modify the `SequencedKey` type to include a reference to the comparator and override the `Ord` implementation to use it. All other locations can use the comparator reference directly. ## Impact Analysis SlateDB features and components that this RFC interacts with. Check all that apply. ### Core API & Query Semantics - [ ] Basic KV API (`get`/`put`/`delete`) - [x] Range queries, iterators, seek semantics - [x] Range deletions - [ ] Error model, API errors Range query semantics will be affected by this change since the comparator is used to determine the order of keys in the range. ### Consistency, Isolation, and Multi-Versioning - [x] Transactions - [ ] Snapshots - [ ] Sequence numbers `DbIteratorRangeTracker` maintains the min/max key observed via comparisons for SSI conflict detection. Since the comparator is used to determine the order of keys, the min/max key observed will be in the order of the comparator (this should be consistent with the query semantics so the change is mostly transparent to the user). ### Time, Retention, and Derived State - [ ] Logical clocks - [ ] Time to live (TTL) - [ ] Compaction filters - [ ] Merge operator - [ ] Change Data Capture (CDC) N/A - the change should not affect time, retention, and derived state. ### Metadata, Coordination, and Lifecycles - [x] Manifest format - [ ] Checkpoints - [ ] Clones - [ ] Garbage collection - [ ] Database splitting and merging - [ ] Multi-writer We will add a new field to the manifest format to store the name of the comparator to serve as a basic validation mechanism that ensures the comparator is not changed after first deployment (this is rudimentary and does not prevent changes if the comparator uses the same name). ### Compaction - [x] Compaction state persistence - [ ] Compaction filters - [ ] Compaction strategies - [ ] Distributed compaction - [ ] Compactions format Compactions will need to use the comparator to sort the keys in the SSTs. ### Storage Engine Internals - [ ] Write-ahead log (WAL) - [ ] Block cache - [ ] Object store cache - [x] Indexing (bloom filters, metadata) - [x] SST format or block format The indexes and SSTs will be sorted by the comparator ### Ecosystem & Operations - [ ] CLI tools - [ ] Language bindings (Go/Python/etc) - [ ] Observability (metrics/logging/tracing) ## Operations ### Performance & Cost There will potentially be a performance impact in situations that perform many comparisons, such as compactions due to the cost of dynamic dispatch. As part of implementation we will benchmark the performance impact and decide whether or not to use the `Option` approach outlined in the implementation section above. ### Observability This will require a new configuration option to set the key comparator. ### Compatibility Existing deployments will set an empty string for the key comparator name in the manifest the first time it is deployed with a version including this RFC. If the field does not exist but the manifest already does, a custom comparator will not be allowed. ## Testing We should have comprehensive tests to ensure we don't break the ordering invariant going forward. All main code paths should be covered, in particular queriyng and compactions. ## Rollout - Milestones / phases: - Feature flags / opt-in: - Docs updates: ## Alternatives N/A - the only alternative is to not use a custom comparator. ## Open Questions The biggest option question is whether to use the `Option` approach or the `dyn KeyComparator` approach (see implementation section above). ## References - github.com/slatedb/slatedb/issues/663 - github.com/crossbeam-rs/crossbeam/issues/1180 ## Updates ## Rejection Analysis After discussion among maintainers, this RFC was rejected for the following reasons: 1. **Complexity vs. benefit**: Custom comparators add significant implementation complexity (dynamic dispatch on hot paths, validation logic, manifest changes) for marginal benefit (there are alternative ways to structure most keys without custom comparators). 2. **Bug potential**: RocksDB and Pebble have experienced bugs and regressions related to custom comparators. The additional code paths and invariants are difficult to test exhaustively. 3. **Industry precedent**: FoundationDB, DynamoDB, TiKV and CockroachDB intentionally do not expose custom comparators—bytewise comparison only, by design. Users encode keys to sort correctly using their well-documented tuple layer. This approach has proven successful at scale. 4. **Use cases are solvable with encoding**: All motivating use cases can be addressed through proper key encoding: - Floating points: flip sign bit for positives, flip all bits for negatives - Composite keys: use `0x00` delimiters or fixed-length components - Descending order: XOR bytes with `0xFF` or subtract from max value - Geospatial: Z-order and Hilbert curves encode to sortable integers 5. **Documentation over API complexity**: Providing clear documentation on key encoding patterns (with links to libraries like [storekey](https://github.com/surrealdb/storekey)) is preferable to exposing a complex API that enables subtle data corruption if misused. See the [Data Modeling documentation](https://slatedb.io/docs/operations/data-modeling) for recommended key encoding patterns. See the [Github Discussion](https://github.com/slatedb/slatedb/pull/1162) for more details.