Skip to content

Compression

SlateDB compresses SST data in two different ways. One is built into the row format and is always on. The other is an optional codec you choose for WAL SSTs and regular SSTs.

  • Prefix compression happens inside each SST data block. SstRowCodecV2 encodes each key relative to the previous key, with restart points that store full keys so readers can still seek within the block. This reduces repeated key bytes even when block compression is disabled. Data Modeling covers the effect on key layout.
  • Block compression is controlled by Settings::compression_codec and CompressionCodec. This applies to all blocks (data, indexes, filters, and stats).

The available codecs are a build-time choice. The slatedb crate exposes Snappy, Zlib, LZ4, and Zstd behind Cargo features. The compression feature enables Snappy. The zlib, lz4, and zstd features enable those codecs directly.

The selected codec is stored in each SST’s metadata. Changing compression_codec only affects new WAL and SST files. Existing files keep the codec they were written with, so a database can temporarily contain a mix of uncompressed files and files written with different codecs while flush and compaction rewrite older data.

Compression reduces object-store bytes, network transfer, and cache footprint. That usually helps storage cost and scan-heavy workloads. The cost is CPU on both write and read.

Block size changes the tradeoff. Larger blocks usually compress better. Smaller blocks reduce over-read on point lookups. The Tuning page covers the operational knobs.

Prefix compression has its own tradeoff. It helps most when adjacent sorted keys share long prefixes, because that reduces repeated key bytes inside a block. The cost is extra decode work within each restart region: after seeking to a restart point, SlateDB might need to reconstruct several keys before it reaches the target entry.