Garbage Collection
SlateDB’s garbage collector runs as a background task in the client process, periodically checking for obsolete files in the database storage.
The garbage collector has a configurable minimum age and interval for each file type (WAL SSTs, compacted SSTs, and manifests). Garbage collection for a file type can be disabled by setting its options to None. The collector runs every interval seconds and will delete files older than min_age that are not referenced by any active manifest or checkpoint.
Below is a diagram illustrating the high-level flow of garbage collection in SlateDB:
flowchart TD
A["Start GC Cycle (interval timer)"] --> B[Remove Expired Checkpoints from Manifests]
B --> C[Run WAL SST GC Task]
C --> C1[List WAL SSTs older than last compacted ID]
C1 --> C2[Filter by min_age and active references]
C2 --> C3[Delete eligible WAL SSTs]
B --> D[Run Compacted SST GC Task]
D --> D1[List all compacted and L0 SSTs]
D1 --> D2[Gather active SST IDs from manifests]
D2 --> D3[Delete SSTs not referenced and older than min_age]
B --> E[Run Manifest GC Task]
E --> E1["List all manifests (exclude latest)"]
E1 --> E2[Gather active manifest IDs from checkpoints]
E2 --> E3[Delete manifests not referenced and older than min_age]
C3 & D3 & E3 --> G[Wait for Next Interval or Shutdown]