Skip to content

Comments

Support read blob values in compressed format#14359

Draft
xingbowang wants to merge 3 commits intofacebook:mainfrom
xingbowang:2026_01_28_blobdb_read_compressed_split
Draft

Support read blob values in compressed format#14359
xingbowang wants to merge 3 commits intofacebook:mainfrom
xingbowang:2026_01_28_blobdb_read_compressed_split

Conversation

@xingbowang
Copy link
Contributor

Summary:
Add a new ReadOptions::read_blob_compressed option that returns blob values
in their raw compressed format, skipping decompression. This enables
efficient blob transfer for tiered storage, backup/restore, and replication
use cases where data can be forwarded in compressed form.

The compression type used for each blob is returned via:
- ReadOptions::blob_compression_types_out (for Get/MultiGet)
- Iterator::GetBlobCompressionType() (for iterator operations)

When enabled, blob cache lookup and insertion are skipped (compressed
caching will be added in a follow-up commit).

Includes C API, Java API, db_bench --read_blob_compressed flag, and
stress test coverage.

Test:
Unit test

xingbowang and others added 3 commits February 19, 2026 16:15
…sion

Add a new ReadOptions::read_blob_compressed option that returns blob values
in their raw compressed format, skipping decompression. This enables
efficient blob transfer for tiered storage, backup/restore, and replication
use cases where data can be forwarded in compressed form.

The compression type used for each blob is returned via:
- ReadOptions::blob_compression_types_out (for Get/MultiGet)
- Iterator::GetBlobCompressionType() (for iterator operations)

When enabled, blob cache lookup and insertion are skipped (compressed
caching will be added in a follow-up commit).

Includes C API, Java API, db_bench --read_blob_compressed flag, and
stress test coverage.
Ensure Get() initializes blob_compression_types_out to kNoCompression on every call so callers do not observe stale compression metadata on inline or not-found reads. Add targeted BlobDB regression tests for inline and not-found cases, and gate stress read_blob_compressed toggles to kNoCompression blob configurations to avoid false mismatches when validating uncompressed expected values.

Co-authored-by: Cursor <cursoragent@cursor.com>
Bug fixes:
- Fix silent data corruption when merge operator encounters compressed
  blob base values during iteration. MergeWithBlobBaseValue now
  temporarily disables read_blob_compressed so the merge operator
  always receives uncompressed data.
- Fix out-of-bounds access in C API Get copy-back that accessed
  blob_compression_types_storage[0] without checking the vector was
  non-empty.
- Add missing compression type copy-back in rocksdb_batched_multi_get_cf
  and rocksdb_batched_multi_get_cf_slice, which silently returned
  stale/zero compression type values.

API refinement:
- Extract CopyBlobCompressionTypesForGet/MultiGet helpers in C API,
  replacing 6 copies of duplicated copy-back code.
- Extract MaybeReadBlobCompressed helper in stress tests, replacing
  ~10 copies of duplicated gating logic.
- Document Java API limitation: compression type output for Get/MultiGet
  is not yet available; use Iterator.getBlobCompressionType() instead.

Test coverage:
- Add mixed blob/inline/missing MultiGet test with compression type
  output verification.

Co-authored-by: Cursor <cursoragent@cursor.com>
@meta-cla meta-cla bot added the CLA Signed label Feb 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant