Support read blob values in compressed format#14359
Draft
xingbowang wants to merge 3 commits intofacebook:mainfrom
Draft
Support read blob values in compressed format#14359xingbowang wants to merge 3 commits intofacebook:mainfrom
xingbowang wants to merge 3 commits intofacebook:mainfrom
Conversation
…sion Add a new ReadOptions::read_blob_compressed option that returns blob values in their raw compressed format, skipping decompression. This enables efficient blob transfer for tiered storage, backup/restore, and replication use cases where data can be forwarded in compressed form. The compression type used for each blob is returned via: - ReadOptions::blob_compression_types_out (for Get/MultiGet) - Iterator::GetBlobCompressionType() (for iterator operations) When enabled, blob cache lookup and insertion are skipped (compressed caching will be added in a follow-up commit). Includes C API, Java API, db_bench --read_blob_compressed flag, and stress test coverage.
Ensure Get() initializes blob_compression_types_out to kNoCompression on every call so callers do not observe stale compression metadata on inline or not-found reads. Add targeted BlobDB regression tests for inline and not-found cases, and gate stress read_blob_compressed toggles to kNoCompression blob configurations to avoid false mismatches when validating uncompressed expected values. Co-authored-by: Cursor <cursoragent@cursor.com>
Bug fixes: - Fix silent data corruption when merge operator encounters compressed blob base values during iteration. MergeWithBlobBaseValue now temporarily disables read_blob_compressed so the merge operator always receives uncompressed data. - Fix out-of-bounds access in C API Get copy-back that accessed blob_compression_types_storage[0] without checking the vector was non-empty. - Add missing compression type copy-back in rocksdb_batched_multi_get_cf and rocksdb_batched_multi_get_cf_slice, which silently returned stale/zero compression type values. API refinement: - Extract CopyBlobCompressionTypesForGet/MultiGet helpers in C API, replacing 6 copies of duplicated copy-back code. - Extract MaybeReadBlobCompressed helper in stress tests, replacing ~10 copies of duplicated gating logic. - Document Java API limitation: compression type output for Get/MultiGet is not yet available; use Iterator.getBlobCompressionType() instead. Test coverage: - Add mixed blob/inline/missing MultiGet test with compression type output verification. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Add a new ReadOptions::read_blob_compressed option that returns blob values
in their raw compressed format, skipping decompression. This enables
efficient blob transfer for tiered storage, backup/restore, and replication
use cases where data can be forwarded in compressed form.
Test:
Unit test