Cache only the encoding plan to avoid inconsistencies#193
Cache only the encoding plan to avoid inconsistencies#193MircoValentiniECMWF wants to merge 1 commit intodevelopfrom
Conversation
This is only a draft, documentation is not aligned and the code is just meant to do a performance evaluation in order to make informed decisions
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #193 +/- ##
===========================================
+ Coverage 61.93% 62.08% +0.15%
===========================================
Files 303 303
Lines 11674 11645 -29
Branches 1048 1038 -10
===========================================
Hits 7230 7230
+ Misses 4444 4415 -29 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR explores an alternative staged-encoding cache strategy in mars2grib: instead of caching a prepared GRIB sample, it caches only the specialized encoder (layout + execution plan) and reruns encoding during “finalisation” to avoid inconsistencies.
Changes:
- Stop constructing/caching a
preparedSample_insideCoreOperations::CacheEntry. - Change
CoreOperations::finaliseEncoding()to callSpecializedEncoder::encode()rather thanfinaliseEncoding(preparedSample, ...). - Comment out the
preparedSample_member (leaving the staged-cache API shape intact, but with different semantics).
| /// prepares the structure required for a future internal cache design. | ||
| /// | ||
| CacheEntry(Layout&& layout, const MarsDict_t& inputMars, const ParDict_t& inputMisc, const OptDict_t& options) : | ||
| encoder_{std::move(layout)}, preparedSample_{encoder_.prepare(inputMars, inputMisc, options)} {}; | ||
| encoder_{std::move(layout)} {}; |
There was a problem hiding this comment.
CacheEntry no longer prepares/caches an immutable sample, but the constructor’s Doxygen comment still says it “prepares an immutable reusable sample from the active metadata” and the type-level docs describe a two-phase (prepare/finalise) pipeline. Please update the documentation (and potentially the constructor signature/call sites) to reflect that this cache entry now only caches the resolved layout/encoding plan.
| /// during the preparation phase. It is treated as read-only input | ||
| /// during `finaliseEncoding()`, which derives a fresh output handle | ||
| /// from it. | ||
| /// | ||
| const std::unique_ptr<const OutDict_t> preparedSample_; | ||
| // const std::unique_ptr<const OutDict_t> preparedSample_; |
There was a problem hiding this comment.
The preparedSample_ member is commented out, but the surrounding Doxygen still documents it as a required immutable template reused by finaliseEncoding(). This is now misleading; either remove the prepared-sample documentation block entirely or reintroduce a real prepared-sample mechanism consistent with the staged API contract.
| auto [activeMars, activeMisc] = | ||
| normalize_if_enabled(inputMars, inputMisc, options, language, scratchMars, scratchMisc); | ||
|
|
||
| auto gribHeader = | ||
| cacheEntry.encoder_.finaliseEncoding(*(cacheEntry.preparedSample_), activeMars, activeMisc, options); | ||
| cacheEntry.encoder_.encode( activeMars, activeMisc, options); | ||
|
|
There was a problem hiding this comment.
finaliseEncoding() now calls cacheEntry.encoder_.encode(...), which reruns the full encoding pipeline rather than “finalising” from a prepared sample. This changes the staged API semantics (e.g., what work happens in prepare(), when errors surface, and what is actually cached). Please update the Doxygen for prepare()/finaliseEncoding() (and the public API docs in api/Mars2Grib.{h,cc} that describe prepared-sample reuse) to match the new behavior, or consider renaming/reshaping the staged API to reflect “cached plan + full encode”.
| auto gribHeader = | ||
| cacheEntry.encoder_.finaliseEncoding(*(cacheEntry.preparedSample_), activeMars, activeMisc, options); | ||
| cacheEntry.encoder_.encode( activeMars, activeMisc, options); | ||
|
|
||
| return encodeValues(values, activeMisc, options, std::move(gribHeader)); |
There was a problem hiding this comment.
There are no tests covering the staged cache path (prepare() + finaliseEncoding()), and this change materially alters that behavior. Please add regression coverage (e.g., asserting that finaliseEncoding(prepare(mars,misc), values, mars, misc) produces the same header/handle keys as encode(values, mars, misc) for representative inputs, and that calling finaliseEncoding() multiple times with the same cache is stable).
There was a problem hiding this comment.
This cache is still quite slow for AIFS.
This is only a draft, documentation is not aligned and the code is just meant to do a performance evaluation in order to make informed decisions
Description
Contributor Declaration
By opening this pull request, I affirm the following: