Cache only the encoding plan to avoid inconsistencies by MircoValentiniECMWF · Pull Request #193 · ecmwf/metkit

MircoValentiniECMWF · 2026-04-08T14:01:18Z

This is only a draft, documentation is not aligned and the code is just meant to do a performance evaluation in order to make informed decisions

Description

Contributor Declaration

By opening this pull request, I affirm the following:

All authors agree to the Contributor License Agreement.
The code follows the project's coding standards.
I have performed self-review and added comments where needed.
I have added or updated tests to verify that my changes are effective and functional.
I have run all existing tests and confirmed they pass.

This is only a draft, documentation is not aligned and the code is just meant to do a performance evaluation in order to make informed decisions

codecov-commenter · 2026-04-08T14:06:04Z

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 62.08%. Comparing base (8f01f7e) to head (5d41932).

Files with missing lines	Patch %	Lines
src/metkit/mars2grib/CoreOperations.h	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop     #193      +/-   ##
===========================================
+ Coverage    61.93%   62.08%   +0.15%     
===========================================
  Files          303      303              
  Lines        11674    11645      -29     
  Branches      1048     1038      -10     
===========================================
  Hits          7230     7230              
+ Misses        4444     4415      -29

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

This PR explores an alternative staged-encoding cache strategy in mars2grib: instead of caching a prepared GRIB sample, it caches only the specialized encoder (layout + execution plan) and reruns encoding during “finalisation” to avoid inconsistencies.

Changes:

Stop constructing/caching a preparedSample_ inside CoreOperations::CacheEntry.
Change CoreOperations::finaliseEncoding() to call SpecializedEncoder::encode() rather than finaliseEncoding(preparedSample, ...).
Comment out the preparedSample_ member (leaving the staged-cache API shape intact, but with different semantics).

Copilot · 2026-04-08T14:08:34Z

src/metkit/mars2grib/CoreOperations.h

        /// prepares the structure required for a future internal cache design.
        ///
        CacheEntry(Layout&& layout, const MarsDict_t& inputMars, const ParDict_t& inputMisc, const OptDict_t& options) :
-            encoder_{std::move(layout)}, preparedSample_{encoder_.prepare(inputMars, inputMisc, options)} {};
+            encoder_{std::move(layout)} {};


CacheEntry no longer prepares/caches an immutable sample, but the constructor’s Doxygen comment still says it “prepares an immutable reusable sample from the active metadata” and the type-level docs describe a two-phase (prepare/finalise) pipeline. Please update the documentation (and potentially the constructor signature/call sites) to reflect that this cache entry now only caches the resolved layout/encoding plan.

Copilot · 2026-04-08T14:08:35Z

src/metkit/mars2grib/CoreOperations.h

        /// during the preparation phase. It is treated as read-only input
        /// during `finaliseEncoding()`, which derives a fresh output handle
        /// from it.
        ///
-        const std::unique_ptr<const OutDict_t> preparedSample_;
+        // const std::unique_ptr<const OutDict_t> preparedSample_;


The preparedSample_ member is commented out, but the surrounding Doxygen still documents it as a required immutable template reused by finaliseEncoding(). This is now misleading; either remove the prepared-sample documentation block entirely or reintroduce a real prepared-sample mechanism consistent with the staged API contract.

Copilot · 2026-04-08T14:08:35Z

src/metkit/mars2grib/CoreOperations.h

            auto [activeMars, activeMisc] =
                normalize_if_enabled(inputMars, inputMisc, options, language, scratchMars, scratchMisc);

            auto gribHeader =
-                cacheEntry.encoder_.finaliseEncoding(*(cacheEntry.preparedSample_), activeMars, activeMisc, options);
+                cacheEntry.encoder_.encode( activeMars, activeMisc, options);



finaliseEncoding() now calls cacheEntry.encoder_.encode(...), which reruns the full encoding pipeline rather than “finalising” from a prepared sample. This changes the staged API semantics (e.g., what work happens in prepare(), when errors surface, and what is actually cached). Please update the Doxygen for prepare()/finaliseEncoding() (and the public API docs in api/Mars2Grib.{h,cc} that describe prepared-sample reuse) to match the new behavior, or consider renaming/reshaping the staged API to reflect “cached plan + full encode”.

Copilot · 2026-04-08T14:08:35Z

src/metkit/mars2grib/CoreOperations.h

            auto gribHeader =
-                cacheEntry.encoder_.finaliseEncoding(*(cacheEntry.preparedSample_), activeMars, activeMisc, options);
+                cacheEntry.encoder_.encode( activeMars, activeMisc, options);

            return encodeValues(values, activeMisc, options, std::move(gribHeader));


There are no tests covering the staged cache path (prepare() + finaliseEncoding()), and this change materially alters that behavior. Please add regression coverage (e.g., asserting that finaliseEncoding(prepare(mars,misc), values, mars, misc) produces the same header/handle keys as encode(values, mars, misc) for representative inputs, and that calling finaliseEncoding() multiple times with the same cache is stable).

This cache is still quite slow for AIFS.

Cache only the encoding plan to avoid inconsistencies

5d41932

This is only a draft, documentation is not aligned and the code is just meant to do a performance evaluation in order to make informed decisions

MircoValentiniECMWF requested review from Copilot, dsarmany, pgeier and tweska April 8, 2026 14:02

Copilot started reviewing on behalf of MircoValentiniECMWF April 8, 2026 14:03 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache only the encoding plan to avoid inconsistencies#193

Cache only the encoding plan to avoid inconsistencies#193
MircoValentiniECMWF wants to merge 1 commit intodevelopfrom
feature/safe-cache

MircoValentiniECMWF commented Apr 8, 2026

Uh oh!

codecov-commenter commented Apr 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

dsarmany Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

MircoValentiniECMWF commented Apr 8, 2026

Description

Contributor Declaration

Uh oh!

codecov-commenter commented Apr 8, 2026

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

dsarmany Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants