Skip to content

Uses std::optional<CompressionAlgo> enum in NarInfo#15238

Open
Hrushi20 wants to merge 4 commits intoNixOS:masterfrom
Hrushi20:narInfo-use-compression-algo
Open

Uses std::optional<CompressionAlgo> enum in NarInfo#15238
Hrushi20 wants to merge 4 commits intoNixOS:masterfrom
Hrushi20:narInfo-use-compression-algo

Conversation

@Hrushi20
Copy link

Replaces std::string in NarInfo to use std::optional

Closes #15022

Signed-off-by: Hrushi20 <hrushikesh20thegreat@gmail.com>
@github-actions github-actions bot added the store Issues and pull requests concerning the Nix store label Feb 14, 2026
namespace nix {

// Do we want to add Identity to the list???
#define NIX_FOR_EACH_COMPRESSION_ALGO(MACRO) \
Copy link
Author

@Hrushi20 Hrushi20 Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add Identity as part of CompressionAlgo enum?

Content-Encoding might contain identity as part of http header.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thing is that Content-Encoding should be parsed by a very separate function. It's specified by HTTP spec (or rather refers to another RFC), is case-insensitive and has a much more limited range of supported values + some legacy ones like x-gzip. I was going to rewrite this code to better handle Content-Encodinf in a more compliant way. Better to not use the parsing logic here for now. I suppose I can put up a PR to address that soon and we can rebase your one on top of that?

Or could you hold off on replacing strings with enums in the filetransfer code for now?

Copy link
Author

@Hrushi20 Hrushi20 Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be a major change or is it something I can look into and implement? If you guide me on any resources, I can take a shot at it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be pretty small - it would just take some of your changes from this PR for the decompression sink.

// Empty-method decompression used e.g. by S3 store
// (Content-Encoding == "").
auto method = "";
auto method = CompressionAlgo::none; // Do we handle this in S3 store???
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about cases where we have Content-Encoding as Empty string? Do we want to map it to CompressionAlgo::none?

@xokdvium xokdvium enabled auto-merge February 14, 2026 20:12
@xokdvium xokdvium disabled auto-merge February 14, 2026 20:12
url = value;
else if (name == "Compression")
compression = value;
compression = parseCompressionAlgo(value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this is a behaviour change technically. What if narinfo had an empty string as the value. I find it unlikely since to_string has an assertion, but maybe we can stick with this sneaky behaviour for now (empty string -> std::nullopt -> bzip2 by default)? I lack the historical context to judge whether this would have a potential to break stuff or not.


if (auto * compression = get(obj, "compression"))
res.compression = getString(*compression);
res.compression = parseCompressionAlgo(getString(*compression));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, I guess the no-op change would be to treat empty strings as std::nullopt here? It's highly dubious behaviour though

Comment on lines 64 to 70
auto * ar = archive_write_new();
auto cleanup = Finally{[&ar]() { checkLibArchive(ar, archive_write_close(ar), "failed to close archive: %s"); }};
auto err = archive_write_add_filter_by_name(ar, method.c_str());
auto err = archive_write_add_filter_by_name(
ar, showCompressionAlgo(method.value()).c_str()); // method.value_or(CompressionAlgo::none)
checkLibArchive(ar, err, "failed to get libarchive filter by name: %s");
auto code = archive_filter_code(ar, 0);
return code;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, not this whole function should be possible to delete. It was more of a hack anyway.

Copy link
Contributor

@xokdvium xokdvium Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By this I mean that we can replace this with direct calls to archive_read_support_filter_ calls. And get rid of the stringly typed code completely.

Signed-off-by: Hrushi20 <hrushikesh20thegreat@gmail.com>
Signed-off-by: Hrushi20 <hrushikesh20thegreat@gmail.com>

else if (name == "content-encoding")
encoding = trim(line.substr(i + 1));
encoding = parseCompressionAlgo(trim(line.substr(i + 1)));
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xokdvium, I'm planning to create a function here that parses the content-encoding and returns the CompressionAlgo enum.

Is there an exhaustive list of all the content-encoding? I only see 3 tokens being used in the RFC spec

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's all quite messy, and we are certainly not spec compliant here – we'd want to tighten up our handling here. Good news is: there are not so many non-standard encodings that we'd want to continue supporting.

Basically (all case insensitive parsing): gzip, x-gzip, compress, x-compress (those are in the spec), deflate (doesn't seem like we can support this via libarchive), br, zstd. As for non-standard ones, we can probably stick with supporting bzip2, since it seemingly did appear in practice. xz and everything should probably error out (but we do for whatever reason send it in Accept-Encoding - that was clearly a mistake). If we want to be extra cautious, we can probably keep accepting xz and issue a warning.

Everything else also mostly doesnt work, because we currently error out if libarchive tries to decompress/compress something that hasnt been linked with appropriate dependencies (it tries to fall back to an external program, issues a warning and we treat it as an error).

Stacked encodings when multiple filters are applied (like Content-Encoding: gzip, deflate) should also error out, but can think about supporting those in the future. That will require having multiple chained decompression sinks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also identity. none and empty strings should be rejected - those are the compression algo encodings used by the NAR compression algorithms.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I’ll open another PR to parse Content-Encoding (case-insensitive) and support:
gzip, x-gzip, compress, x-compress, br, zstd, bzip2, and identity.

I’ll error out on unknown encodings and reject stacked encodings for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That you so much! This has been on my radar for a while! Thanks for tackling it ❤️

cache.storeDir, StorePath(hashPart + "-" + namePart), Hash::parseAnyPrefixed(queryNAR.getStr(6)));
narInfo->url = queryNAR.getStr(2);
narInfo->compression = queryNAR.getStr(3);
narInfo->compression = parseCompressionAlgo(queryNAR.getStr(3));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of caution, shouldn't we continue treating empty strings as none. I'm pretty sure the current version of the sqlite db could never have empty strings there, but can't be too careful with these things.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe drop a FIXME so that we could remove that later?

Signed-off-by: Hrushi20 <hrushikesh20thegreat@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

store Issues and pull requests concerning the Nix store

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make NarInfo use CompressionAlgo

2 participants

Comments