Uses std::optional<CompressionAlgo> enum in NarInfo#15238
Uses std::optional<CompressionAlgo> enum in NarInfo#15238Hrushi20 wants to merge 4 commits intoNixOS:masterfrom
Conversation
Signed-off-by: Hrushi20 <hrushikesh20thegreat@gmail.com>
| namespace nix { | ||
|
|
||
| // Do we want to add Identity to the list??? | ||
| #define NIX_FOR_EACH_COMPRESSION_ALGO(MACRO) \ |
There was a problem hiding this comment.
Do we want to add Identity as part of CompressionAlgo enum?
Content-Encoding might contain identity as part of http header.
There was a problem hiding this comment.
Thing is that Content-Encoding should be parsed by a very separate function. It's specified by HTTP spec (or rather refers to another RFC), is case-insensitive and has a much more limited range of supported values + some legacy ones like x-gzip. I was going to rewrite this code to better handle Content-Encodinf in a more compliant way. Better to not use the parsing logic here for now. I suppose I can put up a PR to address that soon and we can rebase your one on top of that?
Or could you hold off on replacing strings with enums in the filetransfer code for now?
There was a problem hiding this comment.
Would this be a major change or is it something I can look into and implement? If you guide me on any resources, I can take a shot at it.
There was a problem hiding this comment.
I think it would be pretty small - it would just take some of your changes from this PR for the decompression sink.
| // Empty-method decompression used e.g. by S3 store | ||
| // (Content-Encoding == ""). | ||
| auto method = ""; | ||
| auto method = CompressionAlgo::none; // Do we handle this in S3 store??? |
There was a problem hiding this comment.
What about cases where we have Content-Encoding as Empty string? Do we want to map it to CompressionAlgo::none?
src/libstore/nar-info.cc
Outdated
| url = value; | ||
| else if (name == "Compression") | ||
| compression = value; | ||
| compression = parseCompressionAlgo(value); |
There was a problem hiding this comment.
Hm, this is a behaviour change technically. What if narinfo had an empty string as the value. I find it unlikely since to_string has an assertion, but maybe we can stick with this sneaky behaviour for now (empty string -> std::nullopt -> bzip2 by default)? I lack the historical context to judge whether this would have a potential to break stuff or not.
src/libstore/nar-info.cc
Outdated
|
|
||
| if (auto * compression = get(obj, "compression")) | ||
| res.compression = getString(*compression); | ||
| res.compression = parseCompressionAlgo(getString(*compression)); |
There was a problem hiding this comment.
ditto, I guess the no-op change would be to treat empty strings as std::nullopt here? It's highly dubious behaviour though
| auto * ar = archive_write_new(); | ||
| auto cleanup = Finally{[&ar]() { checkLibArchive(ar, archive_write_close(ar), "failed to close archive: %s"); }}; | ||
| auto err = archive_write_add_filter_by_name(ar, method.c_str()); | ||
| auto err = archive_write_add_filter_by_name( | ||
| ar, showCompressionAlgo(method.value()).c_str()); // method.value_or(CompressionAlgo::none) | ||
| checkLibArchive(ar, err, "failed to get libarchive filter by name: %s"); | ||
| auto code = archive_filter_code(ar, 0); | ||
| return code; |
There was a problem hiding this comment.
Hm, not this whole function should be possible to delete. It was more of a hack anyway.
There was a problem hiding this comment.
By this I mean that we can replace this with direct calls to archive_read_support_filter_ calls. And get rid of the stringly typed code completely.
Signed-off-by: Hrushi20 <hrushikesh20thegreat@gmail.com>
Signed-off-by: Hrushi20 <hrushikesh20thegreat@gmail.com>
|
|
||
| else if (name == "content-encoding") | ||
| encoding = trim(line.substr(i + 1)); | ||
| encoding = parseCompressionAlgo(trim(line.substr(i + 1))); |
There was a problem hiding this comment.
It's all quite messy, and we are certainly not spec compliant here – we'd want to tighten up our handling here. Good news is: there are not so many non-standard encodings that we'd want to continue supporting.
Basically (all case insensitive parsing): gzip, x-gzip, compress, x-compress (those are in the spec), deflate (doesn't seem like we can support this via libarchive), br, zstd. As for non-standard ones, we can probably stick with supporting bzip2, since it seemingly did appear in practice. xz and everything should probably error out (but we do for whatever reason send it in Accept-Encoding - that was clearly a mistake). If we want to be extra cautious, we can probably keep accepting xz and issue a warning.
Everything else also mostly doesnt work, because we currently error out if libarchive tries to decompress/compress something that hasnt been linked with appropriate dependencies (it tries to fall back to an external program, issues a warning and we treat it as an error).
Stacked encodings when multiple filters are applied (like Content-Encoding: gzip, deflate) should also error out, but can think about supporting those in the future. That will require having multiple chained decompression sinks.
There was a problem hiding this comment.
Also identity. none and empty strings should be rejected - those are the compression algo encodings used by the NAR compression algorithms.
There was a problem hiding this comment.
Makes sense. I’ll open another PR to parse Content-Encoding (case-insensitive) and support:
gzip, x-gzip, compress, x-compress, br, zstd, bzip2, and identity.
I’ll error out on unknown encodings and reject stacked encodings for now.
There was a problem hiding this comment.
That you so much! This has been on my radar for a while! Thanks for tackling it ❤️
| cache.storeDir, StorePath(hashPart + "-" + namePart), Hash::parseAnyPrefixed(queryNAR.getStr(6))); | ||
| narInfo->url = queryNAR.getStr(2); | ||
| narInfo->compression = queryNAR.getStr(3); | ||
| narInfo->compression = parseCompressionAlgo(queryNAR.getStr(3)); |
There was a problem hiding this comment.
Out of caution, shouldn't we continue treating empty strings as none. I'm pretty sure the current version of the sqlite db could never have empty strings there, but can't be too careful with these things.
There was a problem hiding this comment.
Maybe drop a FIXME so that we could remove that later?
Signed-off-by: Hrushi20 <hrushikesh20thegreat@gmail.com>
Replaces std::string in NarInfo to use std::optional
Closes #15022