Skip to content

Exclude development scripts from published package#137

Merged
Manishearth merged 3 commits intorust-fuzz:mainfrom
GiGainfosystems:exclude_scripts
Feb 3, 2026
Merged

Exclude development scripts from published package#137
Manishearth merged 3 commits intorust-fuzz:mainfrom
GiGainfosystems:exclude_scripts

Conversation

@weiznich
Copy link
Contributor

During a dependency review we noticed that the libfuzzer crate includes various development scripts. These development scripts shouldn't be there as they might, at some point become problematic. As of now they prevent any downstream user from enabling the [bans.build.interpreted] option of cargo deny.

I opted for using an explicit include list instead of an exclude list to prevent these files from being included in the published packages to make sure that everything that's included is an conscious choice.

During a dependency review we noticed that the libfuzzer crate includes various development scripts. These development scripts shouldn't be there as they might, at some point become problematic. As of now they prevent any downstream user from enabling the `[bans.build.interpreted]` option of cargo deny.

I opted for using an explicit include list instead of an exclude list to prevent these files from being included in the published packages to make sure that everything that's included is an conscious choice.
@Manishearth
Copy link
Member

@weiznich
Copy link
Contributor Author

weiznich commented Dec 16, 2025

Would it help you to add an CI job that verifies that everything is included by executing cargo package --all-features?

To be transparent for the reason why I'm so keen to rather use include than exclude: I'm doing that dependency review thing now for a while and for crates that use exclude I notice that they regress relatively often by including another unwanted or even unexpected file. You prevent that by using include.

@Manishearth
Copy link
Member

Would it help you to add an CI job that verifies that everything is included by executing cargo package --all-features?

Not in this case, because I'm not sure about the cross platform build. I might just defer to a different reviewer for this.

To be transparent for the reason why I'm so keen to rather use include than exclude: I'm doing that dependency review thing now for a while and for crates that use exclude I notice that they regress relatively often by including another unwanted or even unexpected file. You prevent that by using include.

That's reasonable, but then it would be much nicer for the reviewers to list exactly what gets excluded in the PR body.

@weiznich
Copy link
Contributor Author

Not in this case, because I'm not sure about the cross platform build. I might just defer to a different reviewer for this.

I added some CI jobs to verify this for all platforms supported by github actions. I made sure that these jobs pass in my fork. I would also like to add that I maintain several sys crates with bundling support myself (pq-src, mysqlclient-src), I have created others (like gdal-src) and contributes to even more of them (e.g. openssl-src), so I'm generally aware of the problems there. That doesn't mean that I'm right here, that's just for context.

If you feel that the target selection is not sufficient I'm happy to add more targets. I think anything from openssl-src shouldn't be too hard to add.

On a conceptual level: The build script adds all cpp files in libfuzzer to the cc call. So these need to be there.

libfuzzer/build.rs

Lines 26 to 29 in 217dc97

.expect("listable source directory")
.map(|de| de.expect("file in directory").path())
.filter(|p| p.extension().map(|ext| ext == "cpp") == Some(true))
.collect::<Vec<_>>();

I also included the h files and the def files as they seemed to be required as well. That shows up as build error during local testing. With that out of the way there is still the possibility that libfuzzer does something unexpected on some random niche platform, I personally wouldn't expect that.

That's reasonable, but then it would be much nicer for the reviewers to list exactly what gets excluded in the PR body.

I'm sorry to not have done that in the first place. I've opened around 50 similar PR's today, so I went with the bare minimum to keep it somewhat manageable on my side. My expectation there was that most people know what's in their repos

The scripts reported by cargo deny are:

  • ci/script.sh
  • libfuzzer/build.sh
  • libfuzzer/scripts/unbalanced_allocs.py
  • update-libfuzzer.sh

For reference that's the output of a diff between what's in the package before and after the change. (Got the files via cargo package --list)

diff /tmp/old /tmp/new                                         
2,3d1
< .github/workflows/rust.yml
< .gitignore
12,13d9
< ci/script.sh
< libfuzzer/CMakeLists.txt
63d58
< libfuzzer/README.txt
65d59
< libfuzzer/build.sh
69,71d62
< libfuzzer/scripts/unbalanced_allocs.py
< libfuzzer/standalone/StandaloneFuzzTargetMain.c
< libfuzzer/tests/CMakeLists.txt
74d64
< rust-toolchain
76d65
< update-libfuzzer.sh

As far as I can tell this only includes unused files, althoug libfuzzer/README.txt might be relevant to include?

@Manishearth
Copy link
Member

I also included the h files and the def files as they seemed to be required as well. That shows up as build error during local testing. With that out of the way there is still the possibility that libfuzzer does something unexpected on some random niche platform, I personally wouldn't expect that.

Yeah, as far as I can tell the build isn't invoking any makefiles (there aren't any), and I can't see the cc crate calling .sh files unless they're some autoconf thing (which these aren't).

So that seems fine!

I'm sorry to not have done that in the first place. I've opened around 50 similar PR's today, so I went with the bare minimum to keep it somewhat manageable on my side. My expectation there was that most people know what's in their repos

Reasonable. On the receiving end it's a bunch of work if you maintain a lot of repositories, especially ones you're not always looking at, and this is stuff that IME is rather easy to accidentally break and not notice until people come complaining to you after a release.

As far as I can tell this only includes unused files, althoug libfuzzer/README.txt might be relevant to include?

Yes, please.

I think for this crate in particular I would prefer an explicit exclude, since it vendors another package that we do not maintain. If more things get included in the future I would actually like that to require a cargo deny user to come ask us to fix that, because it might end up in a different call than we made here.

(A way to automate that would be to run cargo deny on the cargo package output)

@weiznich
Copy link
Contributor Author

On the receiving end it's a bunch of work if you maintain a lot of repositories, especially ones you're not always looking at, and this is stuff that IME is rather easy to accidentally break and not notice until people come complaining to you after a release.

No worries, I always expected that some of the cases might need more discussion than others. So it's really fine to clarify that and find the best solution for an particular case. It's not like there is only one possible way to resolve this.

I think for this crate in particular I would prefer an explicit exclude

I pushed another update that switches to exclude with all (beside the README.txt) files listed in the diff above. I nevertheless would let the CI jobs there so that you can be a bit more sure everything works in the future?


As final semi-related note: Given that you are part of the relevant rust project teams: It might be good to have an official documented position about whether to include tests or not from from the published package. More general speaking it might also desirable to switch to in include based approach (with a reasonable default) in a future rust edition to make it explicit what's in the published crate. Obviously the default won't work for all crates, but it might be better than the current status. You just find too much things in published packages that shouldn't be there. (If necessary I can certainly provide examples for that)

@Manishearth
Copy link
Member

Manishearth commented Dec 16, 2025

Note: I don't actually think I'm a part of any of the relevant Rust project teams here.

It might be good to have an official documented position about whether to include tests or not from from the published package

Maybe, but honestly there are too many disparate pressures in opposite directions for there to be a good consensus on that. And were we to have that discussion I strongly suspect it would fall on the side of "include as much as you can".

Similarly for the include-based default, I think the exclude-based default is deliberate because by and large the cost of accidentally forgetting to include the right file is worse.

The cost of not including the right file can be broken builds (yes, even though cargo publish checks the build), since suddenly your CI'd configuration is not what is actually published. Seen this happen a bunch of times and it's always frustrating.

The cost of not excluding the right file can be

  • Licensing issues for the package (should be manually resolved anyway!)
  • Packages that are overly large (not usually a blocker, but fixable)
  • Issues with tools like cargo-deny. People using those tools can ask their dependencies to exclude problematic files

I've been in enough situations where I've had to ask upstream to exclude files for those reasons, but I think it does make sense that it's an explicit ask rather than an implicit thing managed by tooling because it has almsot never been a situation where I 100% know for sure what the maintainer's preference will be.

The core problem is that there isn't and won't ever really be a consensus on what constitutes a "problematic file". Large files are often problematic but sometimes necessary for tests. Same goes for test files under a different license. Additional dev scripts are usually unnecessary but some packagers like having them.

There's a lot of people packaging crates.io packages into other places (distros, or vendored sources), where they actually do want various things like tests and dev scripts. It's always a tradeoff, and I'm not really convinced there's a one size fits all answer there.

@weiznich
Copy link
Contributor Author

weiznich commented Feb 3, 2026

@Manishearth We still wait for the opinions of the other maintainers here? Who would be the right person to gently ping to get a second opinion?

@Manishearth
Copy link
Member

@nagisa , but I'm fine merging this.

@Manishearth Manishearth merged commit 72a4b1b into rust-fuzz:main Feb 3, 2026
8 checks passed
@nagisa
Copy link
Member

nagisa commented Feb 3, 2026

I feel like I've written my thoughts on this before (here). I find it somewhat unfortunate that the concerns I've listed there didn't really transfer to the rest of similar such contributions.

I believe not including the full repository contents in the distributed crates is in direct conflict of vendoring crates to e.g. enable development on an airplane; or as a plain archival tool to enable recovery after github gets sold to broadcom (and we all get rug pulled from under us).

At the same time, eh, I don't really care about those issues badly enough to do anything more about them than write a comment or two with vague concerns 🤷 So in that sense I think its fine to keep this.

@Manishearth
Copy link
Member

Yeah, overall I think that that cargo-deny check is not a good one and moving the ecosystem towards less vendoring is harmful to it. I'm not a fan of trying to work around the choices made by other tools by making real tradeoffs. But ultimately, I think it's fine here, this is still buildable in a pretty straightforward way, and the update script shouldn't matter.

@weiznich
Copy link
Contributor Author

weiznich commented Feb 4, 2026

I would like to share my view on this again. Maybe that helps you to understand why I'm doing this.

First of all I would like clarify that obviously that cargo-deny check is nothing that can tell you if something is malicious or not. I never wrote this, I just pointed out that these scripts cause issues if that check is enabled. We don't use this check as tool to classify crates as malicious or not. It's only one tiny input in our audit process. That written I also would like to point out that scripts in general could be used for malicious actions. They are obviously not the only thing that could be misused, but they are a relevant factor. So you need to audit the scripts to be sure that they are fine. Given that removing them from the published code removes the burden to review them this is a rather important point for us. Much more than satisfying the cargo-deny check as you can whitelist certain packages there.

That written I would like to clarify that my motivation here is not to move the ecosystem towards less vendoring. I honestly don't even see a connection there, as even removing tests wouldn't stop someone from vendoring the package and continue development in any of the mentioned scenarios. Now I can see that tests would be helpful for such cases, I several times indicated already that I'm happy to change such PR to include the tests if you express that you would like to have them in. I personally started these PR's with tests removed because that's the pattern followed by several larger projects (reqwest, rustls, diesel, …). To a lesser extend this also applies to benchmarks.

I also wonder how common vendoring actually is and if there are tool restrictions that make it harder than necessary. For example, technically speaking cargo vendor could also pull information from the linked github repository instead of only from crates.io, or crates.io could allow providing multiple package variants for different purposes (minimal download for dependencies, full snapshot for vendoring, etc). That obviously doesn't mean that anything of that should exist or even that these are good ideas. I just want to point out that "more vendoring" also could mean: Make the tooling better so that's easier and therefore more common to vendor your dependencies.

As for the development related scripts + vendoring. I can see why you as a maintainer feel that they would be helpful for continued development, given that you use them "all the time". From my experience of moving around between quite a lot of repositories of other people they are often not that helpful for someone that's not used to this particular script setup. That's because each repository/maintainer tends to have a slightly different set of scripts that perform slightly different specific actions. You need to dig through that all to utilize them, if they even fit your workflow or environment. If you would like to make the development workflow easier to pick up for others the best thing you could do in my opinion is to completely drop the scripts and make sure that a single cargo test in the package root does the right thing (whatever right means for this particular repo). Having a unified workflow helps much more than having random scripts. In my opinion the change that got merged to libloading is a good example of making things actually easier for vendoring by excluding "magic" blobs in favor of small helpers that can be reproduced outside of the maintainers system.

Finally I would like to address the point of having raised concerns on other repos already. While I can totally understand that it is frustrating to have several of this PR's opened in a short time frame and that it feels like having the same argument again and again I also would like to point out that it is often hard to impossible to even find out who's the person that is responsible for a specific repo. That's especially hard for older "basic infrastructure" repos like this one or libloading that only require minor maintenance over the years. For me that means: I just don't know if there is a preference on what to include or not as I've seen both variants already (maintainer requests to remove tests and maintainer requests to include tests). That unfortunately means for you you need to at least indicate: Please do it as discussed there. You don't need to repeat the same argument.

@Manishearth
Copy link
Member

First of all I would like clarify

Yes, I understand all of this. I still think it is a poor bandaid that doesn't actually fix the problem. I do not intend to discuss that here further.

That written I would like to clarify that my motivation here is not to move the ecosystem towards less vendoring

Let me be more precise: moving the ecosystem towards publishing fewer things in their crates. I used an unclear shorthand there: I was saying "vendoring fewer things" because of the context and then dropped the "fewer things" making it more confusing.

I also wonder how common vendoring actually is and if there are tool restrictions that make it harder than necessary.

cargo vendor isn't the only thing that vendors. People have all kinds of vendoring setups. It's pretty common in large multilanguage projects; it's also common in distro package management, which isn't exactly "normal" vendoring but has the same characteristics.

For example, technically speaking cargo vendor could also pull information from the linked github repository instead of only from crates.io

Then you need to consume cargo_vcs_info and hope for a truthful and non-dirty publish commit. This is a whole can of worms.

At least one codebase I've worked with uses cargo_vcs_info to fetch fuzzers from a git repo, but would not rely on anything more important than that.

As for the development related scripts + vendoring. I can see why you as a maintainer feel that they would be helpful for continued development, given that you use them "all the time". From my experience of moving around between quite a lot of repositories of other people they are often not that helpful for someone that's not used to this particular script setup.

No, I am saying this as a consumer of third party crates in vendoring setups. Crates I do not directly maintain, but need to "maintain" in the context of the vendoring setup. Similar to how Debian package "maintainers" are usually not actually necessarily OSS maintainers of the actual code.

I, personally, don't actually ever need to care about the libfuzzer scripts being in the package. I'm not running them from the package, I'm running them from the git repo. Other clients of libfuzzer may care, though, and I have been in equivalent situations around other packages where include rules have led to a long cycle of "ask upstream to include a bunch of critical build files, wait for a publish, revendor" cycle. I have also been the maintainer of projects receiving such requests. I would strongly prefer to avoid making deliberate choices that end up in us in that situation.

@weiznich
Copy link
Contributor Author

weiznich commented Feb 4, 2026

Well given that you have strong opinions about that: What's your suggestion to move forward. How would you address my problem? How exactly do you want to guard about potential malicious scripts other than reviewing them or excluding them, with the later variant clearly being the superior solution to that specific problem? I'm honestly open to suggestions there but up until now that only sounds like something where you never can satisfy everyone. In such cases it might be worth to consider that the majority of users don't vendor dependencies and optimize for their use-case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants