Proposal: Add a fast url_large_download() helper for large file downloads (multipart + threading) #6236
Replies: 3 comments 2 replies
-
|
+1 Me too was facing similar issues where large guest image bootstrapping was failing in our lab due to urlopen limitations. via #6237 I was able to fix that issue. That replaces urllib with requests for streaming downloads which is more reliable, but external. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @bssrikanth, Thank you for your insights on the download improvements and for highlighting the I’ve been experimenting with another direction, a multi-threaded multipart downloader using standard library HTTP Range requests. It splits the file into segments and downloads them in parallel, which gives a noticeable performance boost for large files. Since this method stays within the Python stdlib, it doesn’t add new dependencies and remains consistent with Avocado’s current urllib usage. If you think this approach is valuable I am glad to co-author your PR Would appreciate your thoughts on whether this direction aligns with the project’s expectations. I tried it out with this server urllib has it's own limitations like
|
Beta Was this translation helpful? Give feedback.
-
|
Hi Sooraj, Thank you for the patch and the detailed implementation! I tested it with my original use case (bootstrapping large guest image files, often several GB in size), and it works really well – no more intermittent connection resets or aborts that I was seeing with the plain urlopen() approach. A couple of observations from the testing/user-experience side:
I agree that your approach solves the original problem while using the standard library. My only concern is the amount of new code we’d be adding and the long-term maintenance burden. Unless Avocado has a strict policy against third-party dependencies, a much smaller and still-reliable solution could be achieved with the requests library as show in PR #6237 . That said, if the project prefers to avoid any new dependencies, I have no problems dropping my PR. Requesting what the core team thinks about. Thanks again. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello guys,
I’ve been exploring Avocado’s download utilities and noticed that while url_download() works well for small and medium files, there isn’t a built-in option optimized for downloading large files efficiently.
Right now, downloads are handled using a single HTTP stream via urllib, which becomes noticeably slow for multi-GB artifacts. For larger workloads, a multi-segment or multi-connection approach can significantly improve performance.
I’d like to contribute a new optional helper, something like url_large_download(), which would:
perform multipart HTTP Range downloads
use multiple threads to download file segments in parallel
merge the segments into the final file
fall back to using wget or the existing method if the server does not support Range requests
This would not modify or break existing behavior — it would simply provide a faster option for workloads involving large files.
Before opening a PR, I wanted to ask:
Would you be open to adding such a helper to Avocado’s utilities?
If so, I can prepare an implementation, tests, and documentation aligned with the project’s style.
Additionally I think aria2c is also a good option, what do you guys think about it.
Thanks, and happy to discuss the design details!
Beta Was this translation helpful? Give feedback.
All reactions