-
-
Notifications
You must be signed in to change notification settings - Fork 154
is it possible to output regular files instead of warc? #228
Copy link
Copy link
Open
Labels
Description
i only want files, not warc.
can grab-site output regular files (like html and images) for me like wget can? (links must be converted to relative links)
side question: has anyone here actually had good results with getting files back out of warc? this wouldn't be such a big deal if that were possible. i've never seen a util that can exract files from warcs with 100% success rate (and it's usually insanely slow).
i've tried:
- jwat-tools: seemed the best coded of the bunch but gave me nonsensical filenames like
extracted.001, and idk how to get past that - warcat: slow and fails on many warcs
- warc-extractor: the easiest to use of the bunch (it can hit a bunch of warcs in a single dir), but it's insanely slow, and it also fails on many warcs
- the unarchiver: fails on some warcs
Reactions are currently unavailable