-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Open
Labels
Description
Describe the proposed feature
Analog to pdftk it would be a nice feature if the following syntax would be allowed --pages 3-end when you know, that the first 2 pages are not used.
As far I can tell the changes that need to be done in
OCRmyPDF/src/ocrmypdf/_validation.py
Lines 155 to 191 in 8930efe
| def _pages_from_ranges(ranges: str) -> set[int]: | |
| pages: list[int] = [] | |
| page_groups = ranges.replace(' ', '').split(',') | |
| for group in page_groups: | |
| if not group: | |
| continue | |
| try: | |
| start, end = group.split('-') | |
| except ValueError: | |
| pages.append(int(group) - 1) | |
| else: | |
| try: | |
| new_pages = list(range(int(start) - 1, int(end))) | |
| if not new_pages: | |
| raise BadArgsError( | |
| f"invalid page subrange '{start}-{end}'" | |
| ) from None | |
| pages.extend(new_pages) | |
| except ValueError: | |
| raise BadArgsError(f"invalid page subrange '{group}'") from None | |
| if not pages: | |
| raise BadArgsError( | |
| f"The string of page ranges '{ranges}' did not contain any recognizable " | |
| f"page ranges." | |
| ) | |
| if not monotonic(pages): | |
| log.warning( | |
| "List of pages to process contains duplicate pages, or pages that are " | |
| "out of order" | |
| ) | |
| if any(page < 0 for page in pages): | |
| raise BadArgsError("pages refers to a page number less than 1") | |
| log.debug("OCRing only these pages: %s", pages) | |
| return set(pages) |
Ideas, thoughts on this topic?