Skip to content

Support remote images #52

@bertsky

Description

@bertsky

We frequently have the use-case where some (or even all) the file references have not been downloaded yet.

But these URL references for images make OcrdBrowser stumble:

today at 22:59:06Traceback (most recent call last):
today at 22:59:06  File "/usr/local/lib/python3.7/site-packages/ocrd_browser/ui/window.py", line 92, in _open
today at 22:59:06    self.page_list.set_document(self.document)
today at 22:59:06  File "/usr/local/lib/python3.7/site-packages/ocrd_browser/ui/page_browser.py", line 39, in set_document
today at 22:59:06    self.model = PageListStore(self.document)
today at 22:59:06  File "/usr/local/lib/python3.7/site-packages/ocrd_browser/ui/page_store.py", line 57, in __init__
today at 22:59:06    file_lookup = document.get_image_paths(self.file_group)
today at 22:59:06  File "/usr/local/lib/python3.7/site-packages/ocrd_browser/model/document.py", line 275, in get_image_paths
today at 22:59:06    image_paths[page_id] = self.path(images[0])
today at 22:59:06  File "/usr/local/lib/python3.7/site-packages/ocrd_browser/model/document.py", line 169, in path
today at 22:59:06    return self.directory.joinpath(other.local_filename)
today at 22:59:06  File "/usr/local/lib/python3.7/pathlib.py", line 922, in joinpath
today at 22:59:06    return self._make_child(args)
today at 22:59:06  File "/usr/local/lib/python3.7/pathlib.py", line 704, in _make_child
today at 22:59:06    drv, root, parts = self._parse_args(args)
today at 22:59:06  File "/usr/local/lib/python3.7/pathlib.py", line 658, in _parse_args
today at 22:59:06    a = os.fspath(a)
today at 22:59:06TypeError: expected str, bytes or os.PathLike object, not NoneType

That's because in …

if isinstance(other, OcrdFile):
return self.directory.joinpath(other.local_filename)

… we do not differentiate between an OcrdFile's .local_filename (which may be empty) and its .url. The latter could still be downloaded into the document.directory under some name and returned here.

Or perhaps one could somehow make this downloading a lazy operation only to be triggered when actually needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions