-
Notifications
You must be signed in to change notification settings - Fork 86
Open
Labels
feature requestNew feature or requestNew feature or request
Description
PDS-H benchmarks with higher scaling factors usually store hundreds of files under a certain data directory (e.g. directory lineitem) as opposed to using a single large data file (e.g. file lineitem.parquet). While cuDF can handle directories on local file systems via Python, it is unable to do so for remote directories. The current workaround is for users to enumerate all remote files in the benchmark, which is too cumbersome and inconvenient.
We need to investigate what types of endpoints (S3, S3 presigned, WebHDFS) in KvikIO can support file listing and how to do that via libcurl. Ideally, we want to have an interface similar to:
// Can invoke a namesake polymorphic function in the endpoint
std::vector<std::string> kvikio::RemoteHandle::listdir(std::string const& url, bool recursive = true/false);In passing, another necessary feature is to check whether a given URL points to a remote file or directory.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
feature requestNew feature or requestNew feature or request