Skip to content

feat(jobs): add volume mounting support for buckets and repos#3936

Draft
XciD wants to merge 7 commits intomainfrom
feat/job-volumes
Draft

feat(jobs): add volume mounting support for buckets and repos#3936
XciD wants to merge 7 commits intomainfrom
feat/job-volumes

Conversation

@XciD
Copy link
Member

@XciD XciD commented Mar 16, 2026

Summary

Add support for mounting HuggingFace Buckets and Repos (models, datasets, spaces) as volumes in Job containers.

Python API

from huggingface_hub import run_job, JobVolume

job = run_job(
    image="python:3.12",
    command=["python", "-c", "import os; print(os.listdir('/data'))"],
    volumes=[
        JobVolume(type="dataset", source="username/my-dataset", mount_path="/data"),
        JobVolume(type="bucket", source="username/my-bucket", mount_path="/output"),
    ],
)

CLI

hf jobs run -v datasets/username/my-dataset:/data -v buckets/username/my-bucket:/output python:3.12 python script.py

Changes

  • _jobs_api.py: new JobVolume dataclass and JobVolumeType enum, volumes field added to JobInfo/JobSpec/_create_job_spec
  • hf_api.py: volumes parameter added to run_job, run_uv_job, create_scheduled_job, create_scheduled_uv_job
  • cli/jobs.py: --volume/-v CLI option with Docker-like syntax (TYPE/SOURCE:/MOUNT_PATH[:ro])
  • __init__.py: export JobVolume, JobVolumeType

Add `volumes` parameter to `run_job`, `create_scheduled_job`,
`run_uv_job`, and `create_scheduled_uv_job` to mount HuggingFace
Buckets and Repos (models, datasets, spaces) as volumes in job
containers.

- Add `JobVolume` dataclass and `JobVolumeType` enum
- Add `volumes` field to `JobInfo` and `JobSpec` responses
- Add `-v/--volume` CLI option with Docker-like syntax
  (e.g. `-v models/gpt2:/data` or `-v buckets/org/bucket:/mnt:ro`)
- Serialize volumes to camelCase for the Hub API
@bot-ci-comment
Copy link

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

XciD added 4 commits March 16, 2026 20:41
- Remove dead isinstance check in _create_job_spec serialization
- Add volumes field to JobInfo docstring
- Preserve original input in _parse_volumes error messages
- Restructure tests: parametrize, merge into existing classes, top-level imports
Comment on lines +1183 to +1205
# Parse type from source_part (first segment before /)
slash_idx = source_part.find("/")
if slash_idx == -1:
# No slash: bare source like "gpt2:/data" -> model type
vol_type_str = JobVolumeType.MODEL.value
source = source_part
else:
vol_type_str = source_part[:slash_idx]
source = source_part[slash_idx + 1 :]
# If the first segment isn't a known type, treat the whole thing as a model source
# e.g. "org/my-model:/data" -> type=model, source="org/my-model"
if vol_type_str not in _VOLUME_TYPES:
vol_type_str = JobVolumeType.MODEL.value
source = source_part

result.append(
JobVolume(
type=vol_type_str,
source=source,
mount_path=mount_path,
read_only=read_only,
)
)
Copy link
Member

@lhoestq lhoestq Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with this change you can support revisions (including special refs) and paths in repo/bucket:

Suggested change
# Parse type from source_part (first segment before /)
slash_idx = source_part.find("/")
if slash_idx == -1:
# No slash: bare source like "gpt2:/data" -> model type
vol_type_str = JobVolumeType.MODEL.value
source = source_part
else:
vol_type_str = source_part[:slash_idx]
source = source_part[slash_idx + 1 :]
# If the first segment isn't a known type, treat the whole thing as a model source
# e.g. "org/my-model:/data" -> type=model, source="org/my-model"
if vol_type_str not in _VOLUME_TYPES:
vol_type_str = JobVolumeType.MODEL.value
source = source_part
result.append(
JobVolume(
type=vol_type_str,
source=source,
mount_path=mount_path,
read_only=read_only,
)
)
resolved_path = hffs.resolve_path(source_part)
if isinstance(resolved_path, HfFileSystemResolvedRepositoryPath):
result.append(
JobVolume(
type=resolved_path.repo_type,
source=resolved_path.repo_id,
mount_path=mount_path,
revision=resolved_path.revision,
read_only=read_only,
path=resolved_path.path_in_repo,
)
)
else:
result.append(
JobVolume(
type=JobVolumeType.BUCKET.value,
source=resolved_path.bucket_id,
mount_path=mount_path,
read_only=read_only,
path=resolved_path.path,
)
)

for example here are supported paths:

# buckets
"hf://buckets/username/bucket"
"hf://buckets/username/bucket/path"
# repos
"hf://gpt2"
"hf://user/model"
"hf://datasets/user/dataset"
"hf://user/model/path/in/repo"
"hf://user/model@revision"
"hf://user/model@refs/pr/1"

(it works with and without the hf:// prefix)

your will need these imports

from huggingface_hub import hffs
from huggingface_hub.hf_file_system import HfFileSystemResolvedBucketPath, HfFileSystemResolvedRepositoryPath

it will also raise an error if the repo / bucket doesn't exist

@lhoestq
Copy link
Member

lhoestq commented Mar 16, 2026

love it ! quick question for the CLI: should we require the hf:// prefix for the source path ? to make sure it doesn't look like a local path (and in case we want to support local path at some point)

@davanstrien
Copy link
Member

davanstrien commented Mar 17, 2026

quick question for the CLI: should we require the hf:// prefix for the source path ? to make sure it doesn't look like a local path (and in case we want to support local path at some point)

Think this makes sense IMO. For Jobs I have quite a lot of use cases in mind where you do something like

hf jobs uv run whisper-transcribe.py some-local-dir/audiofiles.mp3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants