refactor: deprecate hf_models_to_cache by deanq · Pull Request #32 · runpod-workers/flash

deanq · 2025-10-06T02:22:34Z

#31 and runpod/flash#95 are prerequisites

Add core download acceleration modules with aria2c integration: - download_accelerator.py: Main acceleration classes with multi-connection downloads - huggingface_accelerator.py: Specialized HF model acceleration - constants.py: Download acceleration configuration constants - __init__.py: Package structure for src module

Enhanced dependency installation with intelligent acceleration: - Auto-detects large packages for acceleration (torch, transformers, etc.) - Integrates with remote executor for acceleration control - Maintains backward compatibility with existing workflows - Provides graceful fallback when aria2c unavailable

Enhanced workspace manager with HuggingFace model pre-caching: - Pre-cache specified HF models before function execution - Integrates with volume-aware caching system - Optimizes cold start times for ML workloads

Comprehensive test suite for download acceleration: - Integration tests for aria2 detection and fallback behavior - HF model acceleration testing with authentication - Volume-aware acceleration scenarios - Error handling and performance validation

- Update test files moved to src/ directory - Enhanced test coverage for acceleration features - Updated dependencies and documentation - Submodule updates for tetra-rp

- Added nala accelerated installation for large system packages - Enhanced DependencyInstaller with automatic nala fallback to apt-get - Updated Docker images to include nala package manager - Added comprehensive system package acceleration tests - Improved acceleration logging with system package status

Simplify dependency installation by removing aria2c acceleration for Python packages. UV's built-in parallel downloading and caching is superior and eliminates the need for additional complexity. Changes: - Remove LARGE_PACKAGE_PATTERNS from constants.py - Simplify DependencyInstaller.install_dependencies() to single parameter - Remove Python package acceleration logic and related methods - Update RemoteExecutor to use simplified API - Update tests to match new simplified interface System package acceleration (nala) and HuggingFace model acceleration remain intact as they provide meaningful performance benefits over standard tools. Core functionality verified: - All handler tests pass (8/8) - All unit tests pass (98/98) - Code quality checks pass (format, lint, typecheck)

Add conditional acceleration logic - passes accelerate_downloads to installers, HF model caching only when accelerated + models specified

…bled Implement _install_with_pip() method and route between UV (accelerated) vs pip (standard) based on accelerate_downloads parameter

Add HfXetDownloader for subsequent downloads, implement smart strategy: hf_xet for cached files → hf_transfer for fresh downloads → fallback

Add tests for both acceleration enabled/disabled scenarios, verify UV vs pip routing, update existing test assertions

Update test expectations to handle accelerate_downloads parameter in integration scenarios

Update build files and dependency locks to support new acceleration functionality

Always use UV for Python package installation regardless of acceleration setting. The _install_with_pip method has been removed as UV provides more reliable virtual environment handling and package management. - Remove _install_with_pip() method (70 lines) - Simplify install_dependencies() to always use UV - Maintain differential installation when acceleration is enabled

Update dependency installer tests to reflect the removal of pip support: - Fix test_install_dependencies_with_acceleration_disabled to expect UV - Rename test_install_dependencies_pip_failure to test_install_dependencies_uv_failure - Update assertions to check for "uv pip" commands - Update test descriptions and expected error messages All tests now correctly validate UV-only package installation behavior.

Rename test_pip_no_acceleration.json to test_uv_no_acceleration.json and update content to reflect UV-only package installation: - Update function name from test_pip_installation_without_acceleration to test_uv_installation_without_acceleration - Update success message to reference UV instead of pip - Maintain same test logic for package import validation This test validates that packages installed with accelerate_downloads=False are properly available using UV package manager.

Add parallel installation of dependencies when acceleration is enabled: - Add async wrappers for dependency and model download methods - Implement _install_dependencies_parallel() using asyncio.gather() - Add _install_dependencies_sequential() for non-accelerated path - Add _process_parallel_results() for error handling - Route between parallel/sequential execution based on accelerate_downloads flag When accelerate_downloads=True, system packages, Python packages, and HF model downloads execute concurrently for improved performance.

Add accelerate_model_download_async() method to WorkspaceManager to support parallel execution of model downloads when acceleration is enabled. This async wrapper allows HF model downloads to run concurrently with dependency installations for improved performance.

Update test mocks and expectations for parallel execution implementation: - Fix AsyncMock setup for async dependency installation methods - Update test_dependency_management.py for async method calls - Update test_download_acceleration_integration.py for parallel execution - Update test_remote_executor.py with proper AsyncMock usage All tests now properly mock async methods and validate parallel execution behavior when acceleration is enabled.

- Remove 4 obsolete test files (debug logging, subprocess debug, vLLM symlink, redundant HF) - Add 6 new comprehensive test files covering advanced functionality: * test_system_dependencies.json - System package installation * test_class_persistence.json - Instance reuse with instance_id * test_function_args.json - Serialized arguments/kwargs testing * test_mixed_dependencies.json - Combined system + Python dependencies * test_class_custom_method.json - Custom method execution * test_error_scenarios.json - Error handling and edge cases - Update CLAUDE.md to fix test file location references Total test coverage: 11 files (was 5) covering all handler functionality

- Remove custom HfXetDownloader class (~160 lines) - now redundant - Update huggingface_hub requirement to >=0.32.0 for automatic hf_xet - Leverage HF Hub's native snapshot_download() with transparent acceleration - Simplify HuggingFaceAccelerator to use HF's built-in caching and Xet support - Update workspace_manager to trust HF's cache hierarchy (HF_HOME only) - Remove manual Xet detection and file-by-file download logic - Update tests to reflect native HF Hub integration approach - Add documentation for automatic HF acceleration features Benefits: - Automatic chunk-level deduplication via native hf_xet integration - Simplified codebase with 332 fewer lines of redundant code - Better performance using HF's battle-tested acceleration - Future-proof - automatically works with new Xet-enabled repos - Transparent operation - no code changes needed for acceleration

- Add strategy pattern for HF model downloads with tetra and native implementations - Implement model pattern matching for selective acceleration - Add comprehensive test coverage for download strategies - Integrate with existing workspace and cache management systems

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull Request Overview

This PR deprecates the HuggingFace model cache-ahead functionality by removing the hf_models_to_cache parameter and associated infrastructure. The refactor simplifies the codebase by eliminating HuggingFace-specific pre-caching mechanisms while maintaining the core remote execution capabilities.

Removed HuggingFace cache-ahead system entirely, including the huggingface_cache.py module
Eliminated hf_models_to_cache parameter handling from remote executor logic
Cleaned up test files and documentation references to HuggingFace caching features

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
tests/unit/test_remote_executor.py	Removed test case for HuggingFace model hydration functionality
tests/unit/test_huggingface_cache.py	Deleted entire test file for HuggingFace cache component
tests/integration/test_handler_integration.py	Removed integration tests for HuggingFace cache-ahead scenarios
src/tests/test_hf_accelerated_input.json	Deleted test input file for HuggingFace acceleration
src/remote_executor.py	Removed HuggingFace cache integration and `hf_models_to_cache` parameter handling
src/huggingface_cache.py	Deleted entire HuggingFace cache-ahead implementation
docs/Endpoint Persistence.md	Updated documentation to remove HuggingFace references
CLAUDE.md	Updated architecture documentation to reflect removal of HuggingFace features

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

…92-volume-cache-sync

…68-deprecate-hf_models_to_cache

…recate-hf_models_to_cache

…clude from volume sync

…recate-hf_models_to_cache

deanq added 30 commits August 15, 2025 17:05

feat: add workspace acceleration support

046eb58

Enhanced workspace manager with HuggingFace model pre-caching: - Pre-cache specified HF models before function execution - Integrates with volume-aware caching system - Optimizes cold start times for ML workloads

chore: moved test-handler files to src/

ce51390

feat: runtime uses aria2 for accelerated parallel downloads

6c04de1

chore: update project structure and dependencies

66eb286

- Update test files moved to src/ directory - Enhanced test coverage for acceleration features - Updated dependencies and documentation - Submodule updates for tetra-rp

chore: updated tetra-rp

1930b4b

build: local-execution-test use make test-handler

731fd56

chore: update CLAUDE.md

e829140

chore: move these values to constants.py for maintainability

104b2da

test: uv is no longer part of download accelerator

d7c996d

feat: implement accelerate_downloads parameter logic in RemoteExecutor

2ab93e3

Add conditional acceleration logic - passes accelerate_downloads to installers, HF model caching only when accelerated + models specified

feat: add pip fallback for Python dependencies when acceleration disa…

b50a7bf

…bled Implement _install_with_pip() method and route between UV (accelerated) vs pip (standard) based on accelerate_downloads parameter

feat: enhance HF model caching with hf_transfer/hf_xet strategy

440d00d

Add HfXetDownloader for subsequent downloads, implement smart strategy: hf_xet for cached files → hf_transfer for fresh downloads → fallback

test: add comprehensive coverage for accelerate_downloads parameter

0320e4d

Add tests for both acceleration enabled/disabled scenarios, verify UV vs pip routing, update existing test assertions

test: update integration tests for new acceleration parameter

034f770

Update test expectations to handle accelerate_downloads parameter in integration scenarios

chore: update dependencies and constants for download acceleration

9531079

Update build files and dependency locks to support new acceleration functionality

chore: memory correction

e1db417

deanq changed the base branch from deanq/ae-1092-tetra-volume-warm-cache to deanq/ae-1092-volume-cache-sync October 6, 2025 02:23

This was referenced Oct 6, 2025

refactor: deprecate hf_models_to_cache runpod/flash#95

Merged

feat: Endpoint Persistence using Network Volume (phase 1) #25

Merged

fix: deleted duplicate code block

6a5d87d

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

deanq requested review from Copilot, jhcipar and pandyamarut October 6, 2025 02:31

Copilot AI reviewed Oct 6, 2025

View reviewed changes

deanq and others added 12 commits October 5, 2025 19:39

chore: simplified happy path return

c191b02

Merge branch 'deanq/ae-1092-tetra-volume-warm-cache' into deanq/ae-10…

9eb3875

…92-volume-cache-sync

docs: updated the docstring to reflect function's intent

7e26412

fix: result.error not .stdout

a74fa09

Merge branch 'deanq/ae-1092-tetra-volume-warm-cache' into deanq/ae-12…

3bfbbdf

…68-deprecate-hf_models_to_cache

Merge branch 'deanq/ae-1092-volume-cache-sync' into deanq/ae-1268-dep…

8086436

…recate-hf_models_to_cache

Merge branch 'main' into deanq/ae-1092-volume-cache-sync

e2d2608

fix: bad merge

239d96b

refactor: HuggingFace cache location set outside /root/.cache to ex…

580b9f3

…clude from volume sync

Merge branch 'deanq/ae-1092-volume-cache-sync' into deanq/ae-1268-dep…

a26e228

…recate-hf_models_to_cache

build: tetra-rp submodule should be pinned

18e3a53

Merge branch 'deanq/ae-1092-volume-cache-sync' into deanq/ae-1268-dep…

85b5e4a

…recate-hf_models_to_cache

jhcipar approved these changes Oct 9, 2025

View reviewed changes

Base automatically changed from deanq/ae-1092-volume-cache-sync to main October 10, 2025 17:12

deanq added 2 commits October 10, 2025 11:08

Merge branch 'main' into deanq/ae-1268-deprecate-hf_models_to_cache

f80f8cd

chore: make update to update protocols

57d73e9

deanq marked this pull request as ready for review October 10, 2025 18:58

deanq added 2 commits October 10, 2025 13:57

Merge branch 'main' into deanq/ae-1268-deprecate-hf_models_to_cache

01d4372

build: make update

14b3455

deanq merged commit 2774335 into main Oct 10, 2025
12 checks passed

deanq deleted the deanq/ae-1268-deprecate-hf_models_to_cache branch October 10, 2025 21:10

runpod-workers-release-please-bot bot mentioned this pull request Oct 14, 2025

chore(main): release 0.7.1 #38

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: deprecate hf_models_to_cache#32

refactor: deprecate hf_models_to_cache#32
deanq merged 112 commits intomainfrom
deanq/ae-1268-deprecate-hf_models_to_cache

deanq commented Oct 6, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

deanq commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

deanq commented Oct 6, 2025 •

edited

Loading