runpod-workers · deanq · Oct 10, 2025 · Aug 16, 2025 · Aug 16, 2025 · Aug 16, 2025
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -17,56 +17,49 @@ This is `worker-tetra`, a RunPod Serverless worker template that provides dynami
 - **Function Executor** (`src/function_executor.py:12`): Handles individual function execution with full output capture (stdout, stderr, logs)
 - **Class Executor** (`src/class_executor.py:14`): Manages class instantiation and method execution with instance persistence and metadata tracking
 
-### 2. HuggingFace Model Cache-Ahead (`src/huggingface_cache.py`)
-- **Model Pre-Caching**: Downloads HuggingFace models before user code execution
-- **Cache Validation**: Checks if models are already cached to avoid redundant downloads
-- **Authentication**: Supports HF_TOKEN for private/gated model access
-- **Transfer Acceleration**: Uses hf_transfer when HF_HUB_ENABLE_HF_TRANSFER=1 is set
-- **Transparent Caching**: User code references models without knowing they're pre-cached
-
-### 3. Dependency Management System (`src/dependency_installer.py:14`)
+### 2. Dependency Management System (`src/dependency_installer.py:14`)
 - **Python Package Installation**: UV-based package management with environment-aware configuration (Docker vs local)
 - **System Package Installation**: APT/Nala-based system dependency handling with acceleration support
 - **Differential Installation**: Optimized package installation that skips already-installed packages
 - **Environment Detection**: Automatic Docker vs local environment detection for appropriate installation methods
 - **System Package Filtering**: Intelligent detection of system-available packages to avoid redundant installation
 - **Universal Subprocess Integration**: All subprocess operations use centralized logging utility
 
-### 4. Universal Subprocess Utility (`src/subprocess_utils.py`)
+### 3. Universal Subprocess Utility (`src/subprocess_utils.py`)
 - **Centralized Subprocess Operations**: All subprocess calls use `run_logged_subprocess` for consistency
 - **Automatic Logging Integration**: All subprocess output flows through log streamer at DEBUG level
 - **Environment-Aware Execution**: Handles Docker vs local environment differences automatically
 - **Standardized Error Handling**: Consistent FunctionResponse pattern for all subprocess operations
 - **Timeout Management**: Configurable timeouts with proper cleanup on timeout/cancellation
 
-### 5. Serialization & Protocol Management
+### 4. Serialization & Protocol Management
 - **Protocol Definitions** (`src/remote_execution.py:13`): Pydantic models for request/response with validation
 - **Serialization Utils** (`src/serialization_utils.py`): CloudPickle-based data serialization for function arguments and results
 - **Base Executor** (`src/base_executor.py`): Common execution interface and environment setup
 
-### 6. Tetra SDK Integration (`tetra-rp/` submodule)
+### 5. Tetra SDK Integration (`tetra-rp/` submodule)
 - **Client Interface**: `@remote` decorator for marking functions for remote execution
 - **Resource Management**: GPU/CPU configuration and provisioning through LiveServerless objects
 - **Live Serverless**: Dynamic infrastructure provisioning with auto-scaling
 - **Protocol Buffers**: Communication protocol definitions for distributed execution
 
-### 7. Testing Infrastructure (`tests/`)
+### 6. Testing Infrastructure (`tests/`)
 - **Unit Tests** (`tests/unit/`): Component-level testing for individual modules with mocking
 - **Integration Tests** (`tests/integration/`): End-to-end workflow testing with real execution
 - **Test Fixtures** (`tests/conftest.py:1`): Shared test data, mock objects, and utility functions
 - **Handler Testing**: Local execution validation with JSON test files (`src/tests/`)
   - **Full Coverage**: All handler tests pass with environment-aware dependency installation
   - **Cross-Platform**: Works correctly in both Docker containers and local macOS/Linux environments
 
-### 8. Build & Deployment Pipeline
+### 7. Build & Deployment Pipeline
 - **Docker Containerization**: GPU (`Dockerfile`) and CPU (`Dockerfile-cpu`) image builds
 - **CI/CD Pipeline**: Automated testing, linting, and releases (`.github/workflows/`)
 - **Quality Gates** (`Makefile:104`): Format checking, type checking, test coverage requirements
 - **Release Management**: Automated semantic versioning and Docker Hub deployment
 
-### 9. Configuration & Constants
+### 8. Configuration & Constants
 - **Constants** (`src/constants.py`): System-wide configuration values (NAMESPACE, LARGE_SYSTEM_PACKAGES)
-- **Environment Configuration**: RunPod API integration and HuggingFace cache settings
+- **Environment Configuration**: RunPod API integration
 
 ## Architecture
 
@@ -90,13 +83,12 @@ This is `worker-tetra`, a RunPod Serverless worker template that provides dynami
 ### Key Patterns
 
 1. **Remote Function Execution**: Functions decorated with `@remote` are automatically executed on RunPod GPU workers
-2. **Composition Pattern**: RemoteExecutor uses specialized components (DependencyInstaller, HuggingFaceCacheAhead, Executors)
+2. **Composition Pattern**: RemoteExecutor uses specialized components (DependencyInstaller, Executors)
 3. **Dynamic Dependency Management**: Dependencies specified in decorators are installed at runtime with differential updates
-4. **HuggingFace Cache-Ahead**: Models specified in `hf_models_to_cache` are pre-downloaded before execution
-5. **Universal Subprocess Operations**: All subprocess calls use centralized `run_logged_subprocess` for consistent logging and error handling
-6. **Environment-Aware Configuration**: Automatic Docker vs local environment detection for appropriate installation methods
-7. **Serialization**: Uses cloudpickle + base64 encoding for function arguments and results
-8. **Resource Configuration**: `LiveServerless` objects define GPU requirements, scaling, and worker configuration
+4. **Universal Subprocess Operations**: All subprocess calls use centralized `run_logged_subprocess` for consistent logging and error handling
+5. **Environment-Aware Configuration**: Automatic Docker vs local environment detection for appropriate installation methods
+6. **Serialization**: Uses cloudpickle + base64 encoding for function arguments and results
+7. **Resource Configuration**: `LiveServerless` objects define GPU requirements, scaling, and worker configuration
 
 ## Development Commands
 
@@ -148,8 +140,6 @@ git submodule update --remote --rebase    # Update tetra-rp to latest
 - `HF_TOKEN`: Optional authentication token for private/gated HuggingFace models
 - `HF_HOME=/hf-cache`: HuggingFace cache location, set outside `/root/.cache` to exclude from volume sync
 - `DEBIAN_FRONTEND=noninteractive`: Set during system package installation
-- `UV_CACHE_DIR`: Package cache configuration
-- `VIRTUAL_ENV`: Virtual environment path configuration
 
 ### Resource Configuration
 Configure GPU resources using `LiveServerless` objects:
@@ -205,7 +195,6 @@ gpu_config = LiveServerless(
 │   ├── function_executor.py  # Function execution with output capture
 │   ├── class_executor.py     # Class execution with persistence
 │   ├── dependency_installer.py # Python and system dependency management
-│   ├── huggingface_cache.py  # HuggingFace model cache-ahead system
 │   ├── serialization_utils.py # CloudPickle serialization utilities
 │   ├── base_executor.py      # Common execution interface
 │   ├── constants.py          # System-wide configuration constants

diff --git a/Makefile b/Makefile
@@ -24,6 +24,7 @@ update: # Upgrade all dependencies
 	uv sync --upgrade --all-groups
 	uv lock --upgrade
 	git submodule update --remote
+	make protocols
 
 clean: # Remove build artifacts and cache files
 	rm -rf dist build *.egg-info
@@ -32,7 +33,12 @@ clean: # Remove build artifacts and cache files
 	find . -type f -name "*.pkl" -delete
 
 setup: dev # Initialize project, sync deps, update submodules
-	git submodule update --init --recursive
+	@if [ ! -f "tetra-rp/.git" ]; then \
+		git submodule update --init --recursive; \
+	fi
+	make protocols
+
+protocols: # Copy remote_execution protocol from submodule
 	cp tetra-rp/src/tetra_rp/protos/remote_execution.py src/
 
 build: # Build both GPU and CPU Docker images

diff --git a/docs/Endpoint Persistence.md b/docs/Endpoint Persistence.md
@@ -14,7 +14,7 @@
 
 - First container boots, and checks for volume presence and endpoint workspace. Create if not found.
 
-   1. Container will proceed to download any system, python or HF pre-cache instructed from the remote decorator.
+   1. Container will proceed to download any system or python dependencies in parallel as instructed from the remote decorator.
 
    2. Container runs its job.
 
@@ -45,7 +45,7 @@ graph TD
       H -->|No| J[Launch CDR Daemon<br/>Hydrate /app ← Workspace<br/>Then Monitor /app →
   Workspace]
 
-      G --> K[Download Dependencies<br/>System + Python + HF]
+      G --> K[Download Dependencies<br/>System + Python]
       I --> K
       J --> L[Skip Downloads<br/>Use Cached Data]
 

diff --git a/src/huggingface_cache.py b/src/huggingface_cache.py
diff --git a/src/remote_executor.py b/src/remote_executor.py
@@ -1,7 +1,6 @@
 import logging
 import asyncio
 from typing import List, Any
-from huggingface_cache import HuggingFaceCacheAhead
 from remote_execution import FunctionRequest, FunctionResponse, RemoteExecutorStub
 from dependency_installer import DependencyInstaller
 from function_executor import FunctionExecutor
@@ -25,7 +24,6 @@ def __init__(self):
         self.dependency_installer = DependencyInstaller()
         self.function_executor = FunctionExecutor()
         self.class_executor = ClassExecutor()
-        self.hf_cache = HuggingFaceCacheAhead()
         self.cache_sync = CacheSyncManager()
 
     async def ExecuteFunction(self, request: FunctionRequest) -> FunctionResponse:
@@ -54,11 +52,7 @@ async def ExecuteFunction(self, request: FunctionRequest) -> FunctionResponse:
 
         try:
             # Hydrate cache from volume if needed (before any installations)
-            has_installations = (
-                request.dependencies
-                or request.system_dependencies
-                or request.hf_models_to_cache
-            )
+            has_installations = request.dependencies or request.system_dependencies
             if has_installations:
                 await self.cache_sync.hydrate_from_volume()
 
@@ -148,13 +142,6 @@ async def _install_dependencies_parallel(
             tasks.append(task)
             task_names.append("python_dependencies")
 
-        # Add HF model cache-ahead tasks
-        if request.hf_models_to_cache:
-            for model_id in request.hf_models_to_cache:
-                task = self.hf_cache.cache_model_download_async(model_id)
-                tasks.append(task)
-                task_names.append(f"hf_model_{model_id}")
-
         if not tasks:
             return FunctionResponse(success=True, stdout="No dependencies to install")
 
@@ -189,15 +176,6 @@ async def _install_dependencies_sequential(
                 return sys_installed
             self.logger.info(sys_installed.stdout)
 
-        # Cache-ahead HuggingFace models if requested (should not happen when acceleration disabled)
-        if request.accelerate_downloads and request.hf_models_to_cache:
-            for model_id in request.hf_models_to_cache:
-                cache_result = self.hf_cache.cache_model_download(model_id)
-                if cache_result.success:
-                    self.logger.info(cache_result.stdout)
-                else:
-                    self.logger.warning(cache_result.error)
-
         # Install Python dependencies next
         if request.dependencies:
             py_installed = self.dependency_installer.install_dependencies(

diff --git a/src/tests/test_hf_accelerated_input.json b/src/tests/test_hf_accelerated_input.json