feat: complete `@remote` support for LoadBalancer endpoints by deanq · Pull Request #131 · runpod/flash

deanq · 2026-01-04T05:10:33Z

Prerequisite: #130, #129
Related: runpod-workers/flash#45

Summary

Completes @remote decorator support for HTTP-based load-balanced endpoints with proper security boundaries between local development and production.

See docs for details:
Load_Balancer_Endpoints.md
LoadBalancer_Runtime_Architecture.md
Using_Remote_With_LoadBalancer.md

What's New

Handler Generation

Conditional /execute endpoint registration based on resource type
LiveLoadBalancer: Includes /execute for local development with function serialization
LoadBalancerSlsResource: Excludes /execute for security in deployed environments
Proper validation of HTTP routing for both resource types

Scanner Fix

Scanner now discovers both LiveLoadBalancer and LoadBalancerSlsResource
Previously only found "Serverless" in class names
Now checks for both "Serverless" and "LoadBalancer" patterns

Testing

Integration test for LiveLoadBalancer handler generation with /execute
Integration test for deployed endpoint handler generation without /execute
Scanner discovery test verifying both resource types are found

Architecture

LiveLoadBalancer (local): Uses /execute endpoint with function serialization
LoadBalancerSlsResource (deployed): Uses user-defined HTTP routes
Stub routing: Auto-detects resource type and routes accordingly
No code changes needed: Works transparently with @remote decorator

Security

Deployed endpoints don't expose /execute (prevents arbitrary code execution)
/execute only available for local development (LiveLoadBalancer)
User-defined routes are the interface for deployed endpoints

Usage

This enables:

@remote with LiveLoadBalancer for local testing
@remote with LoadBalancerSlsResource for deployed endpoints
flash build to generate handlers with correct endpoint configuration
Secure deployment to production

Implement a factory function that creates RunPod serverless handlers, eliminating code duplication across generated handler files. The generic_handler module provides: - create_handler(function_registry) factory that accepts a dict of function/class objects and returns a RunPod-compatible handler - Automatic serialization/deserialization using cloudpickle + base64 - Support for both function execution and class instantiation + method calls - Structured error responses with full tracebacks for debugging - Load manifest for cross-endpoint function discovery This design centralizes all handler logic in one place, making it easy to: - Fix bugs once, benefit all handlers - Add new features without regenerating projects - Keep deployment packages small (handler files are ~23 lines each) Implementation: - deserialize_arguments(): Base64 + cloudpickle decoding - serialize_result(): Cloudpickle + base64 encoding - execute_function(): Handles function vs. class execution - load_manifest(): Loads flash_manifest.json for service discovery

…uild process Implement the build pipeline components that work together to generate serverless handlers from @Remote decorated functions. Three core components: 1. RemoteDecoratorScanner (scanner.py) - Uses Python AST to discover all @Remote decorated functions - Extracts function metadata: name, module, async status, is_class - Groups functions by resource_config for handler generation - Handles edge cases like decorated classes and async functions 2. ManifestBuilder (manifest.py) - Groups functions by their resource_config - Creates flash_manifest.json structure for service discovery - Maps functions to their modules and handler files - Enables cross-endpoint function routing at runtime 3. HandlerGenerator (handler_generator.py) - Creates lightweight handler_*.py files for each resource config - Each handler imports functions and registers them in FUNCTION_REGISTRY - Handler delegates to create_handler() factory from generic_handler - Generated handlers are ~23 lines (vs ~98 with duplication) Build Pipeline Flow: 1. Scanner discovers @Remote functions 2. ManifestBuilder groups them by resource_config 3. HandlerGenerator creates handler_*.py for each group 4. All files + manifest bundled into archive.tar.gz This eliminates ~95% duplication across handlers by using the factory pattern instead of template-based generation.

Implement 19 unit tests covering all major paths through the generic_handler factory and its helper functions. Test Coverage: Serialization/Deserialization (7 tests): - serialize_result() with simple values, dicts, lists - deserialize_arguments() with empty, args-only, kwargs-only, mixed inputs - Round-trip encoding/decoding of cloudpickle + base64 Function Execution (4 tests): - Simple function execution with positional and keyword arguments - Keyword argument handling - Class instantiation and method calls - Argument passing to instance methods Handler Factory (8 tests): - create_handler() returns callable RunPod handler - Handler with simple function registry - Missing function error handling (returns error response, not exception) - Function exceptions caught with traceback included - Multiple functions in single registry - Complex Python objects (classes, lambdas, closures) - Empty registry edge case - Default execution_type parameter - None return values - Correct RunPod response format (success, result/error, traceback) Test Strategy: - Arrange-Act-Assert pattern for clarity - Isolated unit tests (no external dependencies) - Tests verify behavior, not implementation - Error cases tested for proper error handling - All serialization tested for round-trip correctness All tests passing, 83% coverage on generic_handler.py

…canning Implement integration tests validating the build pipeline components work correctly together. Test Coverage: HandlerGenerator Tests: - Handler files created with correct names (handler_<resource_name>.py) - Generated files import required functions from workers - FUNCTION_REGISTRY properly formatted - create_handler() imported from generic_handler - Handler creation via factory - RunPod start call present and correct - Multiple handlers generated for multiple resource configs ManifestBuilder Tests: - Manifest structure with correct version and metadata - Resources grouped by resource_config - Handler file paths correct - Function metadata preserved (name, module, is_async, is_class) - Function registry mapping complete ScannerTests: - @Remote decorated functions discovered via AST - Function metadata extracted correctly - Module paths resolved properly - Async functions detected - Class methods detected - Edge cases handled (multiple decorators, nested classes) Test Strategy: - Integration tests verify components work together - Tests verify generated files are syntactically correct - Tests validate data structures match expected schemas - No external dependencies in build process Validates that the entire build pipeline: 1. Discovers functions correctly 2. Groups them appropriately 3. Generates valid Python handler files 4. Creates correct manifest structure

Add comprehensive architecture documentation explaining why the factory pattern was chosen and how it works. Documentation includes: Overview & Context: - Problem statement: Handler files had 95% duplication - Design decision: Use factory function instead of templates - Benefits: Single source of truth, easier maintenance, consistency Architecture Diagrams (MermaidJS): - High-level flow: @Remote functions → Scanner → Manifest → Handlers → Factory - Component relationships: HandlerGenerator, GeneratedHandler, generic_handler - Function registry pattern: Discovery → Grouping → Registration → Factory Implementation Details: - create_handler(function_registry) signature and behavior - deserialize_arguments(): Base64 + cloudpickle decoding - serialize_result(): Cloudpickle + base64 encoding - execute_function(): Function vs. class execution - load_manifest(): Service discovery via flash_manifest.json Design Decisions (with rationale): - Factory Pattern over Inheritance: Simpler, less coupling, easier to test - CloudPickle + Base64: Handles arbitrary objects, safe JSON transmission - Manifest in Generic Handler: Runtime service discovery requirement - Structured Error Responses: Debugging aid, functional error handling - Both Execution Types: Supports stateful classes and pure functions Usage Examples: - Simple function handler - Class execution with methods - Multiple functions in one handler Build Process Integration: - 4-phase pipeline: Scanner → Grouping → Generation → Packaging - Manifest structure and contents - Generated handler structure (~23 lines) Testing Strategy: - 19 unit tests covering all major paths - 7 integration tests verifying handler generation - Manual testing with example applications Performance: - Zero runtime penalty (factory called once at startup) - No additional indirection in request path

Document the flash build command and update CLI README to include it. New Documentation: flash-build.md includes: Usage & Options: - Command syntax: flash build [OPTIONS] - --no-deps: Skip transitive dependencies (faster, smaller archives) - --keep-build: Keep build directory for inspection/debugging - --output, -o: Custom archive name (default: archive.tar.gz) What It Does (5-step process): 1. Discovery: Scan for @Remote decorated functions 2. Grouping: Group functions by resource_config 3. Handler Generation: Create lightweight handler files 4. Manifest Creation: Generate flash_manifest.json 5. Packaging: Create archive.tar.gz for deployment Build Artifacts: - .flash/archive.tar.gz: Deployment package (ready for RunPod) - .flash/flash_manifest.json: Service discovery configuration - .flash/.build/: Temporary build directory Handler Generation: - Explains factory pattern and minimal handler files - Links to Runtime_Generic_Handler.md for details Dependency Management: - Default behavior: Install all dependencies including transitive - --no-deps: Only direct dependencies (when base image has transitive) - Trade-offs explained Cross-Endpoint Function Calls: - Example showing GPU and CPU endpoints - Manifest enables routing automatically Output & Troubleshooting: - Sample build output with progress indicators - Common failure scenarios and solutions - How to debug with --keep-build Next Steps: - Test locally with flash run - Deploy to RunPod - Monitor with flash undeploy list Updated CLI README.md: - Added flash build to command list in sequence - Links to full flash-build.md documentation

Add a new section explaining how the build system works and why the factory pattern reduces code duplication. New Section: Build Process and Handler Generation Explains: How Flash Builds Your Application (5-step pipeline): 1. Discovery: Scans code for @Remote decorated functions 2. Grouping: Groups functions by resource_config 3. Handler Generation: Creates lightweight handler files 4. Manifest Creation: Generates flash_manifest.json for service discovery 5. Packaging: Bundles everything into archive.tar.gz Handler Architecture (with code example): - Shows generated handler using factory pattern - Single source of truth: All handler logic in one place - Easier maintenance: Bug fixes don't require rebuilding projects Cross-Endpoint Function Calls: - Example of GPU and CPU endpoints calling each other - Manifest and runtime wrapper handle service discovery Build Artifacts: - .flash/.build/: Temporary build directory - .flash/archive.tar.gz: Deployment package - .flash/flash_manifest.json: Service configuration Links to detailed documentation: - docs/Runtime_Generic_Handler.md for architecture details - src/tetra_rp/cli/docs/flash-build.md for CLI reference This section bridges the main README and detailed documentation, providing entry point for new users discovering the build system.

Wire up the handler generator, manifest builder, and scanner into the actual flash build command implementation. Changes to build.py: 1. Integration: - Import RemoteDecoratorScanner for function discovery - Import ManifestBuilder for manifest creation - Import HandlerGenerator for handler file creation - Call these in sequence during the build process 2. Build Pipeline: - After copying project files, scan for @Remote functions - Build manifest from discovered functions - Generate handler files for each resource config - Write manifest to build directory - Progress indicators show what's being generated 3. Fixes: - Change .tetra directory references to .flash - Uncomment actual build logic (was showing "Coming Soon" message) - Fix progress messages to show actual file counts 4. Error Handling: - Try/catch around handler generation - Warning shown if generation fails but build continues - User can debug with --keep-build flag Build Flow Now: 1. Load ignore patterns 2. Collect project files 3. Create build directory 4. Copy files to build directory 5. [NEW] Scan for @Remote functions 6. [NEW] Build and write manifest 7. [NEW] Generate handler files 8. Install dependencies 9. Create archive 10. Clean up build directory (unless --keep-build) Dependencies: - Updated uv.lock with all required dependencies

…handling **Critical Fixes:** - Remove "Coming Soon" message blocking build command execution - Fix build directory to use .flash/.build/ directly (no app_name subdirectory) - Fix tarball to extract with flat structure using arcname="." - Fix cleanup to remove correct build directory **Error Handling & Validation:** - Add specific exception handling (ImportError, SyntaxError, ValueError) - Add import validation to generated handlers - Add duplicate function name detection across resources - Add proper error logging throughout build process **Resource Type Tracking:** - Add resource_type field to RemoteFunctionMetadata - Track actual resource types (LiveServerless, CpuLiveServerless) - Use actual types in manifest instead of hardcoding **Robustness Improvements:** - Add handler import validation post-generation - Add manifest path fallback search (cwd, module dir, legacy location) - Add resource name sanitization for safe filenames - Add specific exception logging in scanner (UnicodeDecodeError, SyntaxError) **User Experience:** - Add troubleshooting section to README - Update manifest path documentation in docs - Change "Zero Runtime Penalty" to "Minimal Runtime Overhead" - Mark future enhancements as "Not Yet Implemented" - Improve build success message with next steps Fixes all 20 issues identified in code review (issues #1-13, #19-22)

Implement LoadBalancerSlsResource class for provisioning RunPod load-balanced serverless endpoints. Load-balanced endpoints expose HTTP servers directly to clients without queue-based processing, enabling REST APIs, webhooks, and real-time communication patterns. Key features: - Type enforcement (always LB, never QB) - Scaler validation (REQUEST_COUNT required, not QUEUE_DELAY) - Health check polling via /ping endpoint (200/204 = healthy) - Post-deployment verification with configurable retries - Async and sync health check methods - Comprehensive unit tests - Full documentation with architecture diagrams and examples Architecture: - Extends ServerlessResource with LB-specific behavior - Validates configuration before deployment - Polls /ping endpoint until healthy (10 retries × 5s = 50s timeout) - Raises TimeoutError if endpoint fails to become healthy This forms the foundation for Mothership architecture where a load-balanced endpoint serves as a directory server for child endpoints.

Import ServerlessResource directly and use patch.object on the imported class instead of string-based patches. This ensures the mocks properly intercept the parent class's _do_deploy method when called via super(). Simplifies mock configuration and removes an unused variable assertion. Fixes the three failing deployment tests that were making real GraphQL API calls. All tests now pass: 418 passed, 1 skipped.

…oints Implement core infrastructure for enabling @Remote decorator on LoadBalancerSlsResource endpoints with HTTP method/path routing. Changes: - Create LoadBalancerSlsStub: HTTP-based stub for direct endpoint execution (src/tetra_rp/stubs/load_balancer_sls.py, 170 lines) - Serializes functions and arguments using cloudpickle + base64 - Direct HTTP POST to /execute endpoint (no queue polling) - Proper error handling and deserialization - Register stub with singledispatch (src/tetra_rp/stubs/registry.py) - Enables @Remote to dispatch to LoadBalancerSlsStub for LB resources - Extend @Remote decorator with HTTP routing parameters (src/tetra_rp/client.py) - Add 'method' parameter: GET, POST, PUT, DELETE, PATCH - Add 'path' parameter: /api/endpoint routes - Validate method/path required for LoadBalancerSlsResource - Store routing metadata on decorated functions/classes - Warn if routing params used with non-LB resources Foundation for Phase 2 (Build system integration) and Phase 3 (Local dev).

Update RemoteDecoratorScanner to extract HTTP method and path from @Remote decorator for LoadBalancerSlsResource endpoints. Changes: - Add http_method and http_path fields to RemoteFunctionMetadata - Add _extract_http_routing() method to parse decorator keywords - Extract method (GET, POST, PUT, DELETE, PATCH) from decorator - Extract path (/api/process) from decorator - Store routing metadata for manifest generation Foundation for Phase 2.2 (Manifest updates) and Phase 2.3 (Handler generation).

Enhance ManifestBuilder to support HTTP method/path routing for LoadBalancerSlsResource endpoints. Changes: - Add http_method and http_path fields to ManifestFunction - Validate LB endpoints have both method and path - Detect and prevent route conflicts (same method + path) - Prevent use of reserved paths (/execute, /ping) - Add 'routes' section to manifest for LB endpoints - Conditional inclusion of routing fields (only for LB) Manifest structure for LB endpoints now includes: { "resources": { "api_service": { "resource_type": "LoadBalancerSlsResource", "functions": [ { "name": "process_data", "http_method": "POST", "http_path": "/api/process" } ] } }, "routes": { "api_service": { "POST /api/process": "process_data" } } }

Implement LBHandlerGenerator to create FastAPI applications for LoadBalancerSlsResource endpoints with HTTP method/path routing. Key features: - Generates FastAPI apps with explicit route registry - Creates (method, path) -> function mappings from manifest - Validates route conflicts and reserved paths - Imports user functions and creates dynamic routes - Includes required /ping health check endpoint - Validates generated handler Python syntax via import Generated handler structure enables: - Direct HTTP routing to user functions via FastAPI - Framework /execute endpoint for @Remote stub execution - Local development with uvicorn

Create create_lb_handler() factory function that dynamically builds FastAPI applications from route registries for LoadBalancerSlsResource endpoints. Key features: - Accepts route_registry: Dict[(method, path)] -> handler_function mapping - Registers all user-defined routes from registry to FastAPI app - Provides /execute endpoint for @Remote stub function execution - Handles async function execution automatically - Serializes results with cloudpickle + base64 encoding - Comprehensive error handling with detailed logging The /execute endpoint enables: - Remote function code execution via @Remote decorator - Automatic argument deserialization from cloudpickle/base64 - Result serialization for transmission back to client - Support for both sync and async functions

Update build command to use appropriate handler generators based on resource type. Separates LoadBalancerSlsResource endpoints (using FastAPI) from queue-based endpoints (using generic handler). Changes: - Import LBHandlerGenerator alongside HandlerGenerator - Inspect manifest resources and separate by type - Generate LB handlers via LBHandlerGenerator - Generate QB handlers via HandlerGenerator - Combine all generated handler paths for summary Enables users to mix LB and QB endpoints in same project with correct code generation for each resource type.

Implement LiveLoadBalancer resource following the LiveServerless pattern for local development and testing of load-balanced endpoints. Changes: - Add TETRA_LB_IMAGE constant for load-balanced Tetra image - Create LiveLoadBalancer class extending LoadBalancerSlsResource - Uses LiveServerlessMixin to lock imageName to Tetra LB image - Register LiveLoadBalancer with LoadBalancerSlsStub in singledispatch - Export LiveLoadBalancer from core.resources and top-level __init__ This enables users to test LB-based functions locally before deploying, using the same pattern as LiveServerless for queue-based endpoints. Users can now write: from tetra_rp import LiveLoadBalancer, remote api = LiveLoadBalancer(name="test-api") @Remote(api, method="POST", path="/api/process") async def process_data(x, y): return {"result": x + y} result = await process_data(5, 3) # Local execution

Implement unit tests for LoadBalancerSlsStub covering: - Request preparation with arguments and dependencies - Response handling for success and error cases - Error handling for invalid responses - Base64 encoding/decoding of serialized data - Endpoint URL validation - Timeout and HTTP error handling Test coverage: - _prepare_request: 4 tests - _handle_response: 5 tests - _execute_function: 3 error case tests - __call__: 2 integration tests Tests verify proper function serialization, argument handling, error propagation, and response deserialization.

Fix test_load_balancer_vs_queue_based_endpoints by updating the @Remote decorator to use method='POST' and path='/api/echo' to match the test assertions. This was a test-level bug where the decorator definition didn't match what was being asserted.

…ndpoints - Using_Remote_With_LoadBalancer.md: User guide for HTTP routing, local development, building and deploying - LoadBalancer_Runtime_Architecture.md: Technical details on deployment, request flows, security, and performance - Updated README.md with LoadBalancer section and code example - Updated Load_Balancer_Endpoints.md with cross-references to new guides

Split @Remote execution behavior between local and deployed: - LiveLoadBalancer (local): Uses /execute endpoint for function serialization - LoadBalancerSlsResource (deployed): Uses user-defined routes with HTTP param mapping Changes: 1. LoadBalancerSlsStub routing detection: - _should_use_execute_endpoint() determines execution path - _execute_via_user_route() maps args to JSON and POSTs to user routes - Auto-detects resource type and routing metadata 2. Conditional /execute registration: - create_lb_handler() now accepts include_execute parameter - Generated handlers default to include_execute=False (security) - LiveLoadBalancer can enable /execute if needed 3. Updated handler generator: - Added clarity comments on /execute exclusion for deployed endpoints 4. Comprehensive test coverage: - 8 new tests for routing detection and execution paths - All 31 tests passing (22 unit + 9 integration) 5. Documentation updates: - Using_Remote_With_LoadBalancer.md: clarified /execute scope - Added 'Local vs Deployed Execution' section explaining differences - LoadBalancer_Runtime_Architecture.md: updated execution model - Added troubleshooting for deployed endpoint scenarios Security improvement: - Deployed endpoints only expose user-defined routes - /execute endpoint removed from production (prevents arbitrary code execution) - Lower attack surface for deployed endpoints

…lude /execute endpoint - Modified manifest.py to validate LiveLoadBalancer endpoints like LoadBalancerSlsResource - Updated lb_handler_generator to: - Include LiveLoadBalancer in handler generation filter - Pass include_execute=True for LiveLoadBalancer (local dev) - Pass include_execute=False for LoadBalancerSlsResource (deployed) - Added integration tests: - Verify LiveLoadBalancer handlers include /execute endpoint - Verify deployed handlers exclude /execute endpoint - Fixes critical bug: LiveLoadBalancer now gets /execute endpoint in generated handlers

…ss resources - Updated scanner to extract LiveLoadBalancer and LoadBalancerSlsResource resources - Previously only looked for 'Serverless' in class name, missing LoadBalancer endpoints - Now checks for both 'Serverless' and 'LoadBalancer' in resource type names - Added integration test to verify scanner discovers both resource types - Fixes critical bug that prevented flash build from finding LoadBalancer endpoints

- Wrap long lines in manifest.py, lb_handler.py, and load_balancer_sls.py - Remove unused httpx import in test_load_balancer_sls_stub.py - Apply consistent formatting across codebase

Copilot

Pull request overview

This PR completes @Remote decorator support for LoadBalancer endpoints by implementing proper security boundaries and handler generation for both local development (LiveLoadBalancer) and production (LoadBalancerSlsResource).

Key changes:

Conditional /execute endpoint registration based on resource type (enabled for LiveLoadBalancer, disabled for LoadBalancerSlsResource)
Scanner enhancement to discover both LiveLoadBalancer and LoadBalancerSlsResource classes
Comprehensive test coverage for stub routing, handler generation, and scanner discovery

Reviewed changes

Copilot reviewed 33 out of 34 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
src/tetra_rp/client.py	Added HTTP routing parameters (method, path) to @Remote decorator with validation for LoadBalancerSlsResource
src/tetra_rp/stubs/load_balancer_sls.py	Implemented LoadBalancerSlsStub with dual routing (execute endpoint vs user routes) based on resource type
src/tetra_rp/stubs/registry.py	Registered stubs for LoadBalancerSlsResource and LiveLoadBalancer
src/tetra_rp/runtime/lb_handler.py	Created FastAPI handler factory with conditional /execute endpoint inclusion
src/tetra_rp/runtime/generic_handler.py	Implemented generic handler factory for queue-based endpoints
src/tetra_rp/core/resources/load_balancer_sls_resource.py	Added LoadBalancerSlsResource class with LB-specific validation and health checks
src/tetra_rp/core/resources/live_serverless.py	Added LiveLoadBalancer class for local development
src/tetra_rp/cli/commands/build.py	Enhanced build command with handler generation and manifest creation
src/tetra_rp/cli/commands/build_utils/scanner.py	Implemented AST-based scanner to discover @Remote functions with LoadBalancer support
src/tetra_rp/cli/commands/build_utils/lb_handler_generator.py	Created generator for FastAPI handlers with conditional /execute endpoint
tests/integration/test_lb_remote_execution.py	Added integration tests for LiveLoadBalancer and LoadBalancerSlsResource handler generation
tests/unit/test_load_balancer_sls_stub.py	Comprehensive unit tests for LoadBalancerSlsStub routing and execution
docs/*.md	Added comprehensive documentation for LoadBalancer endpoints and runtime architecture

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/tetra_rp/cli/commands/build_utils/scanner.py

src/tetra_rp/cli/commands/build_utils/manifest.py

src/tetra_rp/stubs/load_balancer_sls.py

src/tetra_rp/runtime/lb_handler.py

tests/integration/test_lb_remote_execution.py

src/tetra_rp/cli/commands/build_utils/manifest.py

- Scanner: Use exact type name matching instead of substring matching - Whitelist specific resource types to avoid false positives - Prevents matching classes like 'MyServerlessHelper' or 'LoadBalancerUtils' - Type hints: Use Optional[str] for nullable fields in manifest - ManifestFunction.http_method and http_path now properly typed - Timeout: Make HTTP client timeout configurable - Added LoadBalancerSlsStub.DEFAULT_TIMEOUT class attribute - Added timeout parameter to __init__ - Updated both _execute_function and _execute_via_user_route to use self.timeout - Deprecated datetime: Replace datetime.utcnow() with datetime.now(timezone.utc) - Updated manifest.py and test_lb_remote_execution.py - Ensures Python 3.12+ compatibility

Addresses three feedback items from code review: 1. Fix breaking parameter order change in update_system_dependencies() - Restored original parameter order: template_id, token, system_dependencies, base_entry_cmd - Maintains backward compatibility with existing callers - Token parameter now optional (default None) 2. Add proper deprecation warning for token parameter - Issues DeprecationWarning when token parameter is used - Clearly communicates migration to RUNPOD_API_KEY environment variable - Follows Python deprecation best practices (warnings.warn with stacklevel=2) 3. Standardize test mocking approach across all health check tests - All tests now use consistent 'tetra_rp.core.utils.http.httpx.AsyncClient' patching - Removed inconsistent 'side_effect=lambda' pattern - Improved test maintainability by using same strategy everywhere All 503 tests pass with consistent, clean implementation.

- Fix health check timeout: Add clarification that timeout is 15 seconds per check - Add HTTP authentication details explaining RUNPOD_API_KEY header injection - Document stub decision logic for incomplete routing metadata (fallback behavior) - Clarify function signature inspection with concrete example showing parameter mapping - Expand /execute security explanation with explicit threats and best practices - Add detailed parameter type constraints for deployed endpoints (supported vs unsupported) - Add troubleshooting guide for missing routing metadata (404 errors) - Strengthen security warnings about never exposing /execute in production All documentation now matches actual implementation verified through codebase analysis.

…improvements - Add 6 new scanner tests for directory filtering (.venv, .flash, .runpod exclusion) - Add test for resource type validation to prevent false positives - Add test for fallback behavior when resource name extraction fails - Add test for handling resource names with special characters - Update existing tests to reflect new dynamic import format and resource name extraction These tests guarantee that improvements to the scanner (resource type validation, directory filtering, fallback behavior) and handler generator (dynamic imports for invalid Python identifiers) won't regress in future changes.

…alancer-sls-resource

The scanner now extracts resource names from the name= parameter rather than using variable names. Update test assertions to expect the actual resource names ('test-api', 'deployed-api') instead of variable names.

src/tetra_rp/cli/commands/build_utils/manifest.py

- Fix timeout parameter type hint to Optional[float] in LoadBalancerSlsStub - Update timeout error messages to use actual timeout value instead of hardcoded "30s" - Extract reserved paths ["/execute", "/ping"] to RESERVED_PATHS constant in manifest builder - Improve error message to dynamically list reserved paths

…positives (#132) * feat(runtime): Add generic handler factory for serverless execution Implement a factory function that creates RunPod serverless handlers, eliminating code duplication across generated handler files. The generic_handler module provides: - create_handler(function_registry) factory that accepts a dict of function/class objects and returns a RunPod-compatible handler - Automatic serialization/deserialization using cloudpickle + base64 - Support for both function execution and class instantiation + method calls - Structured error responses with full tracebacks for debugging - Load manifest for cross-endpoint function discovery This design centralizes all handler logic in one place, making it easy to: - Fix bugs once, benefit all handlers - Add new features without regenerating projects - Keep deployment packages small (handler files are ~23 lines each) Implementation: - deserialize_arguments(): Base64 + cloudpickle decoding - serialize_result(): Cloudpickle + base64 encoding - execute_function(): Handles function vs. class execution - load_manifest(): Loads flash_manifest.json for service discovery * feat(cli): Add handler generator, manifest builder, and scanner for build process Implement the build pipeline components that work together to generate serverless handlers from @Remote decorated functions. Three core components: 1. RemoteDecoratorScanner (scanner.py) - Uses Python AST to discover all @Remote decorated functions - Extracts function metadata: name, module, async status, is_class - Groups functions by resource_config for handler generation - Handles edge cases like decorated classes and async functions 2. ManifestBuilder (manifest.py) - Groups functions by their resource_config - Creates flash_manifest.json structure for service discovery - Maps functions to their modules and handler files - Enables cross-endpoint function routing at runtime 3. HandlerGenerator (handler_generator.py) - Creates lightweight handler_*.py files for each resource config - Each handler imports functions and registers them in FUNCTION_REGISTRY - Handler delegates to create_handler() factory from generic_handler - Generated handlers are ~23 lines (vs ~98 with duplication) Build Pipeline Flow: 1. Scanner discovers @Remote functions 2. ManifestBuilder groups them by resource_config 3. HandlerGenerator creates handler_*.py for each group 4. All files + manifest bundled into archive.tar.gz This eliminates ~95% duplication across handlers by using the factory pattern instead of template-based generation. * test(runtime): Add comprehensive tests for generic handler Implement 19 unit tests covering all major paths through the generic_handler factory and its helper functions. Test Coverage: Serialization/Deserialization (7 tests): - serialize_result() with simple values, dicts, lists - deserialize_arguments() with empty, args-only, kwargs-only, mixed inputs - Round-trip encoding/decoding of cloudpickle + base64 Function Execution (4 tests): - Simple function execution with positional and keyword arguments - Keyword argument handling - Class instantiation and method calls - Argument passing to instance methods Handler Factory (8 tests): - create_handler() returns callable RunPod handler - Handler with simple function registry - Missing function error handling (returns error response, not exception) - Function exceptions caught with traceback included - Multiple functions in single registry - Complex Python objects (classes, lambdas, closures) - Empty registry edge case - Default execution_type parameter - None return values - Correct RunPod response format (success, result/error, traceback) Test Strategy: - Arrange-Act-Assert pattern for clarity - Isolated unit tests (no external dependencies) - Tests verify behavior, not implementation - Error cases tested for proper error handling - All serialization tested for round-trip correctness All tests passing, 83% coverage on generic_handler.py * test(cli): Add tests for handler generation, manifest building, and scanning Implement integration tests validating the build pipeline components work correctly together. Test Coverage: HandlerGenerator Tests: - Handler files created with correct names (handler_<resource_name>.py) - Generated files import required functions from workers - FUNCTION_REGISTRY properly formatted - create_handler() imported from generic_handler - Handler creation via factory - RunPod start call present and correct - Multiple handlers generated for multiple resource configs ManifestBuilder Tests: - Manifest structure with correct version and metadata - Resources grouped by resource_config - Handler file paths correct - Function metadata preserved (name, module, is_async, is_class) - Function registry mapping complete ScannerTests: - @Remote decorated functions discovered via AST - Function metadata extracted correctly - Module paths resolved properly - Async functions detected - Class methods detected - Edge cases handled (multiple decorators, nested classes) Test Strategy: - Integration tests verify components work together - Tests verify generated files are syntactically correct - Tests validate data structures match expected schemas - No external dependencies in build process Validates that the entire build pipeline: 1. Discovers functions correctly 2. Groups them appropriately 3. Generates valid Python handler files 4. Creates correct manifest structure * docs(runtime): Document generic handler factory architecture Add comprehensive architecture documentation explaining why the factory pattern was chosen and how it works. Documentation includes: Overview & Context: - Problem statement: Handler files had 95% duplication - Design decision: Use factory function instead of templates - Benefits: Single source of truth, easier maintenance, consistency Architecture Diagrams (MermaidJS): - High-level flow: @Remote functions → Scanner → Manifest → Handlers → Factory - Component relationships: HandlerGenerator, GeneratedHandler, generic_handler - Function registry pattern: Discovery → Grouping → Registration → Factory Implementation Details: - create_handler(function_registry) signature and behavior - deserialize_arguments(): Base64 + cloudpickle decoding - serialize_result(): Cloudpickle + base64 encoding - execute_function(): Function vs. class execution - load_manifest(): Service discovery via flash_manifest.json Design Decisions (with rationale): - Factory Pattern over Inheritance: Simpler, less coupling, easier to test - CloudPickle + Base64: Handles arbitrary objects, safe JSON transmission - Manifest in Generic Handler: Runtime service discovery requirement - Structured Error Responses: Debugging aid, functional error handling - Both Execution Types: Supports stateful classes and pure functions Usage Examples: - Simple function handler - Class execution with methods - Multiple functions in one handler Build Process Integration: - 4-phase pipeline: Scanner → Grouping → Generation → Packaging - Manifest structure and contents - Generated handler structure (~23 lines) Testing Strategy: - 19 unit tests covering all major paths - 7 integration tests verifying handler generation - Manual testing with example applications Performance: - Zero runtime penalty (factory called once at startup) - No additional indirection in request path * docs(cli): Add flash build command documentation Document the flash build command and update CLI README to include it. New Documentation: flash-build.md includes: Usage & Options: - Command syntax: flash build [OPTIONS] - --no-deps: Skip transitive dependencies (faster, smaller archives) - --keep-build: Keep build directory for inspection/debugging - --output, -o: Custom archive name (default: archive.tar.gz) What It Does (5-step process): 1. Discovery: Scan for @Remote decorated functions 2. Grouping: Group functions by resource_config 3. Handler Generation: Create lightweight handler files 4. Manifest Creation: Generate flash_manifest.json 5. Packaging: Create archive.tar.gz for deployment Build Artifacts: - .flash/archive.tar.gz: Deployment package (ready for RunPod) - .flash/flash_manifest.json: Service discovery configuration - .flash/.build/: Temporary build directory Handler Generation: - Explains factory pattern and minimal handler files - Links to Runtime_Generic_Handler.md for details Dependency Management: - Default behavior: Install all dependencies including transitive - --no-deps: Only direct dependencies (when base image has transitive) - Trade-offs explained Cross-Endpoint Function Calls: - Example showing GPU and CPU endpoints - Manifest enables routing automatically Output & Troubleshooting: - Sample build output with progress indicators - Common failure scenarios and solutions - How to debug with --keep-build Next Steps: - Test locally with flash run - Deploy to RunPod - Monitor with flash undeploy list Updated CLI README.md: - Added flash build to command list in sequence - Links to full flash-build.md documentation * docs: Add build process and handler generation section to README Add a new section explaining how the build system works and why the factory pattern reduces code duplication. New Section: Build Process and Handler Generation Explains: How Flash Builds Your Application (5-step pipeline): 1. Discovery: Scans code for @Remote decorated functions 2. Grouping: Groups functions by resource_config 3. Handler Generation: Creates lightweight handler files 4. Manifest Creation: Generates flash_manifest.json for service discovery 5. Packaging: Bundles everything into archive.tar.gz Handler Architecture (with code example): - Shows generated handler using factory pattern - Single source of truth: All handler logic in one place - Easier maintenance: Bug fixes don't require rebuilding projects Cross-Endpoint Function Calls: - Example of GPU and CPU endpoints calling each other - Manifest and runtime wrapper handle service discovery Build Artifacts: - .flash/.build/: Temporary build directory - .flash/archive.tar.gz: Deployment package - .flash/flash_manifest.json: Service configuration Links to detailed documentation: - docs/Runtime_Generic_Handler.md for architecture details - src/tetra_rp/cli/docs/flash-build.md for CLI reference This section bridges the main README and detailed documentation, providing entry point for new users discovering the build system. * feat(cli): Integrate build utilities into flash build command Wire up the handler generator, manifest builder, and scanner into the actual flash build command implementation. Changes to build.py: 1. Integration: - Import RemoteDecoratorScanner for function discovery - Import ManifestBuilder for manifest creation - Import HandlerGenerator for handler file creation - Call these in sequence during the build process 2. Build Pipeline: - After copying project files, scan for @Remote functions - Build manifest from discovered functions - Generate handler files for each resource config - Write manifest to build directory - Progress indicators show what's being generated 3. Fixes: - Change .tetra directory references to .flash - Uncomment actual build logic (was showing "Coming Soon" message) - Fix progress messages to show actual file counts 4. Error Handling: - Try/catch around handler generation - Warning shown if generation fails but build continues - User can debug with --keep-build flag Build Flow Now: 1. Load ignore patterns 2. Collect project files 3. Create build directory 4. Copy files to build directory 5. [NEW] Scan for @Remote functions 6. [NEW] Build and write manifest 7. [NEW] Generate handler files 8. Install dependencies 9. Create archive 10. Clean up build directory (unless --keep-build) Dependencies: - Updated uv.lock with all required dependencies * refactor(build): Fix directory structure and add comprehensive error handling **Critical Fixes:** - Remove "Coming Soon" message blocking build command execution - Fix build directory to use .flash/.build/ directly (no app_name subdirectory) - Fix tarball to extract with flat structure using arcname="." - Fix cleanup to remove correct build directory **Error Handling & Validation:** - Add specific exception handling (ImportError, SyntaxError, ValueError) - Add import validation to generated handlers - Add duplicate function name detection across resources - Add proper error logging throughout build process **Resource Type Tracking:** - Add resource_type field to RemoteFunctionMetadata - Track actual resource types (LiveServerless, CpuLiveServerless) - Use actual types in manifest instead of hardcoding **Robustness Improvements:** - Add handler import validation post-generation - Add manifest path fallback search (cwd, module dir, legacy location) - Add resource name sanitization for safe filenames - Add specific exception logging in scanner (UnicodeDecodeError, SyntaxError) **User Experience:** - Add troubleshooting section to README - Update manifest path documentation in docs - Change "Zero Runtime Penalty" to "Minimal Runtime Overhead" - Mark future enhancements as "Not Yet Implemented" - Improve build success message with next steps Fixes all 20 issues identified in code review (issues #1-13, #19-22) * feat(resources): Add LoadBalancerSlsResource for LB endpoints Implement LoadBalancerSlsResource class for provisioning RunPod load-balanced serverless endpoints. Load-balanced endpoints expose HTTP servers directly to clients without queue-based processing, enabling REST APIs, webhooks, and real-time communication patterns. Key features: - Type enforcement (always LB, never QB) - Scaler validation (REQUEST_COUNT required, not QUEUE_DELAY) - Health check polling via /ping endpoint (200/204 = healthy) - Post-deployment verification with configurable retries - Async and sync health check methods - Comprehensive unit tests - Full documentation with architecture diagrams and examples Architecture: - Extends ServerlessResource with LB-specific behavior - Validates configuration before deployment - Polls /ping endpoint until healthy (10 retries × 5s = 50s timeout) - Raises TimeoutError if endpoint fails to become healthy This forms the foundation for Mothership architecture where a load-balanced endpoint serves as a directory server for child endpoints. * fix(test): Fix LoadBalancerSlsResource deployment test mocks Import ServerlessResource directly and use patch.object on the imported class instead of string-based patches. This ensures the mocks properly intercept the parent class's _do_deploy method when called via super(). Simplifies mock configuration and removes an unused variable assertion. Fixes the three failing deployment tests that were making real GraphQL API calls. All tests now pass: 418 passed, 1 skipped. * feat(resources): Phase 1 - Core infrastructure for @Remote on LB endpoints Implement core infrastructure for enabling @Remote decorator on LoadBalancerSlsResource endpoints with HTTP method/path routing. Changes: - Create LoadBalancerSlsStub: HTTP-based stub for direct endpoint execution (src/tetra_rp/stubs/load_balancer_sls.py, 170 lines) - Serializes functions and arguments using cloudpickle + base64 - Direct HTTP POST to /execute endpoint (no queue polling) - Proper error handling and deserialization - Register stub with singledispatch (src/tetra_rp/stubs/registry.py) - Enables @Remote to dispatch to LoadBalancerSlsStub for LB resources - Extend @Remote decorator with HTTP routing parameters (src/tetra_rp/client.py) - Add 'method' parameter: GET, POST, PUT, DELETE, PATCH - Add 'path' parameter: /api/endpoint routes - Validate method/path required for LoadBalancerSlsResource - Store routing metadata on decorated functions/classes - Warn if routing params used with non-LB resources Foundation for Phase 2 (Build system integration) and Phase 3 (Local dev). * feat(build): Phase 2.1 - Enhanced scanner for HTTP routing extraction Update RemoteDecoratorScanner to extract HTTP method and path from @Remote decorator for LoadBalancerSlsResource endpoints. Changes: - Add http_method and http_path fields to RemoteFunctionMetadata - Add _extract_http_routing() method to parse decorator keywords - Extract method (GET, POST, PUT, DELETE, PATCH) from decorator - Extract path (/api/process) from decorator - Store routing metadata for manifest generation Foundation for Phase 2.2 (Manifest updates) and Phase 2.3 (Handler generation). * feat(build): Phase 2.2 - Updated manifest schema for HTTP routing Enhance ManifestBuilder to support HTTP method/path routing for LoadBalancerSlsResource endpoints. Changes: - Add http_method and http_path fields to ManifestFunction - Validate LB endpoints have both method and path - Detect and prevent route conflicts (same method + path) - Prevent use of reserved paths (/execute, /ping) - Add 'routes' section to manifest for LB endpoints - Conditional inclusion of routing fields (only for LB) Manifest structure for LB endpoints now includes: { "resources": { "api_service": { "resource_type": "LoadBalancerSlsResource", "functions": [ { "name": "process_data", "http_method": "POST", "http_path": "/api/process" } ] } }, "routes": { "api_service": { "POST /api/process": "process_data" } } } * feat(cli): Add LB handler generator for FastAPI app creation Implement LBHandlerGenerator to create FastAPI applications for LoadBalancerSlsResource endpoints with HTTP method/path routing. Key features: - Generates FastAPI apps with explicit route registry - Creates (method, path) -> function mappings from manifest - Validates route conflicts and reserved paths - Imports user functions and creates dynamic routes - Includes required /ping health check endpoint - Validates generated handler Python syntax via import Generated handler structure enables: - Direct HTTP routing to user functions via FastAPI - Framework /execute endpoint for @Remote stub execution - Local development with uvicorn * feat(runtime): Implement LB handler factory for FastAPI app creation Create create_lb_handler() factory function that dynamically builds FastAPI applications from route registries for LoadBalancerSlsResource endpoints. Key features: - Accepts route_registry: Dict[(method, path)] -> handler_function mapping - Registers all user-defined routes from registry to FastAPI app - Provides /execute endpoint for @Remote stub function execution - Handles async function execution automatically - Serializes results with cloudpickle + base64 encoding - Comprehensive error handling with detailed logging The /execute endpoint enables: - Remote function code execution via @Remote decorator - Automatic argument deserialization from cloudpickle/base64 - Result serialization for transmission back to client - Support for both sync and async functions * feat(cli): Route build command to separate handlers for LB endpoints Update build command to use appropriate handler generators based on resource type. Separates LoadBalancerSlsResource endpoints (using FastAPI) from queue-based endpoints (using generic handler). Changes: - Import LBHandlerGenerator alongside HandlerGenerator - Inspect manifest resources and separate by type - Generate LB handlers via LBHandlerGenerator - Generate QB handlers via HandlerGenerator - Combine all generated handler paths for summary Enables users to mix LB and QB endpoints in same project with correct code generation for each resource type. * feat(resources): Add LiveLoadBalancer for local LB endpoint testing Implement LiveLoadBalancer resource following the LiveServerless pattern for local development and testing of load-balanced endpoints. Changes: - Add TETRA_LB_IMAGE constant for load-balanced Tetra image - Create LiveLoadBalancer class extending LoadBalancerSlsResource - Uses LiveServerlessMixin to lock imageName to Tetra LB image - Register LiveLoadBalancer with LoadBalancerSlsStub in singledispatch - Export LiveLoadBalancer from core.resources and top-level __init__ This enables users to test LB-based functions locally before deploying, using the same pattern as LiveServerless for queue-based endpoints. Users can now write: from tetra_rp import LiveLoadBalancer, remote api = LiveLoadBalancer(name="test-api") @Remote(api, method="POST", path="/api/process") async def process_data(x, y): return {"result": x + y} result = await process_data(5, 3) # Local execution * test(stubs): Add comprehensive unit tests for LoadBalancerSlsStub Implement unit tests for LoadBalancerSlsStub covering: - Request preparation with arguments and dependencies - Response handling for success and error cases - Error handling for invalid responses - Base64 encoding/decoding of serialized data - Endpoint URL validation - Timeout and HTTP error handling Test coverage: - _prepare_request: 4 tests - _handle_response: 5 tests - _execute_function: 3 error case tests - __call__: 2 integration tests Tests verify proper function serialization, argument handling, error propagation, and response deserialization. * fix(test): Correct LB endpoint test decorator to match assertions Fix test_load_balancer_vs_queue_based_endpoints by updating the @Remote decorator to use method='POST' and path='/api/echo' to match the test assertions. This was a test-level bug where the decorator definition didn't match what was being asserted. * docs: Add comprehensive documentation for @Remote with LoadBalancer endpoints - Using_Remote_With_LoadBalancer.md: User guide for HTTP routing, local development, building and deploying - LoadBalancer_Runtime_Architecture.md: Technical details on deployment, request flows, security, and performance - Updated README.md with LoadBalancer section and code example - Updated Load_Balancer_Endpoints.md with cross-references to new guides * security: Remove /execute from deployed LoadBalancer endpoints Split @Remote execution behavior between local and deployed: - LiveLoadBalancer (local): Uses /execute endpoint for function serialization - LoadBalancerSlsResource (deployed): Uses user-defined routes with HTTP param mapping Changes: 1. LoadBalancerSlsStub routing detection: - _should_use_execute_endpoint() determines execution path - _execute_via_user_route() maps args to JSON and POSTs to user routes - Auto-detects resource type and routing metadata 2. Conditional /execute registration: - create_lb_handler() now accepts include_execute parameter - Generated handlers default to include_execute=False (security) - LiveLoadBalancer can enable /execute if needed 3. Updated handler generator: - Added clarity comments on /execute exclusion for deployed endpoints 4. Comprehensive test coverage: - 8 new tests for routing detection and execution paths - All 31 tests passing (22 unit + 9 integration) 5. Documentation updates: - Using_Remote_With_LoadBalancer.md: clarified /execute scope - Added 'Local vs Deployed Execution' section explaining differences - LoadBalancer_Runtime_Architecture.md: updated execution model - Added troubleshooting for deployed endpoint scenarios Security improvement: - Deployed endpoints only expose user-defined routes - /execute endpoint removed from production (prevents arbitrary code execution) - Lower attack surface for deployed endpoints * feat(build): Phase 4 - Fix LiveLoadBalancer handler generation to include /execute endpoint - Modified manifest.py to validate LiveLoadBalancer endpoints like LoadBalancerSlsResource - Updated lb_handler_generator to: - Include LiveLoadBalancer in handler generation filter - Pass include_execute=True for LiveLoadBalancer (local dev) - Pass include_execute=False for LoadBalancerSlsResource (deployed) - Added integration tests: - Verify LiveLoadBalancer handlers include /execute endpoint - Verify deployed handlers exclude /execute endpoint - Fixes critical bug: LiveLoadBalancer now gets /execute endpoint in generated handlers * fix(scanner): Discover LoadBalancer resources in addition to Serverless resources - Updated scanner to extract LiveLoadBalancer and LoadBalancerSlsResource resources - Previously only looked for 'Serverless' in class name, missing LoadBalancer endpoints - Now checks for both 'Serverless' and 'LoadBalancer' in resource type names - Added integration test to verify scanner discovers both resource types - Fixes critical bug that prevented flash build from finding LoadBalancer endpoints * chore: Format code for line length and remove unused imports - Wrap long lines in manifest.py, lb_handler.py, and load_balancer_sls.py - Remove unused httpx import in test_load_balancer_sls_stub.py - Apply consistent formatting across codebase * fix: Address PR #131 review feedback - Scanner: Use exact type name matching instead of substring matching - Whitelist specific resource types to avoid false positives - Prevents matching classes like 'MyServerlessHelper' or 'LoadBalancerUtils' - Type hints: Use Optional[str] for nullable fields in manifest - ManifestFunction.http_method and http_path now properly typed - Timeout: Make HTTP client timeout configurable - Added LoadBalancerSlsStub.DEFAULT_TIMEOUT class attribute - Added timeout parameter to __init__ - Updated both _execute_function and _execute_via_user_route to use self.timeout - Deprecated datetime: Replace datetime.utcnow() with datetime.now(timezone.utc) - Updated manifest.py and test_lb_remote_execution.py - Ensures Python 3.12+ compatibility * style: Format datetime chaining for line length * fix: LiveLoadBalancer template not serialized to RunPod GraphQL The set_serverless_template model_validator was being overwritten by sync_input_fields (both had mode="after"). In Pydantic v2, when two validators with the same mode are defined in a class, only one is registered. This caused templates to never be created from imageName, resulting in: "GraphQL errors: One of templateId, template is required to create an endpoint" Solution: - Move set_serverless_template validator from ServerlessResource base class to subclasses (ServerlessEndpoint and LoadBalancerSlsResource) where the validation is actually needed - Keep helper methods (_create_new_template, _configure_existing_template) in base class for reuse - Add comprehensive tests for LiveLoadBalancer template serialization This allows: 1. Base ServerlessResource to be instantiated freely for testing/configuration 2. Subclasses (ServerlessEndpoint, LoadBalancerSlsResource) to enforce template requirements during deployment 3. Proper template serialization in GraphQL payload for RunPod API Fixes: One of templateId, template is required to create an endpoint error when deploying LiveLoadBalancer with custom image tags like runpod/tetra-rp-lb:local * fix: LoadBalancer endpoint URL and add CPU support - Fix: Use correct endpoint URL format for load-balanced endpoints (https://{id}.api.runpod.ai instead of https://api.runpod.ai/v2/{id}) This fixes 404 errors on /ping health check endpoints - Feature: Add CPU LoadBalancer support * Create CpuLoadBalancerSlsResource for CPU-based load-balanced endpoints * Create CpuLiveLoadBalancer for local CPU LB development * Add TETRA_CPU_LB_IMAGE constant for CPU LB Docker image * Update example code to use CpuLiveLoadBalancer for CPU worker * Add 8 comprehensive tests for CPU LoadBalancer functionality - Tests: Add 2 tests for endpoint URL format validation - All 474 tests passing, 64% code coverage * fix: Export CpuLiveLoadBalancer and CpuLoadBalancerSlsResource from tetra_rp package LoadBalancer resources were not being discovered by ResourceDiscovery because the new CPU variants (CpuLiveLoadBalancer, CpuLoadBalancerSlsResource) were not exported from the main tetra_rp package. This prevented undeploy from picking up these resources. Added exports to: - TYPE_CHECKING imports for type hints - __getattr__ function for lazy loading - __all__ list for public API This fixes the issue where 'flash undeploy list' could not find LoadBalancer resources that were deployed with 'flash run --auto-provision'. * fix: Add API key authentication to LoadBalancer health check The /ping endpoint for RunPod load-balanced endpoints requires the RUNPOD_API_KEY header for authentication. Without it, the health check fails with 401 Unauthorized, causing provisioning to timeout. This fix adds the Authorization header to the health check request if the RUNPOD_API_KEY environment variable is available, allowing the endpoint health check to succeed during provisioning. Fixes issue where 'flash run --auto-provision' would fail even though the endpoint was successfully created on RunPod. * fix(lb): Exclude flashboot from CpuLoadBalancerSlsResource GraphQL payload CpuLoadBalancerSlsResource was overriding _input_only without including flashboot, causing it to be sent to the RunPod GraphQL API which doesn't accept this field. This caused deployment to fail with: Field "flashboot" is not defined by type "EndpointInput". * fix(lb): Expand CpuInstanceType.ANY to all CPU flavors in CpuLoadBalancerSlsResource Add field_validator to expand [CpuInstanceType.ANY] to all available CPU instance types (cpu3g, cpu3c, cpu5c variants). This matches the behavior in CpuServerlessEndpoint and prevents deployment errors like 'instanceId must be in the format of flavorId-vcpu-ram'. * refactor(cpu): Move instanceIds validator to CpuEndpointMixin Move the instanceIds field_validator from CpuServerlessEndpoint to CpuEndpointMixin so both CpuServerlessEndpoint and CpuLoadBalancerSlsResource share the same validator that expands [CpuInstanceType.ANY] to all available CPU flavors. This eliminates code duplication and ensures consistent behavior across all CPU endpoint types. * test: Update CPU instance test to reflect validator expansion Update test_cpu_live_load_balancer_defaults_to_cpu_any to verify that [CpuInstanceType.ANY] is correctly expanded to all available CPU instance types by the field_validator in CpuEndpointMixin. * fix(lb): Increase health check timeout from 5s to 15s Load-balanced workers need more time to respond during cold starts and initialization. RunPod docs recommend at least 10-15 second timeouts for health checks. Workers may return 204 during initialization, which is normal and expected. * fix(lb): Fix CPU load balancer template deployment error Fixes two bugs in CpuLoadBalancerSlsResource that prevented CPU load balancers from deploying: 1. Added gpuCount and allowedCudaVersions to _input_only exclusion set to prevent GPU-specific fields from being sent to RunPod API 2. Overrode set_serverless_template() to call _sync_cpu_fields() first, ensuring GPU defaults are overridden to CPU-appropriate values (gpuCount=0) The RunPod API was rejecting CPU load balancer templates because GPU-specific fields were being included in the GraphQL payload. These changes align CpuLoadBalancerSlsResource behavior with CpuServerlessEndpoint. Also added comprehensive test coverage (30+ tests) to verify: - GPU fields are correctly overridden to CPU defaults - GPU fields are excluded from API payloads - CPU-specific fields are properly included - Consistency with CpuServerlessEndpoint behavior * fix(drift): Exclude runtime fields from config hash to prevent false positives Fixes false positive configuration drift detection by separating concerns: 1. Update ServerlessResource.config_hash to exclude runtime fields - Fields like template, templateId, aiKey, userId are API-assigned - Prevents false drift when same config is redeployed across processes - Now only hashes user-specified configuration 2. Add config_hash override to CpuLoadBalancerSlsResource - CPU load balancers hash only CPU-relevant fields - Excludes GPU-specific fields and runtime fields - Follows same pattern as CpuServerlessEndpoint 3. Fix _has_structural_changes to exclude template/templateId - CRITICAL: These runtime fields were causing false structural changes - Was forcing unnecessary redeployments despite update() being available - Now system correctly uses update() instead of undeploy+deploy 4. Make field serializers robust to handle string/enum values - Prevents serialization errors when fields are pre-converted to strings 5. Add comprehensive drift detection tests (16 tests) - Test hash stability with runtime field changes - Test exclusion of env, template, templateId, and other runtime fields - Test that actual config changes (image, flashboot) are detected - Test structural change detection behavior - Test real-world deployment scenarios Results: - Same config deployed multiple times: no false drift - Different env vars with same config: no false drift - Template/templateId changes: no false drift - API-assigned fields: no false drift - User config changes (image, flashboot): drift detected correctly - All 512 unit tests pass * fix(http): Standardize RunPod HTTP client authentication across codebase Centralizes HTTP client creation for RunPod load-balanced endpoints to prevent manual Authorization header code duplication and ensure consistent authentication: 1. Create centralized HTTP utility function (src/tetra_rp/core/utils/http.py) - New function: get_authenticated_httpx_client() - Automatically adds Bearer token Authorization header if RUNPOD_API_KEY set - Provides consistent timeout handling (default 30s, customizable) - Follows existing GraphQL/REST client authentication pattern 2. Fix critical authentication bug in LoadBalancerSlsStub._execute_via_user_route() - Previously: Missing Authorization header (401 errors on user routes) - Now: Uses centralized utility for proper authentication - Enables direct HTTP calls to user-defined routes with auth 3. Refactor two methods to use centralized utility - LoadBalancerSlsStub._execute_function() - removes 7+ lines of manual auth code - LoadBalancerSlsResource._check_ping_endpoint() - simplifies auth setup 4. Add comprehensive unit tests (tests/unit/core/utils/test_http.py) - Tests API key presence/absence handling - Tests custom and default timeout configuration - Tests edge cases (empty key, zero timeout) - All 7 tests pass with 100% coverage Results: - Single source of truth for HTTP authentication (centralized utility) - Fixes 401 Unauthorized errors on load-balanced endpoints - Eliminates repetitive manual auth code across 3+ locations - Easier to maintain and update authentication patterns in future - All 499 unit tests pass - Code coverage: 64% (exceeds 35% requirement) * feat(http): Extend HTTP utilities to cover both sync and async authentication Extends the centralized HTTP authentication pattern to all RunPod API calls: 1. Add get_authenticated_requests_session() for synchronous requests - Creates requests.Session with automatic Bearer token Authorization header - Follows same pattern as async get_authenticated_httpx_client() - Single source of truth for sync HTTP authentication 2. Refactor template.py to use centralized utility - Removes manual Authorization header setup (line 86) - Now uses get_authenticated_requests_session() for all template updates - Improves error handling with raise_for_status() - Token parameter marked deprecated; uses RUNPOD_API_KEY env var 3. Add comprehensive tests for sync utility (4 tests) - Tests API key presence/absence handling - Tests empty API key edge case - Tests Session object validation - All tests pass with proper cleanup Benefits: - True single source of truth for all RunPod HTTP authentication (sync + async) - Consistent patterns across entire codebase - Easier future auth changes across all HTTP client types - Eliminates manual auth header code in template.py - All 503 unit tests pass - Code coverage: 64% (exceeds 35% requirement) Note: requests.Session doesn't support default timeouts; timeout should be specified per request (e.g., session.post(url, json=data, timeout=30.0)) * fix: Address PR feedback on HTTP utilities implementation Addresses three feedback items from code review: 1. Fix breaking parameter order change in update_system_dependencies() - Restored original parameter order: template_id, token, system_dependencies, base_entry_cmd - Maintains backward compatibility with existing callers - Token parameter now optional (default None) 2. Add proper deprecation warning for token parameter - Issues DeprecationWarning when token parameter is used - Clearly communicates migration to RUNPOD_API_KEY environment variable - Follows Python deprecation best practices (warnings.warn with stacklevel=2) 3. Standardize test mocking approach across all health check tests - All tests now use consistent 'tetra_rp.core.utils.http.httpx.AsyncClient' patching - Removed inconsistent 'side_effect=lambda' pattern - Improved test maintainability by using same strategy everywhere All 503 tests pass with consistent, clean implementation. * refactor(drift): Extract runtime field constants and improve maintainability - Extract RUNTIME_FIELDS and EXCLUDED_HASH_FIELDS as ClassVar constants in ServerlessResource for centralized field list management - Add clarifying comments to enum serializers explaining defensive isinstance() checks for nested model serialization - Document CPU load balancer field list coupling in docstring with maintenance guidelines - Add TestSerializerDefensiveBehavior class with 4 tests verifying pre-stringified enum value handling - Use ClassVar annotation to satisfy Pydantic v2 model field requirements This reduces maintenance burden by centralizing field definitions and improves code clarity without changing functionality. * docs: Improve LoadBalancer documentation accuracy and completeness - Fix health check timeout: Add clarification that timeout is 15 seconds per check - Add HTTP authentication details explaining RUNPOD_API_KEY header injection - Document stub decision logic for incomplete routing metadata (fallback behavior) - Clarify function signature inspection with concrete example showing parameter mapping - Expand /execute security explanation with explicit threats and best practices - Add detailed parameter type constraints for deployed endpoints (supported vs unsupported) - Add troubleshooting guide for missing routing metadata (404 errors) - Strengthen security warnings about never exposing /execute in production All documentation now matches actual implementation verified through codebase analysis. * docs: add resource config drift detection documentation - comprehensive guide on drift detection implementation - covers hash computation, field exclusion, and cpu-specific behavior - includes testing patterns and troubleshooting guide - documents all fields that trigger drift vs those ignored * docs: proper name for the file * test(build): Add comprehensive test coverage for scanner and handler improvements - Add 6 new scanner tests for directory filtering (.venv, .flash, .runpod exclusion) - Add test for resource type validation to prevent false positives - Add test for fallback behavior when resource name extraction fails - Add test for handling resource names with special characters - Update existing tests to reflect new dynamic import format and resource name extraction These tests guarantee that improvements to the scanner (resource type validation, directory filtering, fallback behavior) and handler generator (dynamic imports for invalid Python identifiers) won't regress in future changes. * test(scanner): Fix resource type assertions to match scanner behavior The scanner now extracts resource names from the name= parameter rather than using variable names. Update test assertions to expect the actual resource names ('test-api', 'deployed-api') instead of variable names. * chore: merge correction * fix(drift): Remove manual undeploy/deploy from update() method Use saveEndpoint mutation for all changes instead of manual lifecycle management. Server-side automatically detects version-triggering fields (GPU, template, volumes) and increments endpoint version accordingly. Keep _has_structural_changes() as informational for logging purposes only. This aligns with RunPod API's version-based deployment model. * docs(drift): Clarify _has_structural_changes detects version-triggering changes Update docstring to reflect that this method identifies changes that trigger server-side version increment and worker recreation, not manual redeploy cycles. Explain which changes are version-triggering vs rolling updates, and note that the method is now informational for logging only. * feat(drift): Enable environment variable drift detection Remove env from EXCLUDED_HASH_FIELDS so changes to environment variables trigger drift detection and endpoint updates. Environment changes are non-version-triggering (rolling updates), so server will apply them via saveEndpoint without recreating workers. Add env to CPU LoadBalancer config_hash for consistent behavior across all resource types. Update comments to reflect that env is user-specified configuration, not dynamically computed. * test(drift): Update tests for environment variable drift detection - test_lb_config_hash_excludes_env_variables → test_lb_config_hash_detects_env_changes - test_env_var_changes_no_drift → test_env_var_changes_trigger_drift - test_config_hash_excludes_env_from_drift → test_config_hash_detects_env_from_drift Update assertions to expect different hashes when env changes, matching new behavior where environment variable changes trigger drift and updates. * fix: Address Copilot review feedback on type hints and documentation - Fix type annotation for timeout parameter in LoadBalancerSlsStub (Optional[float]) - Replace hardcoded "30s" with actual self.timeout in error messages (2 locations) - Update Resource_Config_Drift_Detection.md to reflect actual EXCLUDED_HASH_FIELDS - Remove duplicate Load-Balanced Endpoints section from README.md Addresses Copilot review comments (PR #132, review 3642596664)

* feat(runtime): Add generic handler factory for serverless execution Implement a factory function that creates RunPod serverless handlers, eliminating code duplication across generated handler files. The generic_handler module provides: - create_handler(function_registry) factory that accepts a dict of function/class objects and returns a RunPod-compatible handler - Automatic serialization/deserialization using cloudpickle + base64 - Support for both function execution and class instantiation + method calls - Structured error responses with full tracebacks for debugging - Load manifest for cross-endpoint function discovery This design centralizes all handler logic in one place, making it easy to: - Fix bugs once, benefit all handlers - Add new features without regenerating projects - Keep deployment packages small (handler files are ~23 lines each) Implementation: - deserialize_arguments(): Base64 + cloudpickle decoding - serialize_result(): Cloudpickle + base64 encoding - execute_function(): Handles function vs. class execution - load_manifest(): Loads flash_manifest.json for service discovery * feat(cli): Add handler generator, manifest builder, and scanner for build process Implement the build pipeline components that work together to generate serverless handlers from @remote decorated functions. Three core components: 1. RemoteDecoratorScanner (scanner.py) - Uses Python AST to discover all @remote decorated functions - Extracts function metadata: name, module, async status, is_class - Groups functions by resource_config for handler generation - Handles edge cases like decorated classes and async functions 2. ManifestBuilder (manifest.py) - Groups functions by their resource_config - Creates flash_manifest.json structure for service discovery - Maps functions to their modules and handler files - Enables cross-endpoint function routing at runtime 3. HandlerGenerator (handler_generator.py) - Creates lightweight handler_*.py files for each resource config - Each handler imports functions and registers them in FUNCTION_REGISTRY - Handler delegates to create_handler() factory from generic_handler - Generated handlers are ~23 lines (vs ~98 with duplication) Build Pipeline Flow: 1. Scanner discovers @remote functions 2. ManifestBuilder groups them by resource_config 3. HandlerGenerator creates handler_*.py for each group 4. All files + manifest bundled into archive.tar.gz This eliminates ~95% duplication across handlers by using the factory pattern instead of template-based generation. * test(runtime): Add comprehensive tests for generic handler Implement 19 unit tests covering all major paths through the generic_handler factory and its helper functions. Test Coverage: Serialization/Deserialization (7 tests): - serialize_result() with simple values, dicts, lists - deserialize_arguments() with empty, args-only, kwargs-only, mixed inputs - Round-trip encoding/decoding of cloudpickle + base64 Function Execution (4 tests): - Simple function execution with positional and keyword arguments - Keyword argument handling - Class instantiation and method calls - Argument passing to instance methods Handler Factory (8 tests): - create_handler() returns callable RunPod handler - Handler with simple function registry - Missing function error handling (returns error response, not exception) - Function exceptions caught with traceback included - Multiple functions in single registry - Complex Python objects (classes, lambdas, closures) - Empty registry edge case - Default execution_type parameter - None return values - Correct RunPod response format (success, result/error, traceback) Test Strategy: - Arrange-Act-Assert pattern for clarity - Isolated unit tests (no external dependencies) - Tests verify behavior, not implementation - Error cases tested for proper error handling - All serialization tested for round-trip correctness All tests passing, 83% coverage on generic_handler.py * test(cli): Add tests for handler generation, manifest building, and scanning Implement integration tests validating the build pipeline components work correctly together. Test Coverage: HandlerGenerator Tests: - Handler files created with correct names (handler_<resource_name>.py) - Generated files import required functions from workers - FUNCTION_REGISTRY properly formatted - create_handler() imported from generic_handler - Handler creation via factory - RunPod start call present and correct - Multiple handlers generated for multiple resource configs ManifestBuilder Tests: - Manifest structure with correct version and metadata - Resources grouped by resource_config - Handler file paths correct - Function metadata preserved (name, module, is_async, is_class) - Function registry mapping complete ScannerTests: - @remote decorated functions discovered via AST - Function metadata extracted correctly - Module paths resolved properly - Async functions detected - Class methods detected - Edge cases handled (multiple decorators, nested classes) Test Strategy: - Integration tests verify components work together - Tests verify generated files are syntactically correct - Tests validate data structures match expected schemas - No external dependencies in build process Validates that the entire build pipeline: 1. Discovers functions correctly 2. Groups them appropriately 3. Generates valid Python handler files 4. Creates correct manifest structure * docs(runtime): Document generic handler factory architecture Add comprehensive architecture documentation explaining why the factory pattern was chosen and how it works. Documentation includes: Overview & Context: - Problem statement: Handler files had 95% duplication - Design decision: Use factory function instead of templates - Benefits: Single source of truth, easier maintenance, consistency Architecture Diagrams (MermaidJS): - High-level flow: @remote functions → Scanner → Manifest → Handlers → Factory - Component relationships: HandlerGenerator, GeneratedHandler, generic_handler - Function registry pattern: Discovery → Grouping → Registration → Factory Implementation Details: - create_handler(function_registry) signature and behavior - deserialize_arguments(): Base64 + cloudpickle decoding - serialize_result(): Cloudpickle + base64 encoding - execute_function(): Function vs. class execution - load_manifest(): Service discovery via flash_manifest.json Design Decisions (with rationale): - Factory Pattern over Inheritance: Simpler, less coupling, easier to test - CloudPickle + Base64: Handles arbitrary objects, safe JSON transmission - Manifest in Generic Handler: Runtime service discovery requirement - Structured Error Responses: Debugging aid, functional error handling - Both Execution Types: Supports stateful classes and pure functions Usage Examples: - Simple function handler - Class execution with methods - Multiple functions in one handler Build Process Integration: - 4-phase pipeline: Scanner → Grouping → Generation → Packaging - Manifest structure and contents - Generated handler structure (~23 lines) Testing Strategy: - 19 unit tests covering all major paths - 7 integration tests verifying handler generation - Manual testing with example applications Performance: - Zero runtime penalty (factory called once at startup) - No additional indirection in request path * docs(cli): Add flash build command documentation Document the flash build command and update CLI README to include it. New Documentation: flash-build.md includes: Usage & Options: - Command syntax: flash build [OPTIONS] - --no-deps: Skip transitive dependencies (faster, smaller archives) - --keep-build: Keep build directory for inspection/debugging - --output, -o: Custom archive name (default: archive.tar.gz) What It Does (5-step process): 1. Discovery: Scan for @remote decorated functions 2. Grouping: Group functions by resource_config 3. Handler Generation: Create lightweight handler files 4. Manifest Creation: Generate flash_manifest.json 5. Packaging: Create archive.tar.gz for deployment Build Artifacts: - .flash/archive.tar.gz: Deployment package (ready for RunPod) - .flash/flash_manifest.json: Service discovery configuration - .flash/.build/: Temporary build directory Handler Generation: - Explains factory pattern and minimal handler files - Links to Runtime_Generic_Handler.md for details Dependency Management: - Default behavior: Install all dependencies including transitive - --no-deps: Only direct dependencies (when base image has transitive) - Trade-offs explained Cross-Endpoint Function Calls: - Example showing GPU and CPU endpoints - Manifest enables routing automatically Output & Troubleshooting: - Sample build output with progress indicators - Common failure scenarios and solutions - How to debug with --keep-build Next Steps: - Test locally with flash run - Deploy to RunPod - Monitor with flash undeploy list Updated CLI README.md: - Added flash build to command list in sequence - Links to full flash-build.md documentation * docs: Add build process and handler generation section to README Add a new section explaining how the build system works and why the factory pattern reduces code duplication. New Section: Build Process and Handler Generation Explains: How Flash Builds Your Application (5-step pipeline): 1. Discovery: Scans code for @remote decorated functions 2. Grouping: Groups functions by resource_config 3. Handler Generation: Creates lightweight handler files 4. Manifest Creation: Generates flash_manifest.json for service discovery 5. Packaging: Bundles everything into archive.tar.gz Handler Architecture (with code example): - Shows generated handler using factory pattern - Single source of truth: All handler logic in one place - Easier maintenance: Bug fixes don't require rebuilding projects Cross-Endpoint Function Calls: - Example of GPU and CPU endpoints calling each other - Manifest and runtime wrapper handle service discovery Build Artifacts: - .flash/.build/: Temporary build directory - .flash/archive.tar.gz: Deployment package - .flash/flash_manifest.json: Service configuration Links to detailed documentation: - docs/Runtime_Generic_Handler.md for architecture details - src/tetra_rp/cli/docs/flash-build.md for CLI reference This section bridges the main README and detailed documentation, providing entry point for new users discovering the build system. * feat(cli): Integrate build utilities into flash build command Wire up the handler generator, manifest builder, and scanner into the actual flash build command implementation. Changes to build.py: 1. Integration: - Import RemoteDecoratorScanner for function discovery - Import ManifestBuilder for manifest creation - Import HandlerGenerator for handler file creation - Call these in sequence during the build process 2. Build Pipeline: - After copying project files, scan for @remote functions - Build manifest from discovered functions - Generate handler files for each resource config - Write manifest to build directory - Progress indicators show what's being generated 3. Fixes: - Change .tetra directory references to .flash - Uncomment actual build logic (was showing "Coming Soon" message) - Fix progress messages to show actual file counts 4. Error Handling: - Try/catch around handler generation - Warning shown if generation fails but build continues - User can debug with --keep-build flag Build Flow Now: 1. Load ignore patterns 2. Collect project files 3. Create build directory 4. Copy files to build directory 5. [NEW] Scan for @remote functions 6. [NEW] Build and write manifest 7. [NEW] Generate handler files 8. Install dependencies 9. Create archive 10. Clean up build directory (unless --keep-build) Dependencies: - Updated uv.lock with all required dependencies * refactor(build): Fix directory structure and add comprehensive error handling **Critical Fixes:** - Remove "Coming Soon" message blocking build command execution - Fix build directory to use .flash/.build/ directly (no app_name subdirectory) - Fix tarball to extract with flat structure using arcname="." - Fix cleanup to remove correct build directory **Error Handling & Validation:** - Add specific exception handling (ImportError, SyntaxError, ValueError) - Add import validation to generated handlers - Add duplicate function name detection across resources - Add proper error logging throughout build process **Resource Type Tracking:** - Add resource_type field to RemoteFunctionMetadata - Track actual resource types (LiveServerless, CpuLiveServerless) - Use actual types in manifest instead of hardcoding **Robustness Improvements:** - Add handler import validation post-generation - Add manifest path fallback search (cwd, module dir, legacy location) - Add resource name sanitization for safe filenames - Add specific exception logging in scanner (UnicodeDecodeError, SyntaxError) **User Experience:** - Add troubleshooting section to README - Update manifest path documentation in docs - Change "Zero Runtime Penalty" to "Minimal Runtime Overhead" - Mark future enhancements as "Not Yet Implemented" - Improve build success message with next steps Fixes all 20 issues identified in code review (issues #1-13, #19-22) * feat(resources): Add LoadBalancerSlsResource for LB endpoints Implement LoadBalancerSlsResource class for provisioning RunPod load-balanced serverless endpoints. Load-balanced endpoints expose HTTP servers directly to clients without queue-based processing, enabling REST APIs, webhooks, and real-time communication patterns. Key features: - Type enforcement (always LB, never QB) - Scaler validation (REQUEST_COUNT required, not QUEUE_DELAY) - Health check polling via /ping endpoint (200/204 = healthy) - Post-deployment verification with configurable retries - Async and sync health check methods - Comprehensive unit tests - Full documentation with architecture diagrams and examples Architecture: - Extends ServerlessResource with LB-specific behavior - Validates configuration before deployment - Polls /ping endpoint until healthy (10 retries × 5s = 50s timeout) - Raises TimeoutError if endpoint fails to become healthy This forms the foundation for Mothership architecture where a load-balanced endpoint serves as a directory server for child endpoints. * fix(test): Fix LoadBalancerSlsResource deployment test mocks Import ServerlessResource directly and use patch.object on the imported class instead of string-based patches. This ensures the mocks properly intercept the parent class's _do_deploy method when called via super(). Simplifies mock configuration and removes an unused variable assertion. Fixes the three failing deployment tests that were making real GraphQL API calls. All tests now pass: 418 passed, 1 skipped. * feat(resources): Phase 1 - Core infrastructure for @remote on LB endpoints Implement core infrastructure for enabling @remote decorator on LoadBalancerSlsResource endpoints with HTTP method/path routing. Changes: - Create LoadBalancerSlsStub: HTTP-based stub for direct endpoint execution (src/tetra_rp/stubs/load_balancer_sls.py, 170 lines) - Serializes functions and arguments using cloudpickle + base64 - Direct HTTP POST to /execute endpoint (no queue polling) - Proper error handling and deserialization - Register stub with singledispatch (src/tetra_rp/stubs/registry.py) - Enables @remote to dispatch to LoadBalancerSlsStub for LB resources - Extend @remote decorator with HTTP routing parameters (src/tetra_rp/client.py) - Add 'method' parameter: GET, POST, PUT, DELETE, PATCH - Add 'path' parameter: /api/endpoint routes - Validate method/path required for LoadBalancerSlsResource - Store routing metadata on decorated functions/classes - Warn if routing params used with non-LB resources Foundation for Phase 2 (Build system integration) and Phase 3 (Local dev). * feat(build): Phase 2.1 - Enhanced scanner for HTTP routing extraction Update RemoteDecoratorScanner to extract HTTP method and path from @remote decorator for LoadBalancerSlsResource endpoints. Changes: - Add http_method and http_path fields to RemoteFunctionMetadata - Add _extract_http_routing() method to parse decorator keywords - Extract method (GET, POST, PUT, DELETE, PATCH) from decorator - Extract path (/api/process) from decorator - Store routing metadata for manifest generation Foundation for Phase 2.2 (Manifest updates) and Phase 2.3 (Handler generation). * feat(build): Phase 2.2 - Updated manifest schema for HTTP routing Enhance ManifestBuilder to support HTTP method/path routing for LoadBalancerSlsResource endpoints. Changes: - Add http_method and http_path fields to ManifestFunction - Validate LB endpoints have both method and path - Detect and prevent route conflicts (same method + path) - Prevent use of reserved paths (/execute, /ping) - Add 'routes' section to manifest for LB endpoints - Conditional inclusion of routing fields (only for LB) Manifest structure for LB endpoints now includes: { "resources": { "api_service": { "resource_type": "LoadBalancerSlsResource", "functions": [ { "name": "process_data", "http_method": "POST", "http_path": "/api/process" } ] } }, "routes": { "api_service": { "POST /api/process": "process_data" } } } * feat(cli): Add LB handler generator for FastAPI app creation Implement LBHandlerGenerator to create FastAPI applications for LoadBalancerSlsResource endpoints with HTTP method/path routing. Key features: - Generates FastAPI apps with explicit route registry - Creates (method, path) -> function mappings from manifest - Validates route conflicts and reserved paths - Imports user functions and creates dynamic routes - Includes required /ping health check endpoint - Validates generated handler Python syntax via import Generated handler structure enables: - Direct HTTP routing to user functions via FastAPI - Framework /execute endpoint for @remote stub execution - Local development with uvicorn * feat(runtime): Implement LB handler factory for FastAPI app creation Create create_lb_handler() factory function that dynamically builds FastAPI applications from route registries for LoadBalancerSlsResource endpoints. Key features: - Accepts route_registry: Dict[(method, path)] -> handler_function mapping - Registers all user-defined routes from registry to FastAPI app - Provides /execute endpoint for @remote stub function execution - Handles async function execution automatically - Serializes results with cloudpickle + base64 encoding - Comprehensive error handling with detailed logging The /execute endpoint enables: - Remote function code execution via @remote decorator - Automatic argument deserialization from cloudpickle/base64 - Result serialization for transmission back to client - Support for both sync and async functions * feat(cli): Route build command to separate handlers for LB endpoints Update build command to use appropriate handler generators based on resource type. Separates LoadBalancerSlsResource endpoints (using FastAPI) from queue-based endpoints (using generic handler). Changes: - Import LBHandlerGenerator alongside HandlerGenerator - Inspect manifest resources and separate by type - Generate LB handlers via LBHandlerGenerator - Generate QB handlers via HandlerGenerator - Combine all generated handler paths for summary Enables users to mix LB and QB endpoints in same project with correct code generation for each resource type. * feat(resources): Add LiveLoadBalancer for local LB endpoint testing Implement LiveLoadBalancer resource following the LiveServerless pattern for local development and testing of load-balanced endpoints. Changes: - Add TETRA_LB_IMAGE constant for load-balanced Tetra image - Create LiveLoadBalancer class extending LoadBalancerSlsResource - Uses LiveServerlessMixin to lock imageName to Tetra LB image - Register LiveLoadBalancer with LoadBalancerSlsStub in singledispatch - Export LiveLoadBalancer from core.resources and top-level __init__ This enables users to test LB-based functions locally before deploying, using the same pattern as LiveServerless for queue-based endpoints. Users can now write: from tetra_rp import LiveLoadBalancer, remote api = LiveLoadBalancer(name="test-api") @remote(api, method="POST", path="/api/process") async def process_data(x, y): return {"result": x + y} result = await process_data(5, 3) # Local execution * test(stubs): Add comprehensive unit tests for LoadBalancerSlsStub Implement unit tests for LoadBalancerSlsStub covering: - Request preparation with arguments and dependencies - Response handling for success and error cases - Error handling for invalid responses - Base64 encoding/decoding of serialized data - Endpoint URL validation - Timeout and HTTP error handling Test coverage: - _prepare_request: 4 tests - _handle_response: 5 tests - _execute_function: 3 error case tests - __call__: 2 integration tests Tests verify proper function serialization, argument handling, error propagation, and response deserialization. * fix(test): Correct LB endpoint test decorator to match assertions Fix test_load_balancer_vs_queue_based_endpoints by updating the @remote decorator to use method='POST' and path='/api/echo' to match the test assertions. This was a test-level bug where the decorator definition didn't match what was being asserted. * docs: Add comprehensive documentation for @remote with LoadBalancer endpoints - Using_Remote_With_LoadBalancer.md: User guide for HTTP routing, local development, building and deploying - LoadBalancer_Runtime_Architecture.md: Technical details on deployment, request flows, security, and performance - Updated README.md with LoadBalancer section and code example - Updated Load_Balancer_Endpoints.md with cross-references to new guides * security: Remove /execute from deployed LoadBalancer endpoints Split @remote execution behavior between local and deployed: - LiveLoadBalancer (local): Uses /execute endpoint for function serialization - LoadBalancerSlsResource (deployed): Uses user-defined routes with HTTP param mapping Changes: 1. LoadBalancerSlsStub routing detection: - _should_use_execute_endpoint() determines execution path - _execute_via_user_route() maps args to JSON and POSTs to user routes - Auto-detects resource type and routing metadata 2. Conditional /execute registration: - create_lb_handler() now accepts include_execute parameter - Generated handlers default to include_execute=False (security) - LiveLoadBalancer can enable /execute if needed 3. Updated handler generator: - Added clarity comments on /execute exclusion for deployed endpoints 4. Comprehensive test coverage: - 8 new tests for routing detection and execution paths - All 31 tests passing (22 unit + 9 integration) 5. Documentation updates: - Using_Remote_With_LoadBalancer.md: clarified /execute scope - Added 'Local vs Deployed Execution' section explaining differences - LoadBalancer_Runtime_Architecture.md: updated execution model - Added troubleshooting for deployed endpoint scenarios Security improvement: - Deployed endpoints only expose user-defined routes - /execute endpoint removed from production (prevents arbitrary code execution) - Lower attack surface for deployed endpoints * feat(build): Phase 4 - Fix LiveLoadBalancer handler generation to include /execute endpoint - Modified manifest.py to validate LiveLoadBalancer endpoints like LoadBalancerSlsResource - Updated lb_handler_generator to: - Include LiveLoadBalancer in handler generation filter - Pass include_execute=True for LiveLoadBalancer (local dev) - Pass include_execute=False for LoadBalancerSlsResource (deployed) - Added integration tests: - Verify LiveLoadBalancer handlers include /execute endpoint - Verify deployed handlers exclude /execute endpoint - Fixes critical bug: LiveLoadBalancer now gets /execute endpoint in generated handlers * fix(scanner): Discover LoadBalancer resources in addition to Serverless resources - Updated scanner to extract LiveLoadBalancer and LoadBalancerSlsResource resources - Previously only looked for 'Serverless' in class name, missing LoadBalancer endpoints - Now checks for both 'Serverless' and 'LoadBalancer' in resource type names - Added integration test to verify scanner discovers both resource types - Fixes critical bug that prevented flash build from finding LoadBalancer endpoints * chore: Format code for line length and remove unused imports - Wrap long lines in manifest.py, lb_handler.py, and load_balancer_sls.py - Remove unused httpx import in test_load_balancer_sls_stub.py - Apply consistent formatting across codebase * fix: Address PR #131 review feedback - Scanner: Use exact type name matching instead of substring matching - Whitelist specific resource types to avoid false positives - Prevents matching classes like 'MyServerlessHelper' or 'LoadBalancerUtils' - Type hints: Use Optional[str] for nullable fields in manifest - ManifestFunction.http_method and http_path now properly typed - Timeout: Make HTTP client timeout configurable - Added LoadBalancerSlsStub.DEFAULT_TIMEOUT class attribute - Added timeout parameter to __init__ - Updated both _execute_function and _execute_via_user_route to use self.timeout - Deprecated datetime: Replace datetime.utcnow() with datetime.now(timezone.utc) - Updated manifest.py and test_lb_remote_execution.py - Ensures Python 3.12+ compatibility * style: Format datetime chaining for line length * fix: LiveLoadBalancer template not serialized to RunPod GraphQL The set_serverless_template model_validator was being overwritten by sync_input_fields (both had mode="after"). In Pydantic v2, when two validators with the same mode are defined in a class, only one is registered. This caused templates to never be created from imageName, resulting in: "GraphQL errors: One of templateId, template is required to create an endpoint" Solution: - Move set_serverless_template validator from ServerlessResource base class to subclasses (ServerlessEndpoint and LoadBalancerSlsResource) where the validation is actually needed - Keep helper methods (_create_new_template, _configure_existing_template) in base class for reuse - Add comprehensive tests for LiveLoadBalancer template serialization This allows: 1. Base ServerlessResource to be instantiated freely for testing/configuration 2. Subclasses (ServerlessEndpoint, LoadBalancerSlsResource) to enforce template requirements during deployment 3. Proper template serialization in GraphQL payload for RunPod API Fixes: One of templateId, template is required to create an endpoint error when deploying LiveLoadBalancer with custom image tags like runpod/tetra-rp-lb:local * fix: LoadBalancer endpoint URL and add CPU support - Fix: Use correct endpoint URL format for load-balanced endpoints (https://{id}.api.runpod.ai instead of https://api.runpod.ai/v2/{id}) This fixes 404 errors on /ping health check endpoints - Feature: Add CPU LoadBalancer support * Create CpuLoadBalancerSlsResource for CPU-based load-balanced endpoints * Create CpuLiveLoadBalancer for local CPU LB development * Add TETRA_CPU_LB_IMAGE constant for CPU LB Docker image * Update example code to use CpuLiveLoadBalancer for CPU worker * Add 8 comprehensive tests for CPU LoadBalancer functionality - Tests: Add 2 tests for endpoint URL format validation - All 474 tests passing, 64% code coverage * fix: Export CpuLiveLoadBalancer and CpuLoadBalancerSlsResource from tetra_rp package LoadBalancer resources were not being discovered by ResourceDiscovery because the new CPU variants (CpuLiveLoadBalancer, CpuLoadBalancerSlsResource) were not exported from the main tetra_rp package. This prevented undeploy from picking up these resources. Added exports to: - TYPE_CHECKING imports for type hints - __getattr__ function for lazy loading - __all__ list for public API This fixes the issue where 'flash undeploy list' could not find LoadBalancer resources that were deployed with 'flash run --auto-provision'. * fix: Add API key authentication to LoadBalancer health check The /ping endpoint for RunPod load-balanced endpoints requires the RUNPOD_API_KEY header for authentication. Without it, the health check fails with 401 Unauthorized, causing provisioning to timeout. This fix adds the Authorization header to the health check request if the RUNPOD_API_KEY environment variable is available, allowing the endpoint health check to succeed during provisioning. Fixes issue where 'flash run --auto-provision' would fail even though the endpoint was successfully created on RunPod. * fix(lb): Exclude flashboot from CpuLoadBalancerSlsResource GraphQL payload CpuLoadBalancerSlsResource was overriding _input_only without including flashboot, causing it to be sent to the RunPod GraphQL API which doesn't accept this field. This caused deployment to fail with: Field "flashboot" is not defined by type "EndpointInput". * fix(lb): Expand CpuInstanceType.ANY to all CPU flavors in CpuLoadBalancerSlsResource Add field_validator to expand [CpuInstanceType.ANY] to all available CPU instance types (cpu3g, cpu3c, cpu5c variants). This matches the behavior in CpuServerlessEndpoint and prevents deployment errors like 'instanceId must be in the format of flavorId-vcpu-ram'. * refactor(cpu): Move instanceIds validator to CpuEndpointMixin Move the instanceIds field_validator from CpuServerlessEndpoint to CpuEndpointMixin so both CpuServerlessEndpoint and CpuLoadBalancerSlsResource share the same validator that expands [CpuInstanceType.ANY] to all available CPU flavors. This eliminates code duplication and ensures consistent behavior across all CPU endpoint types. * test: Update CPU instance test to reflect validator expansion Update test_cpu_live_load_balancer_defaults_to_cpu_any to verify that [CpuInstanceType.ANY] is correctly expanded to all available CPU instance types by the field_validator in CpuEndpointMixin. * fix(lb): Increase health check timeout from 5s to 15s Load-balanced workers need more time to respond during cold starts and initialization. RunPod docs recommend at least 10-15 second timeouts for health checks. Workers may return 204 during initialization, which is normal and expected. * fix(lb): Fix CPU load balancer template deployment error Fixes two bugs in CpuLoadBalancerSlsResource that prevented CPU load balancers from deploying: 1. Added gpuCount and allowedCudaVersions to _input_only exclusion set to prevent GPU-specific fields from being sent to RunPod API 2. Overrode set_serverless_template() to call _sync_cpu_fields() first, ensuring GPU defaults are overridden to CPU-appropriate values (gpuCount=0) The RunPod API was rejecting CPU load balancer templates because GPU-specific fields were being included in the GraphQL payload. These changes align CpuLoadBalancerSlsResource behavior with CpuServerlessEndpoint. Also added comprehensive test coverage (30+ tests) to verify: - GPU fields are correctly overridden to CPU defaults - GPU fields are excluded from API payloads - CPU-specific fields are properly included - Consistency with CpuServerlessEndpoint behavior * fix(drift): Exclude runtime fields from config hash to prevent false positives Fixes false positive configuration drift detection by separating concerns: 1. Update ServerlessResource.config_hash to exclude runtime fields - Fields like template, templateId, aiKey, userId are API-assigned - Prevents false drift when same config is redeployed across processes - Now only hashes user-specified configuration 2. Add config_hash override to CpuLoadBalancerSlsResource - CPU load balancers hash only CPU-relevant fields - Excludes GPU-specific fields and runtime fields - Follows same pattern as CpuServerlessEndpoint 3. Fix _has_structural_changes to exclude template/templateId - CRITICAL: These runtime fields were causing false structural changes - Was forcing unnecessary redeployments despite update() being available - Now system correctly uses update() instead of undeploy+deploy 4. Make field serializers robust to handle string/enum values - Prevents serialization errors when fields are pre-converted to strings 5. Add comprehensive drift detection tests (16 tests) - Test hash stability with runtime field changes - Test exclusion of env, template, templateId, and other runtime fields - Test that actual config changes (image, flashboot) are detected - Test structural change detection behavior - Test real-world deployment scenarios Results: - Same config deployed multiple times: no false drift - Different env vars with same config: no false drift - Template/templateId changes: no false drift - API-assigned fields: no false drift - User config changes (image, flashboot): drift detected correctly - All 512 unit tests pass * fix(http): Standardize RunPod HTTP client authentication across codebase Centralizes HTTP client creation for RunPod load-balanced endpoints to prevent manual Authorization header code duplication and ensure consistent authentication: 1. Create centralized HTTP utility function (src/tetra_rp/core/utils/http.py) - New function: get_authenticated_httpx_client() - Automatically adds Bearer token Authorization header if RUNPOD_API_KEY set - Provides consistent timeout handling (default 30s, customizable) - Follows existing GraphQL/REST client authentication pattern 2. Fix critical authentication bug in LoadBalancerSlsStub._execute_via_user_route() - Previously: Missing Authorization header (401 errors on user routes) - Now: Uses centralized utility for proper authentication - Enables direct HTTP calls to user-defined routes with auth 3. Refactor two methods to use centralized utility - LoadBalancerSlsStub._execute_function() - removes 7+ lines of manual auth code - LoadBalancerSlsResource._check_ping_endpoint() - simplifies auth setup 4. Add comprehensive unit tests (tests/unit/core/utils/test_http.py) - Tests API key presence/absence handling - Tests custom and default timeout configuration - Tests edge cases (empty key, zero timeout) - All 7 tests pass with 100% coverage Results: - Single source of truth for HTTP authentication (centralized utility) - Fixes 401 Unauthorized errors on load-balanced endpoints - Eliminates repetitive manual auth code across 3+ locations - Easier to maintain and update authentication patterns in future - All 499 unit tests pass - Code coverage: 64% (exceeds 35% requirement) * feat(http): Extend HTTP utilities to cover both sync and async authentication Extends the centralized HTTP authentication pattern to all RunPod API calls: 1. Add get_authenticated_requests_session() for synchronous requests - Creates requests.Session with automatic Bearer token Authorization header - Follows same pattern as async get_authenticated_httpx_client() - Single source of truth for sync HTTP authentication 2. Refactor template.py to use centralized utility - Removes manual Authorization header setup (line 86) - Now uses get_authenticated_requests_session() for all template updates - Improves error handling with raise_for_status() - Token parameter marked deprecated; uses RUNPOD_API_KEY env var 3. Add comprehensive tests for sync utility (4 tests) - Tests API key presence/absence handling - Tests empty API key edge case - Tests Session object validation - All tests pass with proper cleanup Benefits: - True single source of truth for all RunPod HTTP authentication (sync + async) - Consistent patterns across entire codebase - Easier future auth changes across all HTTP client types - Eliminates manual auth header code in template.py - All 503 unit tests pass - Code coverage: 64% (exceeds 35% requirement) Note: requests.Session doesn't support default timeouts; timeout should be specified per request (e.g., session.post(url, json=data, timeout=30.0)) * fix: Address PR feedback on HTTP utilities implementation Addresses three feedback items from code review: 1. Fix breaking parameter order change in update_system_dependencies() - Restored original parameter order: template_id, token, system_dependencies, base_entry_cmd - Maintains backward compatibility with existing callers - Token parameter now optional (default None) 2. Add proper deprecation warning for token parameter - Issues DeprecationWarning when token parameter is used - Clearly communicates migration to RUNPOD_API_KEY environment variable - Follows Python deprecation best practices (warnings.warn with stacklevel=2) 3. Standardize test mocking approach across all health check tests - All tests now use consistent 'tetra_rp.core.utils.http.httpx.AsyncClient' patching - Removed inconsistent 'side_effect=lambda' pattern - Improved test maintainability by using same strategy everywhere All 503 tests pass with consistent, clean implementation. * refactor(drift): Extract runtime field constants and improve maintainability - Extract RUNTIME_FIELDS and EXCLUDED_HASH_FIELDS as ClassVar constants in ServerlessResource for centralized field list management - Add clarifying comments to enum serializers explaining defensive isinstance() checks for nested model serialization - Document CPU load balancer field list coupling in docstring with maintenance guidelines - Add TestSerializerDefensiveBehavior class with 4 tests verifying pre-stringified enum value handling - Use ClassVar annotation to satisfy Pydantic v2 model field requirements This reduces maintenance burden by centralizing field definitions and improves code clarity without changing functionality. * docs: Improve LoadBalancer documentation accuracy and completeness - Fix health check timeout: Add clarification that timeout is 15 seconds per check - Add HTTP authentication details explaining RUNPOD_API_KEY header injection - Document stub decision logic for incomplete routing metadata (fallback behavior) - Clarify function signature inspection with concrete example showing parameter mapping - Expand /execute security explanation with explicit threats and best practices - Add detailed parameter type constraints for deployed endpoints (supported vs unsupported) - Add troubleshooting guide for missing routing metadata (404 errors) - Strengthen security warnings about never exposing /execute in production All documentation now matches actual implementation verified through codebase analysis. * docs: add resource config drift detection documentation - comprehensive guide on drift detection implementation - covers hash computation, field exclusion, and cpu-specific behavior - includes testing patterns and troubleshooting guide - documents all fields that trigger drift vs those ignored * docs: proper name for the file * test(build): Add comprehensive test coverage for scanner and handler improvements - Add 6 new scanner tests for directory filtering (.venv, .flash, .runpod exclusion) - Add test for resource type validation to prevent false positives - Add test for fallback behavior when resource name extraction fails - Add test for handling resource names with special characters - Update existing tests to reflect new dynamic import format and resource name extraction These tests guarantee that improvements to the scanner (resource type validation, directory filtering, fallback behavior) and handler generator (dynamic imports for invalid Python identifiers) won't regress in future changes. * test(scanner): Fix resource type assertions to match scanner behavior The scanner now extracts resource names from the name= parameter rather than using variable names. Update test assertions to expect the actual resource names ('test-api', 'deployed-api') instead of variable names. * chore: merge correction * fix(drift): Remove manual undeploy/deploy from update() method Use saveEndpoint mutation for all changes instead of manual lifecycle management. Server-side automatically detects version-triggering fields (GPU, template, volumes) and increments endpoint version accordingly. Keep _has_structural_changes() as informational for logging purposes only. This aligns with RunPod API's version-based deployment model. * docs(drift): Clarify _has_structural_changes detects version-triggering changes Update docstring to reflect that this method identifies changes that trigger server-side version increment and worker recreation, not manual redeploy cycles. Explain which changes are version-triggering vs rolling updates, and note that the method is now informational for logging only. * feat(drift): Enable environment variable drift detection Remove env from EXCLUDED_HASH_FIELDS so changes to environment variables trigger drift detection and endpoint updates. Environment changes are non-version-triggering (rolling updates), so server will apply them via saveEndpoint without recreating workers. Add env to CPU LoadBalancer config_hash for consistent behavior across all resource types. Update comments to reflect that env is user-specified configuration, not dynamically computed. * test(drift): Update tests for environment variable drift detection - test_lb_config_hash_excludes_env_variables → test_lb_config_hash_detects_env_changes - test_env_var_changes_no_drift → test_env_var_changes_trigger_drift - test_config_hash_excludes_env_from_drift → test_config_hash_detects_env_from_drift Update assertions to expect different hashes when env changes, matching new behavior where environment variable changes trigger drift and updates. * fix: Address Copilot review feedback on type hints and documentation - Fix type annotation for timeout parameter in LoadBalancerSlsStub (Optional[float]) - Replace hardcoded "30s" with actual self.timeout in error messages (2 locations) - Update Resource_Config_Drift_Detection.md to reflect actual EXCLUDED_HASH_FIELDS - Remove duplicate Load-Balanced Endpoints section from README.md Addresses Copilot review comments (PR #132, review 3642596664) * feat(mothership): implement auto-provisioning with State Manager reconciliation Implement Linear ticket AE-1660: Mothership auto-provisioning from manifest. Changes: - Create StateManagerClient for persisting/querying manifests via HTTP - Create MothershipProvisioner with manifest reconciliation logic - Add lifespan context manager to LB handler for startup/shutdown hooks - Implement /manifest endpoint for service discovery - Set FLASH_IS_MOTHERSHIP env var on LoadBalancerSlsResource deployment - Add 39 unit tests for mothership provisioner functions - Add 7 integration tests for end-to-end provisioning flows - Update documentation with auto-provisioning architecture and usage Features: - Automatic detection of new/changed/removed resources via config hashing - Background provisioning (non-blocking) with asyncio.create_task() - Idempotent deployments - unchanged resources skipped on subsequent boots - State Manager integration for manifest persistence across reboots - Graceful error handling - provisioning errors don't block mothership startup - Automatic environment variable propagation (FLASH_MOTHERSHIP_URL) - Reconciliation with delete support - removes resources no longer in manifest - Fast startup - /manifest endpoint available immediately with partial results Test Results: - 651 tests passing (39 new unit + 7 new integration tests) - 65.69% code coverage (exceeds 35% requirement) - All quality checks pass (format, lint, type check, tests) * docs: fix Cross_Endpoint_Routing terminology (Directory → Manifest) Update documentation to consistently use 'Manifest' instead of 'Directory': - Replace DirectoryClient references with StateManagerClient (actual implementation) - Update architecture diagram to reference /manifest endpoint instead of DirectoryClient - Fix ServiceRegistry code examples to use /manifest endpoint - Update extension point for custom directory backends - Fix testing section to reference actual test files (MothershipProvisioner, StateManagerClient) - Update debugging section with /manifest endpoint examples - Clarify that directory is loaded from mothership /manifest endpoint These changes ensure documentation matches the actual AE-1660 implementation. * fix: correct endpoint and exception references (Directory → Manifest) Critical fix: Update ManifestClient to query /manifest endpoint instead of /directory Changes: - Fix ManifestClient.get_directory() to query /manifest endpoint (not /directory) - Update ManifestClient docstring: 'manifest directory service' → '/manifest endpoint' - Fix DirectoryUnavailableError → ManifestServiceUnavailableError in docs - Update example URLs from 'api.runpod.io' to actual LB endpoint format - Clarify in docstrings that this queries the mothership's /manifest endpoint This bug would have caused runtime failures when querying the mothership directory, as the actual endpoint served by lb_handler_generator.py is /manifest, not /directory. * feat(runtime): Migrate from URL to ID-based mothership identification Changes FLASH_MOTHERSHIP_URL to FLASH_MOTHERSHIP_ID for cleaner environment configuration. Child endpoints now use FLASH_RESOURCE_NAME to identify which resource config they represent in the manifest. Changes: - ManifestClient: Construct URL from FLASH_MOTHERSHIP_ID instead of full URL - ServiceRegistry: Use FLASH_RESOURCE_NAME with fallback to RUNPOD_ENDPOINT_ID - Add tomli dependency for Python <3.11 pyproject.toml parsing (needed for build.py) Benefits: - Simpler environment configuration (ID instead of full URL) - Clear distinction between mothership (RUNPOD_ENDPOINT_ID) and children (FLASH_RESOURCE_NAME) - Consistent URL construction pattern Files modified: - src/tetra_rp/runtime/manifest_client.py - src/tetra_rp/runtime/service_registry.py - pyproject.toml - uv.lock * feat(provisioner): Support all resource types and add cache validation Removes LoadBalancer resource filtering to enable multi-tier architectures. Adds cache validation to prevent stale resources from being deployed after codebase refactoring. Provisioning Changes: - Remove LoadBalancer filtering in reconcile_manifests() - Support CpuLiveLoadBalancer, LiveLoadBalancer, LoadBalancerSlsResource - Add filter_resources_by_manifest() to validate cached resources against manifest - Add test-mothership mode with "tmp-" prefix for temporary test endpoints - Change env vars: FLASH_MOTHERSHIP_URL -> FLASH_MOTHERSHIP_ID Resource Manager Changes: - Track all created resources (deployed = has ID) regardless of health status - Cache resources even if deployment completes with errors - Ensures cleanup capability for all created resources Cache Validation: - Prevents stale resources from old codebase versions being redeployed - Validates: resource name exists in manifest + type matches - Logs removed stale entries for visibility Benefits: - Multi-tier load balancing architectures now supported - No orphaned resources from refactored code - Better resource lifecycle management - Reliable cleanup of all created resources Files modified: - src/tetra_rp/runtime/mothership_provisioner.py - src/tetra_rp/core/resources/resource_manager.py * feat(build): Add local tetra_rp bundling and manifest endpoint improvements Enables bundling local tetra_rp source into builds for development and testing. Updates LB handler to serve authoritative manifest from State Manager. Build System Changes: - Add _find_local_tetra_rp() to detect development installations - Add _bundle_local_tetra_rp() to copy source into build directory - Add _extract_tetra_rp_dependencies() to parse pyproject.toml for deps - Add _remove_tetra_from_requirements() to clean up after bundling - Skip bundling for PyPI installations (site-packages) LB Handler Changes: - Store StateManagerClient in module-level state for /manifest endpoint - Update /manifest endpoint to fetch from State Manager (single source of truth) - Add proper error handling for uninitialized state client - Restrict /manifest endpoint to mothership only (403 for children) - Improve provisioning startup logging for clarity Benefits: - Test-mothership can use local tetra_rp changes without publishing - Manifest endpoint serves complete authoritative state - Child endpoints get consistent configuration from single source - Better development workflow for framework changes Files modified: - src/tetra_rp/cli/commands/build.py - src/tetra_rp/cli/commands/build_utils/lb_handler_generator.py * feat(cli): Add undeploy force flag and improve discovery logging Adds --force flag to undeploy for non-interactive cleanup (needed by test-mothership). Improves resource discovery visibility with debug logging. Undeploy Changes: - Add --force/-f flag to skip confirmation prompts - Update _undeploy_by_name(), _undeploy_all(), _interactive_undeploy() to support skip_confirm - Enables automated cleanup in CI/CD and test-mothership shutdown Discovery Changes: - Add detailed logging at each discovery phase (entry point, static imports, directory scan) - Log discovered resource names and types for debugging - Exclude .flash/ directory from project scanning (build artifacts) Run Command Changes: - Add resource discovery debug output showing found resources - Display resource names and types before server startup CLI Main Changes: - Register test-mothership command (note: implementation was in commit 1) Benefits: - Test-mothership can cleanup automatically without user interaction - Better visibility into resource discovery process - Easier debugging of discovery issues - Clean separation of interactive vs automated workflows Files modified: - src/tetra_rp/cli/commands/undeploy.py - src/tetra_rp/cli/commands/run.py - src/tetra_rp/core/discovery.py - src/tetra_rp/cli/main.py * test: Update tests for new provisioning behavior and ID-based config Updates all tests to reflect LoadBalancer provisioning, FLASH_RESOURCE_NAME usage, and removal of obsolete test cases. Mothership Provisioner Tests: - Update tests to expect LoadBalancer resources in provisioning (not skipped) - Fix create_resource_from_manifest tests to use RUNPOD_ENDPOINT_ID env var - Update UnsupportedResourceType test (LoadBalancer now supported) - Remove obsolete get_manifest_directory() tests (function removed) Service Registry Tests: - Update all tests to use FLASH_RESOURCE_NAME instead of RUNPOD_ENDPOINT_ID - Add test for FLASH_RESOURCE_NAME priority with RUNPOD_ENDPOINT_ID fallback - Update test names to reflect new behavior Integration Tests: - Update test_provision_children_skips_load_balancer_resources to test_provision_children_deploys_load_balancer_resources - Fix assertions to expect 2 deployments (LoadBalancer + worker) - Remove obsolete test_manifest_directory_endpoint_after_provisioning Manifest Client Tests: - Update initialization tests for FLASH_MOTHERSHIP_ID usage - Update error message expectations Test Rationale: - LoadBalancer provisioning enables multi-tier architectures - FLASH_RESOURCE_NAME provides clearer child endpoint identification - Removed tests for deleted functionality (get_manifest_directory) Files modified: - tests/unit/runtime/test_mothership_provisioner.py - tests/unit/runtime/test_service_registry.py - tests/integration/test_mothership_provisioning.py - tests/unit/runtime/test_manifest_client.py * fix(build): Use importlib for LB handler imports to support numeric directories Changes: - Modified LBHandlerGenerator to use importlib pattern instead of from imports - Aligns LB handlers with QB handler pattern for consistency - Fixes SyntaxError when building projects with numeric directory names (e.g., 03_advanced_workers) - Added boolean flags (is_load_balanced, is_live_resource) to replace string comparisons - Added test coverage for numeric module paths The bug occurred because Python identifiers cannot start with digits, but importlib treats module paths as strings, allowing any valid filesystem path. * feat(build): Store config variable names in manifest for test-mothership Changes: - Scanner now tracks config variable names (e.g., "gpu_config") at scan time - Manifest includes config_variable field for each resource and function - test-mothership uses config_variable from manifest for reliable discovery - Added backward compatibility fallback to old search logic Fixes "No config variable found" warnings when resource names differ from variable names (e.g., resource "03_05_load_balancer_gpu" with variable "gpu_config"). This enables test-mothership to correctly discover and provision all resources including load balancer endpoints, resolving health check failures. * fix: Address PR review comments for security and error handling Changes: - Replace MD5 with SHA-256 for config hash computation (security best practice) - Add error callback to background provisioning task for proper exception handling - Update tests to expect SHA-256 hash length (64 chars instead of 32) Addresses Copilot review comments: - mothership_provisioner.py:113 - Use SHA-256 instead of cryptographically broken MD5 - lb_handler_generator.py:81 - Track background task and add error callback

* feat(runtime): Add generic handler factory for serverless execution Implement a factory function that creates RunPod serverless handlers, eliminating code duplication across generated handler files. The generic_handler module provides: - create_handler(function_registry) factory that accepts a dict of function/class objects and returns a RunPod-compatible handler - Automatic serialization/deserialization using cloudpickle + base64 - Support for both function execution and class instantiation + method calls - Structured error responses with full tracebacks for debugging - Load manifest for cross-endpoint function discovery This design centralizes all handler logic in one place, making it easy to: - Fix bugs once, benefit all handlers - Add new features without regenerating projects - Keep deployment packages small (handler files are ~23 lines each) Implementation: - deserialize_arguments(): Base64 + cloudpickle decoding - serialize_result(): Cloudpickle + base64 encoding - execute_function(): Handles function vs. class execution - load_manifest(): Loads flash_manifest.json for service discovery * feat(cli): Add handler generator, manifest builder, and scanner for build process Implement the build pipeline components that work together to generate serverless handlers from @Remote decorated functions. Three core components: 1. RemoteDecoratorScanner (scanner.py) - Uses Python AST to discover all @Remote decorated functions - Extracts function metadata: name, module, async status, is_class - Groups functions by resource_config for handler generation - Handles edge cases like decorated classes and async functions 2. ManifestBuilder (manifest.py) - Groups functions by their resource_config - Creates flash_manifest.json structure for service discovery - Maps functions to their modules and handler files - Enables cross-endpoint function routing at runtime 3. HandlerGenerator (handler_generator.py) - Creates lightweight handler_*.py files for each resource config - Each handler imports functions and registers them in FUNCTION_REGISTRY - Handler delegates to create_handler() factory from generic_handler - Generated handlers are ~23 lines (vs ~98 with duplication) Build Pipeline Flow: 1. Scanner discovers @Remote functions 2. ManifestBuilder groups them by resource_config 3. HandlerGenerator creates handler_*.py for each group 4. All files + manifest bundled into archive.tar.gz This eliminates ~95% duplication across handlers by using the factory pattern instead of template-based generation. * test(runtime): Add comprehensive tests for generic handler Implement 19 unit tests covering all major paths through the generic_handler factory and its helper functions. Test Coverage: Serialization/Deserialization (7 tests): - serialize_result() with simple values, dicts, lists - deserialize_arguments() with empty, args-only, kwargs-only, mixed inputs - Round-trip encoding/decoding of cloudpickle + base64 Function Execution (4 tests): - Simple function execution with positional and keyword arguments - Keyword argument handling - Class instantiation and method calls - Argument passing to instance methods Handler Factory (8 tests): - create_handler() returns callable RunPod handler - Handler with simple function registry - Missing function error handling (returns error response, not exception) - Function exceptions caught with traceback included - Multiple functions in single registry - Complex Python objects (classes, lambdas, closures) - Empty registry edge case - Default execution_type parameter - None return values - Correct RunPod response format (success, result/error, traceback) Test Strategy: - Arrange-Act-Assert pattern for clarity - Isolated unit tests (no external dependencies) - Tests verify behavior, not implementation - Error cases tested for proper error handling - All serialization tested for round-trip correctness All tests passing, 83% coverage on generic_handler.py * test(cli): Add tests for handler generation, manifest building, and scanning Implement integration tests validating the build pipeline components work correctly together. Test Coverage: HandlerGenerator Tests: - Handler files created with correct names (handler_<resource_name>.py) - Generated files import required functions from workers - FUNCTION_REGISTRY properly formatted - create_handler() imported from generic_handler - Handler creation via factory - RunPod start call present and correct - Multiple handlers generated for multiple resource configs ManifestBuilder Tests: - Manifest structure with correct version and metadata - Resources grouped by resource_config - Handler file paths correct - Function metadata preserved (name, module, is_async, is_class) - Function registry mapping complete ScannerTests: - @Remote decorated functions discovered via AST - Function metadata extracted correctly - Module paths resolved properly - Async functions detected - Class methods detected - Edge cases handled (multiple decorators, nested classes) Test Strategy: - Integration tests verify components work together - Tests verify generated files are syntactically correct - Tests validate data structures match expected schemas - No external dependencies in build process Validates that the entire build pipeline: 1. Discovers functions correctly 2. Groups them appropriately 3. Generates valid Python handler files 4. Creates correct manifest structure * docs(runtime): Document generic handler factory architecture Add comprehensive architecture documentation explaining why the factory pattern was chosen and how it works. Documentation includes: Overview & Context: - Problem statement: Handler files had 95% duplication - Design decision: Use factory function instead of templates - Benefits: Single source of truth, easier maintenance, consistency Architecture Diagrams (MermaidJS): - High-level flow: @Remote functions → Scanner → Manifest → Handlers → Factory - Component relationships: HandlerGenerator, GeneratedHandler, generic_handler - Function registry pattern: Discovery → Grouping → Registration → Factory Implementation Details: - create_handler(function_registry) signature and behavior - deserialize_arguments(): Base64 + cloudpickle decoding - serialize_result(): Cloudpickle + base64 encoding - execute_function(): Function vs. class execution - load_manifest(): Service discovery via flash_manifest.json Design Decisions (with rationale): - Factory Pattern over Inheritance: Simpler, less coupling, easier to test - CloudPickle + Base64: Handles arbitrary objects, safe JSON transmission - Manifest in Generic Handler: Runtime service discovery requirement - Structured Error Responses: Debugging aid, functional error handling - Both Execution Types: Supports stateful classes and pure functions Usage Examples: - Simple function handler - Class execution with methods - Multiple functions in one handler Build Process Integration: - 4-phase pipeline: Scanner → Grouping → Generation → Packaging - Manifest structure and contents - Generated handler structure (~23 lines) Testing Strategy: - 19 unit tests covering all major paths - 7 integration tests verifying handler generation - Manual testing with example applications Performance: - Zero runtime penalty (factory called once at startup) - No additional indirection in request path * docs(cli): Add flash build command documentation Document the flash build command and update CLI README to include it. New Documentation: flash-build.md includes: Usage & Options: - Command syntax: flash build [OPTIONS] - --no-deps: Skip transitive dependencies (faster, smaller archives) - --keep-build: Keep build directory for inspection/debugging - --output, -o: Custom archive name (default: archive.tar.gz) What It Does (5-step process): 1. Discovery: Scan for @Remote decorated functions 2. Grouping: Group functions by resource_config 3. Handler Generation: Create lightweight handler files 4. Manifest Creation: Generate flash_manifest.json 5. Packaging: Create archive.tar.gz for deployment Build Artifacts: - .flash/archive.tar.gz: Deployment package (ready for RunPod) - .flash/flash_manifest.json: Service discovery configuration - .flash/.build/: Temporary build directory Handler Generation: - Explains factory pattern and minimal handler files - Links to Runtime_Generic_Handler.md for details Dependency Management: - Default behavior: Install all dependencies including transitive - --no-deps: Only direct dependencies (when base image has transitive) - Trade-offs explained Cross-Endpoint Function Calls: - Example showing GPU and CPU endpoints - Manifest enables routing automatically Output & Troubleshooting: - Sample build output with progress indicators - Common failure scenarios and solutions - How to debug with --keep-build Next Steps: - Test locally with flash run - Deploy to RunPod - Monitor with flash undeploy list Updated CLI README.md: - Added flash build to command list in sequence - Links to full flash-build.md documentation * docs: Add build process and handler generation section to README Add a new section explaining how the build system works and why the factory pattern reduces code duplication. New Section: Build Process and Handler Generation Explains: How Flash Builds Your Application (5-step pipeline): 1. Discovery: Scans code for @Remote decorated functions 2. Grouping: Groups functions by resource_config 3. Handler Generation: Creates lightweight handler files 4. Manifest Creation: Generates flash_manifest.json for service discovery 5. Packaging: Bundles everything into archive.tar.gz Handler Architecture (with code example): - Shows generated handler using factory pattern - Single source of truth: All handler logic in one place - Easier maintenance: Bug fixes don't require rebuilding projects Cross-Endpoint Function Calls: - Example of GPU and CPU endpoints calling each other - Manifest and runtime wrapper handle service discovery Build Artifacts: - .flash/.build/: Temporary build directory - .flash/archive.tar.gz: Deployment package - .flash/flash_manifest.json: Service configuration Links to detailed documentation: - docs/Runtime_Generic_Handler.md for architecture details - src/tetra_rp/cli/docs/flash-build.md for CLI reference This section bridges the main README and detailed documentation, providing entry point for new users discovering the build system. * feat(cli): Integrate build utilities into flash build command Wire up the handler generator, manifest builder, and scanner into the actual flash build command implementation. Changes to build.py: 1. Integration: - Import RemoteDecoratorScanner for function discovery - Import ManifestBuilder for manifest creation - Import HandlerGenerator for handler file creation - Call these in sequence during the build process 2. Build Pipeline: - After copying project files, scan for @Remote functions - Build manifest from discovered functions - Generate handler files for each resource config - Write manifest to build directory - Progress indicators show what's being generated 3. Fixes: - Change .tetra directory references to .flash - Uncomment actual build logic (was showing "Coming Soon" message) - Fix progress messages to show actual file counts 4. Error Handling: - Try/catch around handler generation - Warning shown if generation fails but build continues - User can debug with --keep-build flag Build Flow Now: 1. Load ignore patterns 2. Collect project files 3. Create build directory 4. Copy files to build directory 5. [NEW] Scan for @Remote functions 6. [NEW] Build and write manifest 7. [NEW] Generate handler files 8. Install dependencies 9. Create archive 10. Clean up build directory (unless --keep-build) Dependencies: - Updated uv.lock with all required dependencies * refactor(build): Fix directory structure and add comprehensive error handling **Critical Fixes:** - Remove "Coming Soon" message blocking build command execution - Fix build directory to use .flash/.build/ directly (no app_name subdirectory) - Fix tarball to extract with flat structure using arcname="." - Fix cleanup to remove correct build directory **Error Handling & Validation:** - Add specific exception handling (ImportError, SyntaxError, ValueError) - Add import validation to generated handlers - Add duplicate function name detection across resources - Add proper error logging throughout build process **Resource Type Tracking:** - Add resource_type field to RemoteFunctionMetadata - Track actual resource types (LiveServerless, CpuLiveServerless) - Use actual types in manifest instead of hardcoding **Robustness Improvements:** - Add handler import validation post-generation - Add manifest path fallback search (cwd, module dir, legacy location) - Add resource name sanitization for safe filenames - Add specific exception logging in scanner (UnicodeDecodeError, SyntaxError) **User Experience:** - Add troubleshooting section to README - Update manifest path documentation in docs - Change "Zero Runtime Penalty" to "Minimal Runtime Overhead" - Mark future enhancements as "Not Yet Implemented" - Improve build success message with next steps Fixes all 20 issues identified in code review (issues #1-13, #19-22) * feat(resources): Add LoadBalancerSlsResource for LB endpoints Implement LoadBalancerSlsResource class for provisioning RunPod load-balanced serverless endpoints. Load-balanced endpoints expose HTTP servers directly to clients without queue-based processing, enabling REST APIs, webhooks, and real-time communication patterns. Key features: - Type enforcement (always LB, never QB) - Scaler validation (REQUEST_COUNT required, not QUEUE_DELAY) - Health check polling via /ping endpoint (200/204 = healthy) - Post-deployment verification with configurable retries - Async and sync health check methods - Comprehensive unit tests - Full documentation with architecture diagrams and examples Architecture: - Extends ServerlessResource with LB-specific behavior - Validates configuration before deployment - Polls /ping endpoint until healthy (10 retries × 5s = 50s timeout) - Raises TimeoutError if endpoint fails to become healthy This forms the foundation for Mothership architecture where a load-balanced endpoint serves as a directory server for child endpoints. * fix(test): Fix LoadBalancerSlsResource deployment test mocks Import ServerlessResource directly and use patch.object on the imported class instead of string-based patches. This ensures the mocks properly intercept the parent class's _do_deploy method when called via super(). Simplifies mock configuration and removes an unused variable assertion. Fixes the three failing deployment tests that were making real GraphQL API calls. All tests now pass: 418 passed, 1 skipped. * feat(resources): Phase 1 - Core infrastructure for @Remote on LB endpoints Implement core infrastructure for enabling @Remote decorator on LoadBalancerSlsResource endpoints with HTTP method/path routing. Changes: - Create LoadBalancerSlsStub: HTTP-based stub for direct endpoint execution (src/tetra_rp/stubs/load_balancer_sls.py, 170 lines) - Serializes functions and arguments using cloudpickle + base64 - Direct HTTP POST to /execute endpoint (no queue polling) - Proper error handling and deserialization - Register stub with singledispatch (src/tetra_rp/stubs/registry.py) - Enables @Remote to dispatch to LoadBalancerSlsStub for LB resources - Extend @Remote decorator with HTTP routing parameters (src/tetra_rp/client.py) - Add 'method' parameter: GET, POST, PUT, DELETE, PATCH - Add 'path' parameter: /api/endpoint routes - Validate method/path required for LoadBalancerSlsResource - Store routing metadata on decorated functions/classes - Warn if routing params used with non-LB resources Foundation for Phase 2 (Build system integration) and Phase 3 (Local dev). * feat(build): Phase 2.1 - Enhanced scanner for HTTP routing extraction Update RemoteDecoratorScanner to extract HTTP method and path from @Remote decorator for LoadBalancerSlsResource endpoints. Changes: - Add http_method and http_path fields to RemoteFunctionMetadata - Add _extract_http_routing() method to parse decorator keywords - Extract method (GET, POST, PUT, DELETE, PATCH) from decorator - Extract path (/api/process) from decorator - Store routing metadata for manifest generation Foundation for Phase 2.2 (Manifest updates) and Phase 2.3 (Handler generation). * feat(build): Phase 2.2 - Updated manifest schema for HTTP routing Enhance ManifestBuilder to support HTTP method/path routing for LoadBalancerSlsResource endpoints. Changes: - Add http_method and http_path fields to ManifestFunction - Validate LB endpoints have both method and path - Detect and prevent route conflicts (same method + path) - Prevent use of reserved paths (/execute, /ping) - Add 'routes' section to manifest for LB endpoints - Conditional inclusion of routing fields (only for LB) Manifest structure for LB endpoints now includes: { "resources": { "api_service": { "resource_type": "LoadBalancerSlsResource", "functions": [ { "name": "process_data", "http_method": "POST", "http_path": "/api/process" } ] } }, "routes": { "api_service": { "POST /api/process": "process_data" } } } * feat(cli): Add LB handler generator for FastAPI app creation Implement LBHandlerGenerator to create FastAPI applications for LoadBalancerSlsResource endpoints with HTTP method/path routing. Key features: - Generates FastAPI apps with explicit route registry - Creates (method, path) -> function mappings from manifest - Validates route conflicts and reserved paths - Imports user functions and creates dynamic routes - Includes required /ping health check endpoint - Validates generated handler Python syntax via import Generated handler structure enables: - Direct HTTP routing to user functions via FastAPI - Framework /execute endpoint for @Remote stub execution - Local development with uvicorn * feat(runtime): Implement LB handler factory for FastAPI app creation Create create_lb_handler() factory function that dynamically builds FastAPI applications from route registries for LoadBalancerSlsResource endpoints. Key features: - Accepts route_registry: Dict[(method, path)] -> handler_function mapping - Registers all user-defined routes from registry to FastAPI app - Provides /execute endpoint for @Remote stub function execution - Handles async function execution automatically - Serializes results with cloudpickle + base64 encoding - Comprehensive error handling with detailed logging The /execute endpoint enables: - Remote function code execution via @Remote decorator - Automatic argument deserialization from cloudpickle/base64 - Result serialization for transmission back to client - Support for both sync and async functions * feat(cli): Route build command to separate handlers for LB endpoints Update build command to use appropriate handler generators based on resource type. Separates LoadBalancerSlsResource endpoints (using FastAPI) from queue-based endpoints (using generic handler). Changes: - Import LBHandlerGenerator alongside HandlerGenerator - Inspect manifest resources and separate by type - Generate LB handlers via LBHandlerGenerator - Generate QB handlers via HandlerGenerator - Combine all generated handler paths for summary Enables users to mix LB and QB endpoints in same project with correct code generation for each resource type. * feat(resources): Add LiveLoadBalancer for local LB endpoint testing Implement LiveLoadBalancer resource following the LiveServerless pattern for local development and testing of load-balanced endpoints. Changes: - Add TETRA_LB_IMAGE constant for load-balanced Tetra image - Create LiveLoadBalancer class extending LoadBalancerSlsResource - Uses LiveServerlessMixin to lock imageName to Tetra LB image - Register LiveLoadBalancer with LoadBalancerSlsStub in singledispatch - Export LiveLoadBalancer from core.resources and top-level __init__ This enables users to test LB-based functions locally before deploying, using the same pattern as LiveServerless for queue-based endpoints. Users can now write: from tetra_rp import LiveLoadBalancer, remote api = LiveLoadBalancer(name="test-api") @Remote(api, method="POST", path="/api/process") async def process_data(x, y): return {"result": x + y} result = await process_data(5, 3) # Local execution * test(stubs): Add comprehensive unit tests for LoadBalancerSlsStub Implement unit tests for LoadBalancerSlsStub covering: - Request preparation with arguments and dependencies - Response handling for success and error cases - Error handling for invalid responses - Base64 encoding/decoding of serialized data - Endpoint URL validation - Timeout and HTTP error handling Test coverage: - _prepare_request: 4 tests - _handle_response: 5 tests - _execute_function: 3 error case tests - __call__: 2 integration tests Tests verify proper function serialization, argument handling, error propagation, and response deserialization. * fix(test): Correct LB endpoint test decorator to match assertions Fix test_load_balancer_vs_queue_based_endpoints by updating the @Remote decorator to use method='POST' and path='/api/echo' to match the test assertions. This was a test-level bug where the decorator definition didn't match what was being asserted. * docs: Add comprehensive documentation for @Remote with LoadBalancer endpoints - Using_Remote_With_LoadBalancer.md: User guide for HTTP routing, local development, building and deploying - LoadBalancer_Runtime_Architecture.md: Technical details on deployment, request flows, security, and performance - Updated README.md with LoadBalancer section and code example - Updated Load_Balancer_Endpoints.md with cross-references to new guides * security: Remove /execute from deployed LoadBalancer endpoints Split @Remote execution behavior between local and deployed: - LiveLoadBalancer (local): Uses /execute endpoint for function serialization - LoadBalancerSlsResource (deployed): Uses user-defined routes with HTTP param mapping Changes: 1. LoadBalancerSlsStub routing detection: - _should_use_execute_endpoint() determines execution path - _execute_via_user_route() maps args to JSON and POSTs to user routes - Auto-detects resource type and routing metadata 2. Conditional /execute registration: - create_lb_handler() now accepts include_execute parameter - Generated handlers default to include_execute=False (security) - LiveLoadBalancer can enable /execute if needed 3. Updated handler generator: - Added clarity comments on /execute exclusion for deployed endpoints 4. Comprehensive test coverage: - 8 new tests for routing detection and execution paths - All 31 tests passing (22 unit + 9 integration) 5. Documentation updates: - Using_Remote_With_LoadBalancer.md: clarified /execute scope - Added 'Local vs Deployed Execution' section explaining differences - LoadBalancer_Runtime_Architecture.md: updated execution model - Added troubleshooting for deployed endpoint scenarios Security improvement: - Deployed endpoints only expose user-defined routes - /execute endpoint removed from production (prevents arbitrary code execution) - Lower attack surface for deployed endpoints * feat(build): Phase 4 - Fix LiveLoadBalancer handler generation to include /execute endpoint - Modified manifest.py to validate LiveLoadBalancer endpoints like LoadBalancerSlsResource - Updated lb_handler_generator to: - Include LiveLoadBalancer in handler generation filter - Pass include_execute=True for LiveLoadBalancer (local dev) - Pass include_execute=False for LoadBalancerSlsResource (deployed) - Added integration tests: - Verify LiveLoadBalancer handlers include /execute endpoint - Verify deployed handlers exclude /execute endpoint - Fixes critical bug: LiveLoadBalancer now gets /execute endpoint in generated handlers * fix(scanner): Discover LoadBalancer resources in addition to Serverless resources - Updated scanner to extract LiveLoadBalancer and LoadBalancerSlsResource resources - Previously only looked for 'Serverless' in class name, missing LoadBalancer endpoints - Now checks for both 'Serverless' and 'LoadBalancer' in resource type names - Added integration test to verify scanner discovers both resource types - Fixes critical bug that prevented flash build from finding LoadBalancer endpoints * chore: Format code for line length and remove unused imports - Wrap long lines in manifest.py, lb_handler.py, and load_balancer_sls.py - Remove unused httpx import in test_load_balancer_sls_stub.py - Apply consistent formatting across codebase * fix: Address PR #131 review feedback - Scanner: Use exact type name matching instead of substring matching - Whitelist specific resource types to avoid false positives - Prevents matching classes like 'MyServerlessHelper' or 'LoadBalancerUtils' - Type hints: Use Optional[str] for nullable fields in manifest - ManifestFunction.http_method and http_path now properly typed - Timeout: Make HTTP client timeout configurable - Added LoadBalancerSlsStub.DEFAULT_TIMEOUT class attribute - Added timeout parameter to __init__ - Updated both _execute_function and _execute_via_user_route to use self.timeout - Deprecated datetime: Replace datetime.utcnow() with datetime.now(timezone.utc) - Updated manifest.py and test_lb_remote_execution.py - Ensures Python 3.12+ compatibility * style: Format datetime chaining for line length * fix: LiveLoadBalancer template not serialized to RunPod GraphQL The set_serverless_template model_validator was being overwritten by sync_input_fields (both had mode="after"). In Pydantic v2, when two validators with the same mode are defined in a class, only one is registered. This caused templates to never be created from imageName, resulting in: "GraphQL errors: One of templateId, template is required to create an endpoint" Solution: - Move set_serverless_template validator from ServerlessResource base class to subclasses (ServerlessEndpoint and LoadBalancerSlsResource) where the validation is actually needed - Keep helper methods (_create_new_template, _configure_existing_template) in base class for reuse - Add comprehensive tests for LiveLoadBalancer template serialization This allows: 1. Base ServerlessResource to be instantiated freely for testing/configuration 2. Subclasses (ServerlessEndpoint, LoadBalancerSlsResource) to enforce template requirements during deployment 3. Proper template serialization in GraphQL payload for RunPod API Fixes: One of templateId, template is required to create an endpoint error when deploying LiveLoadBalancer with custom image tags like runpod/tetra-rp-lb:local * fix: LoadBalancer endpoint URL and add CPU support - Fix: Use correct endpoint URL format for load-balanced endpoints (https://{id}.api.runpod.ai instead of https://api.runpod.ai/v2/{id}) This fixes 404 errors on /ping health check endpoints - Feature: Add CPU LoadBalancer support * Create CpuLoadBalancerSlsResource for CPU-based load-balanced endpoints * Create CpuLiveLoadBalancer for local CPU LB development * Add TETRA_CPU_LB_IMAGE constant for CPU LB Docker image * Update example code to use CpuLiveLoadBalancer for CPU worker * Add 8 comprehensive tests for CPU LoadBalancer functionality - Tests: Add 2 tests for endpoint URL format validation - All 474 tests passing, 64% code coverage * fix: Export CpuLiveLoadBalancer and CpuLoadBalancerSlsResource from tetra_rp package LoadBalancer resources were not being discovered by ResourceDiscovery because the new CPU variants (CpuLiveLoadBalancer, CpuLoadBalancerSlsResource) were not exported from the main tetra_rp package. This prevented undeploy from picking up these resources. Added exports to: - TYPE_CHECKING imports for type hints - __getattr__ function for lazy loading - __all__ list for public API This fixes the issue where 'flash undeploy list' could not find LoadBalancer resources that were deployed with 'flash run --auto-provision'. * fix: Add API key authentication to LoadBalancer health check The /ping endpoint for RunPod load-balanced endpoints requires the RUNPOD_API_KEY header for authentication. Without it, the health check fails with 401 Unauthorized, causing provisioning to timeout. This fix adds the Authorization header to the health check request if the RUNPOD_API_KEY environment variable is available, allowing the endpoint health check to succeed during provisioning. Fixes issue where 'flash run --auto-provision' would fail even though the endpoint was successfully created on RunPod. * fix(lb): Exclude flashboot from CpuLoadBalancerSlsResource GraphQL payload CpuLoadBalancerSlsResource was overriding _input_only without including flashboot, causing it to be sent to the RunPod GraphQL API which doesn't accept this field. This caused deployment to fail with: Field "flashboot" is not defined by type "EndpointInput". * fix(lb): Expand CpuInstanceType.ANY to all CPU flavors in CpuLoadBalancerSlsResource Add field_validator to expand [CpuInstanceType.ANY] to all available CPU instance types (cpu3g, cpu3c, cpu5c variants). This matches the behavior in CpuServerlessEndpoint and prevents deployment errors like 'instanceId must be in the format of flavorId-vcpu-ram'. * refactor(cpu): Move instanceIds validator to CpuEndpointMixin Move the instanceIds field_validator from CpuServerlessEndpoint to CpuEndpointMixin so both CpuServerlessEndpoint and CpuLoadBalancerSlsResource share the same validator that expands [CpuInstanceType.ANY] to all available CPU flavors. This eliminates code duplication and ensures consistent behavior across all CPU endpoint types. * test: Update CPU instance test to reflect validator expansion Update test_cpu_live_load_balancer_defaults_to_cpu_any to verify that [CpuInstanceType.ANY] is correctly expanded to all available CPU instance types by the field_validator in CpuEndpointMixin. * fix(lb): Increase health check timeout from 5s to 15s Load-balanced workers need more time to respond during cold starts and initialization. RunPod docs recommend at least 10-15 second timeouts for health checks. Workers may return 204 during initialization, which is normal and expected. * fix(lb): Fix CPU load balancer template deployment error Fixes two bugs in CpuLoadBalancerSlsResource that prevented CPU load balancers from deploying: 1. Added gpuCount and allowedCudaVersions to _input_only exclusion set to prevent GPU-specific fields from being sent to RunPod API 2. Overrode set_serverless_template() to call _sync_cpu_fields() first, ensuring GPU defaults are overridden to CPU-appropriate values (gpuCount=0) The RunPod API was rejecting CPU load balancer templates because GPU-specific fields were being included in the GraphQL payload. These changes align CpuLoadBalancerSlsResource behavior with CpuServerlessEndpoint. Also added comprehensive test coverage (30+ tests) to verify: - GPU fields are correctly overridden to CPU defaults - GPU fields are excluded from API payloads - CPU-specific fields are properly included - Consistency with CpuServerlessEndpoint behavior * fix(drift): Exclude runtime fields from config hash to prevent false positives Fixes false positive configuration drift detection by separating concerns: 1. Update ServerlessResource.config_hash to exclude runtime fields - Fields like template, templateId, aiKey, userId are API-assigned - Prevents false drift when same config is redeployed across processes - Now only hashes user-specified configuration 2. Add config_hash override to CpuLoadBalancerSlsResource - CPU load balancers hash only CPU-relevant fields - Excludes GPU-specific fields and runtime fields - Follows same pattern as CpuServerlessEndpoint 3. Fix _has_structural_changes to exclude template/templateId - CRITICAL: These runtime fields were causing false structural changes - Was forcing unnecessary redeployments despite update() being available - Now system correctly uses update() instead of undeploy+deploy 4. Make field serializers robust to handle string/enum values - Prevents serialization errors when fields are pre-converted to strings 5. Add comprehensive drift detection tests (16 tests) - Test hash stability with runtime field changes - Test exclusion of env, template, templateId, and other runtime fields - Test that actual config changes (image, flashboot) are detected - Test structural change detection behavior - Test real-world deployment scenarios Results: - Same config deployed multiple times: no false drift - Different env vars with same config: no false drift - Template/templateId changes: no false drift - API-assigned fields: no false drift - User config changes (image, flashboot): drift detected correctly - All 512 unit tests pass * fix(http): Standardize RunPod HTTP client authentication across codebase Centralizes HTTP client creation for RunPod load-balanced endpoints to prevent manual Authorization header code duplication and ensure consistent authentication: 1. Create centralized HTTP utility function (src/tetra_rp/core/utils/http.py) - New function: get_authenticated_httpx_client() - Automatically adds Bearer token Authorization header if RUNPOD_API_KEY set - Provides consistent timeout handling (default 30s, customizable) - Follows existing GraphQL/REST client authentication pattern 2. Fix critical authentication bug in LoadBalancerSlsStub._execute_via_user_route() - Previously: Missing Authorization header (401 errors on user routes) - Now: Uses centralized utility for proper authentication - Enables direct HTTP calls to user-defined routes with auth 3. Refactor two methods to use centralized utility - LoadBalancerSlsStub._execute_function() - removes 7+ lines of manual auth code - LoadBalancerSlsResource._check_ping_endpoint() - simplifies auth setup 4. Add comprehensive unit tests (tests/unit/core/utils/test_http.py) - Tests API key presence/absence handling - Tests custom and default timeout configuration - Tests edge cases (empty key, zero timeout) - All 7 tests pass with 100% coverage Results: - Single source of truth for HTTP authentication (centralized utility) - Fixes 401 Unauthorized errors on load-balanced endpoints - Eliminates repetitive manual auth code across 3+ locations - Easier to maintain and update authentication patterns in future - All 499 unit tests pass - Code coverage: 64% (exceeds 35% requirement) * feat(http): Extend HTTP utilities to cover both sync and async authentication Extends the centralized HTTP authentication pattern to all RunPod API calls: 1. Add get_authenticated_requests_session() for synchronous requests - Creates requests.Session with automatic Bearer token Authorization header - Follows same pattern as async get_authenticated_httpx_client() - Single source of truth for sync HTTP authentication 2. Refactor template.py to use centralized utility - Removes manual Authorization header setup (line 86) - Now uses get_authenticated_requests_session() for all template updates - Improves error handling with raise_for_status() - Token parameter marked deprecated; uses RUNPOD_API_KEY env var 3. Add comprehensive tests for sync utility (4 tests) - Tests API key presence/absence handling - Tests empty API key edge case - Tests Session object validation - All tests pass with proper cleanup Benefits: - True single source of truth for all RunPod HTTP authentication (sync + async) - Consistent patterns across entire codebase - Easier future auth changes across all HTTP client types - Eliminates manual auth header code in template.py - All 503 unit tests pass - Code coverage: 64% (exceeds 35% requirement) Note: requests.Session doesn't support default timeouts; timeout should be specified per request (e.g., session.post(url, json=data, timeout=30.0)) * fix: Address PR feedback on HTTP utilities implementation Addresses three feedback items from code review: 1. Fix breaking parameter order change in update_system_dependencies() - Restored original parameter order: template_id, token, system_dependencies, base_entry_cmd - Maintains backward compatibility with existing callers - Token parameter now optional (default None) 2. Add proper deprecation warning for token parameter - Issues DeprecationWarning when token parameter is used - Clearly communicates migration to RUNPOD_API_KEY environment variable - Follows Python deprecation best practices (warnings.warn with stacklevel=2) 3. Standardize test mocking approach across all health check tests - All tests now use consistent 'tetra_rp.core.utils.http.httpx.AsyncClient' patching - Removed inconsistent 'side_effect=lambda' pattern - Improved test maintainability by using same strategy everywhere All 503 tests pass with consistent, clean implementation. * refactor(drift): Extract runtime field constants and improve maintainability - Extract RUNTIME_FIELDS and EXCLUDED_HASH_FIELDS as ClassVar constants in ServerlessResource for centralized field list management - Add clarifying comments to enum serializers explaining defensive isinstance() checks for nested model serialization - Document CPU load balancer field list coupling in docstring with maintenance guidelines - Add TestSerializerDefensiveBehavior class with 4 tests verifying pre-stringified enum value handling - Use ClassVar annotation to satisfy Pydantic v2 model field requirements This reduces maintenance burden by centralizing field definitions and improves code clarity without changing functionality. * docs: Improve LoadBalancer documentation accuracy and completeness - Fix health check timeout: Add clarification that timeout is 15 seconds per check - Add HTTP authentication details explaining RUNPOD_API_KEY header injection - Document stub decision logic for incomplete routing metadata (fallback behavior) - Clarify function signature inspection with concrete example showing parameter mapping - Expand /execute security explanation with explicit threats and best practices - Add detailed parameter type constraints for deployed endpoints (supported vs unsupported) - Add troubleshooting guide for missing routing metadata (404 errors) - Strengthen security warnings about never exposing /execute in production All documentation now matches actual implementation verified through codebase analysis. * docs: add resource config drift detection documentation - comprehensive guide on drift detection implementation - covers hash computation, field exclusion, and cpu-specific behavior - includes testing patterns and troubleshooting guide - documents all fields that trigger drift vs those ignored * docs: proper name for the file * test(build): Add comprehensive test coverage for scanner and handler improvements - Add 6 new scanner tests for directory filtering (.venv, .flash, .runpod exclusion) - Add test for resource type validation to prevent false positives - Add test for fallback behavior when resource name extraction fails - Add test for handling resource names with special characters - Update existing tests to reflect new dynamic import format and resource name extraction These tests guarantee that improvements to the scanner (resource type validation, directory filtering, fallback behavior) and handler generator (dynamic imports for invalid Python identifiers) won't regress in future changes. * test(scanner): Fix resource type assertions to match scanner behavior The scanner now extracts resource names from the name= parameter rather than using variable names. Update test assertions to expect the actual resource names ('test-api', 'deployed-api') instead of variable names. * chore: merge correction * fix(drift): Remove manual undeploy/deploy from update() method Use saveEndpoint mutation for all changes instead of manual lifecycle management. Server-side automatically detects version-triggering fields (GPU, template, volumes) and increments endpoint version accordingly. Keep _has_structural_changes() as informational for logging purposes only. This aligns with RunPod API's version-based deployment model. * docs(drift): Clarify _has_structural_changes detects version-triggering changes Update docstring to reflect that this method identifies changes that trigger server-side version increment and worker recreation, not manual redeploy cycles. Explain which changes are version-triggering vs rolling updates, and note that the method is now informational for logging only. * feat(drift): Enable environment variable drift detection Remove env from EXCLUDED_HASH_FIELDS so changes to environment variables trigger drift detection and endpoint updates. Environment changes are non-version-triggering (rolling updates), so server will apply them via saveEndpoint without recreating workers. Add env to CPU LoadBalancer config_hash for consistent behavior across all resource types. Update comments to reflect that env is user-specified configuration, not dynamically computed. * test(drift): Update tests for environment variable drift detection - test_lb_config_hash_excludes_env_variables → test_lb_config_hash_detects_env_changes - test_env_var_changes_no_drift → test_env_var_changes_trigger_drift - test_config_hash_excludes_env_from_drift → test_config_hash_detects_env_from_drift Update assertions to expect different hashes when env changes, matching new behavior where environment variable changes trigger drift and updates. * fix: Address Copilot review feedback on type hints and documentation - Fix type annotation for timeout parameter in LoadBalancerSlsStub (Optional[float]) - Replace hardcoded "30s" with actual self.timeout in error messages (2 locations) - Update Resource_Config_Drift_Detection.md to reflect actual EXCLUDED_HASH_FIELDS - Remove duplicate Load-Balanced Endpoints section from README.md Addresses Copilot review comments (PR #132, review 3642596664) * chore: Update Python version compatibility to 3.10-3.14 - Drop Python 3.9 support (EOL) - Ensure support for Python 3.14 - Update requires-python in pyproject.toml from >=3.9,<3.14 to >=3.10,<3.15 - Update mypy python_version from 3.9 to 3.10 - Update CI matrix to test Python 3.10, 3.11, 3.12, 3.13, 3.14 * chore: Increase code coverage requirement to 65% * refactor: remove dead code and add serialization tests Remove unused functions and improve test coverage: - Remove deprecated update_system_dependencies from template.py - Remove unused utility functions from utils.py and json.py - Add comprehensive test suite for serialization module (100% coverage) Tests cover serialization/deserialization of args, kwargs, and error handling for cloudpickle failures across Python 3.10-3.14.

* feat(runtime): Add generic handler factory for serverless execution Implement a factory function that creates RunPod serverless handlers, eliminating code duplication across generated handler files. The generic_handler module provides: - create_handler(function_registry) factory that accepts a dict of function/class objects and returns a RunPod-compatible handler - Automatic serialization/deserialization using cloudpickle + base64 - Support for both function execution and class instantiation + method calls - Structured error responses with full tracebacks for debugging - Load manifest for cross-endpoint function discovery This design centralizes all handler logic in one place, making it easy to: - Fix bugs once, benefit all handlers - Add new features without regenerating projects - Keep deployment packages small (handler files are ~23 lines each) Implementation: - deserialize_arguments(): Base64 + cloudpickle decoding - serialize_result(): Cloudpickle + base64 encoding - execute_function(): Handles function vs. class execution - load_manifest(): Loads flash_manifest.json for service discovery * feat(cli): Add handler generator, manifest builder, and scanner for build process Implement the build pipeline components that work together to generate serverless handlers from @Remote decorated functions. Three core components: 1. RemoteDecoratorScanner (scanner.py) - Uses Python AST to discover all @Remote decorated functions - Extracts function metadata: name, module, async status, is_class - Groups functions by resource_config for handler generation - Handles edge cases like decorated classes and async functions 2. ManifestBuilder (manifest.py) - Groups functions by their resource_config - Creates flash_manifest.json structure for service discovery - Maps functions to their modules and handler files - Enables cross-endpoint function routing at runtime 3. HandlerGenerator (handler_generator.py) - Creates lightweight handler_*.py files for each resource config - Each handler imports functions and registers them in FUNCTION_REGISTRY - Handler delegates to create_handler() factory from generic_handler - Generated handlers are ~23 lines (vs ~98 with duplication) Build Pipeline Flow: 1. Scanner discovers @Remote functions 2. ManifestBuilder groups them by resource_config 3. HandlerGenerator creates handler_*.py for each group 4. All files + manifest bundled into archive.tar.gz This eliminates ~95% duplication across handlers by using the factory pattern instead of template-based generation. * test(runtime): Add comprehensive tests for generic handler Implement 19 unit tests covering all major paths through the generic_handler factory and its helper functions. Test Coverage: Serialization/Deserialization (7 tests): - serialize_result() with simple values, dicts, lists - deserialize_arguments() with empty, args-only, kwargs-only, mixed inputs - Round-trip encoding/decoding of cloudpickle + base64 Function Execution (4 tests): - Simple function execution with positional and keyword arguments - Keyword argument handling - Class instantiation and method calls - Argument passing to instance methods Handler Factory (8 tests): - create_handler() returns callable RunPod handler - Handler with simple function registry - Missing function error handling (returns error response, not exception) - Function exceptions caught with traceback included - Multiple functions in single registry - Complex Python objects (classes, lambdas, closures) - Empty registry edge case - Default execution_type parameter - None return values - Correct RunPod response format (success, result/error, traceback) Test Strategy: - Arrange-Act-Assert pattern for clarity - Isolated unit tests (no external dependencies) - Tests verify behavior, not implementation - Error cases tested for proper error handling - All serialization tested for round-trip correctness All tests passing, 83% coverage on generic_handler.py * test(cli): Add tests for handler generation, manifest building, and scanning Implement integration tests validating the build pipeline components work correctly together. Test Coverage: HandlerGenerator Tests: - Handler files created with correct names (handler_<resource_name>.py) - Generated files import required functions from workers - FUNCTION_REGISTRY properly formatted - create_handler() imported from generic_handler - Handler creation via factory - RunPod start call present and correct - Multiple handlers generated for multiple resource configs ManifestBuilder Tests: - Manifest structure with correct version and metadata - Resources grouped by resource_config - Handler file paths correct - Function metadata preserved (name, module, is_async, is_class) - Function registry mapping complete ScannerTests: - @Remote decorated functions discovered via AST - Function metadata extracted correctly - Module paths resolved properly - Async functions detected - Class methods detected - Edge cases handled (multiple decorators, nested classes) Test Strategy: - Integration tests verify components work together - Tests verify generated files are syntactically correct - Tests validate data structures match expected schemas - No external dependencies in build process Validates that the entire build pipeline: 1. Discovers functions correctly 2. Groups them appropriately 3. Generates valid Python handler files 4. Creates correct manifest structure * docs(runtime): Document generic handler factory architecture Add comprehensive architecture documentation explaining why the factory pattern was chosen and how it works. Documentation includes: Overview & Context: - Problem statement: Handler files had 95% duplication - Design decision: Use factory function instead of templates - Benefits: Single source of truth, easier maintenance, consistency Architecture Diagrams (MermaidJS): - High-level flow: @Remote functions → Scanner → Manifest → Handlers → Factory - Component relationships: HandlerGenerator, GeneratedHandler, generic_handler - Function registry pattern: Discovery → Grouping → Registration → Factory Implementation Details: - create_handler(function_registry) signature and behavior - deserialize_arguments(): Base64 + cloudpickle decoding - serialize_result(): Cloudpickle + base64 encoding - execute_function(): Function vs. class execution - load_manifest(): Service discovery via flash_manifest.json Design Decisions (with rationale): - Factory Pattern over Inheritance: Simpler, less coupling, easier to test - CloudPickle + Base64: Handles arbitrary objects, safe JSON transmission - Manifest in Generic Handler: Runtime service discovery requirement - Structured Error Responses: Debugging aid, functional error handling - Both Execution Types: Supports stateful classes and pure functions Usage Examples: - Simple function handler - Class execution with methods - Multiple functions in one handler Build Process Integration: - 4-phase pipeline: Scanner → Grouping → Generation → Packaging - Manifest structure and contents - Generated handler structure (~23 lines) Testing Strategy: - 19 unit tests covering all major paths - 7 integration tests verifying handler generation - Manual testing with example applications Performance: - Zero runtime penalty (factory called once at startup) - No additional indirection in request path * docs(cli): Add flash build command documentation Document the flash build command and update CLI README to include it. New Documentation: flash-build.md includes: Usage & Options: - Command syntax: flash build [OPTIONS] - --no-deps: Skip transitive dependencies (faster, smaller archives) - --keep-build: Keep build directory for inspection/debugging - --output, -o: Custom archive name (default: archive.tar.gz) What It Does (5-step process): 1. Discovery: Scan for @Remote decorated functions 2. Grouping: Group functions by resource_config 3. Handler Generation: Create lightweight handler files 4. Manifest Creation: Generate flash_manifest.json 5. Packaging: Create archive.tar.gz for deployment Build Artifacts: - .flash/archive.tar.gz: Deployment package (ready for RunPod) - .flash/flash_manifest.json: Service discovery configuration - .flash/.build/: Temporary build directory Handler Generation: - Explains factory pattern and minimal handler files - Links to Runtime_Generic_Handler.md for details Dependency Management: - Default behavior: Install all dependencies including transitive - --no-deps: Only direct dependencies (when base image has transitive) - Trade-offs explained Cross-Endpoint Function Calls: - Example showing GPU and CPU endpoints - Manifest enables routing automatically Output & Troubleshooting: - Sample build output with progress indicators - Common failure scenarios and solutions - How to debug with --keep-build Next Steps: - Test locally with flash run - Deploy to RunPod - Monitor with flash undeploy list Updated CLI README.md: - Added flash build to command list in sequence - Links to full flash-build.md documentation * docs: Add build process and handler generation section to README Add a new section explaining how the build system works and why the factory pattern reduces code duplication. New Section: Build Process and Handler Generation Explains: How Flash Builds Your Application (5-step pipeline): 1. Discovery: Scans code for @Remote decorated functions 2. Grouping: Groups functions by resource_config 3. Handler Generation: Creates lightweight handler files 4. Manifest Creation: Generates flash_manifest.json for service discovery 5. Packaging: Bundles everything into archive.tar.gz Handler Architecture (with code example): - Shows generated handler using factory pattern - Single source of truth: All handler logic in one place - Easier maintenance: Bug fixes don't require rebuilding projects Cross-Endpoint Function Calls: - Example of GPU and CPU endpoints calling each other - Manifest and runtime wrapper handle service discovery Build Artifacts: - .flash/.build/: Temporary build directory - .flash/archive.tar.gz: Deployment package - .flash/flash_manifest.json: Service configuration Links to detailed documentation: - docs/Runtime_Generic_Handler.md for architecture details - src/tetra_rp/cli/docs/flash-build.md for CLI reference This section bridges the main README and detailed documentation, providing entry point for new users discovering the build system. * feat(cli): Integrate build utilities into flash build command Wire up the handler generator, manifest builder, and scanner into the actual flash build command implementation. Changes to build.py: 1. Integration: - Import RemoteDecoratorScanner for function discovery - Import ManifestBuilder for manifest creation - Import HandlerGenerator for handler file creation - Call these in sequence during the build process 2. Build Pipeline: - After copying project files, scan for @Remote functions - Build manifest from discovered functions - Generate handler files for each resource config - Write manifest to build directory - Progress indicators show what's being generated 3. Fixes: - Change .tetra directory references to .flash - Uncomment actual build logic (was showing "Coming Soon" message) - Fix progress messages to show actual file counts 4. Error Handling: - Try/catch around handler generation - Warning shown if generation fails but build continues - User can debug with --keep-build flag Build Flow Now: 1. Load ignore patterns 2. Collect project files 3. Create build directory 4. Copy files to build directory 5. [NEW] Scan for @Remote functions 6. [NEW] Build and write manifest 7. [NEW] Generate handler files 8. Install dependencies 9. Create archive 10. Clean up build directory (unless --keep-build) Dependencies: - Updated uv.lock with all required dependencies * refactor(build): Fix directory structure and add comprehensive error handling **Critical Fixes:** - Remove "Coming Soon" message blocking build command execution - Fix build directory to use .flash/.build/ directly (no app_name subdirectory) - Fix tarball to extract with flat structure using arcname="." - Fix cleanup to remove correct build directory **Error Handling & Validation:** - Add specific exception handling (ImportError, SyntaxError, ValueError) - Add import validation to generated handlers - Add duplicate function name detection across resources - Add proper error logging throughout build process **Resource Type Tracking:** - Add resource_type field to RemoteFunctionMetadata - Track actual resource types (LiveServerless, CpuLiveServerless) - Use actual types in manifest instead of hardcoding **Robustness Improvements:** - Add handler import validation post-generation - Add manifest path fallback search (cwd, module dir, legacy location) - Add resource name sanitization for safe filenames - Add specific exception logging in scanner (UnicodeDecodeError, SyntaxError) **User Experience:** - Add troubleshooting section to README - Update manifest path documentation in docs - Change "Zero Runtime Penalty" to "Minimal Runtime Overhead" - Mark future enhancements as "Not Yet Implemented" - Improve build success message with next steps Fixes all 20 issues identified in code review (issues #1-13, #19-22) * feat(resources): Add LoadBalancerSlsResource for LB endpoints Implement LoadBalancerSlsResource class for provisioning RunPod load-balanced serverless endpoints. Load-balanced endpoints expose HTTP servers directly to clients without queue-based processing, enabling REST APIs, webhooks, and real-time communication patterns. Key features: - Type enforcement (always LB, never QB) - Scaler validation (REQUEST_COUNT required, not QUEUE_DELAY) - Health check polling via /ping endpoint (200/204 = healthy) - Post-deployment verification with configurable retries - Async and sync health check methods - Comprehensive unit tests - Full documentation with architecture diagrams and examples Architecture: - Extends ServerlessResource with LB-specific behavior - Validates configuration before deployment - Polls /ping endpoint until healthy (10 retries × 5s = 50s timeout) - Raises TimeoutError if endpoint fails to become healthy This forms the foundation for Mothership architecture where a load-balanced endpoint serves as a directory server for child endpoints. * fix(test): Fix LoadBalancerSlsResource deployment test mocks Import ServerlessResource directly and use patch.object on the imported class instead of string-based patches. This ensures the mocks properly intercept the parent class's _do_deploy method when called via super(). Simplifies mock configuration and removes an unused variable assertion. Fixes the three failing deployment tests that were making real GraphQL API calls. All tests now pass: 418 passed, 1 skipped. * feat(resources): Phase 1 - Core infrastructure for @Remote on LB endpoints Implement core infrastructure for enabling @Remote decorator on LoadBalancerSlsResource endpoints with HTTP method/path routing. Changes: - Create LoadBalancerSlsStub: HTTP-based stub for direct endpoint execution (src/tetra_rp/stubs/load_balancer_sls.py, 170 lines) - Serializes functions and arguments using cloudpickle + base64 - Direct HTTP POST to /execute endpoint (no queue polling) - Proper error handling and deserialization - Register stub with singledispatch (src/tetra_rp/stubs/registry.py) - Enables @Remote to dispatch to LoadBalancerSlsStub for LB resources - Extend @Remote decorator with HTTP routing parameters (src/tetra_rp/client.py) - Add 'method' parameter: GET, POST, PUT, DELETE, PATCH - Add 'path' parameter: /api/endpoint routes - Validate method/path required for LoadBalancerSlsResource - Store routing metadata on decorated functions/classes - Warn if routing params used with non-LB resources Foundation for Phase 2 (Build system integration) and Phase 3 (Local dev). * feat(build): Phase 2.1 - Enhanced scanner for HTTP routing extraction Update RemoteDecoratorScanner to extract HTTP method and path from @Remote decorator for LoadBalancerSlsResource endpoints. Changes: - Add http_method and http_path fields to RemoteFunctionMetadata - Add _extract_http_routing() method to parse decorator keywords - Extract method (GET, POST, PUT, DELETE, PATCH) from decorator - Extract path (/api/process) from decorator - Store routing metadata for manifest generation Foundation for Phase 2.2 (Manifest updates) and Phase 2.3 (Handler generation). * feat(build): Phase 2.2 - Updated manifest schema for HTTP routing Enhance ManifestBuilder to support HTTP method/path routing for LoadBalancerSlsResource endpoints. Changes: - Add http_method and http_path fields to ManifestFunction - Validate LB endpoints have both method and path - Detect and prevent route conflicts (same method + path) - Prevent use of reserved paths (/execute, /ping) - Add 'routes' section to manifest for LB endpoints - Conditional inclusion of routing fields (only for LB) Manifest structure for LB endpoints now includes: { "resources": { "api_service": { "resource_type": "LoadBalancerSlsResource", "functions": [ { "name": "process_data", "http_method": "POST", "http_path": "/api/process" } ] } }, "routes": { "api_service": { "POST /api/process": "process_data" } } } * feat(cli): Add LB handler generator for FastAPI app creation Implement LBHandlerGenerator to create FastAPI applications for LoadBalancerSlsResource endpoints with HTTP method/path routing. Key features: - Generates FastAPI apps with explicit route registry - Creates (method, path) -> function mappings from manifest - Validates route conflicts and reserved paths - Imports user functions and creates dynamic routes - Includes required /ping health check endpoint - Validates generated handler Python syntax via import Generated handler structure enables: - Direct HTTP routing to user functions via FastAPI - Framework /execute endpoint for @Remote stub execution - Local development with uvicorn * feat(runtime): Implement LB handler factory for FastAPI app creation Create create_lb_handler() factory function that dynamically builds FastAPI applications from route registries for LoadBalancerSlsResource endpoints. Key features: - Accepts route_registry: Dict[(method, path)] -> handler_function mapping - Registers all user-defined routes from registry to FastAPI app - Provides /execute endpoint for @Remote stub function execution - Handles async function execution automatically - Serializes results with cloudpickle + base64 encoding - Comprehensive error handling with detailed logging The /execute endpoint enables: - Remote function code execution via @Remote decorator - Automatic argument deserialization from cloudpickle/base64 - Result serialization for transmission back to client - Support for both sync and async functions * feat(cli): Route build command to separate handlers for LB endpoints Update build command to use appropriate handler generators based on resource type. Separates LoadBalancerSlsResource endpoints (using FastAPI) from queue-based endpoints (using generic handler). Changes: - Import LBHandlerGenerator alongside HandlerGenerator - Inspect manifest resources and separate by type - Generate LB handlers via LBHandlerGenerator - Generate QB handlers via HandlerGenerator - Combine all generated handler paths for summary Enables users to mix LB and QB endpoints in same project with correct code generation for each resource type. * feat(resources): Add LiveLoadBalancer for local LB endpoint testing Implement LiveLoadBalancer resource following the LiveServerless pattern for local development and testing of load-balanced endpoints. Changes: - Add TETRA_LB_IMAGE constant for load-balanced Tetra image - Create LiveLoadBalancer class extending LoadBalancerSlsResource - Uses LiveServerlessMixin to lock imageName to Tetra LB image - Register LiveLoadBalancer with LoadBalancerSlsStub in singledispatch - Export LiveLoadBalancer from core.resources and top-level __init__ This enables users to test LB-based functions locally before deploying, using the same pattern as LiveServerless for queue-based endpoints. Users can now write: from tetra_rp import LiveLoadBalancer, remote api = LiveLoadBalancer(name="test-api") @Remote(api, method="POST", path="/api/process") async def process_data(x, y): return {"result": x + y} result = await process_data(5, 3) # Local execution * test(stubs): Add comprehensive unit tests for LoadBalancerSlsStub Implement unit tests for LoadBalancerSlsStub covering: - Request preparation with arguments and dependencies - Response handling for success and error cases - Error handling for invalid responses - Base64 encoding/decoding of serialized data - Endpoint URL validation - Timeout and HTTP error handling Test coverage: - _prepare_request: 4 tests - _handle_response: 5 tests - _execute_function: 3 error case tests - __call__: 2 integration tests Tests verify proper function serialization, argument handling, error propagation, and response deserialization. * fix(test): Correct LB endpoint test decorator to match assertions Fix test_load_balancer_vs_queue_based_endpoints by updating the @Remote decorator to use method='POST' and path='/api/echo' to match the test assertions. This was a test-level bug where the decorator definition didn't match what was being asserted. * docs: Add comprehensive documentation for @Remote with LoadBalancer endpoints - Using_Remote_With_LoadBalancer.md: User guide for HTTP routing, local development, building and deploying - LoadBalancer_Runtime_Architecture.md: Technical details on deployment, request flows, security, and performance - Updated README.md with LoadBalancer section and code example - Updated Load_Balancer_Endpoints.md with cross-references to new guides * security: Remove /execute from deployed LoadBalancer endpoints Split @Remote execution behavior between local and deployed: - LiveLoadBalancer (local): Uses /execute endpoint for function serialization - LoadBalancerSlsResource (deployed): Uses user-defined routes with HTTP param mapping Changes: 1. LoadBalancerSlsStub routing detection: - _should_use_execute_endpoint() determines execution path - _execute_via_user_route() maps args to JSON and POSTs to user routes - Auto-detects resource type and routing metadata 2. Conditional /execute registration: - create_lb_handler() now accepts include_execute parameter - Generated handlers default to include_execute=False (security) - LiveLoadBalancer can enable /execute if needed 3. Updated handler generator: - Added clarity comments on /execute exclusion for deployed endpoints 4. Comprehensive test coverage: - 8 new tests for routing detection and execution paths - All 31 tests passing (22 unit + 9 integration) 5. Documentation updates: - Using_Remote_With_LoadBalancer.md: clarified /execute scope - Added 'Local vs Deployed Execution' section explaining differences - LoadBalancer_Runtime_Architecture.md: updated execution model - Added troubleshooting for deployed endpoint scenarios Security improvement: - Deployed endpoints only expose user-defined routes - /execute endpoint removed from production (prevents arbitrary code execution) - Lower attack surface for deployed endpoints * feat(build): Phase 4 - Fix LiveLoadBalancer handler generation to include /execute endpoint - Modified manifest.py to validate LiveLoadBalancer endpoints like LoadBalancerSlsResource - Updated lb_handler_generator to: - Include LiveLoadBalancer in handler generation filter - Pass include_execute=True for LiveLoadBalancer (local dev) - Pass include_execute=False for LoadBalancerSlsResource (deployed) - Added integration tests: - Verify LiveLoadBalancer handlers include /execute endpoint - Verify deployed handlers exclude /execute endpoint - Fixes critical bug: LiveLoadBalancer now gets /execute endpoint in generated handlers * fix(scanner): Discover LoadBalancer resources in addition to Serverless resources - Updated scanner to extract LiveLoadBalancer and LoadBalancerSlsResource resources - Previously only looked for 'Serverless' in class name, missing LoadBalancer endpoints - Now checks for both 'Serverless' and 'LoadBalancer' in resource type names - Added integration test to verify scanner discovers both resource types - Fixes critical bug that prevented flash build from finding LoadBalancer endpoints * chore: Format code for line length and remove unused imports - Wrap long lines in manifest.py, lb_handler.py, and load_balancer_sls.py - Remove unused httpx import in test_load_balancer_sls_stub.py - Apply consistent formatting across codebase * fix: Address PR #131 review feedback - Scanner: Use exact type name matching instead of substring matching - Whitelist specific resource types to avoid false positives - Prevents matching classes like 'MyServerlessHelper' or 'LoadBalancerUtils' - Type hints: Use Optional[str] for nullable fields in manifest - ManifestFunction.http_method and http_path now properly typed - Timeout: Make HTTP client timeout configurable - Added LoadBalancerSlsStub.DEFAULT_TIMEOUT class attribute - Added timeout parameter to __init__ - Updated both _execute_function and _execute_via_user_route to use self.timeout - Deprecated datetime: Replace datetime.utcnow() with datetime.now(timezone.utc) - Updated manifest.py and test_lb_remote_execution.py - Ensures Python 3.12+ compatibility * style: Format datetime chaining for line length * fix: LiveLoadBalancer template not serialized to RunPod GraphQL The set_serverless_template model_validator was being overwritten by sync_input_fields (both had mode="after"). In Pydantic v2, when two validators with the same mode are defined in a class, only one is registered. This caused templates to never be created from imageName, resulting in: "GraphQL errors: One of templateId, template is required to create an endpoint" Solution: - Move set_serverless_template validator from ServerlessResource base class to subclasses (ServerlessEndpoint and LoadBalancerSlsResource) where the validation is actually needed - Keep helper methods (_create_new_template, _configure_existing_template) in base class for reuse - Add comprehensive tests for LiveLoadBalancer template serialization This allows: 1. Base ServerlessResource to be instantiated freely for testing/configuration 2. Subclasses (ServerlessEndpoint, LoadBalancerSlsResource) to enforce template requirements during deployment 3. Proper template serialization in GraphQL payload for RunPod API Fixes: One of templateId, template is required to create an endpoint error when deploying LiveLoadBalancer with custom image tags like runpod/tetra-rp-lb:local * fix: LoadBalancer endpoint URL and add CPU support - Fix: Use correct endpoint URL format for load-balanced endpoints (https://{id}.api.runpod.ai instead of https://api.runpod.ai/v2/{id}) This fixes 404 errors on /ping health check endpoints - Feature: Add CPU LoadBalancer support * Create CpuLoadBalancerSlsResource for CPU-based load-balanced endpoints * Create CpuLiveLoadBalancer for local CPU LB development * Add TETRA_CPU_LB_IMAGE constant for CPU LB Docker image * Update example code to use CpuLiveLoadBalancer for CPU worker * Add 8 comprehensive tests for CPU LoadBalancer functionality - Tests: Add 2 tests for endpoint URL format validation - All 474 tests passing, 64% code coverage * fix: Export CpuLiveLoadBalancer and CpuLoadBalancerSlsResource from tetra_rp package LoadBalancer resources were not being discovered by ResourceDiscovery because the new CPU variants (CpuLiveLoadBalancer, CpuLoadBalancerSlsResource) were not exported from the main tetra_rp package. This prevented undeploy from picking up these resources. Added exports to: - TYPE_CHECKING imports for type hints - __getattr__ function for lazy loading - __all__ list for public API This fixes the issue where 'flash undeploy list' could not find LoadBalancer resources that were deployed with 'flash run --auto-provision'. * fix: Add API key authentication to LoadBalancer health check The /ping endpoint for RunPod load-balanced endpoints requires the RUNPOD_API_KEY header for authentication. Without it, the health check fails with 401 Unauthorized, causing provisioning to timeout. This fix adds the Authorization header to the health check request if the RUNPOD_API_KEY environment variable is available, allowing the endpoint health check to succeed during provisioning. Fixes issue where 'flash run --auto-provision' would fail even though the endpoint was successfully created on RunPod. * fix(lb): Exclude flashboot from CpuLoadBalancerSlsResource GraphQL payload CpuLoadBalancerSlsResource was overriding _input_only without including flashboot, causing it to be sent to the RunPod GraphQL API which doesn't accept this field. This caused deployment to fail with: Field "flashboot" is not defined by type "EndpointInput". * fix(lb): Expand CpuInstanceType.ANY to all CPU flavors in CpuLoadBalancerSlsResource Add field_validator to expand [CpuInstanceType.ANY] to all available CPU instance types (cpu3g, cpu3c, cpu5c variants). This matches the behavior in CpuServerlessEndpoint and prevents deployment errors like 'instanceId must be in the format of flavorId-vcpu-ram'. * refactor(cpu): Move instanceIds validator to CpuEndpointMixin Move the instanceIds field_validator from CpuServerlessEndpoint to CpuEndpointMixin so both CpuServerlessEndpoint and CpuLoadBalancerSlsResource share the same validator that expands [CpuInstanceType.ANY] to all available CPU flavors. This eliminates code duplication and ensures consistent behavior across all CPU endpoint types. * test: Update CPU instance test to reflect validator expansion Update test_cpu_live_load_balancer_defaults_to_cpu_any to verify that [CpuInstanceType.ANY] is correctly expanded to all available CPU instance types by the field_validator in CpuEndpointMixin. * fix(lb): Increase health check timeout from 5s to 15s Load-balanced workers need more time to respond during cold starts and initialization. RunPod docs recommend at least 10-15 second timeouts for health checks. Workers may return 204 during initialization, which is normal and expected. * fix(lb): Fix CPU load balancer template deployment error Fixes two bugs in CpuLoadBalancerSlsResource that prevented CPU load balancers from deploying: 1. Added gpuCount and allowedCudaVersions to _input_only exclusion set to prevent GPU-specific fields from being sent to RunPod API 2. Overrode set_serverless_template() to call _sync_cpu_fields() first, ensuring GPU defaults are overridden to CPU-appropriate values (gpuCount=0) The RunPod API was rejecting CPU load balancer templates because GPU-specific fields were being included in the GraphQL payload. These changes align CpuLoadBalancerSlsResource behavior with CpuServerlessEndpoint. Also added comprehensive test coverage (30+ tests) to verify: - GPU fields are correctly overridden to CPU defaults - GPU fields are excluded from API payloads - CPU-specific fields are properly included - Consistency with CpuServerlessEndpoint behavior * fix(drift): Exclude runtime fields from config hash to prevent false positives Fixes false positive configuration drift detection by separating concerns: 1. Update ServerlessResource.config_hash to exclude runtime fields - Fields like template, templateId, aiKey, userId are API-assigned - Prevents false drift when same config is redeployed across processes - Now only hashes user-specified configuration 2. Add config_hash override to CpuLoadBalancerSlsResource - CPU load balancers hash only CPU-relevant fields - Excludes GPU-specific fields and runtime fields - Follows same pattern as CpuServerlessEndpoint 3. Fix _has_structural_changes to exclude template/templateId - CRITICAL: These runtime fields were causing false structural changes - Was forcing unnecessary redeployments despite update() being available - Now system correctly uses update() instead of undeploy+deploy 4. Make field serializers robust to handle string/enum values - Prevents serialization errors when fields are pre-converted to strings 5. Add comprehensive drift detection tests (16 tests) - Test hash stability with runtime field changes - Test exclusion of env, template, templateId, and other runtime fields - Test that actual config changes (image, flashboot) are detected - Test structural change detection behavior - Test real-world deployment scenarios Results: - Same config deployed multiple times: no false drift - Different env vars with same config: no false drift - Template/templateId changes: no false drift - API-assigned fields: no false drift - User config changes (image, flashboot): drift detected correctly - All 512 unit tests pass * fix(http): Standardize RunPod HTTP client authentication across codebase Centralizes HTTP client creation for RunPod load-balanced endpoints to prevent manual Authorization header code duplication and ensure consistent authentication: 1. Create centralized HTTP utility function (src/tetra_rp/core/utils/http.py) - New function: get_authenticated_httpx_client() - Automatically adds Bearer token Authorization header if RUNPOD_API_KEY set - Provides consistent timeout handling (default 30s, customizable) - Follows existing GraphQL/REST client authentication pattern 2. Fix critical authentication bug in LoadBalancerSlsStub._execute_via_user_route() - Previously: Missing Authorization header (401 errors on user routes) - Now: Uses centralized utility for proper authentication - Enables direct HTTP calls to user-defined routes with auth 3. Refactor two methods to use centralized utility - LoadBalancerSlsStub._execute_function() - removes 7+ lines of manual auth code - LoadBalancerSlsResource._check_ping_endpoint() - simplifies auth setup 4. Add comprehensive unit tests (tests/unit/core/utils/test_http.py) - Tests API key presence/absence handling - Tests custom and default timeout configuration - Tests edge cases (empty key, zero timeout) - All 7 tests pass with 100% coverage Results: - Single source of truth for HTTP authentication (centralized utility) - Fixes 401 Unauthorized errors on load-balanced endpoints - Eliminates repetitive manual auth code across 3+ locations - Easier to maintain and update authentication patterns in future - All 499 unit tests pass - Code coverage: 64% (exceeds 35% requirement) * feat(http): Extend HTTP utilities to cover both sync and async authentication Extends the centralized HTTP authentication pattern to all RunPod API calls: 1. Add get_authenticated_requests_session() for synchronous requests - Creates requests.Session with automatic Bearer token Authorization header - Follows same pattern as async get_authenticated_httpx_client() - Single source of truth for sync HTTP authentication 2. Refactor template.py to use centralized utility - Removes manual Authorization header setup (line 86) - Now uses get_authenticated_requests_session() for all template updates - Improves error handling with raise_for_status() - Token parameter marked deprecated; uses RUNPOD_API_KEY env var 3. Add comprehensive tests for sync utility (4 tests) - Tests API key presence/absence handling - Tests empty API key edge case - Tests Session object validation - All tests pass with proper cleanup Benefits: - True single source of truth for all RunPod HTTP authentication (sync + async) - Consistent patterns across entire codebase - Easier future auth changes across all HTTP client types - Eliminates manual auth header code in template.py - All 503 unit tests pass - Code coverage: 64% (exceeds 35% requirement) Note: requests.Session doesn't support default timeouts; timeout should be specified per request (e.g., session.post(url, json=data, timeout=30.0)) * fix: Address PR feedback on HTTP utilities implementation Addresses three feedback items from code review: 1. Fix breaking parameter order change in update_system_dependencies() - Restored original parameter order: template_id, token, system_dependencies, base_entry_cmd - Maintains backward compatibility with existing callers - Token parameter now optional (default None) 2. Add proper deprecation warning for token parameter - Issues DeprecationWarning when token parameter is used - Clearly communicates migration to RUNPOD_API_KEY environment variable - Follows Python deprecation best practices (warnings.warn with stacklevel=2) 3. Standardize test mocking approach across all health check tests - All tests now use consistent 'tetra_rp.core.utils.http.httpx.AsyncClient' patching - Removed inconsistent 'side_effect=lambda' pattern - Improved test maintainability by using same strategy everywhere All 503 tests pass with consistent, clean implementation. * refactor(drift): Extract runtime field constants and improve maintainability - Extract RUNTIME_FIELDS and EXCLUDED_HASH_FIELDS as ClassVar constants in ServerlessResource for centralized field list management - Add clarifying comments to enum serializers explaining defensive isinstance() checks for nested model serialization - Document CPU load balancer field list coupling in docstring with maintenance guidelines - Add TestSerializerDefensiveBehavior class with 4 tests verifying pre-stringified enum value handling - Use ClassVar annotation to satisfy Pydantic v2 model field requirements This reduces maintenance burden by centralizing field definitions and improves code clarity without changing functionality. * docs: Improve LoadBalancer documentation accuracy and completeness - Fix health check timeout: Add clarification that timeout is 15 seconds per check - Add HTTP authentication details explaining RUNPOD_API_KEY header injection - Document stub decision logic for incomplete routing metadata (fallback behavior) - Clarify function signature inspection with concrete example showing parameter mapping - Expand /execute security explanation with explicit threats and best practices - Add detailed parameter type constraints for deployed endpoints (supported vs unsupported) - Add troubleshooting guide for missing routing metadata (404 errors) - Strengthen security warnings about never exposing /execute in production All documentation now matches actual implementation verified through codebase analysis. * docs: add resource config drift detection documentation - comprehensive guide on drift detection implementation - covers hash computation, field exclusion, and cpu-specific behavior - includes testing patterns and troubleshooting guide - documents all fields that trigger drift vs those ignored * docs: proper name for the file * test(build): Add comprehensive test coverage for scanner and handler improvements - Add 6 new scanner tests for directory filtering (.venv, .flash, .runpod exclusion) - Add test for resource type validation to prevent false positives - Add test for fallback behavior when resource name extraction fails - Add test for handling resource names with special characters - Update existing tests to reflect new dynamic import format and resource name extraction These tests guarantee that improvements to the scanner (resource type validation, directory filtering, fallback behavior) and handler generator (dynamic imports for invalid Python identifiers) won't regress in future changes. * test(scanner): Fix resource type assertions to match scanner behavior The scanner now extracts resource names from the name= parameter rather than using variable names. Update test assertions to expect the actual resource names ('test-api', 'deployed-api') instead of variable names. * chore: merge correction * fix(drift): Remove manual undeploy/deploy from update() method Use saveEndpoint mutation for all changes instead of manual lifecycle management. Server-side automatically detects version-triggering fields (GPU, template, volumes) and increments endpoint version accordingly. Keep _has_structural_changes() as informational for logging purposes only. This aligns with RunPod API's version-based deployment model. * docs(drift): Clarify _has_structural_changes detects version-triggering changes Update docstring to reflect that this method identifies changes that trigger server-side version increment and worker recreation, not manual redeploy cycles. Explain which changes are version-triggering vs rolling updates, and note that the method is now informational for logging only. * feat(drift): Enable environment variable drift detection Remove env from EXCLUDED_HASH_FIELDS so changes to environment variables trigger drift detection and endpoint updates. Environment changes are non-version-triggering (rolling updates), so server will apply them via saveEndpoint without recreating workers. Add env to CPU LoadBalancer config_hash for consistent behavior across all resource types. Update comments to reflect that env is user-specified configuration, not dynamically computed. * test(drift): Update tests for environment variable drift detection - test_lb_config_hash_excludes_env_variables → test_lb_config_hash_detects_env_changes - test_env_var_changes_no_drift → test_env_var_changes_trigger_drift - test_config_hash_excludes_env_from_drift → test_config_hash_detects_env_from_drift Update assertions to expect different hashes when env changes, matching new behavior where environment variable changes trigger drift and updates. * fix: Address Copilot review feedback on type hints and documentation - Fix type annotation for timeout parameter in LoadBalancerSlsStub (Optional[float]) - Replace hardcoded "30s" with actual self.timeout in error messages (2 locations) - Update Resource_Config_Drift_Detection.md to reflect actual EXCLUDED_HASH_FIELDS - Remove duplicate Load-Balanced Endpoints section from README.md Addresses Copilot review comments (PR #132, review 3642596664) * chore: Update Python version compatibility to 3.10-3.14 - Drop Python 3.9 support (EOL) - Ensure support for Python 3.14 - Update requires-python in pyproject.toml from >=3.9,<3.14 to >=3.10,<3.15 - Update mypy python_version from 3.9 to 3.10 - Update CI matrix to test Python 3.10, 3.11, 3.12, 3.13, 3.14 * chore: Increase code coverage requirement to 65% * perf(tests): make parallel test execution the default Implement AE-1748 by making parallel test execution the default for all quality checks, achieving a 4.6x speedup (from ~96s to ~20s on 12-core machines). Changes: - Configure pytest-xdist for parallel test execution - Add worker isolation fixtures to prevent file system conflicts - Mark concurrency tests (~26 tests) as serial to avoid race conditions - Update Makefile to make parallel execution the default - Provide serial execution commands for debugging (quality-check-serial) Performance: - make quality-check: 96s → 20s (4.6x faster) - All 719 tests pass in both parallel and serial modes - Coverage maintained at 64%+ Technical details: - Worker-specific temp directories via worker_temp_dir fixture - Module-level cache clearing in reset_singletons - State file isolation per worker via isolate_resource_state_file - Serial markers on threading-specific tests Rollback: Use `make quality-check-serial` if parallel execution causes issues * refactor: remove dead code and add serialization tests Remove unused functions and improve test coverage: - Remove deprecated update_system_dependencies from template.py - Remove unused utility functions from utils.py and json.py - Add comprehensive test suite for serialization module (100% coverage) Tests cover serialization/deserialization of args, kwargs, and error handling for cloudpickle failures across Python 3.10-3.14. * fix: regenerate uv.lock with correct dependency versions The previous uv.lock was corrupted with an incomplete pytest-xdist==3.8.0 entry that referenced pytest==8.4.2 which wasn't locked. Regenerating the lock file resolves the CI/CD dependency installation failures across all Python versions. * fix: mark TestLoadBalancerSlsStubRouting as serial The @Remote decorator used in TestLoadBalancerSlsStubRouting modifies module-level state and can cause race conditions when run in parallel. Mark this test class as serial to prevent flaky failures, particularly on Python 3.10. * fix: simplify parallel test execution - remove unnecessary two-pass approach All tests pass with xdist parallel execution without needing to filter serial tests. pytest-xdist handles workers independently and coverage merges properly. Simplified Makefile to use single -n auto command for all test runs. * fix: re-add serial marker for TestLoadBalancerSlsStubRouting The @Remote decorator modifies module-level state that isn't properly isolated between parallel workers. Adding the serial marker prevents race conditions on Python 3.12 and 3.14. pytest-xdist respects the serial marker automatically. * fix: implement proper serial test handling with two-pass execution Add pytest hook to mark serial tests with xdist_group so they run without parallelization. Use two-pass test execution: 1. Parallel: Run all non-serial tests with -n auto 2. Serial: Run serial tests without parallelization, appending coverage This ensures: - No race conditions in serial tests (file locking, @Remote decorator) - Coverage properly merged across both passes - Maintains ~4.6x speedup for non-serial tests * fix: implement proper serial test handling with two-pass execution Add pytest hook to mark serial tests with xdist_group so they run without parallelization. Use two-pass test execution: 1. Parallel: Run all non-serial tests with -n auto (--cov-fail-under=0) 2. Serial: Run serial tests without parallelization, appending coverage This ensures: - No race conditions in serial tests (file locking, @Remote decorator) - Coverage properly merged across both passes - Maintains ~4.6x speedup for non-serial tests - Both passes complete even if first has < 65% coverage * chore: consistent coverage failure point Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * chore: this is about reporting coverage (no need to fail) Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * chore: don't know why it was 64 * chore: make test commands parallel by default with serial variants - All test commands now run in parallel by default using pytest-xdist auto-detect - Serial versions available with -serial suffix for debugging - test-parallel, test-parallel-workers, test-unit-parallel removed in favor of cleaner naming - test-workers added as shorthand for specifying worker count - test-fast now includes parallel execution - Quality check commands already use parallel-by-default test-coverage --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* feat(runtime): Add generic handler factory for serverless execution Implement a factory function that creates RunPod serverless handlers, eliminating code duplication across generated handler files. The generic_handler module provides: - create_handler(function_registry) factory that accepts a dict of function/class objects and returns a RunPod-compatible handler - Automatic serialization/deserialization using cloudpickle + base64 - Support for both function execution and class instantiation + method calls - Structured error responses with full tracebacks for debugging - Load manifest for cross-endpoint function discovery This design centralizes all handler logic in one place, making it easy to: - Fix bugs once, benefit all handlers - Add new features without regenerating projects - Keep deployment packages small (handler files are ~23 lines each) Implementation: - deserialize_arguments(): Base64 + cloudpickle decoding - serialize_result(): Cloudpickle + base64 encoding - execute_function(): Handles function vs. class execution - load_manifest(): Loads flash_manifest.json for service discovery * feat(cli): Add handler generator, manifest builder, and scanner for build process Implement the build pipeline components that work together to generate serverless handlers from @Remote decorated functions. Three core components: 1. RemoteDecoratorScanner (scanner.py) - Uses Python AST to discover all @Remote decorated functions - Extracts function metadata: name, module, async status, is_class - Groups functions by resource_config for handler generation - Handles edge cases like decorated classes and async functions 2. ManifestBuilder (manifest.py) - Groups functions by their resource_config - Creates flash_manifest.json structure for service discovery - Maps functions to their modules and handler files - Enables cross-endpoint function routing at runtime 3. HandlerGenerator (handler_generator.py) - Creates lightweight handler_*.py files for each resource config - Each handler imports functions and registers them in FUNCTION_REGISTRY - Handler delegates to create_handler() factory from generic_handler - Generated handlers are ~23 lines (vs ~98 with duplication) Build Pipeline Flow: 1. Scanner discovers @Remote functions 2. ManifestBuilder groups them by resource_config 3. HandlerGenerator creates handler_*.py for each group 4. All files + manifest bundled into archive.tar.gz This eliminates ~95% duplication across handlers by using the factory pattern instead of template-based generation. * test(runtime): Add comprehensive tests for generic handler Implement 19 unit tests covering all major paths through the generic_handler factory and its helper functions. Test Coverage: Serialization/Deserialization (7 tests): - serialize_result() with simple values, dicts, lists - deserialize_arguments() with empty, args-only, kwargs-only, mixed inputs - Round-trip encoding/decoding of cloudpickle + base64 Function Execution (4 tests): - Simple function execution with positional and keyword arguments - Keyword argument handling - Class instantiation and method calls - Argument passing to instance methods Handler Factory (8 tests): - create_handler() returns callable RunPod handler - Handler with simple function registry - Missing function error handling (returns error response, not exception) - Function exceptions caught with traceback included - Multiple functions in single registry - Complex Python objects (classes, lambdas, closures) - Empty registry edge case - Default execution_type parameter - None return values - Correct RunPod response format (success, result/error, traceback) Test Strategy: - Arrange-Act-Assert pattern for clarity - Isolated unit tests (no external dependencies) - Tests verify behavior, not implementation - Error cases tested for proper error handling - All serialization tested for round-trip correctness All tests passing, 83% coverage on generic_handler.py * test(cli): Add tests for handler generation, manifest building, and scanning Implement integration tests validating the build pipeline components work correctly together. Test Coverage: HandlerGenerator Tests: - Handler files created with correct names (handler_<resource_name>.py) - Generated files import required functions from workers - FUNCTION_REGISTRY properly formatted - create_handler() imported from generic_handler - Handler creation via factory - RunPod start call present and correct - Multiple handlers generated for multiple resource configs ManifestBuilder Tests: - Manifest structure with correct version and metadata - Resources grouped by resource_config - Handler file paths correct - Function metadata preserved (name, module, is_async, is_class) - Function registry mapping complete ScannerTests: - @Remote decorated functions discovered via AST - Function metadata extracted correctly - Module paths resolved properly - Async functions detected - Class methods detected - Edge cases handled (multiple decorators, nested classes) Test Strategy: - Integration tests verify components work together - Tests verify generated files are syntactically correct - Tests validate data structures match expected schemas - No external dependencies in build process Validates that the entire build pipeline: 1. Discovers functions correctly 2. Groups them appropriately 3. Generates valid Python handler files 4. Creates correct manifest structure * docs(runtime): Document generic handler factory architecture Add comprehensive architecture documentation explaining why the factory pattern was chosen and how it works. Documentation includes: Overview & Context: - Problem statement: Handler files had 95% duplication - Design decision: Use factory function instead of templates - Benefits: Single source of truth, easier maintenance, consistency Architecture Diagrams (MermaidJS): - High-level flow: @Remote functions → Scanner → Manifest → Handlers → Factory - Component relationships: HandlerGenerator, GeneratedHandler, generic_handler - Function registry pattern: Discovery → Grouping → Registration → Factory Implementation Details: - create_handler(function_registry) signature and behavior - deserialize_arguments(): Base64 + cloudpickle decoding - serialize_result(): Cloudpickle + base64 encoding - execute_function(): Function vs. class execution - load_manifest(): Service discovery via flash_manifest.json Design Decisions (with rationale): - Factory Pattern over Inheritance: Simpler, less coupling, easier to test - CloudPickle + Base64: Handles arbitrary objects, safe JSON transmission - Manifest in Generic Handler: Runtime service discovery requirement - Structured Error Responses: Debugging aid, functional error handling - Both Execution Types: Supports stateful classes and pure functions Usage Examples: - Simple function handler - Class execution with methods - Multiple functions in one handler Build Process Integration: - 4-phase pipeline: Scanner → Grouping → Generation → Packaging - Manifest structure and contents - Generated handler structure (~23 lines) Testing Strategy: - 19 unit tests covering all major paths - 7 integration tests verifying handler generation - Manual testing with example applications Performance: - Zero runtime penalty (factory called once at startup) - No additional indirection in request path * docs(cli): Add flash build command documentation Document the flash build command and update CLI README to include it. New Documentation: flash-build.md includes: Usage & Options: - Command syntax: flash build [OPTIONS] - --no-deps: Skip transitive dependencies (faster, smaller archives) - --keep-build: Keep build directory for inspection/debugging - --output, -o: Custom archive name (default: archive.tar.gz) What It Does (5-step process): 1. Discovery: Scan for @Remote decorated functions 2. Grouping: Group functions by resource_config 3. Handler Generation: Create lightweight handler files 4. Manifest Creation: Generate flash_manifest.json 5. Packaging: Create archive.tar.gz for deployment Build Artifacts: - .flash/archive.tar.gz: Deployment package (ready for RunPod) - .flash/flash_manifest.json: Service discovery configuration - .flash/.build/: Temporary build directory Handler Generation: - Explains factory pattern and minimal handler files - Links to Runtime_Generic_Handler.md for details Dependency Management: - Default behavior: Install all dependencies including transitive - --no-deps: Only direct dependencies (when base image has transitive) - Trade-offs explained Cross-Endpoint Function Calls: - Example showing GPU and CPU endpoints - Manifest enables routing automatically Output & Troubleshooting: - Sample build output with progress indicators - Common failure scenarios and solutions - How to debug with --keep-build Next Steps: - Test locally with flash run - Deploy to RunPod - Monitor with flash undeploy list Updated CLI README.md: - Added flash build to command list in sequence - Links to full flash-build.md documentation * docs: Add build process and handler generation section to README Add a new section explaining how the build system works and why the factory pattern reduces code duplication. New Section: Build Process and Handler Generation Explains: How Flash Builds Your Application (5-step pipeline): 1. Discovery: Scans code for @Remote decorated functions 2. Grouping: Groups functions by resource_config 3. Handler Generation: Creates lightweight handler files 4. Manifest Creation: Generates flash_manifest.json for service discovery 5. Packaging: Bundles everything into archive.tar.gz Handler Architecture (with code example): - Shows generated handler using factory pattern - Single source of truth: All handler logic in one place - Easier maintenance: Bug fixes don't require rebuilding projects Cross-Endpoint Function Calls: - Example of GPU and CPU endpoints calling each other - Manifest and runtime wrapper handle service discovery Build Artifacts: - .flash/.build/: Temporary build directory - .flash/archive.tar.gz: Deployment package - .flash/flash_manifest.json: Service configuration Links to detailed documentation: - docs/Runtime_Generic_Handler.md for architecture details - src/tetra_rp/cli/docs/flash-build.md for CLI reference This section bridges the main README and detailed documentation, providing entry point for new users discovering the build system. * feat(cli): Integrate build utilities into flash build command Wire up the handler generator, manifest builder, and scanner into the actual flash build command implementation. Changes to build.py: 1. Integration: - Import RemoteDecoratorScanner for function discovery - Import ManifestBuilder for manifest creation - Import HandlerGenerator for handler file creation - Call these in sequence during the build process 2. Build Pipeline: - After copying project files, scan for @Remote functions - Build manifest from discovered functions - Generate handler files for each resource config - Write manifest to build directory - Progress indicators show what's being generated 3. Fixes: - Change .tetra directory references to .flash - Uncomment actual build logic (was showing "Coming Soon" message) - Fix progress messages to show actual file counts 4. Error Handling: - Try/catch around handler generation - Warning shown if generation fails but build continues - User can debug with --keep-build flag Build Flow Now: 1. Load ignore patterns 2. Collect project files 3. Create build directory 4. Copy files to build directory 5. [NEW] Scan for @Remote functions 6. [NEW] Build and write manifest 7. [NEW] Generate handler files 8. Install dependencies 9. Create archive 10. Clean up build directory (unless --keep-build) Dependencies: - Updated uv.lock with all required dependencies * refactor(build): Fix directory structure and add comprehensive error handling **Critical Fixes:** - Remove "Coming Soon" message blocking build command execution - Fix build directory to use .flash/.build/ directly (no app_name subdirectory) - Fix tarball to extract with flat structure using arcname="." - Fix cleanup to remove correct build directory **Error Handling & Validation:** - Add specific exception handling (ImportError, SyntaxError, ValueError) - Add import validation to generated handlers - Add duplicate function name detection across resources - Add proper error logging throughout build process **Resource Type Tracking:** - Add resource_type field to RemoteFunctionMetadata - Track actual resource types (LiveServerless, CpuLiveServerless) - Use actual types in manifest instead of hardcoding **Robustness Improvements:** - Add handler import validation post-generation - Add manifest path fallback search (cwd, module dir, legacy location) - Add resource name sanitization for safe filenames - Add specific exception logging in scanner (UnicodeDecodeError, SyntaxError) **User Experience:** - Add troubleshooting section to README - Update manifest path documentation in docs - Change "Zero Runtime Penalty" to "Minimal Runtime Overhead" - Mark future enhancements as "Not Yet Implemented" - Improve build success message with next steps Fixes all 20 issues identified in code review (issues #1-13, #19-22) * feat(resources): Add LoadBalancerSlsResource for LB endpoints Implement LoadBalancerSlsResource class for provisioning RunPod load-balanced serverless endpoints. Load-balanced endpoints expose HTTP servers directly to clients without queue-based processing, enabling REST APIs, webhooks, and real-time communication patterns. Key features: - Type enforcement (always LB, never QB) - Scaler validation (REQUEST_COUNT required, not QUEUE_DELAY) - Health check polling via /ping endpoint (200/204 = healthy) - Post-deployment verification with configurable retries - Async and sync health check methods - Comprehensive unit tests - Full documentation with architecture diagrams and examples Architecture: - Extends ServerlessResource with LB-specific behavior - Validates configuration before deployment - Polls /ping endpoint until healthy (10 retries × 5s = 50s timeout) - Raises TimeoutError if endpoint fails to become healthy This forms the foundation for Mothership architecture where a load-balanced endpoint serves as a directory server for child endpoints. * fix(test): Fix LoadBalancerSlsResource deployment test mocks Import ServerlessResource directly and use patch.object on the imported class instead of string-based patches. This ensures the mocks properly intercept the parent class's _do_deploy method when called via super(). Simplifies mock configuration and removes an unused variable assertion. Fixes the three failing deployment tests that were making real GraphQL API calls. All tests now pass: 418 passed, 1 skipped. * feat(resources): Phase 1 - Core infrastructure for @Remote on LB endpoints Implement core infrastructure for enabling @Remote decorator on LoadBalancerSlsResource endpoints with HTTP method/path routing. Changes: - Create LoadBalancerSlsStub: HTTP-based stub for direct endpoint execution (src/tetra_rp/stubs/load_balancer_sls.py, 170 lines) - Serializes functions and arguments using cloudpickle + base64 - Direct HTTP POST to /execute endpoint (no queue polling) - Proper error handling and deserialization - Register stub with singledispatch (src/tetra_rp/stubs/registry.py) - Enables @Remote to dispatch to LoadBalancerSlsStub for LB resources - Extend @Remote decorator with HTTP routing parameters (src/tetra_rp/client.py) - Add 'method' parameter: GET, POST, PUT, DELETE, PATCH - Add 'path' parameter: /api/endpoint routes - Validate method/path required for LoadBalancerSlsResource - Store routing metadata on decorated functions/classes - Warn if routing params used with non-LB resources Foundation for Phase 2 (Build system integration) and Phase 3 (Local dev). * feat(build): Phase 2.1 - Enhanced scanner for HTTP routing extraction Update RemoteDecoratorScanner to extract HTTP method and path from @Remote decorator for LoadBalancerSlsResource endpoints. Changes: - Add http_method and http_path fields to RemoteFunctionMetadata - Add _extract_http_routing() method to parse decorator keywords - Extract method (GET, POST, PUT, DELETE, PATCH) from decorator - Extract path (/api/process) from decorator - Store routing metadata for manifest generation Foundation for Phase 2.2 (Manifest updates) and Phase 2.3 (Handler generation). * feat(build): Phase 2.2 - Updated manifest schema for HTTP routing Enhance ManifestBuilder to support HTTP method/path routing for LoadBalancerSlsResource endpoints. Changes: - Add http_method and http_path fields to ManifestFunction - Validate LB endpoints have both method and path - Detect and prevent route conflicts (same method + path) - Prevent use of reserved paths (/execute, /ping) - Add 'routes' section to manifest for LB endpoints - Conditional inclusion of routing fields (only for LB) Manifest structure for LB endpoints now includes: { "resources": { "api_service": { "resource_type": "LoadBalancerSlsResource", "functions": [ { "name": "process_data", "http_method": "POST", "http_path": "/api/process" } ] } }, "routes": { "api_service": { "POST /api/process": "process_data" } } } * feat(cli): Add LB handler generator for FastAPI app creation Implement LBHandlerGenerator to create FastAPI applications for LoadBalancerSlsResource endpoints with HTTP method/path routing. Key features: - Generates FastAPI apps with explicit route registry - Creates (method, path) -> function mappings from manifest - Validates route conflicts and reserved paths - Imports user functions and creates dynamic routes - Includes required /ping health check endpoint - Validates generated handler Python syntax via import Generated handler structure enables: - Direct HTTP routing to user functions via FastAPI - Framework /execute endpoint for @Remote stub execution - Local development with uvicorn * feat(runtime): Implement LB handler factory for FastAPI app creation Create create_lb_handler() factory function that dynamically builds FastAPI applications from route registries for LoadBalancerSlsResource endpoints. Key features: - Accepts route_registry: Dict[(method, path)] -> handler_function mapping - Registers all user-defined routes from registry to FastAPI app - Provides /execute endpoint for @Remote stub function execution - Handles async function execution automatically - Serializes results with cloudpickle + base64 encoding - Comprehensive error handling with detailed logging The /execute endpoint enables: - Remote function code execution via @Remote decorator - Automatic argument deserialization from cloudpickle/base64 - Result serialization for transmission back to client - Support for both sync and async functions * feat(cli): Route build command to separate handlers for LB endpoints Update build command to use appropriate handler generators based on resource type. Separates LoadBalancerSlsResource endpoints (using FastAPI) from queue-based endpoints (using generic handler). Changes: - Import LBHandlerGenerator alongside HandlerGenerator - Inspect manifest resources and separate by type - Generate LB handlers via LBHandlerGenerator - Generate QB handlers via HandlerGenerator - Combine all generated handler paths for summary Enables users to mix LB and QB endpoints in same project with correct code generation for each resource type. * feat(resources): Add LiveLoadBalancer for local LB endpoint testing Implement LiveLoadBalancer resource following the LiveServerless pattern for local development and testing of load-balanced endpoints. Changes: - Add TETRA_LB_IMAGE constant for load-balanced Tetra image - Create LiveLoadBalancer class extending LoadBalancerSlsResource - Uses LiveServerlessMixin to lock imageName to Tetra LB image - Register LiveLoadBalancer with LoadBalancerSlsStub in singledispatch - Export LiveLoadBalancer from core.resources and top-level __init__ This enables users to test LB-based functions locally before deploying, using the same pattern as LiveServerless for queue-based endpoints. Users can now write: from tetra_rp import LiveLoadBalancer, remote api = LiveLoadBalancer(name="test-api") @Remote(api, method="POST", path="/api/process") async def process_data(x, y): return {"result": x + y} result = await process_data(5, 3) # Local execution * test(stubs): Add comprehensive unit tests for LoadBalancerSlsStub Implement unit tests for LoadBalancerSlsStub covering: - Request preparation with arguments and dependencies - Response handling for success and error cases - Error handling for invalid responses - Base64 encoding/decoding of serialized data - Endpoint URL validation - Timeout and HTTP error handling Test coverage: - _prepare_request: 4 tests - _handle_response: 5 tests - _execute_function: 3 error case tests - __call__: 2 integration tests Tests verify proper function serialization, argument handling, error propagation, and response deserialization. * fix(test): Correct LB endpoint test decorator to match assertions Fix test_load_balancer_vs_queue_based_endpoints by updating the @Remote decorator to use method='POST' and path='/api/echo' to match the test assertions. This was a test-level bug where the decorator definition didn't match what was being asserted. * docs: Add comprehensive documentation for @Remote with LoadBalancer endpoints - Using_Remote_With_LoadBalancer.md: User guide for HTTP routing, local development, building and deploying - LoadBalancer_Runtime_Architecture.md: Technical details on deployment, request flows, security, and performance - Updated README.md with LoadBalancer section and code example - Updated Load_Balancer_Endpoints.md with cross-references to new guides * security: Remove /execute from deployed LoadBalancer endpoints Split @Remote execution behavior between local and deployed: - LiveLoadBalancer (local): Uses /execute endpoint for function serialization - LoadBalancerSlsResource (deployed): Uses user-defined routes with HTTP param mapping Changes: 1. LoadBalancerSlsStub routing detection: - _should_use_execute_endpoint() determines execution path - _execute_via_user_route() maps args to JSON and POSTs to user routes - Auto-detects resource type and routing metadata 2. Conditional /execute registration: - create_lb_handler() now accepts include_execute parameter - Generated handlers default to include_execute=False (security) - LiveLoadBalancer can enable /execute if needed 3. Updated handler generator: - Added clarity comments on /execute exclusion for deployed endpoints 4. Comprehensive test coverage: - 8 new tests for routing detection and execution paths - All 31 tests passing (22 unit + 9 integration) 5. Documentation updates: - Using_Remote_With_LoadBalancer.md: clarified /execute scope - Added 'Local vs Deployed Execution' section explaining differences - LoadBalancer_Runtime_Architecture.md: updated execution model - Added troubleshooting for deployed endpoint scenarios Security improvement: - Deployed endpoints only expose user-defined routes - /execute endpoint removed from production (prevents arbitrary code execution) - Lower attack surface for deployed endpoints * feat(build): Phase 4 - Fix LiveLoadBalancer handler generation to include /execute endpoint - Modified manifest.py to validate LiveLoadBalancer endpoints like LoadBalancerSlsResource - Updated lb_handler_generator to: - Include LiveLoadBalancer in handler generation filter - Pass include_execute=True for LiveLoadBalancer (local dev) - Pass include_execute=False for LoadBalancerSlsResource (deployed) - Added integration tests: - Verify LiveLoadBalancer handlers include /execute endpoint - Verify deployed handlers exclude /execute endpoint - Fixes critical bug: LiveLoadBalancer now gets /execute endpoint in generated handlers * fix(scanner): Discover LoadBalancer resources in addition to Serverless resources - Updated scanner to extract LiveLoadBalancer and LoadBalancerSlsResource resources - Previously only looked for 'Serverless' in class name, missing LoadBalancer endpoints - Now checks for both 'Serverless' and 'LoadBalancer' in resource type names - Added integration test to verify scanner discovers both resource types - Fixes critical bug that prevented flash build from finding LoadBalancer endpoints * chore: Format code for line length and remove unused imports - Wrap long lines in manifest.py, lb_handler.py, and load_balancer_sls.py - Remove unused httpx import in test_load_balancer_sls_stub.py - Apply consistent formatting across codebase * fix: Address PR #131 review feedback - Scanner: Use exact type name matching instead of substring matching - Whitelist specific resource types to avoid false positives - Prevents matching classes like 'MyServerlessHelper' or 'LoadBalancerUtils' - Type hints: Use Optional[str] for nullable fields in manifest - ManifestFunction.http_method and http_path now properly typed - Timeout: Make HTTP client timeout configurable - Added LoadBalancerSlsStub.DEFAULT_TIMEOUT class attribute - Added timeout parameter to __init__ - Updated both _execute_function and _execute_via_user_route to use self.timeout - Deprecated datetime: Replace datetime.utcnow() with datetime.now(timezone.utc) - Updated manifest.py and test_lb_remote_execution.py - Ensures Python 3.12+ compatibility * style: Format datetime chaining for line length * fix: LiveLoadBalancer template not serialized to RunPod GraphQL The set_serverless_template model_validator was being overwritten by sync_input_fields (both had mode="after"). In Pydantic v2, when two validators with the same mode are defined in a class, only one is registered. This caused templates to never be created from imageName, resulting in: "GraphQL errors: One of templateId, template is required to create an endpoint" Solution: - Move set_serverless_template validator from ServerlessResource base class to subclasses (ServerlessEndpoint and LoadBalancerSlsResource) where the validation is actually needed - Keep helper methods (_create_new_template, _configure_existing_template) in base class for reuse - Add comprehensive tests for LiveLoadBalancer template serialization This allows: 1. Base ServerlessResource to be instantiated freely for testing/configuration 2. Subclasses (ServerlessEndpoint, LoadBalancerSlsResource) to enforce template requirements during deployment 3. Proper template serialization in GraphQL payload for RunPod API Fixes: One of templateId, template is required to create an endpoint error when deploying LiveLoadBalancer with custom image tags like runpod/tetra-rp-lb:local * fix: LoadBalancer endpoint URL and add CPU support - Fix: Use correct endpoint URL format for load-balanced endpoints (https://{id}.api.runpod.ai instead of https://api.runpod.ai/v2/{id}) This fixes 404 errors on /ping health check endpoints - Feature: Add CPU LoadBalancer support * Create CpuLoadBalancerSlsResource for CPU-based load-balanced endpoints * Create CpuLiveLoadBalancer for local CPU LB development * Add TETRA_CPU_LB_IMAGE constant for CPU LB Docker image * Update example code to use CpuLiveLoadBalancer for CPU worker * Add 8 comprehensive tests for CPU LoadBalancer functionality - Tests: Add 2 tests for endpoint URL format validation - All 474 tests passing, 64% code coverage * fix: Export CpuLiveLoadBalancer and CpuLoadBalancerSlsResource from tetra_rp package LoadBalancer resources were not being discovered by ResourceDiscovery because the new CPU variants (CpuLiveLoadBalancer, CpuLoadBalancerSlsResource) were not exported from the main tetra_rp package. This prevented undeploy from picking up these resources. Added exports to: - TYPE_CHECKING imports for type hints - __getattr__ function for lazy loading - __all__ list for public API This fixes the issue where 'flash undeploy list' could not find LoadBalancer resources that were deployed with 'flash run --auto-provision'. * fix: Add API key authentication to LoadBalancer health check The /ping endpoint for RunPod load-balanced endpoints requires the RUNPOD_API_KEY header for authentication. Without it, the health check fails with 401 Unauthorized, causing provisioning to timeout. This fix adds the Authorization header to the health check request if the RUNPOD_API_KEY environment variable is available, allowing the endpoint health check to succeed during provisioning. Fixes issue where 'flash run --auto-provision' would fail even though the endpoint was successfully created on RunPod. * fix(lb): Exclude flashboot from CpuLoadBalancerSlsResource GraphQL payload CpuLoadBalancerSlsResource was overriding _input_only without including flashboot, causing it to be sent to the RunPod GraphQL API which doesn't accept this field. This caused deployment to fail with: Field "flashboot" is not defined by type "EndpointInput". * fix(lb): Expand CpuInstanceType.ANY to all CPU flavors in CpuLoadBalancerSlsResource Add field_validator to expand [CpuInstanceType.ANY] to all available CPU instance types (cpu3g, cpu3c, cpu5c variants). This matches the behavior in CpuServerlessEndpoint and prevents deployment errors like 'instanceId must be in the format of flavorId-vcpu-ram'. * refactor(cpu): Move instanceIds validator to CpuEndpointMixin Move the instanceIds field_validator from CpuServerlessEndpoint to CpuEndpointMixin so both CpuServerlessEndpoint and CpuLoadBalancerSlsResource share the same validator that expands [CpuInstanceType.ANY] to all available CPU flavors. This eliminates code duplication and ensures consistent behavior across all CPU endpoint types. * test: Update CPU instance test to reflect validator expansion Update test_cpu_live_load_balancer_defaults_to_cpu_any to verify that [CpuInstanceType.ANY] is correctly expanded to all available CPU instance types by the field_validator in CpuEndpointMixin. * fix(lb): Increase health check timeout from 5s to 15s Load-balanced workers need more time to respond during cold starts and initialization. RunPod docs recommend at least 10-15 second timeouts for health checks. Workers may return 204 during initialization, which is normal and expected. * fix(lb): Fix CPU load balancer template deployment error Fixes two bugs in CpuLoadBalancerSlsResource that prevented CPU load balancers from deploying: 1. Added gpuCount and allowedCudaVersions to _input_only exclusion set to prevent GPU-specific fields from being sent to RunPod API 2. Overrode set_serverless_template() to call _sync_cpu_fields() first, ensuring GPU defaults are overridden to CPU-appropriate values (gpuCount=0) The RunPod API was rejecting CPU load balancer templates because GPU-specific fields were being included in the GraphQL payload. These changes align CpuLoadBalancerSlsResource behavior with CpuServerlessEndpoint. Also added comprehensive test coverage (30+ tests) to verify: - GPU fields are correctly overridden to CPU defaults - GPU fields are excluded from API payloads - CPU-specific fields are properly included - Consistency with CpuServerlessEndpoint behavior * fix(drift): Exclude runtime fields from config hash to prevent false positives Fixes false positive configuration drift detection by separating concerns: 1. Update ServerlessResource.config_hash to exclude runtime fields - Fields like template, templateId, aiKey, userId are API-assigned - Prevents false drift when same config is redeployed across processes - Now only hashes user-specified configuration 2. Add config_hash override to CpuLoadBalancerSlsResource - CPU load balancers hash only CPU-relevant fields - Excludes GPU-specific fields and runtime fields - Follows same pattern as CpuServerlessEndpoint 3. Fix _has_structural_changes to exclude template/templateId - CRITICAL: These runtime fields were causing false structural changes - Was forcing unnecessary redeployments despite update() being available - Now system correctly uses update() instead of undeploy+deploy 4. Make field serializers robust to handle string/enum values - Prevents serialization errors when fields are pre-converted to strings 5. Add comprehensive drift detection tests (16 tests) - Test hash stability with runtime field changes - Test exclusion of env, template, templateId, and other runtime fields - Test that actual config changes (image, flashboot) are detected - Test structural change detection behavior - Test real-world deployment scenarios Results: - Same config deployed multiple times: no false drift - Different env vars with same config: no false drift - Template/templateId changes: no false drift - API-assigned fields: no false drift - User config changes (image, flashboot): drift detected correctly - All 512 unit tests pass * fix(http): Standardize RunPod HTTP client authentication across codebase Centralizes HTTP client creation for RunPod load-balanced endpoints to prevent manual Authorization header code duplication and ensure consistent authentication: 1. Create centralized HTTP utility function (src/tetra_rp/core/utils/http.py) - New function: get_authenticated_httpx_client() - Automatically adds Bearer token Authorization header if RUNPOD_API_KEY set - Provides consistent timeout handling (default 30s, customizable) - Follows existing GraphQL/REST client authentication pattern 2. Fix critical authentication bug in LoadBalancerSlsStub._execute_via_user_route() - Previously: Missing Authorization header (401 errors on user routes) - Now: Uses centralized utility for proper authentication - Enables direct HTTP calls to user-defined routes with auth 3. Refactor two methods to use centralized utility - LoadBalancerSlsStub._execute_function() - removes 7+ lines of manual auth code - LoadBalancerSlsResource._check_ping_endpoint() - simplifies auth setup 4. Add comprehensive unit tests (tests/unit/core/utils/test_http.py) - Tests API key presence/absence handling - Tests custom and default timeout configuration - Tests edge cases (empty key, zero timeout) - All 7 tests pass with 100% coverage Results: - Single source of truth for HTTP authentication (centralized utility) - Fixes 401 Unauthorized errors on load-balanced endpoints - Eliminates repetitive manual auth code across 3+ locations - Easier to maintain and update authentication patterns in future - All 499 unit tests pass - Code coverage: 64% (exceeds 35% requirement) * feat(http): Extend HTTP utilities to cover both sync and async authentication Extends the centralized HTTP authentication pattern to all RunPod API calls: 1. Add get_authenticated_requests_session() for synchronous requests - Creates requests.Session with automatic Bearer token Authorization header - Follows same pattern as async get_authenticated_httpx_client() - Single source of truth for sync HTTP authentication 2. Refactor template.py to use centralized utility - Removes manual Authorization header setup (line 86) - Now uses get_authenticated_requests_session() for all template updates - Improves error handling with raise_for_status() - Token parameter marked deprecated; uses RUNPOD_API_KEY env var 3. Add comprehensive tests for sync utility (4 tests) - Tests API key presence/absence handling - Tests empty API key edge case - Tests Session object validation - All tests pass with proper cleanup Benefits: - True single source of truth for all RunPod HTTP authentication (sync + async) - Consistent patterns across entire codebase - Easier future auth changes across all HTTP client types - Eliminates manual auth header code in template.py - All 503 unit tests pass - Code coverage: 64% (exceeds 35% requirement) Note: requests.Session doesn't support default timeouts; timeout should be specified per request (e.g., session.post(url, json=data, timeout=30.0)) * fix: Address PR feedback on HTTP utilities implementation Addresses three feedback items from code review: 1. Fix breaking parameter order change in update_system_dependencies() - Restored original parameter order: template_id, token, system_dependencies, base_entry_cmd - Maintains backward compatibility with existing callers - Token parameter now optional (default None) 2. Add proper deprecation warning for token parameter - Issues DeprecationWarning when token parameter is used - Clearly communicates migration to RUNPOD_API_KEY environment variable - Follows Python deprecation best practices (warnings.warn with stacklevel=2) 3. Standardize test mocking approach across all health check tests - All tests now use consistent 'tetra_rp.core.utils.http.httpx.AsyncClient' patching - Removed inconsistent 'side_effect=lambda' pattern - Improved test maintainability by using same strategy everywhere All 503 tests pass with consistent, clean implementation. * refactor(drift): Extract runtime field constants and improve maintainability - Extract RUNTIME_FIELDS and EXCLUDED_HASH_FIELDS as ClassVar constants in ServerlessResource for centralized field list management - Add clarifying comments to enum serializers explaining defensive isinstance() checks for nested model serialization - Document CPU load balancer field list coupling in docstring with maintenance guidelines - Add TestSerializerDefensiveBehavior class with 4 tests verifying pre-stringified enum value handling - Use ClassVar annotation to satisfy Pydantic v2 model field requirements This reduces maintenance burden by centralizing field definitions and improves code clarity without changing functionality. * docs: Improve LoadBalancer documentation accuracy and completeness - Fix health check timeout: Add clarification that timeout is 15 seconds per check - Add HTTP authentication details explaining RUNPOD_API_KEY header injection - Document stub decision logic for incomplete routing metadata (fallback behavior) - Clarify function signature inspection with concrete example showing parameter mapping - Expand /execute security explanation with explicit threats and best practices - Add detailed parameter type constraints for deployed endpoints (supported vs unsupported) - Add troubleshooting guide for missing routing metadata (404 errors) - Strengthen security warnings about never exposing /execute in production All documentation now matches actual implementation verified through codebase analysis. * docs: add resource config drift detection documentation - comprehensive guide on drift detection implementation - covers hash computation, field exclusion, and cpu-specific behavior - includes testing patterns and troubleshooting guide - documents all fields that trigger drift vs those ignored * docs: proper name for the file * test(build): Add comprehensive test coverage for scanner and handler improvements - Add 6 new scanner tests for directory filtering (.venv, .flash, .runpod exclusion) - Add test for resource type validation to prevent false positives - Add test for fallback behavior when resource name extraction fails - Add test for handling resource names with special characters - Update existing tests to reflect new dynamic import format and resource name extraction These tests guarantee that improvements to the scanner (resource type validation, directory filtering, fallback behavior) and handler generator (dynamic imports for invalid Python identifiers) won't regress in future changes. * test(scanner): Fix resource type assertions to match scanner behavior The scanner now extracts resource names from the name= parameter rather than using variable names. Update test assertions to expect the actual resource names ('test-api', 'deployed-api') instead of variable names. * chore: merge correction * fix(drift): Remove manual undeploy/deploy from update() method Use saveEndpoint mutation for all changes instead of manual lifecycle management. Server-side automatically detects version-triggering fields (GPU, template, volumes) and increments endpoint version accordingly. Keep _has_structural_changes() as informational for logging purposes only. This aligns with RunPod API's version-based deployment model. * docs(drift): Clarify _has_structural_changes detects version-triggering changes Update docstring to reflect that this method identifies changes that trigger server-side version increment and worker recreation, not manual redeploy cycles. Explain which changes are version-triggering vs rolling updates, and note that the method is now informational for logging only. * feat(drift): Enable environment variable drift detection Remove env from EXCLUDED_HASH_FIELDS so changes to environment variables trigger drift detection and endpoint updates. Environment changes are non-version-triggering (rolling updates), so server will apply them via saveEndpoint without recreating workers. Add env to CPU LoadBalancer config_hash for consistent behavior across all resource types. Update comments to reflect that env is user-specified configuration, not dynamically computed. * test(drift): Update tests for environment variable drift detection - test_lb_config_hash_excludes_env_variables → test_lb_config_hash_detects_env_changes - test_env_var_changes_no_drift → test_env_var_changes_trigger_drift - test_config_hash_excludes_env_from_drift → test_config_hash_detects_env_from_drift Update assertions to expect different hashes when env changes, matching new behavior where environment variable changes trigger drift and updates. * fix: Address Copilot review feedback on type hints and documentation - Fix type annotation for timeout parameter in LoadBalancerSlsStub (Optional[float]) - Replace hardcoded "30s" with actual self.timeout in error messages (2 locations) - Update Resource_Config_Drift_Detection.md to reflect actual EXCLUDED_HASH_FIELDS - Remove duplicate Load-Balanced Endpoints section from README.md Addresses Copilot review comments (PR #132, review 3642596664) * chore: Update Python version compatibility to 3.10-3.14 - Drop Python 3.9 support (EOL) - Ensure support for Python 3.14 - Update requires-python in pyproject.toml from >=3.9,<3.14 to >=3.10,<3.15 - Update mypy python_version from 3.9 to 3.10 - Update CI matrix to test Python 3.10, 3.11, 3.12, 3.13, 3.14 * chore: Increase code coverage requirement to 65% * perf(tests): make parallel test execution the default Implement AE-1748 by making parallel test execution the default for all quality checks, achieving a 4.6x speedup (from ~96s to ~20s on 12-core machines). Changes: - Configure pytest-xdist for parallel test execution - Add worker isolation fixtures to prevent file system conflicts - Mark concurrency tests (~26 tests) as serial to avoid race conditions - Update Makefile to make parallel execution the default - Provide serial execution commands for debugging (quality-check-serial) Performance: - make quality-check: 96s → 20s (4.6x faster) - All 719 tests pass in both parallel and serial modes - Coverage maintained at 64%+ Technical details: - Worker-specific temp directories via worker_temp_dir fixture - Module-level cache clearing in reset_singletons - State file isolation per worker via isolate_resource_state_file - Serial markers on threading-specific tests Rollback: Use `make quality-check-serial` if parallel execution causes issues * refactor: remove dead code and add serialization tests Remove unused functions and improve test coverage: - Remove deprecated update_system_dependencies from template.py - Remove unused utility functions from utils.py and json.py - Add comprehensive test suite for serialization module (100% coverage) Tests cover serialization/deserialization of args, kwargs, and error handling for cloudpickle failures across Python 3.10-3.14. * fix: regenerate uv.lock with correct dependency versions The previous uv.lock was corrupted with an incomplete pytest-xdist==3.8.0 entry that referenced pytest==8.4.2 which wasn't locked. Regenerating the lock file resolves the CI/CD dependency installation failures across all Python versions. * fix: mark TestLoadBalancerSlsStubRouting as serial The @Remote decorator used in TestLoadBalancerSlsStubRouting modifies module-level state and can cause race conditions when run in parallel. Mark this test class as serial to prevent flaky failures, particularly on Python 3.10. * fix: simplify parallel test execution - remove unnecessary two-pass approach All tests pass with xdist parallel execution without needing to filter serial tests. pytest-xdist handles workers independently and coverage merges properly. Simplified Makefile to use single -n auto command for all test runs. * fix: re-add serial marker for TestLoadBalancerSlsStubRouting The @Remote decorator modifies module-level state that isn't properly isolated between parallel workers. Adding the serial marker prevents race conditions on Python 3.12 and 3.14. pytest-xdist respects the serial marker automatically. * fix: implement proper serial test handling with two-pass execution Add pytest hook to mark serial tests with xdist_group so they run without parallelization. Use two-pass test execution: 1. Parallel: Run all non-serial tests with -n auto 2. Serial: Run serial tests without parallelization, appending coverage This ensures: - No race conditions in serial tests (file locking, @Remote decorator) - Coverage properly merged across both passes - Maintains ~4.6x speedup for non-serial tests * fix: implement proper serial test handling with two-pass execution Add pytest hook to mark serial tests with xdist_group so they run without parallelization. Use two-pass test execution: 1. Parallel: Run all non-serial tests with -n auto (--cov-fail-under=0) 2. Serial: Run serial tests without parallelization, appending coverage This ensures: - No race conditions in serial tests (file locking, @Remote decorator) - Coverage properly merged across both passes - Maintains ~4.6x speedup for non-serial tests - Both passes complete even if first has < 65% coverage * chore: consistent coverage failure point Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * chore: this is about reporting coverage (no need to fail) Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * chore: don't know why it was 64 * chore: make test commands parallel by default with serial variants - All test commands now run in parallel by default using pytest-xdist auto-detect - Serial versions available with -serial suffix for debugging - test-parallel, test-parallel-workers, test-unit-parallel removed in favor of cleaner naming - test-workers added as shorthand for specifying worker count - test-fast now includes parallel execution - Quality check commands already use parallel-by-default test-coverage * test: add comprehensive test coverage for json, init, and resource modules - Add 26 tests for json.py normalization utility (100% coverage) - Add 16 tests for init.py CLI command (91% coverage) - Add 19 tests for resource.py CLI command (85% coverage) Total: 61 new tests covering JSON serialization, project initialization, and resource status reporting. Increases project coverage from 64.72% to 66.85%. * fix: add coverage configuration for parallel test execution - Add [tool.coverage.run] with parallel mode enabled for pytest-xdist - Add [tool.coverage.report] with proper exclude patterns - Add [tool.coverage.paths] to handle different installation paths - Implement normalize_for_json utility function for JSON serialization This fixes the coverage discrepancy between parallel and serial test execution. Parallel now reports 68.43%, matching serial execution at 68.59% (within 0.16%). --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

deanq added 25 commits January 3, 2026 01:22

chore: Format code for line length and remove unused imports

db28ae0

- Wrap long lines in manifest.py, lb_handler.py, and load_balancer_sls.py - Remove unused httpx import in test_load_balancer_sls_stub.py - Apply consistent formatting across codebase

deanq requested a review from Copilot January 4, 2026 05:19

Copilot AI reviewed Jan 4, 2026

View reviewed changes

deanq added 2 commits January 3, 2026 21:24

style: Format datetime chaining for line length

0218995

deanq changed the base branch from main to deanq/ae-1251-handler-mapper January 4, 2026 05:37

deanq mentioned this pull request Jan 5, 2026

feat: Centralize RunPod HTTP authentication for all client types #133

Merged

deanq changed the base branch from deanq/ae-1251-handler-mapper to main January 5, 2026 01:28

deanq changed the base branch from main to deanq/ae-1251-handler-mapper January 5, 2026 02:35

deanq mentioned this pull request Jan 5, 2026

feat(03-05_load_balancer): Add load-balancer endpoints example runpod/flash-examples#13

Merged

deanq added 3 commits January 5, 2026 22:22

Merge branch 'deanq/ae-1251-handler-mapper' into deanq/ae-1102-load-b…

17d338a

…alancer-sls-resource

test(scanner): Fix resource type assertions to match scanner behavior

6d3ff3b

The scanner now extracts resource names from the name= parameter rather than using variable names. Update test assertions to expect the actual resource names ('test-api', 'deployed-api') instead of variable names.

deanq changed the title ~~feat: complete @remote support for LoadBalancer endpoints~~ feat: complete @remote support for LoadBalancer endpoints Jan 6, 2026

jhcipar approved these changes Jan 7, 2026

View reviewed changes

src/tetra_rp/cli/commands/build_utils/manifest.py Outdated Show resolved Hide resolved

Base automatically changed from deanq/ae-1251-handler-mapper to main January 8, 2026 01:40

Merge branch 'main' into deanq/ae-1102-load-balancer-sls-resource

707e50c

deanq changed the base branch from main to deanq/ae-1348-cross-endpoint-routing January 8, 2026 01:54

deanq changed the base branch from deanq/ae-1348-cross-endpoint-routing to main January 8, 2026 01:54

deanq merged commit f2f34c0 into main Jan 8, 2026
7 checks passed

deanq deleted the deanq/ae-1102-load-balancer-sls-resource branch January 8, 2026 19:15

runpod-release-please-bot bot mentioned this pull request Jan 8, 2026

chore: release 0.20.0 #134

Merged

This was referenced Jan 14, 2026

feat: AE-1133 Add dynamic LB URL construction #90

Closed

feat: AE-1102 Add support for new serverless runtime #85

Closed

This was referenced Feb 6, 2026

chore: release 2.0.0 #184

Closed

chore: release 2.0.0 #186

Closed

chore: release 1.1.0 #188

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: complete `@remote` support for LoadBalancer endpoints#131

feat: complete `@remote` support for LoadBalancer endpoints#131
deanq merged 46 commits intomainfrom
deanq/ae-1102-load-balancer-sls-resource

deanq commented Jan 4, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

deanq commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's New

Handler Generation

Scanner Fix

Testing

Architecture

Security

Usage

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

deanq commented Jan 4, 2026 •

edited

Loading