Skip to content

feat: complete @remote support for LoadBalancer endpoints#131

Merged
deanq merged 46 commits intomainfrom
deanq/ae-1102-load-balancer-sls-resource
Jan 8, 2026
Merged

feat: complete @remote support for LoadBalancer endpoints#131
deanq merged 46 commits intomainfrom
deanq/ae-1102-load-balancer-sls-resource

Conversation

@deanq
Copy link
Member

@deanq deanq commented Jan 4, 2026

Prerequisite: #130, #129
Related: runpod-workers/flash#45

Summary

Completes @remote decorator support for HTTP-based load-balanced endpoints with proper security boundaries between local development and production.

See docs for details:
Load_Balancer_Endpoints.md
LoadBalancer_Runtime_Architecture.md
Using_Remote_With_LoadBalancer.md

What's New

Handler Generation

  • Conditional /execute endpoint registration based on resource type
  • LiveLoadBalancer: Includes /execute for local development with function serialization
  • LoadBalancerSlsResource: Excludes /execute for security in deployed environments
  • Proper validation of HTTP routing for both resource types

Scanner Fix

  • Scanner now discovers both LiveLoadBalancer and LoadBalancerSlsResource
  • Previously only found "Serverless" in class names
  • Now checks for both "Serverless" and "LoadBalancer" patterns

Testing

  • Integration test for LiveLoadBalancer handler generation with /execute
  • Integration test for deployed endpoint handler generation without /execute
  • Scanner discovery test verifying both resource types are found

Architecture

  • LiveLoadBalancer (local): Uses /execute endpoint with function serialization
  • LoadBalancerSlsResource (deployed): Uses user-defined HTTP routes
  • Stub routing: Auto-detects resource type and routes accordingly
  • No code changes needed: Works transparently with @remote decorator

Security

  • Deployed endpoints don't expose /execute (prevents arbitrary code execution)
  • /execute only available for local development (LiveLoadBalancer)
  • User-defined routes are the interface for deployed endpoints

Usage

This enables:

  • @remote with LiveLoadBalancer for local testing
  • @remote with LoadBalancerSlsResource for deployed endpoints
  • flash build to generate handlers with correct endpoint configuration
  • Secure deployment to production

deanq added 25 commits January 3, 2026 01:22
Implement a factory function that creates RunPod serverless handlers,
eliminating code duplication across generated handler files.

The generic_handler module provides:
- create_handler(function_registry) factory that accepts a dict of
  function/class objects and returns a RunPod-compatible handler
- Automatic serialization/deserialization using cloudpickle + base64
- Support for both function execution and class instantiation + method calls
- Structured error responses with full tracebacks for debugging
- Load manifest for cross-endpoint function discovery

This design centralizes all handler logic in one place, making it easy to:
- Fix bugs once, benefit all handlers
- Add new features without regenerating projects
- Keep deployment packages small (handler files are ~23 lines each)

Implementation:
- deserialize_arguments(): Base64 + cloudpickle decoding
- serialize_result(): Cloudpickle + base64 encoding
- execute_function(): Handles function vs. class execution
- load_manifest(): Loads flash_manifest.json for service discovery
…uild process

Implement the build pipeline components that work together to generate
serverless handlers from @Remote decorated functions.

Three core components:

1. RemoteDecoratorScanner (scanner.py)
   - Uses Python AST to discover all @Remote decorated functions
   - Extracts function metadata: name, module, async status, is_class
   - Groups functions by resource_config for handler generation
   - Handles edge cases like decorated classes and async functions

2. ManifestBuilder (manifest.py)
   - Groups functions by their resource_config
   - Creates flash_manifest.json structure for service discovery
   - Maps functions to their modules and handler files
   - Enables cross-endpoint function routing at runtime

3. HandlerGenerator (handler_generator.py)
   - Creates lightweight handler_*.py files for each resource config
   - Each handler imports functions and registers them in FUNCTION_REGISTRY
   - Handler delegates to create_handler() factory from generic_handler
   - Generated handlers are ~23 lines (vs ~98 with duplication)

Build Pipeline Flow:
1. Scanner discovers @Remote functions
2. ManifestBuilder groups them by resource_config
3. HandlerGenerator creates handler_*.py for each group
4. All files + manifest bundled into archive.tar.gz

This eliminates ~95% duplication across handlers by using the factory pattern
instead of template-based generation.
Implement 19 unit tests covering all major paths through the generic_handler
factory and its helper functions.

Test Coverage:

Serialization/Deserialization (7 tests):
- serialize_result() with simple values, dicts, lists
- deserialize_arguments() with empty, args-only, kwargs-only, mixed inputs
- Round-trip encoding/decoding of cloudpickle + base64

Function Execution (4 tests):
- Simple function execution with positional and keyword arguments
- Keyword argument handling
- Class instantiation and method calls
- Argument passing to instance methods

Handler Factory (8 tests):
- create_handler() returns callable RunPod handler
- Handler with simple function registry
- Missing function error handling (returns error response, not exception)
- Function exceptions caught with traceback included
- Multiple functions in single registry
- Complex Python objects (classes, lambdas, closures)
- Empty registry edge case
- Default execution_type parameter
- None return values
- Correct RunPod response format (success, result/error, traceback)

Test Strategy:
- Arrange-Act-Assert pattern for clarity
- Isolated unit tests (no external dependencies)
- Tests verify behavior, not implementation
- Error cases tested for proper error handling
- All serialization tested for round-trip correctness

All tests passing, 83% coverage on generic_handler.py
…canning

Implement integration tests validating the build pipeline components work
correctly together.

Test Coverage:

HandlerGenerator Tests:
- Handler files created with correct names (handler_<resource_name>.py)
- Generated files import required functions from workers
- FUNCTION_REGISTRY properly formatted
- create_handler() imported from generic_handler
- Handler creation via factory
- RunPod start call present and correct
- Multiple handlers generated for multiple resource configs

ManifestBuilder Tests:
- Manifest structure with correct version and metadata
- Resources grouped by resource_config
- Handler file paths correct
- Function metadata preserved (name, module, is_async, is_class)
- Function registry mapping complete

ScannerTests:
- @Remote decorated functions discovered via AST
- Function metadata extracted correctly
- Module paths resolved properly
- Async functions detected
- Class methods detected
- Edge cases handled (multiple decorators, nested classes)

Test Strategy:
- Integration tests verify components work together
- Tests verify generated files are syntactically correct
- Tests validate data structures match expected schemas
- No external dependencies in build process

Validates that the entire build pipeline:
1. Discovers functions correctly
2. Groups them appropriately
3. Generates valid Python handler files
4. Creates correct manifest structure
Add comprehensive architecture documentation explaining why the factory
pattern was chosen and how it works.

Documentation includes:

Overview & Context:
- Problem statement: Handler files had 95% duplication
- Design decision: Use factory function instead of templates
- Benefits: Single source of truth, easier maintenance, consistency

Architecture Diagrams (MermaidJS):
- High-level flow: @Remote functions → Scanner → Manifest → Handlers → Factory
- Component relationships: HandlerGenerator, GeneratedHandler, generic_handler
- Function registry pattern: Discovery → Grouping → Registration → Factory

Implementation Details:
- create_handler(function_registry) signature and behavior
- deserialize_arguments(): Base64 + cloudpickle decoding
- serialize_result(): Cloudpickle + base64 encoding
- execute_function(): Function vs. class execution
- load_manifest(): Service discovery via flash_manifest.json

Design Decisions (with rationale):
- Factory Pattern over Inheritance: Simpler, less coupling, easier to test
- CloudPickle + Base64: Handles arbitrary objects, safe JSON transmission
- Manifest in Generic Handler: Runtime service discovery requirement
- Structured Error Responses: Debugging aid, functional error handling
- Both Execution Types: Supports stateful classes and pure functions

Usage Examples:
- Simple function handler
- Class execution with methods
- Multiple functions in one handler

Build Process Integration:
- 4-phase pipeline: Scanner → Grouping → Generation → Packaging
- Manifest structure and contents
- Generated handler structure (~23 lines)

Testing Strategy:
- 19 unit tests covering all major paths
- 7 integration tests verifying handler generation
- Manual testing with example applications

Performance:
- Zero runtime penalty (factory called once at startup)
- No additional indirection in request path
Document the flash build command and update CLI README to include it.

New Documentation:

flash-build.md includes:

Usage & Options:
- Command syntax: flash build [OPTIONS]
- --no-deps: Skip transitive dependencies (faster, smaller archives)
- --keep-build: Keep build directory for inspection/debugging
- --output, -o: Custom archive name (default: archive.tar.gz)

What It Does (5-step process):
1. Discovery: Scan for @Remote decorated functions
2. Grouping: Group functions by resource_config
3. Handler Generation: Create lightweight handler files
4. Manifest Creation: Generate flash_manifest.json
5. Packaging: Create archive.tar.gz for deployment

Build Artifacts:
- .flash/archive.tar.gz: Deployment package (ready for RunPod)
- .flash/flash_manifest.json: Service discovery configuration
- .flash/.build/: Temporary build directory

Handler Generation:
- Explains factory pattern and minimal handler files
- Links to Runtime_Generic_Handler.md for details

Dependency Management:
- Default behavior: Install all dependencies including transitive
- --no-deps: Only direct dependencies (when base image has transitive)
- Trade-offs explained

Cross-Endpoint Function Calls:
- Example showing GPU and CPU endpoints
- Manifest enables routing automatically

Output & Troubleshooting:
- Sample build output with progress indicators
- Common failure scenarios and solutions
- How to debug with --keep-build

Next Steps:
- Test locally with flash run
- Deploy to RunPod
- Monitor with flash undeploy list

Updated CLI README.md:
- Added flash build to command list in sequence
- Links to full flash-build.md documentation
Add a new section explaining how the build system works and why the
factory pattern reduces code duplication.

New Section: Build Process and Handler Generation

Explains:

How Flash Builds Your Application (5-step pipeline):
1. Discovery: Scans code for @Remote decorated functions
2. Grouping: Groups functions by resource_config
3. Handler Generation: Creates lightweight handler files
4. Manifest Creation: Generates flash_manifest.json for service discovery
5. Packaging: Bundles everything into archive.tar.gz

Handler Architecture (with code example):
- Shows generated handler using factory pattern
- Single source of truth: All handler logic in one place
- Easier maintenance: Bug fixes don't require rebuilding projects

Cross-Endpoint Function Calls:
- Example of GPU and CPU endpoints calling each other
- Manifest and runtime wrapper handle service discovery

Build Artifacts:
- .flash/.build/: Temporary build directory
- .flash/archive.tar.gz: Deployment package
- .flash/flash_manifest.json: Service configuration

Links to detailed documentation:
- docs/Runtime_Generic_Handler.md for architecture details
- src/tetra_rp/cli/docs/flash-build.md for CLI reference

This section bridges the main README and detailed documentation,
providing entry point for new users discovering the build system.
Wire up the handler generator, manifest builder, and scanner into the
actual flash build command implementation.

Changes to build.py:

1. Integration:
   - Import RemoteDecoratorScanner for function discovery
   - Import ManifestBuilder for manifest creation
   - Import HandlerGenerator for handler file creation
   - Call these in sequence during the build process

2. Build Pipeline:
   - After copying project files, scan for @Remote functions
   - Build manifest from discovered functions
   - Generate handler files for each resource config
   - Write manifest to build directory
   - Progress indicators show what's being generated

3. Fixes:
   - Change .tetra directory references to .flash
   - Uncomment actual build logic (was showing "Coming Soon" message)
   - Fix progress messages to show actual file counts

4. Error Handling:
   - Try/catch around handler generation
   - Warning shown if generation fails but build continues
   - User can debug with --keep-build flag

Build Flow Now:
1. Load ignore patterns
2. Collect project files
3. Create build directory
4. Copy files to build directory
5. [NEW] Scan for @Remote functions
6. [NEW] Build and write manifest
7. [NEW] Generate handler files
8. Install dependencies
9. Create archive
10. Clean up build directory (unless --keep-build)

Dependencies:
- Updated uv.lock with all required dependencies
…handling

**Critical Fixes:**
- Remove "Coming Soon" message blocking build command execution
- Fix build directory to use .flash/.build/ directly (no app_name subdirectory)
- Fix tarball to extract with flat structure using arcname="."
- Fix cleanup to remove correct build directory

**Error Handling & Validation:**
- Add specific exception handling (ImportError, SyntaxError, ValueError)
- Add import validation to generated handlers
- Add duplicate function name detection across resources
- Add proper error logging throughout build process

**Resource Type Tracking:**
- Add resource_type field to RemoteFunctionMetadata
- Track actual resource types (LiveServerless, CpuLiveServerless)
- Use actual types in manifest instead of hardcoding

**Robustness Improvements:**
- Add handler import validation post-generation
- Add manifest path fallback search (cwd, module dir, legacy location)
- Add resource name sanitization for safe filenames
- Add specific exception logging in scanner (UnicodeDecodeError, SyntaxError)

**User Experience:**
- Add troubleshooting section to README
- Update manifest path documentation in docs
- Change "Zero Runtime Penalty" to "Minimal Runtime Overhead"
- Mark future enhancements as "Not Yet Implemented"
- Improve build success message with next steps

Fixes all 20 issues identified in code review (issues #1-13, #19-22)
Implement LoadBalancerSlsResource class for provisioning RunPod load-balanced
serverless endpoints. Load-balanced endpoints expose HTTP servers directly to
clients without queue-based processing, enabling REST APIs, webhooks, and
real-time communication patterns.

Key features:
- Type enforcement (always LB, never QB)
- Scaler validation (REQUEST_COUNT required, not QUEUE_DELAY)
- Health check polling via /ping endpoint (200/204 = healthy)
- Post-deployment verification with configurable retries
- Async and sync health check methods
- Comprehensive unit tests
- Full documentation with architecture diagrams and examples

Architecture:
- Extends ServerlessResource with LB-specific behavior
- Validates configuration before deployment
- Polls /ping endpoint until healthy (10 retries × 5s = 50s timeout)
- Raises TimeoutError if endpoint fails to become healthy

This forms the foundation for Mothership architecture where a load-balanced
endpoint serves as a directory server for child endpoints.
Import ServerlessResource directly and use patch.object on the imported class
instead of string-based patches. This ensures the mocks properly intercept the
parent class's _do_deploy method when called via super(). Simplifies mock
configuration and removes an unused variable assertion.

Fixes the three failing deployment tests that were making real GraphQL API calls.
All tests now pass: 418 passed, 1 skipped.
…oints

Implement core infrastructure for enabling @Remote decorator on
LoadBalancerSlsResource endpoints with HTTP method/path routing.

Changes:
- Create LoadBalancerSlsStub: HTTP-based stub for direct endpoint execution
  (src/tetra_rp/stubs/load_balancer_sls.py, 170 lines)
  - Serializes functions and arguments using cloudpickle + base64
  - Direct HTTP POST to /execute endpoint (no queue polling)
  - Proper error handling and deserialization

- Register stub with singledispatch (src/tetra_rp/stubs/registry.py)
  - Enables @Remote to dispatch to LoadBalancerSlsStub for LB resources

- Extend @Remote decorator with HTTP routing parameters (src/tetra_rp/client.py)
  - Add 'method' parameter: GET, POST, PUT, DELETE, PATCH
  - Add 'path' parameter: /api/endpoint routes
  - Validate method/path required for LoadBalancerSlsResource
  - Store routing metadata on decorated functions/classes
  - Warn if routing params used with non-LB resources

Foundation for Phase 2 (Build system integration) and Phase 3 (Local dev).
Update RemoteDecoratorScanner to extract HTTP method and path from
@Remote decorator for LoadBalancerSlsResource endpoints.

Changes:
- Add http_method and http_path fields to RemoteFunctionMetadata
- Add _extract_http_routing() method to parse decorator keywords
- Extract method (GET, POST, PUT, DELETE, PATCH) from decorator
- Extract path (/api/process) from decorator
- Store routing metadata for manifest generation

Foundation for Phase 2.2 (Manifest updates) and Phase 2.3 (Handler generation).
Enhance ManifestBuilder to support HTTP method/path routing for
LoadBalancerSlsResource endpoints.

Changes:
- Add http_method and http_path fields to ManifestFunction
- Validate LB endpoints have both method and path
- Detect and prevent route conflicts (same method + path)
- Prevent use of reserved paths (/execute, /ping)
- Add 'routes' section to manifest for LB endpoints
- Conditional inclusion of routing fields (only for LB)

Manifest structure for LB endpoints now includes:
{
  "resources": {
    "api_service": {
      "resource_type": "LoadBalancerSlsResource",
      "functions": [
        {
          "name": "process_data",
          "http_method": "POST",
          "http_path": "/api/process"
        }
      ]
    }
  },
  "routes": {
    "api_service": {
      "POST /api/process": "process_data"
    }
  }
}
Implement LBHandlerGenerator to create FastAPI applications for
LoadBalancerSlsResource endpoints with HTTP method/path routing.

Key features:
- Generates FastAPI apps with explicit route registry
- Creates (method, path) -> function mappings from manifest
- Validates route conflicts and reserved paths
- Imports user functions and creates dynamic routes
- Includes required /ping health check endpoint
- Validates generated handler Python syntax via import

Generated handler structure enables:
- Direct HTTP routing to user functions via FastAPI
- Framework /execute endpoint for @Remote stub execution
- Local development with uvicorn
Create create_lb_handler() factory function that dynamically builds FastAPI
applications from route registries for LoadBalancerSlsResource endpoints.

Key features:
- Accepts route_registry: Dict[(method, path)] -> handler_function mapping
- Registers all user-defined routes from registry to FastAPI app
- Provides /execute endpoint for @Remote stub function execution
- Handles async function execution automatically
- Serializes results with cloudpickle + base64 encoding
- Comprehensive error handling with detailed logging

The /execute endpoint enables:
- Remote function code execution via @Remote decorator
- Automatic argument deserialization from cloudpickle/base64
- Result serialization for transmission back to client
- Support for both sync and async functions
Update build command to use appropriate handler generators based on
resource type. Separates LoadBalancerSlsResource endpoints (using FastAPI)
from queue-based endpoints (using generic handler).

Changes:
- Import LBHandlerGenerator alongside HandlerGenerator
- Inspect manifest resources and separate by type
- Generate LB handlers via LBHandlerGenerator
- Generate QB handlers via HandlerGenerator
- Combine all generated handler paths for summary

Enables users to mix LB and QB endpoints in same project with correct
code generation for each resource type.
Implement LiveLoadBalancer resource following the LiveServerless pattern
for local development and testing of load-balanced endpoints.

Changes:
- Add TETRA_LB_IMAGE constant for load-balanced Tetra image
- Create LiveLoadBalancer class extending LoadBalancerSlsResource
- Uses LiveServerlessMixin to lock imageName to Tetra LB image
- Register LiveLoadBalancer with LoadBalancerSlsStub in singledispatch
- Export LiveLoadBalancer from core.resources and top-level __init__

This enables users to test LB-based functions locally before deploying,
using the same pattern as LiveServerless for queue-based endpoints.

Users can now write:
  from tetra_rp import LiveLoadBalancer, remote

  api = LiveLoadBalancer(name="test-api")

  @Remote(api, method="POST", path="/api/process")
  async def process_data(x, y):
      return {"result": x + y}

  result = await process_data(5, 3)  # Local execution
Implement unit tests for LoadBalancerSlsStub covering:
- Request preparation with arguments and dependencies
- Response handling for success and error cases
- Error handling for invalid responses
- Base64 encoding/decoding of serialized data
- Endpoint URL validation
- Timeout and HTTP error handling

Test coverage:
- _prepare_request: 4 tests
- _handle_response: 5 tests
- _execute_function: 3 error case tests
- __call__: 2 integration tests

Tests verify proper function serialization, argument handling,
error propagation, and response deserialization.
Fix test_load_balancer_vs_queue_based_endpoints by updating the @Remote
decorator to use method='POST' and path='/api/echo' to match the test
assertions. This was a test-level bug where the decorator definition
didn't match what was being asserted.
…ndpoints

- Using_Remote_With_LoadBalancer.md: User guide for HTTP routing, local development, building and deploying
- LoadBalancer_Runtime_Architecture.md: Technical details on deployment, request flows, security, and performance
- Updated README.md with LoadBalancer section and code example
- Updated Load_Balancer_Endpoints.md with cross-references to new guides
Split @Remote execution behavior between local and deployed:
- LiveLoadBalancer (local): Uses /execute endpoint for function serialization
- LoadBalancerSlsResource (deployed): Uses user-defined routes with HTTP param mapping

Changes:
1. LoadBalancerSlsStub routing detection:
   - _should_use_execute_endpoint() determines execution path
   - _execute_via_user_route() maps args to JSON and POSTs to user routes
   - Auto-detects resource type and routing metadata

2. Conditional /execute registration:
   - create_lb_handler() now accepts include_execute parameter
   - Generated handlers default to include_execute=False (security)
   - LiveLoadBalancer can enable /execute if needed

3. Updated handler generator:
   - Added clarity comments on /execute exclusion for deployed endpoints

4. Comprehensive test coverage:
   - 8 new tests for routing detection and execution paths
   - All 31 tests passing (22 unit + 9 integration)

5. Documentation updates:
   - Using_Remote_With_LoadBalancer.md: clarified /execute scope
   - Added 'Local vs Deployed Execution' section explaining differences
   - LoadBalancer_Runtime_Architecture.md: updated execution model
   - Added troubleshooting for deployed endpoint scenarios

Security improvement:
- Deployed endpoints only expose user-defined routes
- /execute endpoint removed from production (prevents arbitrary code execution)
- Lower attack surface for deployed endpoints
…lude /execute endpoint

- Modified manifest.py to validate LiveLoadBalancer endpoints like LoadBalancerSlsResource
- Updated lb_handler_generator to:
  - Include LiveLoadBalancer in handler generation filter
  - Pass include_execute=True for LiveLoadBalancer (local dev)
  - Pass include_execute=False for LoadBalancerSlsResource (deployed)
- Added integration tests:
  - Verify LiveLoadBalancer handlers include /execute endpoint
  - Verify deployed handlers exclude /execute endpoint
- Fixes critical bug: LiveLoadBalancer now gets /execute endpoint in generated handlers
…ss resources

- Updated scanner to extract LiveLoadBalancer and LoadBalancerSlsResource resources
- Previously only looked for 'Serverless' in class name, missing LoadBalancer endpoints
- Now checks for both 'Serverless' and 'LoadBalancer' in resource type names
- Added integration test to verify scanner discovers both resource types
- Fixes critical bug that prevented flash build from finding LoadBalancer endpoints
- Wrap long lines in manifest.py, lb_handler.py, and load_balancer_sls.py
- Remove unused httpx import in test_load_balancer_sls_stub.py
- Apply consistent formatting across codebase
@deanq deanq requested a review from Copilot January 4, 2026 05:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR completes @Remote decorator support for LoadBalancer endpoints by implementing proper security boundaries and handler generation for both local development (LiveLoadBalancer) and production (LoadBalancerSlsResource).

Key changes:

  • Conditional /execute endpoint registration based on resource type (enabled for LiveLoadBalancer, disabled for LoadBalancerSlsResource)
  • Scanner enhancement to discover both LiveLoadBalancer and LoadBalancerSlsResource classes
  • Comprehensive test coverage for stub routing, handler generation, and scanner discovery

Reviewed changes

Copilot reviewed 33 out of 34 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/tetra_rp/client.py Added HTTP routing parameters (method, path) to @Remote decorator with validation for LoadBalancerSlsResource
src/tetra_rp/stubs/load_balancer_sls.py Implemented LoadBalancerSlsStub with dual routing (execute endpoint vs user routes) based on resource type
src/tetra_rp/stubs/registry.py Registered stubs for LoadBalancerSlsResource and LiveLoadBalancer
src/tetra_rp/runtime/lb_handler.py Created FastAPI handler factory with conditional /execute endpoint inclusion
src/tetra_rp/runtime/generic_handler.py Implemented generic handler factory for queue-based endpoints
src/tetra_rp/core/resources/load_balancer_sls_resource.py Added LoadBalancerSlsResource class with LB-specific validation and health checks
src/tetra_rp/core/resources/live_serverless.py Added LiveLoadBalancer class for local development
src/tetra_rp/cli/commands/build.py Enhanced build command with handler generation and manifest creation
src/tetra_rp/cli/commands/build_utils/scanner.py Implemented AST-based scanner to discover @Remote functions with LoadBalancer support
src/tetra_rp/cli/commands/build_utils/lb_handler_generator.py Created generator for FastAPI handlers with conditional /execute endpoint
tests/integration/test_lb_remote_execution.py Added integration tests for LiveLoadBalancer and LoadBalancerSlsResource handler generation
tests/unit/test_load_balancer_sls_stub.py Comprehensive unit tests for LoadBalancerSlsStub routing and execution
docs/*.md Added comprehensive documentation for LoadBalancer endpoints and runtime architecture

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

deanq added 2 commits January 3, 2026 21:24
- Scanner: Use exact type name matching instead of substring matching
  - Whitelist specific resource types to avoid false positives
  - Prevents matching classes like 'MyServerlessHelper' or 'LoadBalancerUtils'

- Type hints: Use Optional[str] for nullable fields in manifest
  - ManifestFunction.http_method and http_path now properly typed

- Timeout: Make HTTP client timeout configurable
  - Added LoadBalancerSlsStub.DEFAULT_TIMEOUT class attribute
  - Added timeout parameter to __init__
  - Updated both _execute_function and _execute_via_user_route to use self.timeout

- Deprecated datetime: Replace datetime.utcnow() with datetime.now(timezone.utc)
  - Updated manifest.py and test_lb_remote_execution.py
  - Ensures Python 3.12+ compatibility
@deanq deanq changed the base branch from main to deanq/ae-1251-handler-mapper January 4, 2026 05:37
Addresses three feedback items from code review:

1. Fix breaking parameter order change in update_system_dependencies()
   - Restored original parameter order: template_id, token, system_dependencies, base_entry_cmd
   - Maintains backward compatibility with existing callers
   - Token parameter now optional (default None)

2. Add proper deprecation warning for token parameter
   - Issues DeprecationWarning when token parameter is used
   - Clearly communicates migration to RUNPOD_API_KEY environment variable
   - Follows Python deprecation best practices (warnings.warn with stacklevel=2)

3. Standardize test mocking approach across all health check tests
   - All tests now use consistent 'tetra_rp.core.utils.http.httpx.AsyncClient' patching
   - Removed inconsistent 'side_effect=lambda' pattern
   - Improved test maintainability by using same strategy everywhere

All 503 tests pass with consistent, clean implementation.
@deanq deanq changed the base branch from deanq/ae-1251-handler-mapper to main January 5, 2026 01:28
@deanq deanq changed the base branch from main to deanq/ae-1251-handler-mapper January 5, 2026 02:35
- Fix health check timeout: Add clarification that timeout is 15 seconds per check
- Add HTTP authentication details explaining RUNPOD_API_KEY header injection
- Document stub decision logic for incomplete routing metadata (fallback behavior)
- Clarify function signature inspection with concrete example showing parameter mapping
- Expand /execute security explanation with explicit threats and best practices
- Add detailed parameter type constraints for deployed endpoints (supported vs unsupported)
- Add troubleshooting guide for missing routing metadata (404 errors)
- Strengthen security warnings about never exposing /execute in production

All documentation now matches actual implementation verified through codebase analysis.
deanq added 3 commits January 5, 2026 22:22
…improvements

- Add 6 new scanner tests for directory filtering (.venv, .flash, .runpod exclusion)
- Add test for resource type validation to prevent false positives
- Add test for fallback behavior when resource name extraction fails
- Add test for handling resource names with special characters
- Update existing tests to reflect new dynamic import format and resource name extraction

These tests guarantee that improvements to the scanner (resource type validation,
directory filtering, fallback behavior) and handler generator (dynamic imports for
invalid Python identifiers) won't regress in future changes.
The scanner now extracts resource names from the name= parameter rather
than using variable names. Update test assertions to expect the actual
resource names ('test-api', 'deployed-api') instead of variable names.
@deanq deanq changed the title feat: complete @remote support for LoadBalancer endpoints feat: complete @remote support for LoadBalancer endpoints Jan 6, 2026
Base automatically changed from deanq/ae-1251-handler-mapper to main January 8, 2026 01:40
@deanq deanq changed the base branch from main to deanq/ae-1348-cross-endpoint-routing January 8, 2026 01:54
@deanq deanq changed the base branch from deanq/ae-1348-cross-endpoint-routing to main January 8, 2026 01:54
- Fix timeout parameter type hint to Optional[float] in LoadBalancerSlsStub
- Update timeout error messages to use actual timeout value instead of hardcoded "30s"
- Extract reserved paths ["/execute", "/ping"] to RESERVED_PATHS constant in manifest builder
- Improve error message to dynamically list reserved paths
@deanq deanq merged commit f2f34c0 into main Jan 8, 2026
7 checks passed
@deanq deanq deleted the deanq/ae-1102-load-balancer-sls-resource branch January 8, 2026 19:15
deanq added a commit that referenced this pull request Jan 12, 2026
…positives (#132)

* feat(runtime): Add generic handler factory for serverless execution

Implement a factory function that creates RunPod serverless handlers,
eliminating code duplication across generated handler files.

The generic_handler module provides:
- create_handler(function_registry) factory that accepts a dict of
  function/class objects and returns a RunPod-compatible handler
- Automatic serialization/deserialization using cloudpickle + base64
- Support for both function execution and class instantiation + method calls
- Structured error responses with full tracebacks for debugging
- Load manifest for cross-endpoint function discovery

This design centralizes all handler logic in one place, making it easy to:
- Fix bugs once, benefit all handlers
- Add new features without regenerating projects
- Keep deployment packages small (handler files are ~23 lines each)

Implementation:
- deserialize_arguments(): Base64 + cloudpickle decoding
- serialize_result(): Cloudpickle + base64 encoding
- execute_function(): Handles function vs. class execution
- load_manifest(): Loads flash_manifest.json for service discovery

* feat(cli): Add handler generator, manifest builder, and scanner for build process

Implement the build pipeline components that work together to generate
serverless handlers from @Remote decorated functions.

Three core components:

1. RemoteDecoratorScanner (scanner.py)
   - Uses Python AST to discover all @Remote decorated functions
   - Extracts function metadata: name, module, async status, is_class
   - Groups functions by resource_config for handler generation
   - Handles edge cases like decorated classes and async functions

2. ManifestBuilder (manifest.py)
   - Groups functions by their resource_config
   - Creates flash_manifest.json structure for service discovery
   - Maps functions to their modules and handler files
   - Enables cross-endpoint function routing at runtime

3. HandlerGenerator (handler_generator.py)
   - Creates lightweight handler_*.py files for each resource config
   - Each handler imports functions and registers them in FUNCTION_REGISTRY
   - Handler delegates to create_handler() factory from generic_handler
   - Generated handlers are ~23 lines (vs ~98 with duplication)

Build Pipeline Flow:
1. Scanner discovers @Remote functions
2. ManifestBuilder groups them by resource_config
3. HandlerGenerator creates handler_*.py for each group
4. All files + manifest bundled into archive.tar.gz

This eliminates ~95% duplication across handlers by using the factory pattern
instead of template-based generation.

* test(runtime): Add comprehensive tests for generic handler

Implement 19 unit tests covering all major paths through the generic_handler
factory and its helper functions.

Test Coverage:

Serialization/Deserialization (7 tests):
- serialize_result() with simple values, dicts, lists
- deserialize_arguments() with empty, args-only, kwargs-only, mixed inputs
- Round-trip encoding/decoding of cloudpickle + base64

Function Execution (4 tests):
- Simple function execution with positional and keyword arguments
- Keyword argument handling
- Class instantiation and method calls
- Argument passing to instance methods

Handler Factory (8 tests):
- create_handler() returns callable RunPod handler
- Handler with simple function registry
- Missing function error handling (returns error response, not exception)
- Function exceptions caught with traceback included
- Multiple functions in single registry
- Complex Python objects (classes, lambdas, closures)
- Empty registry edge case
- Default execution_type parameter
- None return values
- Correct RunPod response format (success, result/error, traceback)

Test Strategy:
- Arrange-Act-Assert pattern for clarity
- Isolated unit tests (no external dependencies)
- Tests verify behavior, not implementation
- Error cases tested for proper error handling
- All serialization tested for round-trip correctness

All tests passing, 83% coverage on generic_handler.py

* test(cli): Add tests for handler generation, manifest building, and scanning

Implement integration tests validating the build pipeline components work
correctly together.

Test Coverage:

HandlerGenerator Tests:
- Handler files created with correct names (handler_<resource_name>.py)
- Generated files import required functions from workers
- FUNCTION_REGISTRY properly formatted
- create_handler() imported from generic_handler
- Handler creation via factory
- RunPod start call present and correct
- Multiple handlers generated for multiple resource configs

ManifestBuilder Tests:
- Manifest structure with correct version and metadata
- Resources grouped by resource_config
- Handler file paths correct
- Function metadata preserved (name, module, is_async, is_class)
- Function registry mapping complete

ScannerTests:
- @Remote decorated functions discovered via AST
- Function metadata extracted correctly
- Module paths resolved properly
- Async functions detected
- Class methods detected
- Edge cases handled (multiple decorators, nested classes)

Test Strategy:
- Integration tests verify components work together
- Tests verify generated files are syntactically correct
- Tests validate data structures match expected schemas
- No external dependencies in build process

Validates that the entire build pipeline:
1. Discovers functions correctly
2. Groups them appropriately
3. Generates valid Python handler files
4. Creates correct manifest structure

* docs(runtime): Document generic handler factory architecture

Add comprehensive architecture documentation explaining why the factory
pattern was chosen and how it works.

Documentation includes:

Overview & Context:
- Problem statement: Handler files had 95% duplication
- Design decision: Use factory function instead of templates
- Benefits: Single source of truth, easier maintenance, consistency

Architecture Diagrams (MermaidJS):
- High-level flow: @Remote functions → Scanner → Manifest → Handlers → Factory
- Component relationships: HandlerGenerator, GeneratedHandler, generic_handler
- Function registry pattern: Discovery → Grouping → Registration → Factory

Implementation Details:
- create_handler(function_registry) signature and behavior
- deserialize_arguments(): Base64 + cloudpickle decoding
- serialize_result(): Cloudpickle + base64 encoding
- execute_function(): Function vs. class execution
- load_manifest(): Service discovery via flash_manifest.json

Design Decisions (with rationale):
- Factory Pattern over Inheritance: Simpler, less coupling, easier to test
- CloudPickle + Base64: Handles arbitrary objects, safe JSON transmission
- Manifest in Generic Handler: Runtime service discovery requirement
- Structured Error Responses: Debugging aid, functional error handling
- Both Execution Types: Supports stateful classes and pure functions

Usage Examples:
- Simple function handler
- Class execution with methods
- Multiple functions in one handler

Build Process Integration:
- 4-phase pipeline: Scanner → Grouping → Generation → Packaging
- Manifest structure and contents
- Generated handler structure (~23 lines)

Testing Strategy:
- 19 unit tests covering all major paths
- 7 integration tests verifying handler generation
- Manual testing with example applications

Performance:
- Zero runtime penalty (factory called once at startup)
- No additional indirection in request path

* docs(cli): Add flash build command documentation

Document the flash build command and update CLI README to include it.

New Documentation:

flash-build.md includes:

Usage & Options:
- Command syntax: flash build [OPTIONS]
- --no-deps: Skip transitive dependencies (faster, smaller archives)
- --keep-build: Keep build directory for inspection/debugging
- --output, -o: Custom archive name (default: archive.tar.gz)

What It Does (5-step process):
1. Discovery: Scan for @Remote decorated functions
2. Grouping: Group functions by resource_config
3. Handler Generation: Create lightweight handler files
4. Manifest Creation: Generate flash_manifest.json
5. Packaging: Create archive.tar.gz for deployment

Build Artifacts:
- .flash/archive.tar.gz: Deployment package (ready for RunPod)
- .flash/flash_manifest.json: Service discovery configuration
- .flash/.build/: Temporary build directory

Handler Generation:
- Explains factory pattern and minimal handler files
- Links to Runtime_Generic_Handler.md for details

Dependency Management:
- Default behavior: Install all dependencies including transitive
- --no-deps: Only direct dependencies (when base image has transitive)
- Trade-offs explained

Cross-Endpoint Function Calls:
- Example showing GPU and CPU endpoints
- Manifest enables routing automatically

Output & Troubleshooting:
- Sample build output with progress indicators
- Common failure scenarios and solutions
- How to debug with --keep-build

Next Steps:
- Test locally with flash run
- Deploy to RunPod
- Monitor with flash undeploy list

Updated CLI README.md:
- Added flash build to command list in sequence
- Links to full flash-build.md documentation

* docs: Add build process and handler generation section to README

Add a new section explaining how the build system works and why the
factory pattern reduces code duplication.

New Section: Build Process and Handler Generation

Explains:

How Flash Builds Your Application (5-step pipeline):
1. Discovery: Scans code for @Remote decorated functions
2. Grouping: Groups functions by resource_config
3. Handler Generation: Creates lightweight handler files
4. Manifest Creation: Generates flash_manifest.json for service discovery
5. Packaging: Bundles everything into archive.tar.gz

Handler Architecture (with code example):
- Shows generated handler using factory pattern
- Single source of truth: All handler logic in one place
- Easier maintenance: Bug fixes don't require rebuilding projects

Cross-Endpoint Function Calls:
- Example of GPU and CPU endpoints calling each other
- Manifest and runtime wrapper handle service discovery

Build Artifacts:
- .flash/.build/: Temporary build directory
- .flash/archive.tar.gz: Deployment package
- .flash/flash_manifest.json: Service configuration

Links to detailed documentation:
- docs/Runtime_Generic_Handler.md for architecture details
- src/tetra_rp/cli/docs/flash-build.md for CLI reference

This section bridges the main README and detailed documentation,
providing entry point for new users discovering the build system.

* feat(cli): Integrate build utilities into flash build command

Wire up the handler generator, manifest builder, and scanner into the
actual flash build command implementation.

Changes to build.py:

1. Integration:
   - Import RemoteDecoratorScanner for function discovery
   - Import ManifestBuilder for manifest creation
   - Import HandlerGenerator for handler file creation
   - Call these in sequence during the build process

2. Build Pipeline:
   - After copying project files, scan for @Remote functions
   - Build manifest from discovered functions
   - Generate handler files for each resource config
   - Write manifest to build directory
   - Progress indicators show what's being generated

3. Fixes:
   - Change .tetra directory references to .flash
   - Uncomment actual build logic (was showing "Coming Soon" message)
   - Fix progress messages to show actual file counts

4. Error Handling:
   - Try/catch around handler generation
   - Warning shown if generation fails but build continues
   - User can debug with --keep-build flag

Build Flow Now:
1. Load ignore patterns
2. Collect project files
3. Create build directory
4. Copy files to build directory
5. [NEW] Scan for @Remote functions
6. [NEW] Build and write manifest
7. [NEW] Generate handler files
8. Install dependencies
9. Create archive
10. Clean up build directory (unless --keep-build)

Dependencies:
- Updated uv.lock with all required dependencies

* refactor(build): Fix directory structure and add comprehensive error handling

**Critical Fixes:**
- Remove "Coming Soon" message blocking build command execution
- Fix build directory to use .flash/.build/ directly (no app_name subdirectory)
- Fix tarball to extract with flat structure using arcname="."
- Fix cleanup to remove correct build directory

**Error Handling & Validation:**
- Add specific exception handling (ImportError, SyntaxError, ValueError)
- Add import validation to generated handlers
- Add duplicate function name detection across resources
- Add proper error logging throughout build process

**Resource Type Tracking:**
- Add resource_type field to RemoteFunctionMetadata
- Track actual resource types (LiveServerless, CpuLiveServerless)
- Use actual types in manifest instead of hardcoding

**Robustness Improvements:**
- Add handler import validation post-generation
- Add manifest path fallback search (cwd, module dir, legacy location)
- Add resource name sanitization for safe filenames
- Add specific exception logging in scanner (UnicodeDecodeError, SyntaxError)

**User Experience:**
- Add troubleshooting section to README
- Update manifest path documentation in docs
- Change "Zero Runtime Penalty" to "Minimal Runtime Overhead"
- Mark future enhancements as "Not Yet Implemented"
- Improve build success message with next steps

Fixes all 20 issues identified in code review (issues #1-13, #19-22)

* feat(resources): Add LoadBalancerSlsResource for LB endpoints

Implement LoadBalancerSlsResource class for provisioning RunPod load-balanced
serverless endpoints. Load-balanced endpoints expose HTTP servers directly to
clients without queue-based processing, enabling REST APIs, webhooks, and
real-time communication patterns.

Key features:
- Type enforcement (always LB, never QB)
- Scaler validation (REQUEST_COUNT required, not QUEUE_DELAY)
- Health check polling via /ping endpoint (200/204 = healthy)
- Post-deployment verification with configurable retries
- Async and sync health check methods
- Comprehensive unit tests
- Full documentation with architecture diagrams and examples

Architecture:
- Extends ServerlessResource with LB-specific behavior
- Validates configuration before deployment
- Polls /ping endpoint until healthy (10 retries × 5s = 50s timeout)
- Raises TimeoutError if endpoint fails to become healthy

This forms the foundation for Mothership architecture where a load-balanced
endpoint serves as a directory server for child endpoints.

* fix(test): Fix LoadBalancerSlsResource deployment test mocks

Import ServerlessResource directly and use patch.object on the imported class
instead of string-based patches. This ensures the mocks properly intercept the
parent class's _do_deploy method when called via super(). Simplifies mock
configuration and removes an unused variable assertion.

Fixes the three failing deployment tests that were making real GraphQL API calls.
All tests now pass: 418 passed, 1 skipped.

* feat(resources): Phase 1 - Core infrastructure for @Remote on LB endpoints

Implement core infrastructure for enabling @Remote decorator on
LoadBalancerSlsResource endpoints with HTTP method/path routing.

Changes:
- Create LoadBalancerSlsStub: HTTP-based stub for direct endpoint execution
  (src/tetra_rp/stubs/load_balancer_sls.py, 170 lines)
  - Serializes functions and arguments using cloudpickle + base64
  - Direct HTTP POST to /execute endpoint (no queue polling)
  - Proper error handling and deserialization

- Register stub with singledispatch (src/tetra_rp/stubs/registry.py)
  - Enables @Remote to dispatch to LoadBalancerSlsStub for LB resources

- Extend @Remote decorator with HTTP routing parameters (src/tetra_rp/client.py)
  - Add 'method' parameter: GET, POST, PUT, DELETE, PATCH
  - Add 'path' parameter: /api/endpoint routes
  - Validate method/path required for LoadBalancerSlsResource
  - Store routing metadata on decorated functions/classes
  - Warn if routing params used with non-LB resources

Foundation for Phase 2 (Build system integration) and Phase 3 (Local dev).

* feat(build): Phase 2.1 - Enhanced scanner for HTTP routing extraction

Update RemoteDecoratorScanner to extract HTTP method and path from
@Remote decorator for LoadBalancerSlsResource endpoints.

Changes:
- Add http_method and http_path fields to RemoteFunctionMetadata
- Add _extract_http_routing() method to parse decorator keywords
- Extract method (GET, POST, PUT, DELETE, PATCH) from decorator
- Extract path (/api/process) from decorator
- Store routing metadata for manifest generation

Foundation for Phase 2.2 (Manifest updates) and Phase 2.3 (Handler generation).

* feat(build): Phase 2.2 - Updated manifest schema for HTTP routing

Enhance ManifestBuilder to support HTTP method/path routing for
LoadBalancerSlsResource endpoints.

Changes:
- Add http_method and http_path fields to ManifestFunction
- Validate LB endpoints have both method and path
- Detect and prevent route conflicts (same method + path)
- Prevent use of reserved paths (/execute, /ping)
- Add 'routes' section to manifest for LB endpoints
- Conditional inclusion of routing fields (only for LB)

Manifest structure for LB endpoints now includes:
{
  "resources": {
    "api_service": {
      "resource_type": "LoadBalancerSlsResource",
      "functions": [
        {
          "name": "process_data",
          "http_method": "POST",
          "http_path": "/api/process"
        }
      ]
    }
  },
  "routes": {
    "api_service": {
      "POST /api/process": "process_data"
    }
  }
}

* feat(cli): Add LB handler generator for FastAPI app creation

Implement LBHandlerGenerator to create FastAPI applications for
LoadBalancerSlsResource endpoints with HTTP method/path routing.

Key features:
- Generates FastAPI apps with explicit route registry
- Creates (method, path) -> function mappings from manifest
- Validates route conflicts and reserved paths
- Imports user functions and creates dynamic routes
- Includes required /ping health check endpoint
- Validates generated handler Python syntax via import

Generated handler structure enables:
- Direct HTTP routing to user functions via FastAPI
- Framework /execute endpoint for @Remote stub execution
- Local development with uvicorn

* feat(runtime): Implement LB handler factory for FastAPI app creation

Create create_lb_handler() factory function that dynamically builds FastAPI
applications from route registries for LoadBalancerSlsResource endpoints.

Key features:
- Accepts route_registry: Dict[(method, path)] -> handler_function mapping
- Registers all user-defined routes from registry to FastAPI app
- Provides /execute endpoint for @Remote stub function execution
- Handles async function execution automatically
- Serializes results with cloudpickle + base64 encoding
- Comprehensive error handling with detailed logging

The /execute endpoint enables:
- Remote function code execution via @Remote decorator
- Automatic argument deserialization from cloudpickle/base64
- Result serialization for transmission back to client
- Support for both sync and async functions

* feat(cli): Route build command to separate handlers for LB endpoints

Update build command to use appropriate handler generators based on
resource type. Separates LoadBalancerSlsResource endpoints (using FastAPI)
from queue-based endpoints (using generic handler).

Changes:
- Import LBHandlerGenerator alongside HandlerGenerator
- Inspect manifest resources and separate by type
- Generate LB handlers via LBHandlerGenerator
- Generate QB handlers via HandlerGenerator
- Combine all generated handler paths for summary

Enables users to mix LB and QB endpoints in same project with correct
code generation for each resource type.

* feat(resources): Add LiveLoadBalancer for local LB endpoint testing

Implement LiveLoadBalancer resource following the LiveServerless pattern
for local development and testing of load-balanced endpoints.

Changes:
- Add TETRA_LB_IMAGE constant for load-balanced Tetra image
- Create LiveLoadBalancer class extending LoadBalancerSlsResource
- Uses LiveServerlessMixin to lock imageName to Tetra LB image
- Register LiveLoadBalancer with LoadBalancerSlsStub in singledispatch
- Export LiveLoadBalancer from core.resources and top-level __init__

This enables users to test LB-based functions locally before deploying,
using the same pattern as LiveServerless for queue-based endpoints.

Users can now write:
  from tetra_rp import LiveLoadBalancer, remote

  api = LiveLoadBalancer(name="test-api")

  @Remote(api, method="POST", path="/api/process")
  async def process_data(x, y):
      return {"result": x + y}

  result = await process_data(5, 3)  # Local execution

* test(stubs): Add comprehensive unit tests for LoadBalancerSlsStub

Implement unit tests for LoadBalancerSlsStub covering:
- Request preparation with arguments and dependencies
- Response handling for success and error cases
- Error handling for invalid responses
- Base64 encoding/decoding of serialized data
- Endpoint URL validation
- Timeout and HTTP error handling

Test coverage:
- _prepare_request: 4 tests
- _handle_response: 5 tests
- _execute_function: 3 error case tests
- __call__: 2 integration tests

Tests verify proper function serialization, argument handling,
error propagation, and response deserialization.

* fix(test): Correct LB endpoint test decorator to match assertions

Fix test_load_balancer_vs_queue_based_endpoints by updating the @Remote
decorator to use method='POST' and path='/api/echo' to match the test
assertions. This was a test-level bug where the decorator definition
didn't match what was being asserted.

* docs: Add comprehensive documentation for @Remote with LoadBalancer endpoints

- Using_Remote_With_LoadBalancer.md: User guide for HTTP routing, local development, building and deploying
- LoadBalancer_Runtime_Architecture.md: Technical details on deployment, request flows, security, and performance
- Updated README.md with LoadBalancer section and code example
- Updated Load_Balancer_Endpoints.md with cross-references to new guides

* security: Remove /execute from deployed LoadBalancer endpoints

Split @Remote execution behavior between local and deployed:
- LiveLoadBalancer (local): Uses /execute endpoint for function serialization
- LoadBalancerSlsResource (deployed): Uses user-defined routes with HTTP param mapping

Changes:
1. LoadBalancerSlsStub routing detection:
   - _should_use_execute_endpoint() determines execution path
   - _execute_via_user_route() maps args to JSON and POSTs to user routes
   - Auto-detects resource type and routing metadata

2. Conditional /execute registration:
   - create_lb_handler() now accepts include_execute parameter
   - Generated handlers default to include_execute=False (security)
   - LiveLoadBalancer can enable /execute if needed

3. Updated handler generator:
   - Added clarity comments on /execute exclusion for deployed endpoints

4. Comprehensive test coverage:
   - 8 new tests for routing detection and execution paths
   - All 31 tests passing (22 unit + 9 integration)

5. Documentation updates:
   - Using_Remote_With_LoadBalancer.md: clarified /execute scope
   - Added 'Local vs Deployed Execution' section explaining differences
   - LoadBalancer_Runtime_Architecture.md: updated execution model
   - Added troubleshooting for deployed endpoint scenarios

Security improvement:
- Deployed endpoints only expose user-defined routes
- /execute endpoint removed from production (prevents arbitrary code execution)
- Lower attack surface for deployed endpoints

* feat(build): Phase 4 - Fix LiveLoadBalancer handler generation to include /execute endpoint

- Modified manifest.py to validate LiveLoadBalancer endpoints like LoadBalancerSlsResource
- Updated lb_handler_generator to:
  - Include LiveLoadBalancer in handler generation filter
  - Pass include_execute=True for LiveLoadBalancer (local dev)
  - Pass include_execute=False for LoadBalancerSlsResource (deployed)
- Added integration tests:
  - Verify LiveLoadBalancer handlers include /execute endpoint
  - Verify deployed handlers exclude /execute endpoint
- Fixes critical bug: LiveLoadBalancer now gets /execute endpoint in generated handlers

* fix(scanner): Discover LoadBalancer resources in addition to Serverless resources

- Updated scanner to extract LiveLoadBalancer and LoadBalancerSlsResource resources
- Previously only looked for 'Serverless' in class name, missing LoadBalancer endpoints
- Now checks for both 'Serverless' and 'LoadBalancer' in resource type names
- Added integration test to verify scanner discovers both resource types
- Fixes critical bug that prevented flash build from finding LoadBalancer endpoints

* chore: Format code for line length and remove unused imports

- Wrap long lines in manifest.py, lb_handler.py, and load_balancer_sls.py
- Remove unused httpx import in test_load_balancer_sls_stub.py
- Apply consistent formatting across codebase

* fix: Address PR #131 review feedback

- Scanner: Use exact type name matching instead of substring matching
  - Whitelist specific resource types to avoid false positives
  - Prevents matching classes like 'MyServerlessHelper' or 'LoadBalancerUtils'

- Type hints: Use Optional[str] for nullable fields in manifest
  - ManifestFunction.http_method and http_path now properly typed

- Timeout: Make HTTP client timeout configurable
  - Added LoadBalancerSlsStub.DEFAULT_TIMEOUT class attribute
  - Added timeout parameter to __init__
  - Updated both _execute_function and _execute_via_user_route to use self.timeout

- Deprecated datetime: Replace datetime.utcnow() with datetime.now(timezone.utc)
  - Updated manifest.py and test_lb_remote_execution.py
  - Ensures Python 3.12+ compatibility

* style: Format datetime chaining for line length

* fix: LiveLoadBalancer template not serialized to RunPod GraphQL

The set_serverless_template model_validator was being overwritten by sync_input_fields
(both had mode="after"). In Pydantic v2, when two validators with the same mode are
defined in a class, only one is registered.

This caused templates to never be created from imageName, resulting in:
  "GraphQL errors: One of templateId, template is required to create an endpoint"

Solution:
- Move set_serverless_template validator from ServerlessResource base class to subclasses
  (ServerlessEndpoint and LoadBalancerSlsResource) where the validation is actually needed
- Keep helper methods (_create_new_template, _configure_existing_template) in base class
  for reuse
- Add comprehensive tests for LiveLoadBalancer template serialization

This allows:
1. Base ServerlessResource to be instantiated freely for testing/configuration
2. Subclasses (ServerlessEndpoint, LoadBalancerSlsResource) to enforce template
   requirements during deployment
3. Proper template serialization in GraphQL payload for RunPod API

Fixes: One of templateId, template is required to create an endpoint error when
deploying LiveLoadBalancer with custom image tags like runpod/tetra-rp-lb:local

* fix: LoadBalancer endpoint URL and add CPU support

- Fix: Use correct endpoint URL format for load-balanced endpoints
  (https://{id}.api.runpod.ai instead of https://api.runpod.ai/v2/{id})
  This fixes 404 errors on /ping health check endpoints

- Feature: Add CPU LoadBalancer support
  * Create CpuLoadBalancerSlsResource for CPU-based load-balanced endpoints
  * Create CpuLiveLoadBalancer for local CPU LB development
  * Add TETRA_CPU_LB_IMAGE constant for CPU LB Docker image
  * Update example code to use CpuLiveLoadBalancer for CPU worker
  * Add 8 comprehensive tests for CPU LoadBalancer functionality

- Tests: Add 2 tests for endpoint URL format validation
- All 474 tests passing, 64% code coverage

* fix: Export CpuLiveLoadBalancer and CpuLoadBalancerSlsResource from tetra_rp package

LoadBalancer resources were not being discovered by ResourceDiscovery because
the new CPU variants (CpuLiveLoadBalancer, CpuLoadBalancerSlsResource) were
not exported from the main tetra_rp package. This prevented undeploy from
picking up these resources.

Added exports to:
- TYPE_CHECKING imports for type hints
- __getattr__ function for lazy loading
- __all__ list for public API

This fixes the issue where 'flash undeploy list' could not find LoadBalancer
resources that were deployed with 'flash run --auto-provision'.

* fix: Add API key authentication to LoadBalancer health check

The /ping endpoint for RunPod load-balanced endpoints requires the
RUNPOD_API_KEY header for authentication. Without it, the health check
fails with 401 Unauthorized, causing provisioning to timeout.

This fix adds the Authorization header to the health check request if
the RUNPOD_API_KEY environment variable is available, allowing the
endpoint health check to succeed during provisioning.

Fixes issue where 'flash run --auto-provision' would fail even though
the endpoint was successfully created on RunPod.

* fix(lb): Exclude flashboot from CpuLoadBalancerSlsResource GraphQL payload

CpuLoadBalancerSlsResource was overriding _input_only without including flashboot,
causing it to be sent to the RunPod GraphQL API which doesn't accept this field.
This caused deployment to fail with: Field "flashboot" is not defined by type "EndpointInput".

* fix(lb): Expand CpuInstanceType.ANY to all CPU flavors in CpuLoadBalancerSlsResource

Add field_validator to expand [CpuInstanceType.ANY] to all available CPU instance
types (cpu3g, cpu3c, cpu5c variants). This matches the behavior in CpuServerlessEndpoint
and prevents deployment errors like 'instanceId must be in the format of flavorId-vcpu-ram'.

* refactor(cpu): Move instanceIds validator to CpuEndpointMixin

Move the instanceIds field_validator from CpuServerlessEndpoint to CpuEndpointMixin
so both CpuServerlessEndpoint and CpuLoadBalancerSlsResource share the same validator
that expands [CpuInstanceType.ANY] to all available CPU flavors. This eliminates
code duplication and ensures consistent behavior across all CPU endpoint types.

* test: Update CPU instance test to reflect validator expansion

Update test_cpu_live_load_balancer_defaults_to_cpu_any to verify that
[CpuInstanceType.ANY] is correctly expanded to all available CPU instance types
by the field_validator in CpuEndpointMixin.

* fix(lb): Increase health check timeout from 5s to 15s

Load-balanced workers need more time to respond during cold starts and initialization.
RunPod docs recommend at least 10-15 second timeouts for health checks. Workers may
return 204 during initialization, which is normal and expected.

* fix(lb): Fix CPU load balancer template deployment error

Fixes two bugs in CpuLoadBalancerSlsResource that prevented CPU load balancers from deploying:

1. Added gpuCount and allowedCudaVersions to _input_only exclusion set to prevent
   GPU-specific fields from being sent to RunPod API
2. Overrode set_serverless_template() to call _sync_cpu_fields() first, ensuring
   GPU defaults are overridden to CPU-appropriate values (gpuCount=0)

The RunPod API was rejecting CPU load balancer templates because GPU-specific
fields were being included in the GraphQL payload. These changes align
CpuLoadBalancerSlsResource behavior with CpuServerlessEndpoint.

Also added comprehensive test coverage (30+ tests) to verify:
- GPU fields are correctly overridden to CPU defaults
- GPU fields are excluded from API payloads
- CPU-specific fields are properly included
- Consistency with CpuServerlessEndpoint behavior

* fix(drift): Exclude runtime fields from config hash to prevent false positives

Fixes false positive configuration drift detection by separating concerns:

1. Update ServerlessResource.config_hash to exclude runtime fields
   - Fields like template, templateId, aiKey, userId are API-assigned
   - Prevents false drift when same config is redeployed across processes
   - Now only hashes user-specified configuration

2. Add config_hash override to CpuLoadBalancerSlsResource
   - CPU load balancers hash only CPU-relevant fields
   - Excludes GPU-specific fields and runtime fields
   - Follows same pattern as CpuServerlessEndpoint

3. Fix _has_structural_changes to exclude template/templateId
   - CRITICAL: These runtime fields were causing false structural changes
   - Was forcing unnecessary redeployments despite update() being available
   - Now system correctly uses update() instead of undeploy+deploy

4. Make field serializers robust to handle string/enum values
   - Prevents serialization errors when fields are pre-converted to strings

5. Add comprehensive drift detection tests (16 tests)
   - Test hash stability with runtime field changes
   - Test exclusion of env, template, templateId, and other runtime fields
   - Test that actual config changes (image, flashboot) are detected
   - Test structural change detection behavior
   - Test real-world deployment scenarios

Results:
- Same config deployed multiple times: no false drift
- Different env vars with same config: no false drift
- Template/templateId changes: no false drift
- API-assigned fields: no false drift
- User config changes (image, flashboot): drift detected correctly
- All 512 unit tests pass

* fix(http): Standardize RunPod HTTP client authentication across codebase

Centralizes HTTP client creation for RunPod load-balanced endpoints to prevent
manual Authorization header code duplication and ensure consistent authentication:

1. Create centralized HTTP utility function (src/tetra_rp/core/utils/http.py)
   - New function: get_authenticated_httpx_client()
   - Automatically adds Bearer token Authorization header if RUNPOD_API_KEY set
   - Provides consistent timeout handling (default 30s, customizable)
   - Follows existing GraphQL/REST client authentication pattern

2. Fix critical authentication bug in LoadBalancerSlsStub._execute_via_user_route()
   - Previously: Missing Authorization header (401 errors on user routes)
   - Now: Uses centralized utility for proper authentication
   - Enables direct HTTP calls to user-defined routes with auth

3. Refactor two methods to use centralized utility
   - LoadBalancerSlsStub._execute_function() - removes 7+ lines of manual auth code
   - LoadBalancerSlsResource._check_ping_endpoint() - simplifies auth setup

4. Add comprehensive unit tests (tests/unit/core/utils/test_http.py)
   - Tests API key presence/absence handling
   - Tests custom and default timeout configuration
   - Tests edge cases (empty key, zero timeout)
   - All 7 tests pass with 100% coverage

Results:
- Single source of truth for HTTP authentication (centralized utility)
- Fixes 401 Unauthorized errors on load-balanced endpoints
- Eliminates repetitive manual auth code across 3+ locations
- Easier to maintain and update authentication patterns in future
- All 499 unit tests pass
- Code coverage: 64% (exceeds 35% requirement)

* feat(http): Extend HTTP utilities to cover both sync and async authentication

Extends the centralized HTTP authentication pattern to all RunPod API calls:

1. Add get_authenticated_requests_session() for synchronous requests
   - Creates requests.Session with automatic Bearer token Authorization header
   - Follows same pattern as async get_authenticated_httpx_client()
   - Single source of truth for sync HTTP authentication

2. Refactor template.py to use centralized utility
   - Removes manual Authorization header setup (line 86)
   - Now uses get_authenticated_requests_session() for all template updates
   - Improves error handling with raise_for_status()
   - Token parameter marked deprecated; uses RUNPOD_API_KEY env var

3. Add comprehensive tests for sync utility (4 tests)
   - Tests API key presence/absence handling
   - Tests empty API key edge case
   - Tests Session object validation
   - All tests pass with proper cleanup

Benefits:
- True single source of truth for all RunPod HTTP authentication (sync + async)
- Consistent patterns across entire codebase
- Easier future auth changes across all HTTP client types
- Eliminates manual auth header code in template.py
- All 503 unit tests pass
- Code coverage: 64% (exceeds 35% requirement)

Note: requests.Session doesn't support default timeouts; timeout should be
specified per request (e.g., session.post(url, json=data, timeout=30.0))

* fix: Address PR feedback on HTTP utilities implementation

Addresses three feedback items from code review:

1. Fix breaking parameter order change in update_system_dependencies()
   - Restored original parameter order: template_id, token, system_dependencies, base_entry_cmd
   - Maintains backward compatibility with existing callers
   - Token parameter now optional (default None)

2. Add proper deprecation warning for token parameter
   - Issues DeprecationWarning when token parameter is used
   - Clearly communicates migration to RUNPOD_API_KEY environment variable
   - Follows Python deprecation best practices (warnings.warn with stacklevel=2)

3. Standardize test mocking approach across all health check tests
   - All tests now use consistent 'tetra_rp.core.utils.http.httpx.AsyncClient' patching
   - Removed inconsistent 'side_effect=lambda' pattern
   - Improved test maintainability by using same strategy everywhere

All 503 tests pass with consistent, clean implementation.

* refactor(drift): Extract runtime field constants and improve maintainability

- Extract RUNTIME_FIELDS and EXCLUDED_HASH_FIELDS as ClassVar constants in ServerlessResource for centralized field list management
- Add clarifying comments to enum serializers explaining defensive isinstance() checks for nested model serialization
- Document CPU load balancer field list coupling in docstring with maintenance guidelines
- Add TestSerializerDefensiveBehavior class with 4 tests verifying pre-stringified enum value handling
- Use ClassVar annotation to satisfy Pydantic v2 model field requirements

This reduces maintenance burden by centralizing field definitions and improves code clarity without changing functionality.

* docs: Improve LoadBalancer documentation accuracy and completeness

- Fix health check timeout: Add clarification that timeout is 15 seconds per check
- Add HTTP authentication details explaining RUNPOD_API_KEY header injection
- Document stub decision logic for incomplete routing metadata (fallback behavior)
- Clarify function signature inspection with concrete example showing parameter mapping
- Expand /execute security explanation with explicit threats and best practices
- Add detailed parameter type constraints for deployed endpoints (supported vs unsupported)
- Add troubleshooting guide for missing routing metadata (404 errors)
- Strengthen security warnings about never exposing /execute in production

All documentation now matches actual implementation verified through codebase analysis.

* docs: add resource config drift detection documentation

- comprehensive guide on drift detection implementation
- covers hash computation, field exclusion, and cpu-specific behavior
- includes testing patterns and troubleshooting guide
- documents all fields that trigger drift vs those ignored

* docs: proper name for the file

* test(build): Add comprehensive test coverage for scanner and handler improvements

- Add 6 new scanner tests for directory filtering (.venv, .flash, .runpod exclusion)
- Add test for resource type validation to prevent false positives
- Add test for fallback behavior when resource name extraction fails
- Add test for handling resource names with special characters
- Update existing tests to reflect new dynamic import format and resource name extraction

These tests guarantee that improvements to the scanner (resource type validation,
directory filtering, fallback behavior) and handler generator (dynamic imports for
invalid Python identifiers) won't regress in future changes.

* test(scanner): Fix resource type assertions to match scanner behavior

The scanner now extracts resource names from the name= parameter rather
than using variable names. Update test assertions to expect the actual
resource names ('test-api', 'deployed-api') instead of variable names.

* chore: merge correction

* fix(drift): Remove manual undeploy/deploy from update() method

Use saveEndpoint mutation for all changes instead of manual lifecycle
management. Server-side automatically detects version-triggering fields
(GPU, template, volumes) and increments endpoint version accordingly.

Keep _has_structural_changes() as informational for logging purposes only.
This aligns with RunPod API's version-based deployment model.

* docs(drift): Clarify _has_structural_changes detects version-triggering changes

Update docstring to reflect that this method identifies changes that
trigger server-side version increment and worker recreation, not manual
redeploy cycles. Explain which changes are version-triggering vs rolling
updates, and note that the method is now informational for logging only.

* feat(drift): Enable environment variable drift detection

Remove env from EXCLUDED_HASH_FIELDS so changes to environment variables
trigger drift detection and endpoint updates. Environment changes are
non-version-triggering (rolling updates), so server will apply them via
saveEndpoint without recreating workers.

Add env to CPU LoadBalancer config_hash for consistent behavior across
all resource types. Update comments to reflect that env is user-specified
configuration, not dynamically computed.

* test(drift): Update tests for environment variable drift detection

- test_lb_config_hash_excludes_env_variables → test_lb_config_hash_detects_env_changes
- test_env_var_changes_no_drift → test_env_var_changes_trigger_drift
- test_config_hash_excludes_env_from_drift → test_config_hash_detects_env_from_drift

Update assertions to expect different hashes when env changes, matching
new behavior where environment variable changes trigger drift and updates.

* fix: Address Copilot review feedback on type hints and documentation

- Fix type annotation for timeout parameter in LoadBalancerSlsStub (Optional[float])
- Replace hardcoded "30s" with actual self.timeout in error messages (2 locations)
- Update Resource_Config_Drift_Detection.md to reflect actual EXCLUDED_HASH_FIELDS
- Remove duplicate Load-Balanced Endpoints section from README.md

Addresses Copilot review comments (PR #132, review 3642596664)
jhcipar pushed a commit that referenced this pull request Jan 12, 2026
…positives (#132)

* feat(runtime): Add generic handler factory for serverless execution

Implement a factory function that creates RunPod serverless handlers,
eliminating code duplication across generated handler files.

The generic_handler module provides:
- create_handler(function_registry) factory that accepts a dict of
  function/class objects and returns a RunPod-compatible handler
- Automatic serialization/deserialization using cloudpickle + base64
- Support for both function execution and class instantiation + method calls
- Structured error responses with full tracebacks for debugging
- Load manifest for cross-endpoint function discovery

This design centralizes all handler logic in one place, making it easy to:
- Fix bugs once, benefit all handlers
- Add new features without regenerating projects
- Keep deployment packages small (handler files are ~23 lines each)

Implementation:
- deserialize_arguments(): Base64 + cloudpickle decoding
- serialize_result(): Cloudpickle + base64 encoding
- execute_function(): Handles function vs. class execution
- load_manifest(): Loads flash_manifest.json for service discovery

* feat(cli): Add handler generator, manifest builder, and scanner for build process

Implement the build pipeline components that work together to generate
serverless handlers from @Remote decorated functions.

Three core components:

1. RemoteDecoratorScanner (scanner.py)
   - Uses Python AST to discover all @Remote decorated functions
   - Extracts function metadata: name, module, async status, is_class
   - Groups functions by resource_config for handler generation
   - Handles edge cases like decorated classes and async functions

2. ManifestBuilder (manifest.py)
   - Groups functions by their resource_config
   - Creates flash_manifest.json structure for service discovery
   - Maps functions to their modules and handler files
   - Enables cross-endpoint function routing at runtime

3. HandlerGenerator (handler_generator.py)
   - Creates lightweight handler_*.py files for each resource config
   - Each handler imports functions and registers them in FUNCTION_REGISTRY
   - Handler delegates to create_handler() factory from generic_handler
   - Generated handlers are ~23 lines (vs ~98 with duplication)

Build Pipeline Flow:
1. Scanner discovers @Remote functions
2. ManifestBuilder groups them by resource_config
3. HandlerGenerator creates handler_*.py for each group
4. All files + manifest bundled into archive.tar.gz

This eliminates ~95% duplication across handlers by using the factory pattern
instead of template-based generation.

* test(runtime): Add comprehensive tests for generic handler

Implement 19 unit tests covering all major paths through the generic_handler
factory and its helper functions.

Test Coverage:

Serialization/Deserialization (7 tests):
- serialize_result() with simple values, dicts, lists
- deserialize_arguments() with empty, args-only, kwargs-only, mixed inputs
- Round-trip encoding/decoding of cloudpickle + base64

Function Execution (4 tests):
- Simple function execution with positional and keyword arguments
- Keyword argument handling
- Class instantiation and method calls
- Argument passing to instance methods

Handler Factory (8 tests):
- create_handler() returns callable RunPod handler
- Handler with simple function registry
- Missing function error handling (returns error response, not exception)
- Function exceptions caught with traceback included
- Multiple functions in single registry
- Complex Python objects (classes, lambdas, closures)
- Empty registry edge case
- Default execution_type parameter
- None return values
- Correct RunPod response format (success, result/error, traceback)

Test Strategy:
- Arrange-Act-Assert pattern for clarity
- Isolated unit tests (no external dependencies)
- Tests verify behavior, not implementation
- Error cases tested for proper error handling
- All serialization tested for round-trip correctness

All tests passing, 83% coverage on generic_handler.py

* test(cli): Add tests for handler generation, manifest building, and scanning

Implement integration tests validating the build pipeline components work
correctly together.

Test Coverage:

HandlerGenerator Tests:
- Handler files created with correct names (handler_<resource_name>.py)
- Generated files import required functions from workers
- FUNCTION_REGISTRY properly formatted
- create_handler() imported from generic_handler
- Handler creation via factory
- RunPod start call present and correct
- Multiple handlers generated for multiple resource configs

ManifestBuilder Tests:
- Manifest structure with correct version and metadata
- Resources grouped by resource_config
- Handler file paths correct
- Function metadata preserved (name, module, is_async, is_class)
- Function registry mapping complete

ScannerTests:
- @Remote decorated functions discovered via AST
- Function metadata extracted correctly
- Module paths resolved properly
- Async functions detected
- Class methods detected
- Edge cases handled (multiple decorators, nested classes)

Test Strategy:
- Integration tests verify components work together
- Tests verify generated files are syntactically correct
- Tests validate data structures match expected schemas
- No external dependencies in build process

Validates that the entire build pipeline:
1. Discovers functions correctly
2. Groups them appropriately
3. Generates valid Python handler files
4. Creates correct manifest structure

* docs(runtime): Document generic handler factory architecture

Add comprehensive architecture documentation explaining why the factory
pattern was chosen and how it works.

Documentation includes:

Overview & Context:
- Problem statement: Handler files had 95% duplication
- Design decision: Use factory function instead of templates
- Benefits: Single source of truth, easier maintenance, consistency

Architecture Diagrams (MermaidJS):
- High-level flow: @Remote functions → Scanner → Manifest → Handlers → Factory
- Component relationships: HandlerGenerator, GeneratedHandler, generic_handler
- Function registry pattern: Discovery → Grouping → Registration → Factory

Implementation Details:
- create_handler(function_registry) signature and behavior
- deserialize_arguments(): Base64 + cloudpickle decoding
- serialize_result(): Cloudpickle + base64 encoding
- execute_function(): Function vs. class execution
- load_manifest(): Service discovery via flash_manifest.json

Design Decisions (with rationale):
- Factory Pattern over Inheritance: Simpler, less coupling, easier to test
- CloudPickle + Base64: Handles arbitrary objects, safe JSON transmission
- Manifest in Generic Handler: Runtime service discovery requirement
- Structured Error Responses: Debugging aid, functional error handling
- Both Execution Types: Supports stateful classes and pure functions

Usage Examples:
- Simple function handler
- Class execution with methods
- Multiple functions in one handler

Build Process Integration:
- 4-phase pipeline: Scanner → Grouping → Generation → Packaging
- Manifest structure and contents
- Generated handler structure (~23 lines)

Testing Strategy:
- 19 unit tests covering all major paths
- 7 integration tests verifying handler generation
- Manual testing with example applications

Performance:
- Zero runtime penalty (factory called once at startup)
- No additional indirection in request path

* docs(cli): Add flash build command documentation

Document the flash build command and update CLI README to include it.

New Documentation:

flash-build.md includes:

Usage & Options:
- Command syntax: flash build [OPTIONS]
- --no-deps: Skip transitive dependencies (faster, smaller archives)
- --keep-build: Keep build directory for inspection/debugging
- --output, -o: Custom archive name (default: archive.tar.gz)

What It Does (5-step process):
1. Discovery: Scan for @Remote decorated functions
2. Grouping: Group functions by resource_config
3. Handler Generation: Create lightweight handler files
4. Manifest Creation: Generate flash_manifest.json
5. Packaging: Create archive.tar.gz for deployment

Build Artifacts:
- .flash/archive.tar.gz: Deployment package (ready for RunPod)
- .flash/flash_manifest.json: Service discovery configuration
- .flash/.build/: Temporary build directory

Handler Generation:
- Explains factory pattern and minimal handler files
- Links to Runtime_Generic_Handler.md for details

Dependency Management:
- Default behavior: Install all dependencies including transitive
- --no-deps: Only direct dependencies (when base image has transitive)
- Trade-offs explained

Cross-Endpoint Function Calls:
- Example showing GPU and CPU endpoints
- Manifest enables routing automatically

Output & Troubleshooting:
- Sample build output with progress indicators
- Common failure scenarios and solutions
- How to debug with --keep-build

Next Steps:
- Test locally with flash run
- Deploy to RunPod
- Monitor with flash undeploy list

Updated CLI README.md:
- Added flash build to command list in sequence
- Links to full flash-build.md documentation

* docs: Add build process and handler generation section to README

Add a new section explaining how the build system works and why the
factory pattern reduces code duplication.

New Section: Build Process and Handler Generation

Explains:

How Flash Builds Your Application (5-step pipeline):
1. Discovery: Scans code for @Remote decorated functions
2. Grouping: Groups functions by resource_config
3. Handler Generation: Creates lightweight handler files
4. Manifest Creation: Generates flash_manifest.json for service discovery
5. Packaging: Bundles everything into archive.tar.gz

Handler Architecture (with code example):
- Shows generated handler using factory pattern
- Single source of truth: All handler logic in one place
- Easier maintenance: Bug fixes don't require rebuilding projects

Cross-Endpoint Function Calls:
- Example of GPU and CPU endpoints calling each other
- Manifest and runtime wrapper handle service discovery

Build Artifacts:
- .flash/.build/: Temporary build directory
- .flash/archive.tar.gz: Deployment package
- .flash/flash_manifest.json: Service configuration

Links to detailed documentation:
- docs/Runtime_Generic_Handler.md for architecture details
- src/tetra_rp/cli/docs/flash-build.md for CLI reference

This section bridges the main README and detailed documentation,
providing entry point for new users discovering the build system.

* feat(cli): Integrate build utilities into flash build command

Wire up the handler generator, manifest builder, and scanner into the
actual flash build command implementation.

Changes to build.py:

1. Integration:
   - Import RemoteDecoratorScanner for function discovery
   - Import ManifestBuilder for manifest creation
   - Import HandlerGenerator for handler file creation
   - Call these in sequence during the build process

2. Build Pipeline:
   - After copying project files, scan for @Remote functions
   - Build manifest from discovered functions
   - Generate handler files for each resource config
   - Write manifest to build directory
   - Progress indicators show what's being generated

3. Fixes:
   - Change .tetra directory references to .flash
   - Uncomment actual build logic (was showing "Coming Soon" message)
   - Fix progress messages to show actual file counts

4. Error Handling:
   - Try/catch around handler generation
   - Warning shown if generation fails but build continues
   - User can debug with --keep-build flag

Build Flow Now:
1. Load ignore patterns
2. Collect project files
3. Create build directory
4. Copy files to build directory
5. [NEW] Scan for @Remote functions
6. [NEW] Build and write manifest
7. [NEW] Generate handler files
8. Install dependencies
9. Create archive
10. Clean up build directory (unless --keep-build)

Dependencies:
- Updated uv.lock with all required dependencies

* refactor(build): Fix directory structure and add comprehensive error handling

**Critical Fixes:**
- Remove "Coming Soon" message blocking build command execution
- Fix build directory to use .flash/.build/ directly (no app_name subdirectory)
- Fix tarball to extract with flat structure using arcname="."
- Fix cleanup to remove correct build directory

**Error Handling & Validation:**
- Add specific exception handling (ImportError, SyntaxError, ValueError)
- Add import validation to generated handlers
- Add duplicate function name detection across resources
- Add proper error logging throughout build process

**Resource Type Tracking:**
- Add resource_type field to RemoteFunctionMetadata
- Track actual resource types (LiveServerless, CpuLiveServerless)
- Use actual types in manifest instead of hardcoding

**Robustness Improvements:**
- Add handler import validation post-generation
- Add manifest path fallback search (cwd, module dir, legacy location)
- Add resource name sanitization for safe filenames
- Add specific exception logging in scanner (UnicodeDecodeError, SyntaxError)

**User Experience:**
- Add troubleshooting section to README
- Update manifest path documentation in docs
- Change "Zero Runtime Penalty" to "Minimal Runtime Overhead"
- Mark future enhancements as "Not Yet Implemented"
- Improve build success message with next steps

Fixes all 20 issues identified in code review (issues #1-13, #19-22)

* feat(resources): Add LoadBalancerSlsResource for LB endpoints

Implement LoadBalancerSlsResource class for provisioning RunPod load-balanced
serverless endpoints. Load-balanced endpoints expose HTTP servers directly to
clients without queue-based processing, enabling REST APIs, webhooks, and
real-time communication patterns.

Key features:
- Type enforcement (always LB, never QB)
- Scaler validation (REQUEST_COUNT required, not QUEUE_DELAY)
- Health check polling via /ping endpoint (200/204 = healthy)
- Post-deployment verification with configurable retries
- Async and sync health check methods
- Comprehensive unit tests
- Full documentation with architecture diagrams and examples

Architecture:
- Extends ServerlessResource with LB-specific behavior
- Validates configuration before deployment
- Polls /ping endpoint until healthy (10 retries × 5s = 50s timeout)
- Raises TimeoutError if endpoint fails to become healthy

This forms the foundation for Mothership architecture where a load-balanced
endpoint serves as a directory server for child endpoints.

* fix(test): Fix LoadBalancerSlsResource deployment test mocks

Import ServerlessResource directly and use patch.object on the imported class
instead of string-based patches. This ensures the mocks properly intercept the
parent class's _do_deploy method when called via super(). Simplifies mock
configuration and removes an unused variable assertion.

Fixes the three failing deployment tests that were making real GraphQL API calls.
All tests now pass: 418 passed, 1 skipped.

* feat(resources): Phase 1 - Core infrastructure for @Remote on LB endpoints

Implement core infrastructure for enabling @Remote decorator on
LoadBalancerSlsResource endpoints with HTTP method/path routing.

Changes:
- Create LoadBalancerSlsStub: HTTP-based stub for direct endpoint execution
  (src/tetra_rp/stubs/load_balancer_sls.py, 170 lines)
  - Serializes functions and arguments using cloudpickle + base64
  - Direct HTTP POST to /execute endpoint (no queue polling)
  - Proper error handling and deserialization

- Register stub with singledispatch (src/tetra_rp/stubs/registry.py)
  - Enables @Remote to dispatch to LoadBalancerSlsStub for LB resources

- Extend @Remote decorator with HTTP routing parameters (src/tetra_rp/client.py)
  - Add 'method' parameter: GET, POST, PUT, DELETE, PATCH
  - Add 'path' parameter: /api/endpoint routes
  - Validate method/path required for LoadBalancerSlsResource
  - Store routing metadata on decorated functions/classes
  - Warn if routing params used with non-LB resources

Foundation for Phase 2 (Build system integration) and Phase 3 (Local dev).

* feat(build): Phase 2.1 - Enhanced scanner for HTTP routing extraction

Update RemoteDecoratorScanner to extract HTTP method and path from
@Remote decorator for LoadBalancerSlsResource endpoints.

Changes:
- Add http_method and http_path fields to RemoteFunctionMetadata
- Add _extract_http_routing() method to parse decorator keywords
- Extract method (GET, POST, PUT, DELETE, PATCH) from decorator
- Extract path (/api/process) from decorator
- Store routing metadata for manifest generation

Foundation for Phase 2.2 (Manifest updates) and Phase 2.3 (Handler generation).

* feat(build): Phase 2.2 - Updated manifest schema for HTTP routing

Enhance ManifestBuilder to support HTTP method/path routing for
LoadBalancerSlsResource endpoints.

Changes:
- Add http_method and http_path fields to ManifestFunction
- Validate LB endpoints have both method and path
- Detect and prevent route conflicts (same method + path)
- Prevent use of reserved paths (/execute, /ping)
- Add 'routes' section to manifest for LB endpoints
- Conditional inclusion of routing fields (only for LB)

Manifest structure for LB endpoints now includes:
{
  "resources": {
    "api_service": {
      "resource_type": "LoadBalancerSlsResource",
      "functions": [
        {
          "name": "process_data",
          "http_method": "POST",
          "http_path": "/api/process"
        }
      ]
    }
  },
  "routes": {
    "api_service": {
      "POST /api/process": "process_data"
    }
  }
}

* feat(cli): Add LB handler generator for FastAPI app creation

Implement LBHandlerGenerator to create FastAPI applications for
LoadBalancerSlsResource endpoints with HTTP method/path routing.

Key features:
- Generates FastAPI apps with explicit route registry
- Creates (method, path) -> function mappings from manifest
- Validates route conflicts and reserved paths
- Imports user functions and creates dynamic routes
- Includes required /ping health check endpoint
- Validates generated handler Python syntax via import

Generated handler structure enables:
- Direct HTTP routing to user functions via FastAPI
- Framework /execute endpoint for @Remote stub execution
- Local development with uvicorn

* feat(runtime): Implement LB handler factory for FastAPI app creation

Create create_lb_handler() factory function that dynamically builds FastAPI
applications from route registries for LoadBalancerSlsResource endpoints.

Key features:
- Accepts route_registry: Dict[(method, path)] -> handler_function mapping
- Registers all user-defined routes from registry to FastAPI app
- Provides /execute endpoint for @Remote stub function execution
- Handles async function execution automatically
- Serializes results with cloudpickle + base64 encoding
- Comprehensive error handling with detailed logging

The /execute endpoint enables:
- Remote function code execution via @Remote decorator
- Automatic argument deserialization from cloudpickle/base64
- Result serialization for transmission back to client
- Support for both sync and async functions

* feat(cli): Route build command to separate handlers for LB endpoints

Update build command to use appropriate handler generators based on
resource type. Separates LoadBalancerSlsResource endpoints (using FastAPI)
from queue-based endpoints (using generic handler).

Changes:
- Import LBHandlerGenerator alongside HandlerGenerator
- Inspect manifest resources and separate by type
- Generate LB handlers via LBHandlerGenerator
- Generate QB handlers via HandlerGenerator
- Combine all generated handler paths for summary

Enables users to mix LB and QB endpoints in same project with correct
code generation for each resource type.

* feat(resources): Add LiveLoadBalancer for local LB endpoint testing

Implement LiveLoadBalancer resource following the LiveServerless pattern
for local development and testing of load-balanced endpoints.

Changes:
- Add TETRA_LB_IMAGE constant for load-balanced Tetra image
- Create LiveLoadBalancer class extending LoadBalancerSlsResource
- Uses LiveServerlessMixin to lock imageName to Tetra LB image
- Register LiveLoadBalancer with LoadBalancerSlsStub in singledispatch
- Export LiveLoadBalancer from core.resources and top-level __init__

This enables users to test LB-based functions locally before deploying,
using the same pattern as LiveServerless for queue-based endpoints.

Users can now write:
  from tetra_rp import LiveLoadBalancer, remote

  api = LiveLoadBalancer(name="test-api")

  @Remote(api, method="POST", path="/api/process")
  async def process_data(x, y):
      return {"result": x + y}

  result = await process_data(5, 3)  # Local execution

* test(stubs): Add comprehensive unit tests for LoadBalancerSlsStub

Implement unit tests for LoadBalancerSlsStub covering:
- Request preparation with arguments and dependencies
- Response handling for success and error cases
- Error handling for invalid responses
- Base64 encoding/decoding of serialized data
- Endpoint URL validation
- Timeout and HTTP error handling

Test coverage:
- _prepare_request: 4 tests
- _handle_response: 5 tests
- _execute_function: 3 error case tests
- __call__: 2 integration tests

Tests verify proper function serialization, argument handling,
error propagation, and response deserialization.

* fix(test): Correct LB endpoint test decorator to match assertions

Fix test_load_balancer_vs_queue_based_endpoints by updating the @Remote
decorator to use method='POST' and path='/api/echo' to match the test
assertions. This was a test-level bug where the decorator definition
didn't match what was being asserted.

* docs: Add comprehensive documentation for @Remote with LoadBalancer endpoints

- Using_Remote_With_LoadBalancer.md: User guide for HTTP routing, local development, building and deploying
- LoadBalancer_Runtime_Architecture.md: Technical details on deployment, request flows, security, and performance
- Updated README.md with LoadBalancer section and code example
- Updated Load_Balancer_Endpoints.md with cross-references to new guides

* security: Remove /execute from deployed LoadBalancer endpoints

Split @Remote execution behavior between local and deployed:
- LiveLoadBalancer (local): Uses /execute endpoint for function serialization
- LoadBalancerSlsResource (deployed): Uses user-defined routes with HTTP param mapping

Changes:
1. LoadBalancerSlsStub routing detection:
   - _should_use_execute_endpoint() determines execution path
   - _execute_via_user_route() maps args to JSON and POSTs to user routes
   - Auto-detects resource type and routing metadata

2. Conditional /execute registration:
   - create_lb_handler() now accepts include_execute parameter
   - Generated handlers default to include_execute=False (security)
   - LiveLoadBalancer can enable /execute if needed

3. Updated handler generator:
   - Added clarity comments on /execute exclusion for deployed endpoints

4. Comprehensive test coverage:
   - 8 new tests for routing detection and execution paths
   - All 31 tests passing (22 unit + 9 integration)

5. Documentation updates:
   - Using_Remote_With_LoadBalancer.md: clarified /execute scope
   - Added 'Local vs Deployed Execution' section explaining differences
   - LoadBalancer_Runtime_Architecture.md: updated execution model
   - Added troubleshooting for deployed endpoint scenarios

Security improvement:
- Deployed endpoints only expose user-defined routes
- /execute endpoint removed from production (prevents arbitrary code execution)
- Lower attack surface for deployed endpoints

* feat(build): Phase 4 - Fix LiveLoadBalancer handler generation to include /execute endpoint

- Modified manifest.py to validate LiveLoadBalancer endpoints like LoadBalancerSlsResource
- Updated lb_handler_generator to:
  - Include LiveLoadBalancer in handler generation filter
  - Pass include_execute=True for LiveLoadBalancer (local dev)
  - Pass include_execute=False for LoadBalancerSlsResource (deployed)
- Added integration tests:
  - Verify LiveLoadBalancer handlers include /execute endpoint
  - Verify deployed handlers exclude /execute endpoint
- Fixes critical bug: LiveLoadBalancer now gets /execute endpoint in generated handlers

* fix(scanner): Discover LoadBalancer resources in addition to Serverless resources

- Updated scanner to extract LiveLoadBalancer and LoadBalancerSlsResource resources
- Previously only looked for 'Serverless' in class name, missing LoadBalancer endpoints
- Now checks for both 'Serverless' and 'LoadBalancer' in resource type names
- Added integration test to verify scanner discovers both resource types
- Fixes critical bug that prevented flash build from finding LoadBalancer endpoints

* chore: Format code for line length and remove unused imports

- Wrap long lines in manifest.py, lb_handler.py, and load_balancer_sls.py
- Remove unused httpx import in test_load_balancer_sls_stub.py
- Apply consistent formatting across codebase

* fix: Address PR #131 review feedback

- Scanner: Use exact type name matching instead of substring matching
  - Whitelist specific resource types to avoid false positives
  - Prevents matching classes like 'MyServerlessHelper' or 'LoadBalancerUtils'

- Type hints: Use Optional[str] for nullable fields in manifest
  - ManifestFunction.http_method and http_path now properly typed

- Timeout: Make HTTP client timeout configurable
  - Added LoadBalancerSlsStub.DEFAULT_TIMEOUT class attribute
  - Added timeout parameter to __init__
  - Updated both _execute_function and _execute_via_user_route to use self.timeout

- Deprecated datetime: Replace datetime.utcnow() with datetime.now(timezone.utc)
  - Updated manifest.py and test_lb_remote_execution.py
  - Ensures Python 3.12+ compatibility

* style: Format datetime chaining for line length

* fix: LiveLoadBalancer template not serialized to RunPod GraphQL

The set_serverless_template model_validator was being overwritten by sync_input_fields
(both had mode="after"). In Pydantic v2, when two validators with the same mode are
defined in a class, only one is registered.

This caused templates to never be created from imageName, resulting in:
  "GraphQL errors: One of templateId, template is required to create an endpoint"

Solution:
- Move set_serverless_template validator from ServerlessResource base class to subclasses
  (ServerlessEndpoint and LoadBalancerSlsResource) where the validation is actually needed
- Keep helper methods (_create_new_template, _configure_existing_template) in base class
  for reuse
- Add comprehensive tests for LiveLoadBalancer template serialization

This allows:
1. Base ServerlessResource to be instantiated freely for testing/configuration
2. Subclasses (ServerlessEndpoint, LoadBalancerSlsResource) to enforce template
   requirements during deployment
3. Proper template serialization in GraphQL payload for RunPod API

Fixes: One of templateId, template is required to create an endpoint error when
deploying LiveLoadBalancer with custom image tags like runpod/tetra-rp-lb:local

* fix: LoadBalancer endpoint URL and add CPU support

- Fix: Use correct endpoint URL format for load-balanced endpoints
  (https://{id}.api.runpod.ai instead of https://api.runpod.ai/v2/{id})
  This fixes 404 errors on /ping health check endpoints

- Feature: Add CPU LoadBalancer support
  * Create CpuLoadBalancerSlsResource for CPU-based load-balanced endpoints
  * Create CpuLiveLoadBalancer for local CPU LB development
  * Add TETRA_CPU_LB_IMAGE constant for CPU LB Docker image
  * Update example code to use CpuLiveLoadBalancer for CPU worker
  * Add 8 comprehensive tests for CPU LoadBalancer functionality

- Tests: Add 2 tests for endpoint URL format validation
- All 474 tests passing, 64% code coverage

* fix: Export CpuLiveLoadBalancer and CpuLoadBalancerSlsResource from tetra_rp package

LoadBalancer resources were not being discovered by ResourceDiscovery because
the new CPU variants (CpuLiveLoadBalancer, CpuLoadBalancerSlsResource) were
not exported from the main tetra_rp package. This prevented undeploy from
picking up these resources.

Added exports to:
- TYPE_CHECKING imports for type hints
- __getattr__ function for lazy loading
- __all__ list for public API

This fixes the issue where 'flash undeploy list' could not find LoadBalancer
resources that were deployed with 'flash run --auto-provision'.

* fix: Add API key authentication to LoadBalancer health check

The /ping endpoint for RunPod load-balanced endpoints requires the
RUNPOD_API_KEY header for authentication. Without it, the health check
fails with 401 Unauthorized, causing provisioning to timeout.

This fix adds the Authorization header to the health check request if
the RUNPOD_API_KEY environment variable is available, allowing the
endpoint health check to succeed during provisioning.

Fixes issue where 'flash run --auto-provision' would fail even though
the endpoint was successfully created on RunPod.

* fix(lb): Exclude flashboot from CpuLoadBalancerSlsResource GraphQL payload

CpuLoadBalancerSlsResource was overriding _input_only without including flashboot,
causing it to be sent to the RunPod GraphQL API which doesn't accept this field.
This caused deployment to fail with: Field "flashboot" is not defined by type "EndpointInput".

* fix(lb): Expand CpuInstanceType.ANY to all CPU flavors in CpuLoadBalancerSlsResource

Add field_validator to expand [CpuInstanceType.ANY] to all available CPU instance
types (cpu3g, cpu3c, cpu5c variants). This matches the behavior in CpuServerlessEndpoint
and prevents deployment errors like 'instanceId must be in the format of flavorId-vcpu-ram'.

* refactor(cpu): Move instanceIds validator to CpuEndpointMixin

Move the instanceIds field_validator from CpuServerlessEndpoint to CpuEndpointMixin
so both CpuServerlessEndpoint and CpuLoadBalancerSlsResource share the same validator
that expands [CpuInstanceType.ANY] to all available CPU flavors. This eliminates
code duplication and ensures consistent behavior across all CPU endpoint types.

* test: Update CPU instance test to reflect validator expansion

Update test_cpu_live_load_balancer_defaults_to_cpu_any to verify that
[CpuInstanceType.ANY] is correctly expanded to all available CPU instance types
by the field_validator in CpuEndpointMixin.

* fix(lb): Increase health check timeout from 5s to 15s

Load-balanced workers need more time to respond during cold starts and initialization.
RunPod docs recommend at least 10-15 second timeouts for health checks. Workers may
return 204 during initialization, which is normal and expected.

* fix(lb): Fix CPU load balancer template deployment error

Fixes two bugs in CpuLoadBalancerSlsResource that prevented CPU load balancers from deploying:

1. Added gpuCount and allowedCudaVersions to _input_only exclusion set to prevent
   GPU-specific fields from being sent to RunPod API
2. Overrode set_serverless_template() to call _sync_cpu_fields() first, ensuring
   GPU defaults are overridden to CPU-appropriate values (gpuCount=0)

The RunPod API was rejecting CPU load balancer templates because GPU-specific
fields were being included in the GraphQL payload. These changes align
CpuLoadBalancerSlsResource behavior with CpuServerlessEndpoint.

Also added comprehensive test coverage (30+ tests) to verify:
- GPU fields are correctly overridden to CPU defaults
- GPU fields are excluded from API payloads
- CPU-specific fields are properly included
- Consistency with CpuServerlessEndpoint behavior

* fix(drift): Exclude runtime fields from config hash to prevent false positives

Fixes false positive configuration drift detection by separating concerns:

1. Update ServerlessResource.config_hash to exclude runtime fields
   - Fields like template, templateId, aiKey, userId are API-assigned
   - Prevents false drift when same config is redeployed across processes
   - Now only hashes user-specified configuration

2. Add config_hash override to CpuLoadBalancerSlsResource
   - CPU load balancers hash only CPU-relevant fields
   - Excludes GPU-specific fields and runtime fields
   - Follows same pattern as CpuServerlessEndpoint

3. Fix _has_structural_changes to exclude template/templateId
   - CRITICAL: These runtime fields were causing false structural changes
   - Was forcing unnecessary redeployments despite update() being available
   - Now system correctly uses update() instead of undeploy+deploy

4. Make field serializers robust to handle string/enum values
   - Prevents serialization errors when fields are pre-converted to strings

5. Add comprehensive drift detection tests (16 tests)
   - Test hash stability with runtime field changes
   - Test exclusion of env, template, templateId, and other runtime fields
   - Test that actual config changes (image, flashboot) are detected
   - Test structural change detection behavior
   - Test real-world deployment scenarios

Results:
- Same config deployed multiple times: no false drift
- Different env vars with same config: no false drift
- Template/templateId changes: no false drift
- API-assigned fields: no false drift
- User config changes (image, flashboot): drift detected correctly
- All 512 unit tests pass

* fix(http): Standardize RunPod HTTP client authentication across codebase

Centralizes HTTP client creation for RunPod load-balanced endpoints to prevent
manual Authorization header code duplication and ensure consistent authentication:

1. Create centralized HTTP utility function (src/tetra_rp/core/utils/http.py)
   - New function: get_authenticated_httpx_client()
   - Automatically adds Bearer token Authorization header if RUNPOD_API_KEY set
   - Provides consistent timeout handling (default 30s, customizable)
   - Follows existing GraphQL/REST client authentication pattern

2. Fix critical authentication bug in LoadBalancerSlsStub._execute_via_user_route()
   - Previously: Missing Authorization header (401 errors on user routes)
   - Now: Uses centralized utility for proper authentication
   - Enables direct HTTP calls to user-defined routes with auth

3. Refactor two methods to use centralized utility
   - LoadBalancerSlsStub._execute_function() - removes 7+ lines of manual auth code
   - LoadBalancerSlsResource._check_ping_endpoint() - simplifies auth setup

4. Add comprehensive unit tests (tests/unit/core/utils/test_http.py)
   - Tests API key presence/absence handling
   - Tests custom and default timeout configuration
   - Tests edge cases (empty key, zero timeout)
   - All 7 tests pass with 100% coverage

Results:
- Single source of truth for HTTP authentication (centralized utility)
- Fixes 401 Unauthorized errors on load-balanced endpoints
- Eliminates repetitive manual auth code across 3+ locations
- Easier to maintain and update authentication patterns in future
- All 499 unit tests pass
- Code coverage: 64% (exceeds 35% requirement)

* feat(http): Extend HTTP utilities to cover both sync and async authentication

Extends the centralized HTTP authentication pattern to all RunPod API calls:

1. Add get_authenticated_requests_session() for synchronous requests
   - Creates requests.Session with automatic Bearer token Authorization header
   - Follows same pattern as async get_authenticated_httpx_client()
   - Single source of truth for sync HTTP authentication

2. Refactor template.py to use centralized utility
   - Removes manual Authorization header setup (line 86)
   - Now uses get_authenticated_requests_session() for all template updates
   - Improves error handling with raise_for_status()
   - Token parameter marked deprecated; uses RUNPOD_API_KEY env var

3. Add comprehensive tests for sync utility (4 tests)
   - Tests API key presence/absence handling
   - Tests empty API key edge case
   - Tests Session object validation
   - All tests pass with proper cleanup

Benefits:
- True single source of truth for all RunPod HTTP authentication (sync + async)
- Consistent patterns across entire codebase
- Easier future auth changes across all HTTP client types
- Eliminates manual auth header code in template.py
- All 503 unit tests pass
- Code coverage: 64% (exceeds 35% requirement)

Note: requests.Session doesn't support default timeouts; timeout should be
specified per request (e.g., session.post(url, json=data, timeout=30.0))

* fix: Address PR feedback on HTTP utilities implementation

Addresses three feedback items from code review:

1. Fix breaking parameter order change in update_system_dependencies()
   - Restored original parameter order: template_id, token, system_dependencies, base_entry_cmd
   - Maintains backward compatibility with existing callers
   - Token parameter now optional (default None)

2. Add proper deprecation warning for token parameter
   - Issues DeprecationWarning when token parameter is used
   - Clearly communicates migration to RUNPOD_API_KEY environment variable
   - Follows Python deprecation best practices (warnings.warn with stacklevel=2)

3. Standardize test mocking approach across all health check tests
   - All tests now use consistent 'tetra_rp.core.utils.http.httpx.AsyncClient' patching
   - Removed inconsistent 'side_effect=lambda' pattern
   - Improved test maintainability by using same strategy everywhere

All 503 tests pass with consistent, clean implementation.

* refactor(drift): Extract runtime field constants and improve maintainability

- Extract RUNTIME_FIELDS and EXCLUDED_HASH_FIELDS as ClassVar constants in ServerlessResource for centralized field list management
- Add clarifying comments to enum serializers explaining defensive isinstance() checks for nested model serialization
- Document CPU load balancer field list coupling in docstring with maintenance guidelines
- Add TestSerializerDefensiveBehavior class with 4 tests verifying pre-stringified enum value handling
- Use ClassVar annotation to satisfy Pydantic v2 model field requirements

This reduces maintenance burden by centralizing field definitions and improves code clarity without changing functionality.

* docs: Improve LoadBalancer documentation accuracy and completeness

- Fix health check timeout: Add clarification that timeout is 15 seconds per check
- Add HTTP authentication details explaining RUNPOD_API_KEY header injection
- Document stub decision logic for incomplete routing metadata (fallback behavior)
- Clarify function signature inspection with concrete example showing parameter mapping
- Expand /execute security explanation with explicit threats and best practices
- Add detailed parameter type constraints for deployed endpoints (supported vs unsupported)
- Add troubleshooting guide for missing routing metadata (404 errors)
- Strengthen security warnings about never exposing /execute in production

All documentation now matches actual implementation verified through codebase analysis.

* docs: add resource config drift detection documentation

- comprehensive guide on drift detection implementation
- covers hash computation, field exclusion, and cpu-specific behavior
- includes testing patterns and troubleshooting guide
- documents all fields that trigger drift vs those ignored

* docs: proper name for the file

* test(build): Add comprehensive test coverage for scanner and handler improvements

- Add 6 new scanner tests for directory filtering (.venv, .flash, .runpod exclusion)
- Add test for resource type validation to prevent false positives
- Add test for fallback behavior when resource name extraction fails
- Add test for handling resource names with special characters
- Update existing tests to reflect new dynamic import format and resource name extraction

These tests guarantee that improvements to the scanner (resource type validation,
directory filtering, fallback behavior) and handler generator (dynamic imports for
invalid Python identifiers) won't regress in future changes.

* test(scanner): Fix resource type assertions to match scanner behavior

The scanner now extracts resource names from the name= parameter rather
than using variable names. Update test assertions to expect the actual
resource names ('test-api', 'deployed-api') instead of variable names.

* chore: merge correction

* fix(drift): Remove manual undeploy/deploy from update() method

Use saveEndpoint mutation for all changes instead of manual lifecycle
management. Server-side automatically detects version-triggering fields
(GPU, template, volumes) and increments endpoint version accordingly.

Keep _has_structural_changes() as informational for logging purposes only.
This aligns with RunPod API's version-based deployment model.

* docs(drift): Clarify _has_structural_changes detects version-triggering changes

Update docstring to reflect that this method identifies changes that
trigger server-side version increment and worker recreation, not manual
redeploy cycles. Explain which changes are version-triggering vs rolling
updates, and note that the method is now informational for logging only.

* feat(drift): Enable environment variable drift detection

Remove env from EXCLUDED_HASH_FIELDS so changes to environment variables
trigger drift detection and endpoint updates. Environment changes are
non-version-triggering (rolling updates), so server will apply them via
saveEndpoint without recreating workers.

Add env to CPU LoadBalancer config_hash for consistent behavior across
all resource types. Update comments to reflect that env is user-specified
configuration, not dynamically computed.

* test(drift): Update tests for environment variable drift detection

- test_lb_config_hash_excludes_env_variables → test_lb_config_hash_detects_env_changes
- test_env_var_changes_no_drift → test_env_var_changes_trigger_drift
- test_config_hash_excludes_env_from_drift → test_config_hash_detects_env_from_drift

Update assertions to expect different hashes when env changes, matching
new behavior where environment variable changes trigger drift and updates.

* fix: Address Copilot review feedback on type hints and documentation

- Fix type annotation for timeout parameter in LoadBalancerSlsStub (Optional[float])
- Replace hardcoded "30s" with actual self.timeout in error messages (2 locations)
- Update Resource_Config_Drift_Detection.md to reflect actual EXCLUDED_HASH_FIELDS
- Remove duplicate Load-Balanced Endpoints section from README.md

Addresses Copilot review comments (PR #132, review 3642596664)
deanq added a commit that referenced this pull request Jan 14, 2026
* feat(runtime): Add generic handler factory for serverless execution

Implement a factory function that creates RunPod serverless handlers,
eliminating code duplication across generated handler files.

The generic_handler module provides:
- create_handler(function_registry) factory that accepts a dict of
  function/class objects and returns a RunPod-compatible handler
- Automatic serialization/deserialization using cloudpickle + base64
- Support for both function execution and class instantiation + method calls
- Structured error responses with full tracebacks for debugging
- Load manifest for cross-endpoint function discovery

This design centralizes all handler logic in one place, making it easy to:
- Fix bugs once, benefit all handlers
- Add new features without regenerating projects
- Keep deployment packages small (handler files are ~23 lines each)

Implementation:
- deserialize_arguments(): Base64 + cloudpickle decoding
- serialize_result(): Cloudpickle + base64 encoding
- execute_function(): Handles function vs. class execution
- load_manifest(): Loads flash_manifest.json for service discovery

* feat(cli): Add handler generator, manifest builder, and scanner for build process

Implement the build pipeline components that work together to generate
serverless handlers from @remote decorated functions.

Three core components:

1. RemoteDecoratorScanner (scanner.py)
   - Uses Python AST to discover all @remote decorated functions
   - Extracts function metadata: name, module, async status, is_class
   - Groups functions by resource_config for handler generation
   - Handles edge cases like decorated classes and async functions

2. ManifestBuilder (manifest.py)
   - Groups functions by their resource_config
   - Creates flash_manifest.json structure for service discovery
   - Maps functions to their modules and handler files
   - Enables cross-endpoint function routing at runtime

3. HandlerGenerator (handler_generator.py)
   - Creates lightweight handler_*.py files for each resource config
   - Each handler imports functions and registers them in FUNCTION_REGISTRY
   - Handler delegates to create_handler() factory from generic_handler
   - Generated handlers are ~23 lines (vs ~98 with duplication)

Build Pipeline Flow:
1. Scanner discovers @remote functions
2. ManifestBuilder groups them by resource_config
3. HandlerGenerator creates handler_*.py for each group
4. All files + manifest bundled into archive.tar.gz

This eliminates ~95% duplication across handlers by using the factory pattern
instead of template-based generation.

* test(runtime): Add comprehensive tests for generic handler

Implement 19 unit tests covering all major paths through the generic_handler
factory and its helper functions.

Test Coverage:

Serialization/Deserialization (7 tests):
- serialize_result() with simple values, dicts, lists
- deserialize_arguments() with empty, args-only, kwargs-only, mixed inputs
- Round-trip encoding/decoding of cloudpickle + base64

Function Execution (4 tests):
- Simple function execution with positional and keyword arguments
- Keyword argument handling
- Class instantiation and method calls
- Argument passing to instance methods

Handler Factory (8 tests):
- create_handler() returns callable RunPod handler
- Handler with simple function registry
- Missing function error handling (returns error response, not exception)
- Function exceptions caught with traceback included
- Multiple functions in single registry
- Complex Python objects (classes, lambdas, closures)
- Empty registry edge case
- Default execution_type parameter
- None return values
- Correct RunPod response format (success, result/error, traceback)

Test Strategy:
- Arrange-Act-Assert pattern for clarity
- Isolated unit tests (no external dependencies)
- Tests verify behavior, not implementation
- Error cases tested for proper error handling
- All serialization tested for round-trip correctness

All tests passing, 83% coverage on generic_handler.py

* test(cli): Add tests for handler generation, manifest building, and scanning

Implement integration tests validating the build pipeline components work
correctly together.

Test Coverage:

HandlerGenerator Tests:
- Handler files created with correct names (handler_<resource_name>.py)
- Generated files import required functions from workers
- FUNCTION_REGISTRY properly formatted
- create_handler() imported from generic_handler
- Handler creation via factory
- RunPod start call present and correct
- Multiple handlers generated for multiple resource configs

ManifestBuilder Tests:
- Manifest structure with correct version and metadata
- Resources grouped by resource_config
- Handler file paths correct
- Function metadata preserved (name, module, is_async, is_class)
- Function registry mapping complete

ScannerTests:
- @remote decorated functions discovered via AST
- Function metadata extracted correctly
- Module paths resolved properly
- Async functions detected
- Class methods detected
- Edge cases handled (multiple decorators, nested classes)

Test Strategy:
- Integration tests verify components work together
- Tests verify generated files are syntactically correct
- Tests validate data structures match expected schemas
- No external dependencies in build process

Validates that the entire build pipeline:
1. Discovers functions correctly
2. Groups them appropriately
3. Generates valid Python handler files
4. Creates correct manifest structure

* docs(runtime): Document generic handler factory architecture

Add comprehensive architecture documentation explaining why the factory
pattern was chosen and how it works.

Documentation includes:

Overview & Context:
- Problem statement: Handler files had 95% duplication
- Design decision: Use factory function instead of templates
- Benefits: Single source of truth, easier maintenance, consistency

Architecture Diagrams (MermaidJS):
- High-level flow: @remote functions → Scanner → Manifest → Handlers → Factory
- Component relationships: HandlerGenerator, GeneratedHandler, generic_handler
- Function registry pattern: Discovery → Grouping → Registration → Factory

Implementation Details:
- create_handler(function_registry) signature and behavior
- deserialize_arguments(): Base64 + cloudpickle decoding
- serialize_result(): Cloudpickle + base64 encoding
- execute_function(): Function vs. class execution
- load_manifest(): Service discovery via flash_manifest.json

Design Decisions (with rationale):
- Factory Pattern over Inheritance: Simpler, less coupling, easier to test
- CloudPickle + Base64: Handles arbitrary objects, safe JSON transmission
- Manifest in Generic Handler: Runtime service discovery requirement
- Structured Error Responses: Debugging aid, functional error handling
- Both Execution Types: Supports stateful classes and pure functions

Usage Examples:
- Simple function handler
- Class execution with methods
- Multiple functions in one handler

Build Process Integration:
- 4-phase pipeline: Scanner → Grouping → Generation → Packaging
- Manifest structure and contents
- Generated handler structure (~23 lines)

Testing Strategy:
- 19 unit tests covering all major paths
- 7 integration tests verifying handler generation
- Manual testing with example applications

Performance:
- Zero runtime penalty (factory called once at startup)
- No additional indirection in request path

* docs(cli): Add flash build command documentation

Document the flash build command and update CLI README to include it.

New Documentation:

flash-build.md includes:

Usage & Options:
- Command syntax: flash build [OPTIONS]
- --no-deps: Skip transitive dependencies (faster, smaller archives)
- --keep-build: Keep build directory for inspection/debugging
- --output, -o: Custom archive name (default: archive.tar.gz)

What It Does (5-step process):
1. Discovery: Scan for @remote decorated functions
2. Grouping: Group functions by resource_config
3. Handler Generation: Create lightweight handler files
4. Manifest Creation: Generate flash_manifest.json
5. Packaging: Create archive.tar.gz for deployment

Build Artifacts:
- .flash/archive.tar.gz: Deployment package (ready for RunPod)
- .flash/flash_manifest.json: Service discovery configuration
- .flash/.build/: Temporary build directory

Handler Generation:
- Explains factory pattern and minimal handler files
- Links to Runtime_Generic_Handler.md for details

Dependency Management:
- Default behavior: Install all dependencies including transitive
- --no-deps: Only direct dependencies (when base image has transitive)
- Trade-offs explained

Cross-Endpoint Function Calls:
- Example showing GPU and CPU endpoints
- Manifest enables routing automatically

Output & Troubleshooting:
- Sample build output with progress indicators
- Common failure scenarios and solutions
- How to debug with --keep-build

Next Steps:
- Test locally with flash run
- Deploy to RunPod
- Monitor with flash undeploy list

Updated CLI README.md:
- Added flash build to command list in sequence
- Links to full flash-build.md documentation

* docs: Add build process and handler generation section to README

Add a new section explaining how the build system works and why the
factory pattern reduces code duplication.

New Section: Build Process and Handler Generation

Explains:

How Flash Builds Your Application (5-step pipeline):
1. Discovery: Scans code for @remote decorated functions
2. Grouping: Groups functions by resource_config
3. Handler Generation: Creates lightweight handler files
4. Manifest Creation: Generates flash_manifest.json for service discovery
5. Packaging: Bundles everything into archive.tar.gz

Handler Architecture (with code example):
- Shows generated handler using factory pattern
- Single source of truth: All handler logic in one place
- Easier maintenance: Bug fixes don't require rebuilding projects

Cross-Endpoint Function Calls:
- Example of GPU and CPU endpoints calling each other
- Manifest and runtime wrapper handle service discovery

Build Artifacts:
- .flash/.build/: Temporary build directory
- .flash/archive.tar.gz: Deployment package
- .flash/flash_manifest.json: Service configuration

Links to detailed documentation:
- docs/Runtime_Generic_Handler.md for architecture details
- src/tetra_rp/cli/docs/flash-build.md for CLI reference

This section bridges the main README and detailed documentation,
providing entry point for new users discovering the build system.

* feat(cli): Integrate build utilities into flash build command

Wire up the handler generator, manifest builder, and scanner into the
actual flash build command implementation.

Changes to build.py:

1. Integration:
   - Import RemoteDecoratorScanner for function discovery
   - Import ManifestBuilder for manifest creation
   - Import HandlerGenerator for handler file creation
   - Call these in sequence during the build process

2. Build Pipeline:
   - After copying project files, scan for @remote functions
   - Build manifest from discovered functions
   - Generate handler files for each resource config
   - Write manifest to build directory
   - Progress indicators show what's being generated

3. Fixes:
   - Change .tetra directory references to .flash
   - Uncomment actual build logic (was showing "Coming Soon" message)
   - Fix progress messages to show actual file counts

4. Error Handling:
   - Try/catch around handler generation
   - Warning shown if generation fails but build continues
   - User can debug with --keep-build flag

Build Flow Now:
1. Load ignore patterns
2. Collect project files
3. Create build directory
4. Copy files to build directory
5. [NEW] Scan for @remote functions
6. [NEW] Build and write manifest
7. [NEW] Generate handler files
8. Install dependencies
9. Create archive
10. Clean up build directory (unless --keep-build)

Dependencies:
- Updated uv.lock with all required dependencies

* refactor(build): Fix directory structure and add comprehensive error handling

**Critical Fixes:**
- Remove "Coming Soon" message blocking build command execution
- Fix build directory to use .flash/.build/ directly (no app_name subdirectory)
- Fix tarball to extract with flat structure using arcname="."
- Fix cleanup to remove correct build directory

**Error Handling & Validation:**
- Add specific exception handling (ImportError, SyntaxError, ValueError)
- Add import validation to generated handlers
- Add duplicate function name detection across resources
- Add proper error logging throughout build process

**Resource Type Tracking:**
- Add resource_type field to RemoteFunctionMetadata
- Track actual resource types (LiveServerless, CpuLiveServerless)
- Use actual types in manifest instead of hardcoding

**Robustness Improvements:**
- Add handler import validation post-generation
- Add manifest path fallback search (cwd, module dir, legacy location)
- Add resource name sanitization for safe filenames
- Add specific exception logging in scanner (UnicodeDecodeError, SyntaxError)

**User Experience:**
- Add troubleshooting section to README
- Update manifest path documentation in docs
- Change "Zero Runtime Penalty" to "Minimal Runtime Overhead"
- Mark future enhancements as "Not Yet Implemented"
- Improve build success message with next steps

Fixes all 20 issues identified in code review (issues #1-13, #19-22)

* feat(resources): Add LoadBalancerSlsResource for LB endpoints

Implement LoadBalancerSlsResource class for provisioning RunPod load-balanced
serverless endpoints. Load-balanced endpoints expose HTTP servers directly to
clients without queue-based processing, enabling REST APIs, webhooks, and
real-time communication patterns.

Key features:
- Type enforcement (always LB, never QB)
- Scaler validation (REQUEST_COUNT required, not QUEUE_DELAY)
- Health check polling via /ping endpoint (200/204 = healthy)
- Post-deployment verification with configurable retries
- Async and sync health check methods
- Comprehensive unit tests
- Full documentation with architecture diagrams and examples

Architecture:
- Extends ServerlessResource with LB-specific behavior
- Validates configuration before deployment
- Polls /ping endpoint until healthy (10 retries × 5s = 50s timeout)
- Raises TimeoutError if endpoint fails to become healthy

This forms the foundation for Mothership architecture where a load-balanced
endpoint serves as a directory server for child endpoints.

* fix(test): Fix LoadBalancerSlsResource deployment test mocks

Import ServerlessResource directly and use patch.object on the imported class
instead of string-based patches. This ensures the mocks properly intercept the
parent class's _do_deploy method when called via super(). Simplifies mock
configuration and removes an unused variable assertion.

Fixes the three failing deployment tests that were making real GraphQL API calls.
All tests now pass: 418 passed, 1 skipped.

* feat(resources): Phase 1 - Core infrastructure for @remote on LB endpoints

Implement core infrastructure for enabling @remote decorator on
LoadBalancerSlsResource endpoints with HTTP method/path routing.

Changes:
- Create LoadBalancerSlsStub: HTTP-based stub for direct endpoint execution
  (src/tetra_rp/stubs/load_balancer_sls.py, 170 lines)
  - Serializes functions and arguments using cloudpickle + base64
  - Direct HTTP POST to /execute endpoint (no queue polling)
  - Proper error handling and deserialization

- Register stub with singledispatch (src/tetra_rp/stubs/registry.py)
  - Enables @remote to dispatch to LoadBalancerSlsStub for LB resources

- Extend @remote decorator with HTTP routing parameters (src/tetra_rp/client.py)
  - Add 'method' parameter: GET, POST, PUT, DELETE, PATCH
  - Add 'path' parameter: /api/endpoint routes
  - Validate method/path required for LoadBalancerSlsResource
  - Store routing metadata on decorated functions/classes
  - Warn if routing params used with non-LB resources

Foundation for Phase 2 (Build system integration) and Phase 3 (Local dev).

* feat(build): Phase 2.1 - Enhanced scanner for HTTP routing extraction

Update RemoteDecoratorScanner to extract HTTP method and path from
@remote decorator for LoadBalancerSlsResource endpoints.

Changes:
- Add http_method and http_path fields to RemoteFunctionMetadata
- Add _extract_http_routing() method to parse decorator keywords
- Extract method (GET, POST, PUT, DELETE, PATCH) from decorator
- Extract path (/api/process) from decorator
- Store routing metadata for manifest generation

Foundation for Phase 2.2 (Manifest updates) and Phase 2.3 (Handler generation).

* feat(build): Phase 2.2 - Updated manifest schema for HTTP routing

Enhance ManifestBuilder to support HTTP method/path routing for
LoadBalancerSlsResource endpoints.

Changes:
- Add http_method and http_path fields to ManifestFunction
- Validate LB endpoints have both method and path
- Detect and prevent route conflicts (same method + path)
- Prevent use of reserved paths (/execute, /ping)
- Add 'routes' section to manifest for LB endpoints
- Conditional inclusion of routing fields (only for LB)

Manifest structure for LB endpoints now includes:
{
  "resources": {
    "api_service": {
      "resource_type": "LoadBalancerSlsResource",
      "functions": [
        {
          "name": "process_data",
          "http_method": "POST",
          "http_path": "/api/process"
        }
      ]
    }
  },
  "routes": {
    "api_service": {
      "POST /api/process": "process_data"
    }
  }
}

* feat(cli): Add LB handler generator for FastAPI app creation

Implement LBHandlerGenerator to create FastAPI applications for
LoadBalancerSlsResource endpoints with HTTP method/path routing.

Key features:
- Generates FastAPI apps with explicit route registry
- Creates (method, path) -> function mappings from manifest
- Validates route conflicts and reserved paths
- Imports user functions and creates dynamic routes
- Includes required /ping health check endpoint
- Validates generated handler Python syntax via import

Generated handler structure enables:
- Direct HTTP routing to user functions via FastAPI
- Framework /execute endpoint for @remote stub execution
- Local development with uvicorn

* feat(runtime): Implement LB handler factory for FastAPI app creation

Create create_lb_handler() factory function that dynamically builds FastAPI
applications from route registries for LoadBalancerSlsResource endpoints.

Key features:
- Accepts route_registry: Dict[(method, path)] -> handler_function mapping
- Registers all user-defined routes from registry to FastAPI app
- Provides /execute endpoint for @remote stub function execution
- Handles async function execution automatically
- Serializes results with cloudpickle + base64 encoding
- Comprehensive error handling with detailed logging

The /execute endpoint enables:
- Remote function code execution via @remote decorator
- Automatic argument deserialization from cloudpickle/base64
- Result serialization for transmission back to client
- Support for both sync and async functions

* feat(cli): Route build command to separate handlers for LB endpoints

Update build command to use appropriate handler generators based on
resource type. Separates LoadBalancerSlsResource endpoints (using FastAPI)
from queue-based endpoints (using generic handler).

Changes:
- Import LBHandlerGenerator alongside HandlerGenerator
- Inspect manifest resources and separate by type
- Generate LB handlers via LBHandlerGenerator
- Generate QB handlers via HandlerGenerator
- Combine all generated handler paths for summary

Enables users to mix LB and QB endpoints in same project with correct
code generation for each resource type.

* feat(resources): Add LiveLoadBalancer for local LB endpoint testing

Implement LiveLoadBalancer resource following the LiveServerless pattern
for local development and testing of load-balanced endpoints.

Changes:
- Add TETRA_LB_IMAGE constant for load-balanced Tetra image
- Create LiveLoadBalancer class extending LoadBalancerSlsResource
- Uses LiveServerlessMixin to lock imageName to Tetra LB image
- Register LiveLoadBalancer with LoadBalancerSlsStub in singledispatch
- Export LiveLoadBalancer from core.resources and top-level __init__

This enables users to test LB-based functions locally before deploying,
using the same pattern as LiveServerless for queue-based endpoints.

Users can now write:
  from tetra_rp import LiveLoadBalancer, remote

  api = LiveLoadBalancer(name="test-api")

  @remote(api, method="POST", path="/api/process")
  async def process_data(x, y):
      return {"result": x + y}

  result = await process_data(5, 3)  # Local execution

* test(stubs): Add comprehensive unit tests for LoadBalancerSlsStub

Implement unit tests for LoadBalancerSlsStub covering:
- Request preparation with arguments and dependencies
- Response handling for success and error cases
- Error handling for invalid responses
- Base64 encoding/decoding of serialized data
- Endpoint URL validation
- Timeout and HTTP error handling

Test coverage:
- _prepare_request: 4 tests
- _handle_response: 5 tests
- _execute_function: 3 error case tests
- __call__: 2 integration tests

Tests verify proper function serialization, argument handling,
error propagation, and response deserialization.

* fix(test): Correct LB endpoint test decorator to match assertions

Fix test_load_balancer_vs_queue_based_endpoints by updating the @remote
decorator to use method='POST' and path='/api/echo' to match the test
assertions. This was a test-level bug where the decorator definition
didn't match what was being asserted.

* docs: Add comprehensive documentation for @remote with LoadBalancer endpoints

- Using_Remote_With_LoadBalancer.md: User guide for HTTP routing, local development, building and deploying
- LoadBalancer_Runtime_Architecture.md: Technical details on deployment, request flows, security, and performance
- Updated README.md with LoadBalancer section and code example
- Updated Load_Balancer_Endpoints.md with cross-references to new guides

* security: Remove /execute from deployed LoadBalancer endpoints

Split @remote execution behavior between local and deployed:
- LiveLoadBalancer (local): Uses /execute endpoint for function serialization
- LoadBalancerSlsResource (deployed): Uses user-defined routes with HTTP param mapping

Changes:
1. LoadBalancerSlsStub routing detection:
   - _should_use_execute_endpoint() determines execution path
   - _execute_via_user_route() maps args to JSON and POSTs to user routes
   - Auto-detects resource type and routing metadata

2. Conditional /execute registration:
   - create_lb_handler() now accepts include_execute parameter
   - Generated handlers default to include_execute=False (security)
   - LiveLoadBalancer can enable /execute if needed

3. Updated handler generator:
   - Added clarity comments on /execute exclusion for deployed endpoints

4. Comprehensive test coverage:
   - 8 new tests for routing detection and execution paths
   - All 31 tests passing (22 unit + 9 integration)

5. Documentation updates:
   - Using_Remote_With_LoadBalancer.md: clarified /execute scope
   - Added 'Local vs Deployed Execution' section explaining differences
   - LoadBalancer_Runtime_Architecture.md: updated execution model
   - Added troubleshooting for deployed endpoint scenarios

Security improvement:
- Deployed endpoints only expose user-defined routes
- /execute endpoint removed from production (prevents arbitrary code execution)
- Lower attack surface for deployed endpoints

* feat(build): Phase 4 - Fix LiveLoadBalancer handler generation to include /execute endpoint

- Modified manifest.py to validate LiveLoadBalancer endpoints like LoadBalancerSlsResource
- Updated lb_handler_generator to:
  - Include LiveLoadBalancer in handler generation filter
  - Pass include_execute=True for LiveLoadBalancer (local dev)
  - Pass include_execute=False for LoadBalancerSlsResource (deployed)
- Added integration tests:
  - Verify LiveLoadBalancer handlers include /execute endpoint
  - Verify deployed handlers exclude /execute endpoint
- Fixes critical bug: LiveLoadBalancer now gets /execute endpoint in generated handlers

* fix(scanner): Discover LoadBalancer resources in addition to Serverless resources

- Updated scanner to extract LiveLoadBalancer and LoadBalancerSlsResource resources
- Previously only looked for 'Serverless' in class name, missing LoadBalancer endpoints
- Now checks for both 'Serverless' and 'LoadBalancer' in resource type names
- Added integration test to verify scanner discovers both resource types
- Fixes critical bug that prevented flash build from finding LoadBalancer endpoints

* chore: Format code for line length and remove unused imports

- Wrap long lines in manifest.py, lb_handler.py, and load_balancer_sls.py
- Remove unused httpx import in test_load_balancer_sls_stub.py
- Apply consistent formatting across codebase

* fix: Address PR #131 review feedback

- Scanner: Use exact type name matching instead of substring matching
  - Whitelist specific resource types to avoid false positives
  - Prevents matching classes like 'MyServerlessHelper' or 'LoadBalancerUtils'

- Type hints: Use Optional[str] for nullable fields in manifest
  - ManifestFunction.http_method and http_path now properly typed

- Timeout: Make HTTP client timeout configurable
  - Added LoadBalancerSlsStub.DEFAULT_TIMEOUT class attribute
  - Added timeout parameter to __init__
  - Updated both _execute_function and _execute_via_user_route to use self.timeout

- Deprecated datetime: Replace datetime.utcnow() with datetime.now(timezone.utc)
  - Updated manifest.py and test_lb_remote_execution.py
  - Ensures Python 3.12+ compatibility

* style: Format datetime chaining for line length

* fix: LiveLoadBalancer template not serialized to RunPod GraphQL

The set_serverless_template model_validator was being overwritten by sync_input_fields
(both had mode="after"). In Pydantic v2, when two validators with the same mode are
defined in a class, only one is registered.

This caused templates to never be created from imageName, resulting in:
  "GraphQL errors: One of templateId, template is required to create an endpoint"

Solution:
- Move set_serverless_template validator from ServerlessResource base class to subclasses
  (ServerlessEndpoint and LoadBalancerSlsResource) where the validation is actually needed
- Keep helper methods (_create_new_template, _configure_existing_template) in base class
  for reuse
- Add comprehensive tests for LiveLoadBalancer template serialization

This allows:
1. Base ServerlessResource to be instantiated freely for testing/configuration
2. Subclasses (ServerlessEndpoint, LoadBalancerSlsResource) to enforce template
   requirements during deployment
3. Proper template serialization in GraphQL payload for RunPod API

Fixes: One of templateId, template is required to create an endpoint error when
deploying LiveLoadBalancer with custom image tags like runpod/tetra-rp-lb:local

* fix: LoadBalancer endpoint URL and add CPU support

- Fix: Use correct endpoint URL format for load-balanced endpoints
  (https://{id}.api.runpod.ai instead of https://api.runpod.ai/v2/{id})
  This fixes 404 errors on /ping health check endpoints

- Feature: Add CPU LoadBalancer support
  * Create CpuLoadBalancerSlsResource for CPU-based load-balanced endpoints
  * Create CpuLiveLoadBalancer for local CPU LB development
  * Add TETRA_CPU_LB_IMAGE constant for CPU LB Docker image
  * Update example code to use CpuLiveLoadBalancer for CPU worker
  * Add 8 comprehensive tests for CPU LoadBalancer functionality

- Tests: Add 2 tests for endpoint URL format validation
- All 474 tests passing, 64% code coverage

* fix: Export CpuLiveLoadBalancer and CpuLoadBalancerSlsResource from tetra_rp package

LoadBalancer resources were not being discovered by ResourceDiscovery because
the new CPU variants (CpuLiveLoadBalancer, CpuLoadBalancerSlsResource) were
not exported from the main tetra_rp package. This prevented undeploy from
picking up these resources.

Added exports to:
- TYPE_CHECKING imports for type hints
- __getattr__ function for lazy loading
- __all__ list for public API

This fixes the issue where 'flash undeploy list' could not find LoadBalancer
resources that were deployed with 'flash run --auto-provision'.

* fix: Add API key authentication to LoadBalancer health check

The /ping endpoint for RunPod load-balanced endpoints requires the
RUNPOD_API_KEY header for authentication. Without it, the health check
fails with 401 Unauthorized, causing provisioning to timeout.

This fix adds the Authorization header to the health check request if
the RUNPOD_API_KEY environment variable is available, allowing the
endpoint health check to succeed during provisioning.

Fixes issue where 'flash run --auto-provision' would fail even though
the endpoint was successfully created on RunPod.

* fix(lb): Exclude flashboot from CpuLoadBalancerSlsResource GraphQL payload

CpuLoadBalancerSlsResource was overriding _input_only without including flashboot,
causing it to be sent to the RunPod GraphQL API which doesn't accept this field.
This caused deployment to fail with: Field "flashboot" is not defined by type "EndpointInput".

* fix(lb): Expand CpuInstanceType.ANY to all CPU flavors in CpuLoadBalancerSlsResource

Add field_validator to expand [CpuInstanceType.ANY] to all available CPU instance
types (cpu3g, cpu3c, cpu5c variants). This matches the behavior in CpuServerlessEndpoint
and prevents deployment errors like 'instanceId must be in the format of flavorId-vcpu-ram'.

* refactor(cpu): Move instanceIds validator to CpuEndpointMixin

Move the instanceIds field_validator from CpuServerlessEndpoint to CpuEndpointMixin
so both CpuServerlessEndpoint and CpuLoadBalancerSlsResource share the same validator
that expands [CpuInstanceType.ANY] to all available CPU flavors. This eliminates
code duplication and ensures consistent behavior across all CPU endpoint types.

* test: Update CPU instance test to reflect validator expansion

Update test_cpu_live_load_balancer_defaults_to_cpu_any to verify that
[CpuInstanceType.ANY] is correctly expanded to all available CPU instance types
by the field_validator in CpuEndpointMixin.

* fix(lb): Increase health check timeout from 5s to 15s

Load-balanced workers need more time to respond during cold starts and initialization.
RunPod docs recommend at least 10-15 second timeouts for health checks. Workers may
return 204 during initialization, which is normal and expected.

* fix(lb): Fix CPU load balancer template deployment error

Fixes two bugs in CpuLoadBalancerSlsResource that prevented CPU load balancers from deploying:

1. Added gpuCount and allowedCudaVersions to _input_only exclusion set to prevent
   GPU-specific fields from being sent to RunPod API
2. Overrode set_serverless_template() to call _sync_cpu_fields() first, ensuring
   GPU defaults are overridden to CPU-appropriate values (gpuCount=0)

The RunPod API was rejecting CPU load balancer templates because GPU-specific
fields were being included in the GraphQL payload. These changes align
CpuLoadBalancerSlsResource behavior with CpuServerlessEndpoint.

Also added comprehensive test coverage (30+ tests) to verify:
- GPU fields are correctly overridden to CPU defaults
- GPU fields are excluded from API payloads
- CPU-specific fields are properly included
- Consistency with CpuServerlessEndpoint behavior

* fix(drift): Exclude runtime fields from config hash to prevent false positives

Fixes false positive configuration drift detection by separating concerns:

1. Update ServerlessResource.config_hash to exclude runtime fields
   - Fields like template, templateId, aiKey, userId are API-assigned
   - Prevents false drift when same config is redeployed across processes
   - Now only hashes user-specified configuration

2. Add config_hash override to CpuLoadBalancerSlsResource
   - CPU load balancers hash only CPU-relevant fields
   - Excludes GPU-specific fields and runtime fields
   - Follows same pattern as CpuServerlessEndpoint

3. Fix _has_structural_changes to exclude template/templateId
   - CRITICAL: These runtime fields were causing false structural changes
   - Was forcing unnecessary redeployments despite update() being available
   - Now system correctly uses update() instead of undeploy+deploy

4. Make field serializers robust to handle string/enum values
   - Prevents serialization errors when fields are pre-converted to strings

5. Add comprehensive drift detection tests (16 tests)
   - Test hash stability with runtime field changes
   - Test exclusion of env, template, templateId, and other runtime fields
   - Test that actual config changes (image, flashboot) are detected
   - Test structural change detection behavior
   - Test real-world deployment scenarios

Results:
- Same config deployed multiple times: no false drift
- Different env vars with same config: no false drift
- Template/templateId changes: no false drift
- API-assigned fields: no false drift
- User config changes (image, flashboot): drift detected correctly
- All 512 unit tests pass

* fix(http): Standardize RunPod HTTP client authentication across codebase

Centralizes HTTP client creation for RunPod load-balanced endpoints to prevent
manual Authorization header code duplication and ensure consistent authentication:

1. Create centralized HTTP utility function (src/tetra_rp/core/utils/http.py)
   - New function: get_authenticated_httpx_client()
   - Automatically adds Bearer token Authorization header if RUNPOD_API_KEY set
   - Provides consistent timeout handling (default 30s, customizable)
   - Follows existing GraphQL/REST client authentication pattern

2. Fix critical authentication bug in LoadBalancerSlsStub._execute_via_user_route()
   - Previously: Missing Authorization header (401 errors on user routes)
   - Now: Uses centralized utility for proper authentication
   - Enables direct HTTP calls to user-defined routes with auth

3. Refactor two methods to use centralized utility
   - LoadBalancerSlsStub._execute_function() - removes 7+ lines of manual auth code
   - LoadBalancerSlsResource._check_ping_endpoint() - simplifies auth setup

4. Add comprehensive unit tests (tests/unit/core/utils/test_http.py)
   - Tests API key presence/absence handling
   - Tests custom and default timeout configuration
   - Tests edge cases (empty key, zero timeout)
   - All 7 tests pass with 100% coverage

Results:
- Single source of truth for HTTP authentication (centralized utility)
- Fixes 401 Unauthorized errors on load-balanced endpoints
- Eliminates repetitive manual auth code across 3+ locations
- Easier to maintain and update authentication patterns in future
- All 499 unit tests pass
- Code coverage: 64% (exceeds 35% requirement)

* feat(http): Extend HTTP utilities to cover both sync and async authentication

Extends the centralized HTTP authentication pattern to all RunPod API calls:

1. Add get_authenticated_requests_session() for synchronous requests
   - Creates requests.Session with automatic Bearer token Authorization header
   - Follows same pattern as async get_authenticated_httpx_client()
   - Single source of truth for sync HTTP authentication

2. Refactor template.py to use centralized utility
   - Removes manual Authorization header setup (line 86)
   - Now uses get_authenticated_requests_session() for all template updates
   - Improves error handling with raise_for_status()
   - Token parameter marked deprecated; uses RUNPOD_API_KEY env var

3. Add comprehensive tests for sync utility (4 tests)
   - Tests API key presence/absence handling
   - Tests empty API key edge case
   - Tests Session object validation
   - All tests pass with proper cleanup

Benefits:
- True single source of truth for all RunPod HTTP authentication (sync + async)
- Consistent patterns across entire codebase
- Easier future auth changes across all HTTP client types
- Eliminates manual auth header code in template.py
- All 503 unit tests pass
- Code coverage: 64% (exceeds 35% requirement)

Note: requests.Session doesn't support default timeouts; timeout should be
specified per request (e.g., session.post(url, json=data, timeout=30.0))

* fix: Address PR feedback on HTTP utilities implementation

Addresses three feedback items from code review:

1. Fix breaking parameter order change in update_system_dependencies()
   - Restored original parameter order: template_id, token, system_dependencies, base_entry_cmd
   - Maintains backward compatibility with existing callers
   - Token parameter now optional (default None)

2. Add proper deprecation warning for token parameter
   - Issues DeprecationWarning when token parameter is used
   - Clearly communicates migration to RUNPOD_API_KEY environment variable
   - Follows Python deprecation best practices (warnings.warn with stacklevel=2)

3. Standardize test mocking approach across all health check tests
   - All tests now use consistent 'tetra_rp.core.utils.http.httpx.AsyncClient' patching
   - Removed inconsistent 'side_effect=lambda' pattern
   - Improved test maintainability by using same strategy everywhere

All 503 tests pass with consistent, clean implementation.

* refactor(drift): Extract runtime field constants and improve maintainability

- Extract RUNTIME_FIELDS and EXCLUDED_HASH_FIELDS as ClassVar constants in ServerlessResource for centralized field list management
- Add clarifying comments to enum serializers explaining defensive isinstance() checks for nested model serialization
- Document CPU load balancer field list coupling in docstring with maintenance guidelines
- Add TestSerializerDefensiveBehavior class with 4 tests verifying pre-stringified enum value handling
- Use ClassVar annotation to satisfy Pydantic v2 model field requirements

This reduces maintenance burden by centralizing field definitions and improves code clarity without changing functionality.

* docs: Improve LoadBalancer documentation accuracy and completeness

- Fix health check timeout: Add clarification that timeout is 15 seconds per check
- Add HTTP authentication details explaining RUNPOD_API_KEY header injection
- Document stub decision logic for incomplete routing metadata (fallback behavior)
- Clarify function signature inspection with concrete example showing parameter mapping
- Expand /execute security explanation with explicit threats and best practices
- Add detailed parameter type constraints for deployed endpoints (supported vs unsupported)
- Add troubleshooting guide for missing routing metadata (404 errors)
- Strengthen security warnings about never exposing /execute in production

All documentation now matches actual implementation verified through codebase analysis.

* docs: add resource config drift detection documentation

- comprehensive guide on drift detection implementation
- covers hash computation, field exclusion, and cpu-specific behavior
- includes testing patterns and troubleshooting guide
- documents all fields that trigger drift vs those ignored

* docs: proper name for the file

* test(build): Add comprehensive test coverage for scanner and handler improvements

- Add 6 new scanner tests for directory filtering (.venv, .flash, .runpod exclusion)
- Add test for resource type validation to prevent false positives
- Add test for fallback behavior when resource name extraction fails
- Add test for handling resource names with special characters
- Update existing tests to reflect new dynamic import format and resource name extraction

These tests guarantee that improvements to the scanner (resource type validation,
directory filtering, fallback behavior) and handler generator (dynamic imports for
invalid Python identifiers) won't regress in future changes.

* test(scanner): Fix resource type assertions to match scanner behavior

The scanner now extracts resource names from the name= parameter rather
than using variable names. Update test assertions to expect the actual
resource names ('test-api', 'deployed-api') instead of variable names.

* chore: merge correction

* fix(drift): Remove manual undeploy/deploy from update() method

Use saveEndpoint mutation for all changes instead of manual lifecycle
management. Server-side automatically detects version-triggering fields
(GPU, template, volumes) and increments endpoint version accordingly.

Keep _has_structural_changes() as informational for logging purposes only.
This aligns with RunPod API's version-based deployment model.

* docs(drift): Clarify _has_structural_changes detects version-triggering changes

Update docstring to reflect that this method identifies changes that
trigger server-side version increment and worker recreation, not manual
redeploy cycles. Explain which changes are version-triggering vs rolling
updates, and note that the method is now informational for logging only.

* feat(drift): Enable environment variable drift detection

Remove env from EXCLUDED_HASH_FIELDS so changes to environment variables
trigger drift detection and endpoint updates. Environment changes are
non-version-triggering (rolling updates), so server will apply them via
saveEndpoint without recreating workers.

Add env to CPU LoadBalancer config_hash for consistent behavior across
all resource types. Update comments to reflect that env is user-specified
configuration, not dynamically computed.

* test(drift): Update tests for environment variable drift detection

- test_lb_config_hash_excludes_env_variables → test_lb_config_hash_detects_env_changes
- test_env_var_changes_no_drift → test_env_var_changes_trigger_drift
- test_config_hash_excludes_env_from_drift → test_config_hash_detects_env_from_drift

Update assertions to expect different hashes when env changes, matching
new behavior where environment variable changes trigger drift and updates.

* fix: Address Copilot review feedback on type hints and documentation

- Fix type annotation for timeout parameter in LoadBalancerSlsStub (Optional[float])
- Replace hardcoded "30s" with actual self.timeout in error messages (2 locations)
- Update Resource_Config_Drift_Detection.md to reflect actual EXCLUDED_HASH_FIELDS
- Remove duplicate Load-Balanced Endpoints section from README.md

Addresses Copilot review comments (PR #132, review 3642596664)

* feat(mothership): implement auto-provisioning with State Manager reconciliation

Implement Linear ticket AE-1660: Mothership auto-provisioning from manifest.

Changes:
- Create StateManagerClient for persisting/querying manifests via HTTP
- Create MothershipProvisioner with manifest reconciliation logic
- Add lifespan context manager to LB handler for startup/shutdown hooks
- Implement /manifest endpoint for service discovery
- Set FLASH_IS_MOTHERSHIP env var on LoadBalancerSlsResource deployment
- Add 39 unit tests for mothership provisioner functions
- Add 7 integration tests for end-to-end provisioning flows
- Update documentation with auto-provisioning architecture and usage

Features:
- Automatic detection of new/changed/removed resources via config hashing
- Background provisioning (non-blocking) with asyncio.create_task()
- Idempotent deployments - unchanged resources skipped on subsequent boots
- State Manager integration for manifest persistence across reboots
- Graceful error handling - provisioning errors don't block mothership startup
- Automatic environment variable propagation (FLASH_MOTHERSHIP_URL)
- Reconciliation with delete support - removes resources no longer in manifest
- Fast startup - /manifest endpoint available immediately with partial results

Test Results:
- 651 tests passing (39 new unit + 7 new integration tests)
- 65.69% code coverage (exceeds 35% requirement)
- All quality checks pass (format, lint, type check, tests)

* docs: fix Cross_Endpoint_Routing terminology (Directory → Manifest)

Update documentation to consistently use 'Manifest' instead of 'Directory':
- Replace DirectoryClient references with StateManagerClient (actual implementation)
- Update architecture diagram to reference /manifest endpoint instead of DirectoryClient
- Fix ServiceRegistry code examples to use /manifest endpoint
- Update extension point for custom directory backends
- Fix testing section to reference actual test files (MothershipProvisioner, StateManagerClient)
- Update debugging section with /manifest endpoint examples
- Clarify that directory is loaded from mothership /manifest endpoint

These changes ensure documentation matches the actual AE-1660 implementation.

* fix: correct endpoint and exception references (Directory → Manifest)

Critical fix: Update ManifestClient to query /manifest endpoint instead of /directory

Changes:
- Fix ManifestClient.get_directory() to query /manifest endpoint (not /directory)
- Update ManifestClient docstring: 'manifest directory service' → '/manifest endpoint'
- Fix DirectoryUnavailableError → ManifestServiceUnavailableError in docs
- Update example URLs from 'api.runpod.io' to actual LB endpoint format
- Clarify in docstrings that this queries the mothership's /manifest endpoint

This bug would have caused runtime failures when querying the mothership directory,
as the actual endpoint served by lb_handler_generator.py is /manifest, not /directory.

* feat(runtime): Migrate from URL to ID-based mothership identification

Changes FLASH_MOTHERSHIP_URL to FLASH_MOTHERSHIP_ID for cleaner
environment configuration. Child endpoints now use FLASH_RESOURCE_NAME
to identify which resource config they represent in the manifest.

Changes:
- ManifestClient: Construct URL from FLASH_MOTHERSHIP_ID instead of full URL
- ServiceRegistry: Use FLASH_RESOURCE_NAME with fallback to RUNPOD_ENDPOINT_ID
- Add tomli dependency for Python <3.11 pyproject.toml parsing (needed for build.py)

Benefits:
- Simpler environment configuration (ID instead of full URL)
- Clear distinction between mothership (RUNPOD_ENDPOINT_ID) and children (FLASH_RESOURCE_NAME)
- Consistent URL construction pattern

Files modified:
- src/tetra_rp/runtime/manifest_client.py
- src/tetra_rp/runtime/service_registry.py
- pyproject.toml
- uv.lock

* feat(provisioner): Support all resource types and add cache validation

Removes LoadBalancer resource filtering to enable multi-tier
architectures. Adds cache validation to prevent stale resources
from being deployed after codebase refactoring.

Provisioning Changes:
- Remove LoadBalancer filtering in reconcile_manifests()
- Support CpuLiveLoadBalancer, LiveLoadBalancer, LoadBalancerSlsResource
- Add filter_resources_by_manifest() to validate cached resources against manifest
- Add test-mothership mode with "tmp-" prefix for temporary test endpoints
- Change env vars: FLASH_MOTHERSHIP_URL -> FLASH_MOTHERSHIP_ID

Resource Manager Changes:
- Track all created resources (deployed = has ID) regardless of health status
- Cache resources even if deployment completes with errors
- Ensures cleanup capability for all created resources

Cache Validation:
- Prevents stale resources from old codebase versions being redeployed
- Validates: resource name exists in manifest + type matches
- Logs removed stale entries for visibility

Benefits:
- Multi-tier load balancing architectures now supported
- No orphaned resources from refactored code
- Better resource lifecycle management
- Reliable cleanup of all created resources

Files modified:
- src/tetra_rp/runtime/mothership_provisioner.py
- src/tetra_rp/core/resources/resource_manager.py

* feat(build): Add local tetra_rp bundling and manifest endpoint improvements

Enables bundling local tetra_rp source into builds for development and
testing. Updates LB handler to serve authoritative manifest from State Manager.

Build System Changes:
- Add _find_local_tetra_rp() to detect development installations
- Add _bundle_local_tetra_rp() to copy source into build directory
- Add _extract_tetra_rp_dependencies() to parse pyproject.toml for deps
- Add _remove_tetra_from_requirements() to clean up after bundling
- Skip bundling for PyPI installations (site-packages)

LB Handler Changes:
- Store StateManagerClient in module-level state for /manifest endpoint
- Update /manifest endpoint to fetch from State Manager (single source of truth)
- Add proper error handling for uninitialized state client
- Restrict /manifest endpoint to mothership only (403 for children)
- Improve provisioning startup logging for clarity

Benefits:
- Test-mothership can use local tetra_rp changes without publishing
- Manifest endpoint serves complete authoritative state
- Child endpoints get consistent configuration from single source
- Better development workflow for framework changes

Files modified:
- src/tetra_rp/cli/commands/build.py
- src/tetra_rp/cli/commands/build_utils/lb_handler_generator.py

* feat(cli): Add undeploy force flag and improve discovery logging

Adds --force flag to undeploy for non-interactive cleanup (needed by
test-mothership). Improves resource discovery visibility with debug logging.

Undeploy Changes:
- Add --force/-f flag to skip confirmation prompts
- Update _undeploy_by_name(), _undeploy_all(), _interactive_undeploy() to support skip_confirm
- Enables automated cleanup in CI/CD and test-mothership shutdown

Discovery Changes:
- Add detailed logging at each discovery phase (entry point, static imports, directory scan)
- Log discovered resource names and types for debugging
- Exclude .flash/ directory from project scanning (build artifacts)

Run Command Changes:
- Add resource discovery debug output showing found resources
- Display resource names and types before server startup

CLI Main Changes:
- Register test-mothership command (note: implementation was in commit 1)

Benefits:
- Test-mothership can cleanup automatically without user interaction
- Better visibility into resource discovery process
- Easier debugging of discovery issues
- Clean separation of interactive vs automated workflows

Files modified:
- src/tetra_rp/cli/commands/undeploy.py
- src/tetra_rp/cli/commands/run.py
- src/tetra_rp/core/discovery.py
- src/tetra_rp/cli/main.py

* test: Update tests for new provisioning behavior and ID-based config

Updates all tests to reflect LoadBalancer provisioning, FLASH_RESOURCE_NAME
usage, and removal of obsolete test cases.

Mothership Provisioner Tests:
- Update tests to expect LoadBalancer resources in provisioning (not skipped)
- Fix create_resource_from_manifest tests to use RUNPOD_ENDPOINT_ID env var
- Update UnsupportedResourceType test (LoadBalancer now supported)
- Remove obsolete get_manifest_directory() tests (function removed)

Service Registry Tests:
- Update all tests to use FLASH_RESOURCE_NAME instead of RUNPOD_ENDPOINT_ID
- Add test for FLASH_RESOURCE_NAME priority with RUNPOD_ENDPOINT_ID fallback
- Update test names to reflect new behavior

Integration Tests:
- Update test_provision_children_skips_load_balancer_resources to
  test_provision_children_deploys_load_balancer_resources
- Fix assertions to expect 2 deployments (LoadBalancer + worker)
- Remove obsolete test_manifest_directory_endpoint_after_provisioning

Manifest Client Tests:
- Update initialization tests for FLASH_MOTHERSHIP_ID usage
- Update error message expectations

Test Rationale:
- LoadBalancer provisioning enables multi-tier architectures
- FLASH_RESOURCE_NAME provides clearer child endpoint identification
- Removed tests for deleted functionality (get_manifest_directory)

Files modified:
- tests/unit/runtime/test_mothership_provisioner.py
- tests/unit/runtime/test_service_registry.py
- tests/integration/test_mothership_provisioning.py
- tests/unit/runtime/test_manifest_client.py

* fix(build): Use importlib for LB handler imports to support numeric directories

Changes:
- Modified LBHandlerGenerator to use importlib pattern instead of from imports
- Aligns LB handlers with QB handler pattern for consistency
- Fixes SyntaxError when building projects with numeric directory names (e.g., 03_advanced_workers)
- Added boolean flags (is_load_balanced, is_live_resource) to replace string comparisons
- Added test coverage for numeric module paths

The bug occurred because Python identifiers cannot start with digits, but
importlib treats module paths as strings, allowing any valid filesystem path.

* feat(build): Store config variable names in manifest for test-mothership

Changes:
- Scanner now tracks config variable names (e.g., "gpu_config") at scan time
- Manifest includes config_variable field for each resource and function
- test-mothership uses config_variable from manifest for reliable discovery
- Added backward compatibility fallback to old search logic

Fixes "No config variable found" warnings when resource names differ from
variable names (e.g., resource "03_05_load_balancer_gpu" with variable "gpu_config").

This enables test-mothership to correctly discover and provision all resources
including load balancer endpoints, resolving health check failures.

* fix: Address PR review comments for security and error handling

Changes:
- Replace MD5 with SHA-256 for config hash computation (security best practice)
- Add error callback to background provisioning task for proper exception handling
- Update tests to expect SHA-256 hash length (64 chars instead of 32)

Addresses Copilot review comments:
- mothership_provisioner.py:113 - Use SHA-256 instead of cryptographically broken MD5
- lb_handler_generator.py:81 - Track background task and add error callback
deanq added a commit that referenced this pull request Jan 22, 2026
* feat(runtime): Add generic handler factory for serverless execution

Implement a factory function that creates RunPod serverless handlers,
eliminating code duplication across generated handler files.

The generic_handler module provides:
- create_handler(function_registry) factory that accepts a dict of
  function/class objects and returns a RunPod-compatible handler
- Automatic serialization/deserialization using cloudpickle + base64
- Support for both function execution and class instantiation + method calls
- Structured error responses with full tracebacks for debugging
- Load manifest for cross-endpoint function discovery

This design centralizes all handler logic in one place, making it easy to:
- Fix bugs once, benefit all handlers
- Add new features without regenerating projects
- Keep deployment packages small (handler files are ~23 lines each)

Implementation:
- deserialize_arguments(): Base64 + cloudpickle decoding
- serialize_result(): Cloudpickle + base64 encoding
- execute_function(): Handles function vs. class execution
- load_manifest(): Loads flash_manifest.json for service discovery

* feat(cli): Add handler generator, manifest builder, and scanner for build process

Implement the build pipeline components that work together to generate
serverless handlers from @Remote decorated functions.

Three core components:

1. RemoteDecoratorScanner (scanner.py)
   - Uses Python AST to discover all @Remote decorated functions
   - Extracts function metadata: name, module, async status, is_class
   - Groups functions by resource_config for handler generation
   - Handles edge cases like decorated classes and async functions

2. ManifestBuilder (manifest.py)
   - Groups functions by their resource_config
   - Creates flash_manifest.json structure for service discovery
   - Maps functions to their modules and handler files
   - Enables cross-endpoint function routing at runtime

3. HandlerGenerator (handler_generator.py)
   - Creates lightweight handler_*.py files for each resource config
   - Each handler imports functions and registers them in FUNCTION_REGISTRY
   - Handler delegates to create_handler() factory from generic_handler
   - Generated handlers are ~23 lines (vs ~98 with duplication)

Build Pipeline Flow:
1. Scanner discovers @Remote functions
2. ManifestBuilder groups them by resource_config
3. HandlerGenerator creates handler_*.py for each group
4. All files + manifest bundled into archive.tar.gz

This eliminates ~95% duplication across handlers by using the factory pattern
instead of template-based generation.

* test(runtime): Add comprehensive tests for generic handler

Implement 19 unit tests covering all major paths through the generic_handler
factory and its helper functions.

Test Coverage:

Serialization/Deserialization (7 tests):
- serialize_result() with simple values, dicts, lists
- deserialize_arguments() with empty, args-only, kwargs-only, mixed inputs
- Round-trip encoding/decoding of cloudpickle + base64

Function Execution (4 tests):
- Simple function execution with positional and keyword arguments
- Keyword argument handling
- Class instantiation and method calls
- Argument passing to instance methods

Handler Factory (8 tests):
- create_handler() returns callable RunPod handler
- Handler with simple function registry
- Missing function error handling (returns error response, not exception)
- Function exceptions caught with traceback included
- Multiple functions in single registry
- Complex Python objects (classes, lambdas, closures)
- Empty registry edge case
- Default execution_type parameter
- None return values
- Correct RunPod response format (success, result/error, traceback)

Test Strategy:
- Arrange-Act-Assert pattern for clarity
- Isolated unit tests (no external dependencies)
- Tests verify behavior, not implementation
- Error cases tested for proper error handling
- All serialization tested for round-trip correctness

All tests passing, 83% coverage on generic_handler.py

* test(cli): Add tests for handler generation, manifest building, and scanning

Implement integration tests validating the build pipeline components work
correctly together.

Test Coverage:

HandlerGenerator Tests:
- Handler files created with correct names (handler_<resource_name>.py)
- Generated files import required functions from workers
- FUNCTION_REGISTRY properly formatted
- create_handler() imported from generic_handler
- Handler creation via factory
- RunPod start call present and correct
- Multiple handlers generated for multiple resource configs

ManifestBuilder Tests:
- Manifest structure with correct version and metadata
- Resources grouped by resource_config
- Handler file paths correct
- Function metadata preserved (name, module, is_async, is_class)
- Function registry mapping complete

ScannerTests:
- @Remote decorated functions discovered via AST
- Function metadata extracted correctly
- Module paths resolved properly
- Async functions detected
- Class methods detected
- Edge cases handled (multiple decorators, nested classes)

Test Strategy:
- Integration tests verify components work together
- Tests verify generated files are syntactically correct
- Tests validate data structures match expected schemas
- No external dependencies in build process

Validates that the entire build pipeline:
1. Discovers functions correctly
2. Groups them appropriately
3. Generates valid Python handler files
4. Creates correct manifest structure

* docs(runtime): Document generic handler factory architecture

Add comprehensive architecture documentation explaining why the factory
pattern was chosen and how it works.

Documentation includes:

Overview & Context:
- Problem statement: Handler files had 95% duplication
- Design decision: Use factory function instead of templates
- Benefits: Single source of truth, easier maintenance, consistency

Architecture Diagrams (MermaidJS):
- High-level flow: @Remote functions → Scanner → Manifest → Handlers → Factory
- Component relationships: HandlerGenerator, GeneratedHandler, generic_handler
- Function registry pattern: Discovery → Grouping → Registration → Factory

Implementation Details:
- create_handler(function_registry) signature and behavior
- deserialize_arguments(): Base64 + cloudpickle decoding
- serialize_result(): Cloudpickle + base64 encoding
- execute_function(): Function vs. class execution
- load_manifest(): Service discovery via flash_manifest.json

Design Decisions (with rationale):
- Factory Pattern over Inheritance: Simpler, less coupling, easier to test
- CloudPickle + Base64: Handles arbitrary objects, safe JSON transmission
- Manifest in Generic Handler: Runtime service discovery requirement
- Structured Error Responses: Debugging aid, functional error handling
- Both Execution Types: Supports stateful classes and pure functions

Usage Examples:
- Simple function handler
- Class execution with methods
- Multiple functions in one handler

Build Process Integration:
- 4-phase pipeline: Scanner → Grouping → Generation → Packaging
- Manifest structure and contents
- Generated handler structure (~23 lines)

Testing Strategy:
- 19 unit tests covering all major paths
- 7 integration tests verifying handler generation
- Manual testing with example applications

Performance:
- Zero runtime penalty (factory called once at startup)
- No additional indirection in request path

* docs(cli): Add flash build command documentation

Document the flash build command and update CLI README to include it.

New Documentation:

flash-build.md includes:

Usage & Options:
- Command syntax: flash build [OPTIONS]
- --no-deps: Skip transitive dependencies (faster, smaller archives)
- --keep-build: Keep build directory for inspection/debugging
- --output, -o: Custom archive name (default: archive.tar.gz)

What It Does (5-step process):
1. Discovery: Scan for @Remote decorated functions
2. Grouping: Group functions by resource_config
3. Handler Generation: Create lightweight handler files
4. Manifest Creation: Generate flash_manifest.json
5. Packaging: Create archive.tar.gz for deployment

Build Artifacts:
- .flash/archive.tar.gz: Deployment package (ready for RunPod)
- .flash/flash_manifest.json: Service discovery configuration
- .flash/.build/: Temporary build directory

Handler Generation:
- Explains factory pattern and minimal handler files
- Links to Runtime_Generic_Handler.md for details

Dependency Management:
- Default behavior: Install all dependencies including transitive
- --no-deps: Only direct dependencies (when base image has transitive)
- Trade-offs explained

Cross-Endpoint Function Calls:
- Example showing GPU and CPU endpoints
- Manifest enables routing automatically

Output & Troubleshooting:
- Sample build output with progress indicators
- Common failure scenarios and solutions
- How to debug with --keep-build

Next Steps:
- Test locally with flash run
- Deploy to RunPod
- Monitor with flash undeploy list

Updated CLI README.md:
- Added flash build to command list in sequence
- Links to full flash-build.md documentation

* docs: Add build process and handler generation section to README

Add a new section explaining how the build system works and why the
factory pattern reduces code duplication.

New Section: Build Process and Handler Generation

Explains:

How Flash Builds Your Application (5-step pipeline):
1. Discovery: Scans code for @Remote decorated functions
2. Grouping: Groups functions by resource_config
3. Handler Generation: Creates lightweight handler files
4. Manifest Creation: Generates flash_manifest.json for service discovery
5. Packaging: Bundles everything into archive.tar.gz

Handler Architecture (with code example):
- Shows generated handler using factory pattern
- Single source of truth: All handler logic in one place
- Easier maintenance: Bug fixes don't require rebuilding projects

Cross-Endpoint Function Calls:
- Example of GPU and CPU endpoints calling each other
- Manifest and runtime wrapper handle service discovery

Build Artifacts:
- .flash/.build/: Temporary build directory
- .flash/archive.tar.gz: Deployment package
- .flash/flash_manifest.json: Service configuration

Links to detailed documentation:
- docs/Runtime_Generic_Handler.md for architecture details
- src/tetra_rp/cli/docs/flash-build.md for CLI reference

This section bridges the main README and detailed documentation,
providing entry point for new users discovering the build system.

* feat(cli): Integrate build utilities into flash build command

Wire up the handler generator, manifest builder, and scanner into the
actual flash build command implementation.

Changes to build.py:

1. Integration:
   - Import RemoteDecoratorScanner for function discovery
   - Import ManifestBuilder for manifest creation
   - Import HandlerGenerator for handler file creation
   - Call these in sequence during the build process

2. Build Pipeline:
   - After copying project files, scan for @Remote functions
   - Build manifest from discovered functions
   - Generate handler files for each resource config
   - Write manifest to build directory
   - Progress indicators show what's being generated

3. Fixes:
   - Change .tetra directory references to .flash
   - Uncomment actual build logic (was showing "Coming Soon" message)
   - Fix progress messages to show actual file counts

4. Error Handling:
   - Try/catch around handler generation
   - Warning shown if generation fails but build continues
   - User can debug with --keep-build flag

Build Flow Now:
1. Load ignore patterns
2. Collect project files
3. Create build directory
4. Copy files to build directory
5. [NEW] Scan for @Remote functions
6. [NEW] Build and write manifest
7. [NEW] Generate handler files
8. Install dependencies
9. Create archive
10. Clean up build directory (unless --keep-build)

Dependencies:
- Updated uv.lock with all required dependencies

* refactor(build): Fix directory structure and add comprehensive error handling

**Critical Fixes:**
- Remove "Coming Soon" message blocking build command execution
- Fix build directory to use .flash/.build/ directly (no app_name subdirectory)
- Fix tarball to extract with flat structure using arcname="."
- Fix cleanup to remove correct build directory

**Error Handling & Validation:**
- Add specific exception handling (ImportError, SyntaxError, ValueError)
- Add import validation to generated handlers
- Add duplicate function name detection across resources
- Add proper error logging throughout build process

**Resource Type Tracking:**
- Add resource_type field to RemoteFunctionMetadata
- Track actual resource types (LiveServerless, CpuLiveServerless)
- Use actual types in manifest instead of hardcoding

**Robustness Improvements:**
- Add handler import validation post-generation
- Add manifest path fallback search (cwd, module dir, legacy location)
- Add resource name sanitization for safe filenames
- Add specific exception logging in scanner (UnicodeDecodeError, SyntaxError)

**User Experience:**
- Add troubleshooting section to README
- Update manifest path documentation in docs
- Change "Zero Runtime Penalty" to "Minimal Runtime Overhead"
- Mark future enhancements as "Not Yet Implemented"
- Improve build success message with next steps

Fixes all 20 issues identified in code review (issues #1-13, #19-22)

* feat(resources): Add LoadBalancerSlsResource for LB endpoints

Implement LoadBalancerSlsResource class for provisioning RunPod load-balanced
serverless endpoints. Load-balanced endpoints expose HTTP servers directly to
clients without queue-based processing, enabling REST APIs, webhooks, and
real-time communication patterns.

Key features:
- Type enforcement (always LB, never QB)
- Scaler validation (REQUEST_COUNT required, not QUEUE_DELAY)
- Health check polling via /ping endpoint (200/204 = healthy)
- Post-deployment verification with configurable retries
- Async and sync health check methods
- Comprehensive unit tests
- Full documentation with architecture diagrams and examples

Architecture:
- Extends ServerlessResource with LB-specific behavior
- Validates configuration before deployment
- Polls /ping endpoint until healthy (10 retries × 5s = 50s timeout)
- Raises TimeoutError if endpoint fails to become healthy

This forms the foundation for Mothership architecture where a load-balanced
endpoint serves as a directory server for child endpoints.

* fix(test): Fix LoadBalancerSlsResource deployment test mocks

Import ServerlessResource directly and use patch.object on the imported class
instead of string-based patches. This ensures the mocks properly intercept the
parent class's _do_deploy method when called via super(). Simplifies mock
configuration and removes an unused variable assertion.

Fixes the three failing deployment tests that were making real GraphQL API calls.
All tests now pass: 418 passed, 1 skipped.

* feat(resources): Phase 1 - Core infrastructure for @Remote on LB endpoints

Implement core infrastructure for enabling @Remote decorator on
LoadBalancerSlsResource endpoints with HTTP method/path routing.

Changes:
- Create LoadBalancerSlsStub: HTTP-based stub for direct endpoint execution
  (src/tetra_rp/stubs/load_balancer_sls.py, 170 lines)
  - Serializes functions and arguments using cloudpickle + base64
  - Direct HTTP POST to /execute endpoint (no queue polling)
  - Proper error handling and deserialization

- Register stub with singledispatch (src/tetra_rp/stubs/registry.py)
  - Enables @Remote to dispatch to LoadBalancerSlsStub for LB resources

- Extend @Remote decorator with HTTP routing parameters (src/tetra_rp/client.py)
  - Add 'method' parameter: GET, POST, PUT, DELETE, PATCH
  - Add 'path' parameter: /api/endpoint routes
  - Validate method/path required for LoadBalancerSlsResource
  - Store routing metadata on decorated functions/classes
  - Warn if routing params used with non-LB resources

Foundation for Phase 2 (Build system integration) and Phase 3 (Local dev).

* feat(build): Phase 2.1 - Enhanced scanner for HTTP routing extraction

Update RemoteDecoratorScanner to extract HTTP method and path from
@Remote decorator for LoadBalancerSlsResource endpoints.

Changes:
- Add http_method and http_path fields to RemoteFunctionMetadata
- Add _extract_http_routing() method to parse decorator keywords
- Extract method (GET, POST, PUT, DELETE, PATCH) from decorator
- Extract path (/api/process) from decorator
- Store routing metadata for manifest generation

Foundation for Phase 2.2 (Manifest updates) and Phase 2.3 (Handler generation).

* feat(build): Phase 2.2 - Updated manifest schema for HTTP routing

Enhance ManifestBuilder to support HTTP method/path routing for
LoadBalancerSlsResource endpoints.

Changes:
- Add http_method and http_path fields to ManifestFunction
- Validate LB endpoints have both method and path
- Detect and prevent route conflicts (same method + path)
- Prevent use of reserved paths (/execute, /ping)
- Add 'routes' section to manifest for LB endpoints
- Conditional inclusion of routing fields (only for LB)

Manifest structure for LB endpoints now includes:
{
  "resources": {
    "api_service": {
      "resource_type": "LoadBalancerSlsResource",
      "functions": [
        {
          "name": "process_data",
          "http_method": "POST",
          "http_path": "/api/process"
        }
      ]
    }
  },
  "routes": {
    "api_service": {
      "POST /api/process": "process_data"
    }
  }
}

* feat(cli): Add LB handler generator for FastAPI app creation

Implement LBHandlerGenerator to create FastAPI applications for
LoadBalancerSlsResource endpoints with HTTP method/path routing.

Key features:
- Generates FastAPI apps with explicit route registry
- Creates (method, path) -> function mappings from manifest
- Validates route conflicts and reserved paths
- Imports user functions and creates dynamic routes
- Includes required /ping health check endpoint
- Validates generated handler Python syntax via import

Generated handler structure enables:
- Direct HTTP routing to user functions via FastAPI
- Framework /execute endpoint for @Remote stub execution
- Local development with uvicorn

* feat(runtime): Implement LB handler factory for FastAPI app creation

Create create_lb_handler() factory function that dynamically builds FastAPI
applications from route registries for LoadBalancerSlsResource endpoints.

Key features:
- Accepts route_registry: Dict[(method, path)] -> handler_function mapping
- Registers all user-defined routes from registry to FastAPI app
- Provides /execute endpoint for @Remote stub function execution
- Handles async function execution automatically
- Serializes results with cloudpickle + base64 encoding
- Comprehensive error handling with detailed logging

The /execute endpoint enables:
- Remote function code execution via @Remote decorator
- Automatic argument deserialization from cloudpickle/base64
- Result serialization for transmission back to client
- Support for both sync and async functions

* feat(cli): Route build command to separate handlers for LB endpoints

Update build command to use appropriate handler generators based on
resource type. Separates LoadBalancerSlsResource endpoints (using FastAPI)
from queue-based endpoints (using generic handler).

Changes:
- Import LBHandlerGenerator alongside HandlerGenerator
- Inspect manifest resources and separate by type
- Generate LB handlers via LBHandlerGenerator
- Generate QB handlers via HandlerGenerator
- Combine all generated handler paths for summary

Enables users to mix LB and QB endpoints in same project with correct
code generation for each resource type.

* feat(resources): Add LiveLoadBalancer for local LB endpoint testing

Implement LiveLoadBalancer resource following the LiveServerless pattern
for local development and testing of load-balanced endpoints.

Changes:
- Add TETRA_LB_IMAGE constant for load-balanced Tetra image
- Create LiveLoadBalancer class extending LoadBalancerSlsResource
- Uses LiveServerlessMixin to lock imageName to Tetra LB image
- Register LiveLoadBalancer with LoadBalancerSlsStub in singledispatch
- Export LiveLoadBalancer from core.resources and top-level __init__

This enables users to test LB-based functions locally before deploying,
using the same pattern as LiveServerless for queue-based endpoints.

Users can now write:
  from tetra_rp import LiveLoadBalancer, remote

  api = LiveLoadBalancer(name="test-api")

  @Remote(api, method="POST", path="/api/process")
  async def process_data(x, y):
      return {"result": x + y}

  result = await process_data(5, 3)  # Local execution

* test(stubs): Add comprehensive unit tests for LoadBalancerSlsStub

Implement unit tests for LoadBalancerSlsStub covering:
- Request preparation with arguments and dependencies
- Response handling for success and error cases
- Error handling for invalid responses
- Base64 encoding/decoding of serialized data
- Endpoint URL validation
- Timeout and HTTP error handling

Test coverage:
- _prepare_request: 4 tests
- _handle_response: 5 tests
- _execute_function: 3 error case tests
- __call__: 2 integration tests

Tests verify proper function serialization, argument handling,
error propagation, and response deserialization.

* fix(test): Correct LB endpoint test decorator to match assertions

Fix test_load_balancer_vs_queue_based_endpoints by updating the @Remote
decorator to use method='POST' and path='/api/echo' to match the test
assertions. This was a test-level bug where the decorator definition
didn't match what was being asserted.

* docs: Add comprehensive documentation for @Remote with LoadBalancer endpoints

- Using_Remote_With_LoadBalancer.md: User guide for HTTP routing, local development, building and deploying
- LoadBalancer_Runtime_Architecture.md: Technical details on deployment, request flows, security, and performance
- Updated README.md with LoadBalancer section and code example
- Updated Load_Balancer_Endpoints.md with cross-references to new guides

* security: Remove /execute from deployed LoadBalancer endpoints

Split @Remote execution behavior between local and deployed:
- LiveLoadBalancer (local): Uses /execute endpoint for function serialization
- LoadBalancerSlsResource (deployed): Uses user-defined routes with HTTP param mapping

Changes:
1. LoadBalancerSlsStub routing detection:
   - _should_use_execute_endpoint() determines execution path
   - _execute_via_user_route() maps args to JSON and POSTs to user routes
   - Auto-detects resource type and routing metadata

2. Conditional /execute registration:
   - create_lb_handler() now accepts include_execute parameter
   - Generated handlers default to include_execute=False (security)
   - LiveLoadBalancer can enable /execute if needed

3. Updated handler generator:
   - Added clarity comments on /execute exclusion for deployed endpoints

4. Comprehensive test coverage:
   - 8 new tests for routing detection and execution paths
   - All 31 tests passing (22 unit + 9 integration)

5. Documentation updates:
   - Using_Remote_With_LoadBalancer.md: clarified /execute scope
   - Added 'Local vs Deployed Execution' section explaining differences
   - LoadBalancer_Runtime_Architecture.md: updated execution model
   - Added troubleshooting for deployed endpoint scenarios

Security improvement:
- Deployed endpoints only expose user-defined routes
- /execute endpoint removed from production (prevents arbitrary code execution)
- Lower attack surface for deployed endpoints

* feat(build): Phase 4 - Fix LiveLoadBalancer handler generation to include /execute endpoint

- Modified manifest.py to validate LiveLoadBalancer endpoints like LoadBalancerSlsResource
- Updated lb_handler_generator to:
  - Include LiveLoadBalancer in handler generation filter
  - Pass include_execute=True for LiveLoadBalancer (local dev)
  - Pass include_execute=False for LoadBalancerSlsResource (deployed)
- Added integration tests:
  - Verify LiveLoadBalancer handlers include /execute endpoint
  - Verify deployed handlers exclude /execute endpoint
- Fixes critical bug: LiveLoadBalancer now gets /execute endpoint in generated handlers

* fix(scanner): Discover LoadBalancer resources in addition to Serverless resources

- Updated scanner to extract LiveLoadBalancer and LoadBalancerSlsResource resources
- Previously only looked for 'Serverless' in class name, missing LoadBalancer endpoints
- Now checks for both 'Serverless' and 'LoadBalancer' in resource type names
- Added integration test to verify scanner discovers both resource types
- Fixes critical bug that prevented flash build from finding LoadBalancer endpoints

* chore: Format code for line length and remove unused imports

- Wrap long lines in manifest.py, lb_handler.py, and load_balancer_sls.py
- Remove unused httpx import in test_load_balancer_sls_stub.py
- Apply consistent formatting across codebase

* fix: Address PR #131 review feedback

- Scanner: Use exact type name matching instead of substring matching
  - Whitelist specific resource types to avoid false positives
  - Prevents matching classes like 'MyServerlessHelper' or 'LoadBalancerUtils'

- Type hints: Use Optional[str] for nullable fields in manifest
  - ManifestFunction.http_method and http_path now properly typed

- Timeout: Make HTTP client timeout configurable
  - Added LoadBalancerSlsStub.DEFAULT_TIMEOUT class attribute
  - Added timeout parameter to __init__
  - Updated both _execute_function and _execute_via_user_route to use self.timeout

- Deprecated datetime: Replace datetime.utcnow() with datetime.now(timezone.utc)
  - Updated manifest.py and test_lb_remote_execution.py
  - Ensures Python 3.12+ compatibility

* style: Format datetime chaining for line length

* fix: LiveLoadBalancer template not serialized to RunPod GraphQL

The set_serverless_template model_validator was being overwritten by sync_input_fields
(both had mode="after"). In Pydantic v2, when two validators with the same mode are
defined in a class, only one is registered.

This caused templates to never be created from imageName, resulting in:
  "GraphQL errors: One of templateId, template is required to create an endpoint"

Solution:
- Move set_serverless_template validator from ServerlessResource base class to subclasses
  (ServerlessEndpoint and LoadBalancerSlsResource) where the validation is actually needed
- Keep helper methods (_create_new_template, _configure_existing_template) in base class
  for reuse
- Add comprehensive tests for LiveLoadBalancer template serialization

This allows:
1. Base ServerlessResource to be instantiated freely for testing/configuration
2. Subclasses (ServerlessEndpoint, LoadBalancerSlsResource) to enforce template
   requirements during deployment
3. Proper template serialization in GraphQL payload for RunPod API

Fixes: One of templateId, template is required to create an endpoint error when
deploying LiveLoadBalancer with custom image tags like runpod/tetra-rp-lb:local

* fix: LoadBalancer endpoint URL and add CPU support

- Fix: Use correct endpoint URL format for load-balanced endpoints
  (https://{id}.api.runpod.ai instead of https://api.runpod.ai/v2/{id})
  This fixes 404 errors on /ping health check endpoints

- Feature: Add CPU LoadBalancer support
  * Create CpuLoadBalancerSlsResource for CPU-based load-balanced endpoints
  * Create CpuLiveLoadBalancer for local CPU LB development
  * Add TETRA_CPU_LB_IMAGE constant for CPU LB Docker image
  * Update example code to use CpuLiveLoadBalancer for CPU worker
  * Add 8 comprehensive tests for CPU LoadBalancer functionality

- Tests: Add 2 tests for endpoint URL format validation
- All 474 tests passing, 64% code coverage

* fix: Export CpuLiveLoadBalancer and CpuLoadBalancerSlsResource from tetra_rp package

LoadBalancer resources were not being discovered by ResourceDiscovery because
the new CPU variants (CpuLiveLoadBalancer, CpuLoadBalancerSlsResource) were
not exported from the main tetra_rp package. This prevented undeploy from
picking up these resources.

Added exports to:
- TYPE_CHECKING imports for type hints
- __getattr__ function for lazy loading
- __all__ list for public API

This fixes the issue where 'flash undeploy list' could not find LoadBalancer
resources that were deployed with 'flash run --auto-provision'.

* fix: Add API key authentication to LoadBalancer health check

The /ping endpoint for RunPod load-balanced endpoints requires the
RUNPOD_API_KEY header for authentication. Without it, the health check
fails with 401 Unauthorized, causing provisioning to timeout.

This fix adds the Authorization header to the health check request if
the RUNPOD_API_KEY environment variable is available, allowing the
endpoint health check to succeed during provisioning.

Fixes issue where 'flash run --auto-provision' would fail even though
the endpoint was successfully created on RunPod.

* fix(lb): Exclude flashboot from CpuLoadBalancerSlsResource GraphQL payload

CpuLoadBalancerSlsResource was overriding _input_only without including flashboot,
causing it to be sent to the RunPod GraphQL API which doesn't accept this field.
This caused deployment to fail with: Field "flashboot" is not defined by type "EndpointInput".

* fix(lb): Expand CpuInstanceType.ANY to all CPU flavors in CpuLoadBalancerSlsResource

Add field_validator to expand [CpuInstanceType.ANY] to all available CPU instance
types (cpu3g, cpu3c, cpu5c variants). This matches the behavior in CpuServerlessEndpoint
and prevents deployment errors like 'instanceId must be in the format of flavorId-vcpu-ram'.

* refactor(cpu): Move instanceIds validator to CpuEndpointMixin

Move the instanceIds field_validator from CpuServerlessEndpoint to CpuEndpointMixin
so both CpuServerlessEndpoint and CpuLoadBalancerSlsResource share the same validator
that expands [CpuInstanceType.ANY] to all available CPU flavors. This eliminates
code duplication and ensures consistent behavior across all CPU endpoint types.

* test: Update CPU instance test to reflect validator expansion

Update test_cpu_live_load_balancer_defaults_to_cpu_any to verify that
[CpuInstanceType.ANY] is correctly expanded to all available CPU instance types
by the field_validator in CpuEndpointMixin.

* fix(lb): Increase health check timeout from 5s to 15s

Load-balanced workers need more time to respond during cold starts and initialization.
RunPod docs recommend at least 10-15 second timeouts for health checks. Workers may
return 204 during initialization, which is normal and expected.

* fix(lb): Fix CPU load balancer template deployment error

Fixes two bugs in CpuLoadBalancerSlsResource that prevented CPU load balancers from deploying:

1. Added gpuCount and allowedCudaVersions to _input_only exclusion set to prevent
   GPU-specific fields from being sent to RunPod API
2. Overrode set_serverless_template() to call _sync_cpu_fields() first, ensuring
   GPU defaults are overridden to CPU-appropriate values (gpuCount=0)

The RunPod API was rejecting CPU load balancer templates because GPU-specific
fields were being included in the GraphQL payload. These changes align
CpuLoadBalancerSlsResource behavior with CpuServerlessEndpoint.

Also added comprehensive test coverage (30+ tests) to verify:
- GPU fields are correctly overridden to CPU defaults
- GPU fields are excluded from API payloads
- CPU-specific fields are properly included
- Consistency with CpuServerlessEndpoint behavior

* fix(drift): Exclude runtime fields from config hash to prevent false positives

Fixes false positive configuration drift detection by separating concerns:

1. Update ServerlessResource.config_hash to exclude runtime fields
   - Fields like template, templateId, aiKey, userId are API-assigned
   - Prevents false drift when same config is redeployed across processes
   - Now only hashes user-specified configuration

2. Add config_hash override to CpuLoadBalancerSlsResource
   - CPU load balancers hash only CPU-relevant fields
   - Excludes GPU-specific fields and runtime fields
   - Follows same pattern as CpuServerlessEndpoint

3. Fix _has_structural_changes to exclude template/templateId
   - CRITICAL: These runtime fields were causing false structural changes
   - Was forcing unnecessary redeployments despite update() being available
   - Now system correctly uses update() instead of undeploy+deploy

4. Make field serializers robust to handle string/enum values
   - Prevents serialization errors when fields are pre-converted to strings

5. Add comprehensive drift detection tests (16 tests)
   - Test hash stability with runtime field changes
   - Test exclusion of env, template, templateId, and other runtime fields
   - Test that actual config changes (image, flashboot) are detected
   - Test structural change detection behavior
   - Test real-world deployment scenarios

Results:
- Same config deployed multiple times: no false drift
- Different env vars with same config: no false drift
- Template/templateId changes: no false drift
- API-assigned fields: no false drift
- User config changes (image, flashboot): drift detected correctly
- All 512 unit tests pass

* fix(http): Standardize RunPod HTTP client authentication across codebase

Centralizes HTTP client creation for RunPod load-balanced endpoints to prevent
manual Authorization header code duplication and ensure consistent authentication:

1. Create centralized HTTP utility function (src/tetra_rp/core/utils/http.py)
   - New function: get_authenticated_httpx_client()
   - Automatically adds Bearer token Authorization header if RUNPOD_API_KEY set
   - Provides consistent timeout handling (default 30s, customizable)
   - Follows existing GraphQL/REST client authentication pattern

2. Fix critical authentication bug in LoadBalancerSlsStub._execute_via_user_route()
   - Previously: Missing Authorization header (401 errors on user routes)
   - Now: Uses centralized utility for proper authentication
   - Enables direct HTTP calls to user-defined routes with auth

3. Refactor two methods to use centralized utility
   - LoadBalancerSlsStub._execute_function() - removes 7+ lines of manual auth code
   - LoadBalancerSlsResource._check_ping_endpoint() - simplifies auth setup

4. Add comprehensive unit tests (tests/unit/core/utils/test_http.py)
   - Tests API key presence/absence handling
   - Tests custom and default timeout configuration
   - Tests edge cases (empty key, zero timeout)
   - All 7 tests pass with 100% coverage

Results:
- Single source of truth for HTTP authentication (centralized utility)
- Fixes 401 Unauthorized errors on load-balanced endpoints
- Eliminates repetitive manual auth code across 3+ locations
- Easier to maintain and update authentication patterns in future
- All 499 unit tests pass
- Code coverage: 64% (exceeds 35% requirement)

* feat(http): Extend HTTP utilities to cover both sync and async authentication

Extends the centralized HTTP authentication pattern to all RunPod API calls:

1. Add get_authenticated_requests_session() for synchronous requests
   - Creates requests.Session with automatic Bearer token Authorization header
   - Follows same pattern as async get_authenticated_httpx_client()
   - Single source of truth for sync HTTP authentication

2. Refactor template.py to use centralized utility
   - Removes manual Authorization header setup (line 86)
   - Now uses get_authenticated_requests_session() for all template updates
   - Improves error handling with raise_for_status()
   - Token parameter marked deprecated; uses RUNPOD_API_KEY env var

3. Add comprehensive tests for sync utility (4 tests)
   - Tests API key presence/absence handling
   - Tests empty API key edge case
   - Tests Session object validation
   - All tests pass with proper cleanup

Benefits:
- True single source of truth for all RunPod HTTP authentication (sync + async)
- Consistent patterns across entire codebase
- Easier future auth changes across all HTTP client types
- Eliminates manual auth header code in template.py
- All 503 unit tests pass
- Code coverage: 64% (exceeds 35% requirement)

Note: requests.Session doesn't support default timeouts; timeout should be
specified per request (e.g., session.post(url, json=data, timeout=30.0))

* fix: Address PR feedback on HTTP utilities implementation

Addresses three feedback items from code review:

1. Fix breaking parameter order change in update_system_dependencies()
   - Restored original parameter order: template_id, token, system_dependencies, base_entry_cmd
   - Maintains backward compatibility with existing callers
   - Token parameter now optional (default None)

2. Add proper deprecation warning for token parameter
   - Issues DeprecationWarning when token parameter is used
   - Clearly communicates migration to RUNPOD_API_KEY environment variable
   - Follows Python deprecation best practices (warnings.warn with stacklevel=2)

3. Standardize test mocking approach across all health check tests
   - All tests now use consistent 'tetra_rp.core.utils.http.httpx.AsyncClient' patching
   - Removed inconsistent 'side_effect=lambda' pattern
   - Improved test maintainability by using same strategy everywhere

All 503 tests pass with consistent, clean implementation.

* refactor(drift): Extract runtime field constants and improve maintainability

- Extract RUNTIME_FIELDS and EXCLUDED_HASH_FIELDS as ClassVar constants in ServerlessResource for centralized field list management
- Add clarifying comments to enum serializers explaining defensive isinstance() checks for nested model serialization
- Document CPU load balancer field list coupling in docstring with maintenance guidelines
- Add TestSerializerDefensiveBehavior class with 4 tests verifying pre-stringified enum value handling
- Use ClassVar annotation to satisfy Pydantic v2 model field requirements

This reduces maintenance burden by centralizing field definitions and improves code clarity without changing functionality.

* docs: Improve LoadBalancer documentation accuracy and completeness

- Fix health check timeout: Add clarification that timeout is 15 seconds per check
- Add HTTP authentication details explaining RUNPOD_API_KEY header injection
- Document stub decision logic for incomplete routing metadata (fallback behavior)
- Clarify function signature inspection with concrete example showing parameter mapping
- Expand /execute security explanation with explicit threats and best practices
- Add detailed parameter type constraints for deployed endpoints (supported vs unsupported)
- Add troubleshooting guide for missing routing metadata (404 errors)
- Strengthen security warnings about never exposing /execute in production

All documentation now matches actual implementation verified through codebase analysis.

* docs: add resource config drift detection documentation

- comprehensive guide on drift detection implementation
- covers hash computation, field exclusion, and cpu-specific behavior
- includes testing patterns and troubleshooting guide
- documents all fields that trigger drift vs those ignored

* docs: proper name for the file

* test(build): Add comprehensive test coverage for scanner and handler improvements

- Add 6 new scanner tests for directory filtering (.venv, .flash, .runpod exclusion)
- Add test for resource type validation to prevent false positives
- Add test for fallback behavior when resource name extraction fails
- Add test for handling resource names with special characters
- Update existing tests to reflect new dynamic import format and resource name extraction

These tests guarantee that improvements to the scanner (resource type validation,
directory filtering, fallback behavior) and handler generator (dynamic imports for
invalid Python identifiers) won't regress in future changes.

* test(scanner): Fix resource type assertions to match scanner behavior

The scanner now extracts resource names from the name= parameter rather
than using variable names. Update test assertions to expect the actual
resource names ('test-api', 'deployed-api') instead of variable names.

* chore: merge correction

* fix(drift): Remove manual undeploy/deploy from update() method

Use saveEndpoint mutation for all changes instead of manual lifecycle
management. Server-side automatically detects version-triggering fields
(GPU, template, volumes) and increments endpoint version accordingly.

Keep _has_structural_changes() as informational for logging purposes only.
This aligns with RunPod API's version-based deployment model.

* docs(drift): Clarify _has_structural_changes detects version-triggering changes

Update docstring to reflect that this method identifies changes that
trigger server-side version increment and worker recreation, not manual
redeploy cycles. Explain which changes are version-triggering vs rolling
updates, and note that the method is now informational for logging only.

* feat(drift): Enable environment variable drift detection

Remove env from EXCLUDED_HASH_FIELDS so changes to environment variables
trigger drift detection and endpoint updates. Environment changes are
non-version-triggering (rolling updates), so server will apply them via
saveEndpoint without recreating workers.

Add env to CPU LoadBalancer config_hash for consistent behavior across
all resource types. Update comments to reflect that env is user-specified
configuration, not dynamically computed.

* test(drift): Update tests for environment variable drift detection

- test_lb_config_hash_excludes_env_variables → test_lb_config_hash_detects_env_changes
- test_env_var_changes_no_drift → test_env_var_changes_trigger_drift
- test_config_hash_excludes_env_from_drift → test_config_hash_detects_env_from_drift

Update assertions to expect different hashes when env changes, matching
new behavior where environment variable changes trigger drift and updates.

* fix: Address Copilot review feedback on type hints and documentation

- Fix type annotation for timeout parameter in LoadBalancerSlsStub (Optional[float])
- Replace hardcoded "30s" with actual self.timeout in error messages (2 locations)
- Update Resource_Config_Drift_Detection.md to reflect actual EXCLUDED_HASH_FIELDS
- Remove duplicate Load-Balanced Endpoints section from README.md

Addresses Copilot review comments (PR #132, review 3642596664)

* chore: Update Python version compatibility to 3.10-3.14

- Drop Python 3.9 support (EOL)
- Ensure support for Python 3.14
- Update requires-python in pyproject.toml from >=3.9,<3.14 to >=3.10,<3.15
- Update mypy python_version from 3.9 to 3.10
- Update CI matrix to test Python 3.10, 3.11, 3.12, 3.13, 3.14

* chore: Increase code coverage requirement to 65%

* refactor: remove dead code and add serialization tests

Remove unused functions and improve test coverage:
- Remove deprecated update_system_dependencies from template.py
- Remove unused utility functions from utils.py and json.py
- Add comprehensive test suite for serialization module (100% coverage)

Tests cover serialization/deserialization of args, kwargs, and error handling
for cloudpickle failures across Python 3.10-3.14.
deanq added a commit that referenced this pull request Jan 22, 2026
* feat(runtime): Add generic handler factory for serverless execution

Implement a factory function that creates RunPod serverless handlers,
eliminating code duplication across generated handler files.

The generic_handler module provides:
- create_handler(function_registry) factory that accepts a dict of
  function/class objects and returns a RunPod-compatible handler
- Automatic serialization/deserialization using cloudpickle + base64
- Support for both function execution and class instantiation + method calls
- Structured error responses with full tracebacks for debugging
- Load manifest for cross-endpoint function discovery

This design centralizes all handler logic in one place, making it easy to:
- Fix bugs once, benefit all handlers
- Add new features without regenerating projects
- Keep deployment packages small (handler files are ~23 lines each)

Implementation:
- deserialize_arguments(): Base64 + cloudpickle decoding
- serialize_result(): Cloudpickle + base64 encoding
- execute_function(): Handles function vs. class execution
- load_manifest(): Loads flash_manifest.json for service discovery

* feat(cli): Add handler generator, manifest builder, and scanner for build process

Implement the build pipeline components that work together to generate
serverless handlers from @Remote decorated functions.

Three core components:

1. RemoteDecoratorScanner (scanner.py)
   - Uses Python AST to discover all @Remote decorated functions
   - Extracts function metadata: name, module, async status, is_class
   - Groups functions by resource_config for handler generation
   - Handles edge cases like decorated classes and async functions

2. ManifestBuilder (manifest.py)
   - Groups functions by their resource_config
   - Creates flash_manifest.json structure for service discovery
   - Maps functions to their modules and handler files
   - Enables cross-endpoint function routing at runtime

3. HandlerGenerator (handler_generator.py)
   - Creates lightweight handler_*.py files for each resource config
   - Each handler imports functions and registers them in FUNCTION_REGISTRY
   - Handler delegates to create_handler() factory from generic_handler
   - Generated handlers are ~23 lines (vs ~98 with duplication)

Build Pipeline Flow:
1. Scanner discovers @Remote functions
2. ManifestBuilder groups them by resource_config
3. HandlerGenerator creates handler_*.py for each group
4. All files + manifest bundled into archive.tar.gz

This eliminates ~95% duplication across handlers by using the factory pattern
instead of template-based generation.

* test(runtime): Add comprehensive tests for generic handler

Implement 19 unit tests covering all major paths through the generic_handler
factory and its helper functions.

Test Coverage:

Serialization/Deserialization (7 tests):
- serialize_result() with simple values, dicts, lists
- deserialize_arguments() with empty, args-only, kwargs-only, mixed inputs
- Round-trip encoding/decoding of cloudpickle + base64

Function Execution (4 tests):
- Simple function execution with positional and keyword arguments
- Keyword argument handling
- Class instantiation and method calls
- Argument passing to instance methods

Handler Factory (8 tests):
- create_handler() returns callable RunPod handler
- Handler with simple function registry
- Missing function error handling (returns error response, not exception)
- Function exceptions caught with traceback included
- Multiple functions in single registry
- Complex Python objects (classes, lambdas, closures)
- Empty registry edge case
- Default execution_type parameter
- None return values
- Correct RunPod response format (success, result/error, traceback)

Test Strategy:
- Arrange-Act-Assert pattern for clarity
- Isolated unit tests (no external dependencies)
- Tests verify behavior, not implementation
- Error cases tested for proper error handling
- All serialization tested for round-trip correctness

All tests passing, 83% coverage on generic_handler.py

* test(cli): Add tests for handler generation, manifest building, and scanning

Implement integration tests validating the build pipeline components work
correctly together.

Test Coverage:

HandlerGenerator Tests:
- Handler files created with correct names (handler_<resource_name>.py)
- Generated files import required functions from workers
- FUNCTION_REGISTRY properly formatted
- create_handler() imported from generic_handler
- Handler creation via factory
- RunPod start call present and correct
- Multiple handlers generated for multiple resource configs

ManifestBuilder Tests:
- Manifest structure with correct version and metadata
- Resources grouped by resource_config
- Handler file paths correct
- Function metadata preserved (name, module, is_async, is_class)
- Function registry mapping complete

ScannerTests:
- @Remote decorated functions discovered via AST
- Function metadata extracted correctly
- Module paths resolved properly
- Async functions detected
- Class methods detected
- Edge cases handled (multiple decorators, nested classes)

Test Strategy:
- Integration tests verify components work together
- Tests verify generated files are syntactically correct
- Tests validate data structures match expected schemas
- No external dependencies in build process

Validates that the entire build pipeline:
1. Discovers functions correctly
2. Groups them appropriately
3. Generates valid Python handler files
4. Creates correct manifest structure

* docs(runtime): Document generic handler factory architecture

Add comprehensive architecture documentation explaining why the factory
pattern was chosen and how it works.

Documentation includes:

Overview & Context:
- Problem statement: Handler files had 95% duplication
- Design decision: Use factory function instead of templates
- Benefits: Single source of truth, easier maintenance, consistency

Architecture Diagrams (MermaidJS):
- High-level flow: @Remote functions → Scanner → Manifest → Handlers → Factory
- Component relationships: HandlerGenerator, GeneratedHandler, generic_handler
- Function registry pattern: Discovery → Grouping → Registration → Factory

Implementation Details:
- create_handler(function_registry) signature and behavior
- deserialize_arguments(): Base64 + cloudpickle decoding
- serialize_result(): Cloudpickle + base64 encoding
- execute_function(): Function vs. class execution
- load_manifest(): Service discovery via flash_manifest.json

Design Decisions (with rationale):
- Factory Pattern over Inheritance: Simpler, less coupling, easier to test
- CloudPickle + Base64: Handles arbitrary objects, safe JSON transmission
- Manifest in Generic Handler: Runtime service discovery requirement
- Structured Error Responses: Debugging aid, functional error handling
- Both Execution Types: Supports stateful classes and pure functions

Usage Examples:
- Simple function handler
- Class execution with methods
- Multiple functions in one handler

Build Process Integration:
- 4-phase pipeline: Scanner → Grouping → Generation → Packaging
- Manifest structure and contents
- Generated handler structure (~23 lines)

Testing Strategy:
- 19 unit tests covering all major paths
- 7 integration tests verifying handler generation
- Manual testing with example applications

Performance:
- Zero runtime penalty (factory called once at startup)
- No additional indirection in request path

* docs(cli): Add flash build command documentation

Document the flash build command and update CLI README to include it.

New Documentation:

flash-build.md includes:

Usage & Options:
- Command syntax: flash build [OPTIONS]
- --no-deps: Skip transitive dependencies (faster, smaller archives)
- --keep-build: Keep build directory for inspection/debugging
- --output, -o: Custom archive name (default: archive.tar.gz)

What It Does (5-step process):
1. Discovery: Scan for @Remote decorated functions
2. Grouping: Group functions by resource_config
3. Handler Generation: Create lightweight handler files
4. Manifest Creation: Generate flash_manifest.json
5. Packaging: Create archive.tar.gz for deployment

Build Artifacts:
- .flash/archive.tar.gz: Deployment package (ready for RunPod)
- .flash/flash_manifest.json: Service discovery configuration
- .flash/.build/: Temporary build directory

Handler Generation:
- Explains factory pattern and minimal handler files
- Links to Runtime_Generic_Handler.md for details

Dependency Management:
- Default behavior: Install all dependencies including transitive
- --no-deps: Only direct dependencies (when base image has transitive)
- Trade-offs explained

Cross-Endpoint Function Calls:
- Example showing GPU and CPU endpoints
- Manifest enables routing automatically

Output & Troubleshooting:
- Sample build output with progress indicators
- Common failure scenarios and solutions
- How to debug with --keep-build

Next Steps:
- Test locally with flash run
- Deploy to RunPod
- Monitor with flash undeploy list

Updated CLI README.md:
- Added flash build to command list in sequence
- Links to full flash-build.md documentation

* docs: Add build process and handler generation section to README

Add a new section explaining how the build system works and why the
factory pattern reduces code duplication.

New Section: Build Process and Handler Generation

Explains:

How Flash Builds Your Application (5-step pipeline):
1. Discovery: Scans code for @Remote decorated functions
2. Grouping: Groups functions by resource_config
3. Handler Generation: Creates lightweight handler files
4. Manifest Creation: Generates flash_manifest.json for service discovery
5. Packaging: Bundles everything into archive.tar.gz

Handler Architecture (with code example):
- Shows generated handler using factory pattern
- Single source of truth: All handler logic in one place
- Easier maintenance: Bug fixes don't require rebuilding projects

Cross-Endpoint Function Calls:
- Example of GPU and CPU endpoints calling each other
- Manifest and runtime wrapper handle service discovery

Build Artifacts:
- .flash/.build/: Temporary build directory
- .flash/archive.tar.gz: Deployment package
- .flash/flash_manifest.json: Service configuration

Links to detailed documentation:
- docs/Runtime_Generic_Handler.md for architecture details
- src/tetra_rp/cli/docs/flash-build.md for CLI reference

This section bridges the main README and detailed documentation,
providing entry point for new users discovering the build system.

* feat(cli): Integrate build utilities into flash build command

Wire up the handler generator, manifest builder, and scanner into the
actual flash build command implementation.

Changes to build.py:

1. Integration:
   - Import RemoteDecoratorScanner for function discovery
   - Import ManifestBuilder for manifest creation
   - Import HandlerGenerator for handler file creation
   - Call these in sequence during the build process

2. Build Pipeline:
   - After copying project files, scan for @Remote functions
   - Build manifest from discovered functions
   - Generate handler files for each resource config
   - Write manifest to build directory
   - Progress indicators show what's being generated

3. Fixes:
   - Change .tetra directory references to .flash
   - Uncomment actual build logic (was showing "Coming Soon" message)
   - Fix progress messages to show actual file counts

4. Error Handling:
   - Try/catch around handler generation
   - Warning shown if generation fails but build continues
   - User can debug with --keep-build flag

Build Flow Now:
1. Load ignore patterns
2. Collect project files
3. Create build directory
4. Copy files to build directory
5. [NEW] Scan for @Remote functions
6. [NEW] Build and write manifest
7. [NEW] Generate handler files
8. Install dependencies
9. Create archive
10. Clean up build directory (unless --keep-build)

Dependencies:
- Updated uv.lock with all required dependencies

* refactor(build): Fix directory structure and add comprehensive error handling

**Critical Fixes:**
- Remove "Coming Soon" message blocking build command execution
- Fix build directory to use .flash/.build/ directly (no app_name subdirectory)
- Fix tarball to extract with flat structure using arcname="."
- Fix cleanup to remove correct build directory

**Error Handling & Validation:**
- Add specific exception handling (ImportError, SyntaxError, ValueError)
- Add import validation to generated handlers
- Add duplicate function name detection across resources
- Add proper error logging throughout build process

**Resource Type Tracking:**
- Add resource_type field to RemoteFunctionMetadata
- Track actual resource types (LiveServerless, CpuLiveServerless)
- Use actual types in manifest instead of hardcoding

**Robustness Improvements:**
- Add handler import validation post-generation
- Add manifest path fallback search (cwd, module dir, legacy location)
- Add resource name sanitization for safe filenames
- Add specific exception logging in scanner (UnicodeDecodeError, SyntaxError)

**User Experience:**
- Add troubleshooting section to README
- Update manifest path documentation in docs
- Change "Zero Runtime Penalty" to "Minimal Runtime Overhead"
- Mark future enhancements as "Not Yet Implemented"
- Improve build success message with next steps

Fixes all 20 issues identified in code review (issues #1-13, #19-22)

* feat(resources): Add LoadBalancerSlsResource for LB endpoints

Implement LoadBalancerSlsResource class for provisioning RunPod load-balanced
serverless endpoints. Load-balanced endpoints expose HTTP servers directly to
clients without queue-based processing, enabling REST APIs, webhooks, and
real-time communication patterns.

Key features:
- Type enforcement (always LB, never QB)
- Scaler validation (REQUEST_COUNT required, not QUEUE_DELAY)
- Health check polling via /ping endpoint (200/204 = healthy)
- Post-deployment verification with configurable retries
- Async and sync health check methods
- Comprehensive unit tests
- Full documentation with architecture diagrams and examples

Architecture:
- Extends ServerlessResource with LB-specific behavior
- Validates configuration before deployment
- Polls /ping endpoint until healthy (10 retries × 5s = 50s timeout)
- Raises TimeoutError if endpoint fails to become healthy

This forms the foundation for Mothership architecture where a load-balanced
endpoint serves as a directory server for child endpoints.

* fix(test): Fix LoadBalancerSlsResource deployment test mocks

Import ServerlessResource directly and use patch.object on the imported class
instead of string-based patches. This ensures the mocks properly intercept the
parent class's _do_deploy method when called via super(). Simplifies mock
configuration and removes an unused variable assertion.

Fixes the three failing deployment tests that were making real GraphQL API calls.
All tests now pass: 418 passed, 1 skipped.

* feat(resources): Phase 1 - Core infrastructure for @Remote on LB endpoints

Implement core infrastructure for enabling @Remote decorator on
LoadBalancerSlsResource endpoints with HTTP method/path routing.

Changes:
- Create LoadBalancerSlsStub: HTTP-based stub for direct endpoint execution
  (src/tetra_rp/stubs/load_balancer_sls.py, 170 lines)
  - Serializes functions and arguments using cloudpickle + base64
  - Direct HTTP POST to /execute endpoint (no queue polling)
  - Proper error handling and deserialization

- Register stub with singledispatch (src/tetra_rp/stubs/registry.py)
  - Enables @Remote to dispatch to LoadBalancerSlsStub for LB resources

- Extend @Remote decorator with HTTP routing parameters (src/tetra_rp/client.py)
  - Add 'method' parameter: GET, POST, PUT, DELETE, PATCH
  - Add 'path' parameter: /api/endpoint routes
  - Validate method/path required for LoadBalancerSlsResource
  - Store routing metadata on decorated functions/classes
  - Warn if routing params used with non-LB resources

Foundation for Phase 2 (Build system integration) and Phase 3 (Local dev).

* feat(build): Phase 2.1 - Enhanced scanner for HTTP routing extraction

Update RemoteDecoratorScanner to extract HTTP method and path from
@Remote decorator for LoadBalancerSlsResource endpoints.

Changes:
- Add http_method and http_path fields to RemoteFunctionMetadata
- Add _extract_http_routing() method to parse decorator keywords
- Extract method (GET, POST, PUT, DELETE, PATCH) from decorator
- Extract path (/api/process) from decorator
- Store routing metadata for manifest generation

Foundation for Phase 2.2 (Manifest updates) and Phase 2.3 (Handler generation).

* feat(build): Phase 2.2 - Updated manifest schema for HTTP routing

Enhance ManifestBuilder to support HTTP method/path routing for
LoadBalancerSlsResource endpoints.

Changes:
- Add http_method and http_path fields to ManifestFunction
- Validate LB endpoints have both method and path
- Detect and prevent route conflicts (same method + path)
- Prevent use of reserved paths (/execute, /ping)
- Add 'routes' section to manifest for LB endpoints
- Conditional inclusion of routing fields (only for LB)

Manifest structure for LB endpoints now includes:
{
  "resources": {
    "api_service": {
      "resource_type": "LoadBalancerSlsResource",
      "functions": [
        {
          "name": "process_data",
          "http_method": "POST",
          "http_path": "/api/process"
        }
      ]
    }
  },
  "routes": {
    "api_service": {
      "POST /api/process": "process_data"
    }
  }
}

* feat(cli): Add LB handler generator for FastAPI app creation

Implement LBHandlerGenerator to create FastAPI applications for
LoadBalancerSlsResource endpoints with HTTP method/path routing.

Key features:
- Generates FastAPI apps with explicit route registry
- Creates (method, path) -> function mappings from manifest
- Validates route conflicts and reserved paths
- Imports user functions and creates dynamic routes
- Includes required /ping health check endpoint
- Validates generated handler Python syntax via import

Generated handler structure enables:
- Direct HTTP routing to user functions via FastAPI
- Framework /execute endpoint for @Remote stub execution
- Local development with uvicorn

* feat(runtime): Implement LB handler factory for FastAPI app creation

Create create_lb_handler() factory function that dynamically builds FastAPI
applications from route registries for LoadBalancerSlsResource endpoints.

Key features:
- Accepts route_registry: Dict[(method, path)] -> handler_function mapping
- Registers all user-defined routes from registry to FastAPI app
- Provides /execute endpoint for @Remote stub function execution
- Handles async function execution automatically
- Serializes results with cloudpickle + base64 encoding
- Comprehensive error handling with detailed logging

The /execute endpoint enables:
- Remote function code execution via @Remote decorator
- Automatic argument deserialization from cloudpickle/base64
- Result serialization for transmission back to client
- Support for both sync and async functions

* feat(cli): Route build command to separate handlers for LB endpoints

Update build command to use appropriate handler generators based on
resource type. Separates LoadBalancerSlsResource endpoints (using FastAPI)
from queue-based endpoints (using generic handler).

Changes:
- Import LBHandlerGenerator alongside HandlerGenerator
- Inspect manifest resources and separate by type
- Generate LB handlers via LBHandlerGenerator
- Generate QB handlers via HandlerGenerator
- Combine all generated handler paths for summary

Enables users to mix LB and QB endpoints in same project with correct
code generation for each resource type.

* feat(resources): Add LiveLoadBalancer for local LB endpoint testing

Implement LiveLoadBalancer resource following the LiveServerless pattern
for local development and testing of load-balanced endpoints.

Changes:
- Add TETRA_LB_IMAGE constant for load-balanced Tetra image
- Create LiveLoadBalancer class extending LoadBalancerSlsResource
- Uses LiveServerlessMixin to lock imageName to Tetra LB image
- Register LiveLoadBalancer with LoadBalancerSlsStub in singledispatch
- Export LiveLoadBalancer from core.resources and top-level __init__

This enables users to test LB-based functions locally before deploying,
using the same pattern as LiveServerless for queue-based endpoints.

Users can now write:
  from tetra_rp import LiveLoadBalancer, remote

  api = LiveLoadBalancer(name="test-api")

  @Remote(api, method="POST", path="/api/process")
  async def process_data(x, y):
      return {"result": x + y}

  result = await process_data(5, 3)  # Local execution

* test(stubs): Add comprehensive unit tests for LoadBalancerSlsStub

Implement unit tests for LoadBalancerSlsStub covering:
- Request preparation with arguments and dependencies
- Response handling for success and error cases
- Error handling for invalid responses
- Base64 encoding/decoding of serialized data
- Endpoint URL validation
- Timeout and HTTP error handling

Test coverage:
- _prepare_request: 4 tests
- _handle_response: 5 tests
- _execute_function: 3 error case tests
- __call__: 2 integration tests

Tests verify proper function serialization, argument handling,
error propagation, and response deserialization.

* fix(test): Correct LB endpoint test decorator to match assertions

Fix test_load_balancer_vs_queue_based_endpoints by updating the @Remote
decorator to use method='POST' and path='/api/echo' to match the test
assertions. This was a test-level bug where the decorator definition
didn't match what was being asserted.

* docs: Add comprehensive documentation for @Remote with LoadBalancer endpoints

- Using_Remote_With_LoadBalancer.md: User guide for HTTP routing, local development, building and deploying
- LoadBalancer_Runtime_Architecture.md: Technical details on deployment, request flows, security, and performance
- Updated README.md with LoadBalancer section and code example
- Updated Load_Balancer_Endpoints.md with cross-references to new guides

* security: Remove /execute from deployed LoadBalancer endpoints

Split @Remote execution behavior between local and deployed:
- LiveLoadBalancer (local): Uses /execute endpoint for function serialization
- LoadBalancerSlsResource (deployed): Uses user-defined routes with HTTP param mapping

Changes:
1. LoadBalancerSlsStub routing detection:
   - _should_use_execute_endpoint() determines execution path
   - _execute_via_user_route() maps args to JSON and POSTs to user routes
   - Auto-detects resource type and routing metadata

2. Conditional /execute registration:
   - create_lb_handler() now accepts include_execute parameter
   - Generated handlers default to include_execute=False (security)
   - LiveLoadBalancer can enable /execute if needed

3. Updated handler generator:
   - Added clarity comments on /execute exclusion for deployed endpoints

4. Comprehensive test coverage:
   - 8 new tests for routing detection and execution paths
   - All 31 tests passing (22 unit + 9 integration)

5. Documentation updates:
   - Using_Remote_With_LoadBalancer.md: clarified /execute scope
   - Added 'Local vs Deployed Execution' section explaining differences
   - LoadBalancer_Runtime_Architecture.md: updated execution model
   - Added troubleshooting for deployed endpoint scenarios

Security improvement:
- Deployed endpoints only expose user-defined routes
- /execute endpoint removed from production (prevents arbitrary code execution)
- Lower attack surface for deployed endpoints

* feat(build): Phase 4 - Fix LiveLoadBalancer handler generation to include /execute endpoint

- Modified manifest.py to validate LiveLoadBalancer endpoints like LoadBalancerSlsResource
- Updated lb_handler_generator to:
  - Include LiveLoadBalancer in handler generation filter
  - Pass include_execute=True for LiveLoadBalancer (local dev)
  - Pass include_execute=False for LoadBalancerSlsResource (deployed)
- Added integration tests:
  - Verify LiveLoadBalancer handlers include /execute endpoint
  - Verify deployed handlers exclude /execute endpoint
- Fixes critical bug: LiveLoadBalancer now gets /execute endpoint in generated handlers

* fix(scanner): Discover LoadBalancer resources in addition to Serverless resources

- Updated scanner to extract LiveLoadBalancer and LoadBalancerSlsResource resources
- Previously only looked for 'Serverless' in class name, missing LoadBalancer endpoints
- Now checks for both 'Serverless' and 'LoadBalancer' in resource type names
- Added integration test to verify scanner discovers both resource types
- Fixes critical bug that prevented flash build from finding LoadBalancer endpoints

* chore: Format code for line length and remove unused imports

- Wrap long lines in manifest.py, lb_handler.py, and load_balancer_sls.py
- Remove unused httpx import in test_load_balancer_sls_stub.py
- Apply consistent formatting across codebase

* fix: Address PR #131 review feedback

- Scanner: Use exact type name matching instead of substring matching
  - Whitelist specific resource types to avoid false positives
  - Prevents matching classes like 'MyServerlessHelper' or 'LoadBalancerUtils'

- Type hints: Use Optional[str] for nullable fields in manifest
  - ManifestFunction.http_method and http_path now properly typed

- Timeout: Make HTTP client timeout configurable
  - Added LoadBalancerSlsStub.DEFAULT_TIMEOUT class attribute
  - Added timeout parameter to __init__
  - Updated both _execute_function and _execute_via_user_route to use self.timeout

- Deprecated datetime: Replace datetime.utcnow() with datetime.now(timezone.utc)
  - Updated manifest.py and test_lb_remote_execution.py
  - Ensures Python 3.12+ compatibility

* style: Format datetime chaining for line length

* fix: LiveLoadBalancer template not serialized to RunPod GraphQL

The set_serverless_template model_validator was being overwritten by sync_input_fields
(both had mode="after"). In Pydantic v2, when two validators with the same mode are
defined in a class, only one is registered.

This caused templates to never be created from imageName, resulting in:
  "GraphQL errors: One of templateId, template is required to create an endpoint"

Solution:
- Move set_serverless_template validator from ServerlessResource base class to subclasses
  (ServerlessEndpoint and LoadBalancerSlsResource) where the validation is actually needed
- Keep helper methods (_create_new_template, _configure_existing_template) in base class
  for reuse
- Add comprehensive tests for LiveLoadBalancer template serialization

This allows:
1. Base ServerlessResource to be instantiated freely for testing/configuration
2. Subclasses (ServerlessEndpoint, LoadBalancerSlsResource) to enforce template
   requirements during deployment
3. Proper template serialization in GraphQL payload for RunPod API

Fixes: One of templateId, template is required to create an endpoint error when
deploying LiveLoadBalancer with custom image tags like runpod/tetra-rp-lb:local

* fix: LoadBalancer endpoint URL and add CPU support

- Fix: Use correct endpoint URL format for load-balanced endpoints
  (https://{id}.api.runpod.ai instead of https://api.runpod.ai/v2/{id})
  This fixes 404 errors on /ping health check endpoints

- Feature: Add CPU LoadBalancer support
  * Create CpuLoadBalancerSlsResource for CPU-based load-balanced endpoints
  * Create CpuLiveLoadBalancer for local CPU LB development
  * Add TETRA_CPU_LB_IMAGE constant for CPU LB Docker image
  * Update example code to use CpuLiveLoadBalancer for CPU worker
  * Add 8 comprehensive tests for CPU LoadBalancer functionality

- Tests: Add 2 tests for endpoint URL format validation
- All 474 tests passing, 64% code coverage

* fix: Export CpuLiveLoadBalancer and CpuLoadBalancerSlsResource from tetra_rp package

LoadBalancer resources were not being discovered by ResourceDiscovery because
the new CPU variants (CpuLiveLoadBalancer, CpuLoadBalancerSlsResource) were
not exported from the main tetra_rp package. This prevented undeploy from
picking up these resources.

Added exports to:
- TYPE_CHECKING imports for type hints
- __getattr__ function for lazy loading
- __all__ list for public API

This fixes the issue where 'flash undeploy list' could not find LoadBalancer
resources that were deployed with 'flash run --auto-provision'.

* fix: Add API key authentication to LoadBalancer health check

The /ping endpoint for RunPod load-balanced endpoints requires the
RUNPOD_API_KEY header for authentication. Without it, the health check
fails with 401 Unauthorized, causing provisioning to timeout.

This fix adds the Authorization header to the health check request if
the RUNPOD_API_KEY environment variable is available, allowing the
endpoint health check to succeed during provisioning.

Fixes issue where 'flash run --auto-provision' would fail even though
the endpoint was successfully created on RunPod.

* fix(lb): Exclude flashboot from CpuLoadBalancerSlsResource GraphQL payload

CpuLoadBalancerSlsResource was overriding _input_only without including flashboot,
causing it to be sent to the RunPod GraphQL API which doesn't accept this field.
This caused deployment to fail with: Field "flashboot" is not defined by type "EndpointInput".

* fix(lb): Expand CpuInstanceType.ANY to all CPU flavors in CpuLoadBalancerSlsResource

Add field_validator to expand [CpuInstanceType.ANY] to all available CPU instance
types (cpu3g, cpu3c, cpu5c variants). This matches the behavior in CpuServerlessEndpoint
and prevents deployment errors like 'instanceId must be in the format of flavorId-vcpu-ram'.

* refactor(cpu): Move instanceIds validator to CpuEndpointMixin

Move the instanceIds field_validator from CpuServerlessEndpoint to CpuEndpointMixin
so both CpuServerlessEndpoint and CpuLoadBalancerSlsResource share the same validator
that expands [CpuInstanceType.ANY] to all available CPU flavors. This eliminates
code duplication and ensures consistent behavior across all CPU endpoint types.

* test: Update CPU instance test to reflect validator expansion

Update test_cpu_live_load_balancer_defaults_to_cpu_any to verify that
[CpuInstanceType.ANY] is correctly expanded to all available CPU instance types
by the field_validator in CpuEndpointMixin.

* fix(lb): Increase health check timeout from 5s to 15s

Load-balanced workers need more time to respond during cold starts and initialization.
RunPod docs recommend at least 10-15 second timeouts for health checks. Workers may
return 204 during initialization, which is normal and expected.

* fix(lb): Fix CPU load balancer template deployment error

Fixes two bugs in CpuLoadBalancerSlsResource that prevented CPU load balancers from deploying:

1. Added gpuCount and allowedCudaVersions to _input_only exclusion set to prevent
   GPU-specific fields from being sent to RunPod API
2. Overrode set_serverless_template() to call _sync_cpu_fields() first, ensuring
   GPU defaults are overridden to CPU-appropriate values (gpuCount=0)

The RunPod API was rejecting CPU load balancer templates because GPU-specific
fields were being included in the GraphQL payload. These changes align
CpuLoadBalancerSlsResource behavior with CpuServerlessEndpoint.

Also added comprehensive test coverage (30+ tests) to verify:
- GPU fields are correctly overridden to CPU defaults
- GPU fields are excluded from API payloads
- CPU-specific fields are properly included
- Consistency with CpuServerlessEndpoint behavior

* fix(drift): Exclude runtime fields from config hash to prevent false positives

Fixes false positive configuration drift detection by separating concerns:

1. Update ServerlessResource.config_hash to exclude runtime fields
   - Fields like template, templateId, aiKey, userId are API-assigned
   - Prevents false drift when same config is redeployed across processes
   - Now only hashes user-specified configuration

2. Add config_hash override to CpuLoadBalancerSlsResource
   - CPU load balancers hash only CPU-relevant fields
   - Excludes GPU-specific fields and runtime fields
   - Follows same pattern as CpuServerlessEndpoint

3. Fix _has_structural_changes to exclude template/templateId
   - CRITICAL: These runtime fields were causing false structural changes
   - Was forcing unnecessary redeployments despite update() being available
   - Now system correctly uses update() instead of undeploy+deploy

4. Make field serializers robust to handle string/enum values
   - Prevents serialization errors when fields are pre-converted to strings

5. Add comprehensive drift detection tests (16 tests)
   - Test hash stability with runtime field changes
   - Test exclusion of env, template, templateId, and other runtime fields
   - Test that actual config changes (image, flashboot) are detected
   - Test structural change detection behavior
   - Test real-world deployment scenarios

Results:
- Same config deployed multiple times: no false drift
- Different env vars with same config: no false drift
- Template/templateId changes: no false drift
- API-assigned fields: no false drift
- User config changes (image, flashboot): drift detected correctly
- All 512 unit tests pass

* fix(http): Standardize RunPod HTTP client authentication across codebase

Centralizes HTTP client creation for RunPod load-balanced endpoints to prevent
manual Authorization header code duplication and ensure consistent authentication:

1. Create centralized HTTP utility function (src/tetra_rp/core/utils/http.py)
   - New function: get_authenticated_httpx_client()
   - Automatically adds Bearer token Authorization header if RUNPOD_API_KEY set
   - Provides consistent timeout handling (default 30s, customizable)
   - Follows existing GraphQL/REST client authentication pattern

2. Fix critical authentication bug in LoadBalancerSlsStub._execute_via_user_route()
   - Previously: Missing Authorization header (401 errors on user routes)
   - Now: Uses centralized utility for proper authentication
   - Enables direct HTTP calls to user-defined routes with auth

3. Refactor two methods to use centralized utility
   - LoadBalancerSlsStub._execute_function() - removes 7+ lines of manual auth code
   - LoadBalancerSlsResource._check_ping_endpoint() - simplifies auth setup

4. Add comprehensive unit tests (tests/unit/core/utils/test_http.py)
   - Tests API key presence/absence handling
   - Tests custom and default timeout configuration
   - Tests edge cases (empty key, zero timeout)
   - All 7 tests pass with 100% coverage

Results:
- Single source of truth for HTTP authentication (centralized utility)
- Fixes 401 Unauthorized errors on load-balanced endpoints
- Eliminates repetitive manual auth code across 3+ locations
- Easier to maintain and update authentication patterns in future
- All 499 unit tests pass
- Code coverage: 64% (exceeds 35% requirement)

* feat(http): Extend HTTP utilities to cover both sync and async authentication

Extends the centralized HTTP authentication pattern to all RunPod API calls:

1. Add get_authenticated_requests_session() for synchronous requests
   - Creates requests.Session with automatic Bearer token Authorization header
   - Follows same pattern as async get_authenticated_httpx_client()
   - Single source of truth for sync HTTP authentication

2. Refactor template.py to use centralized utility
   - Removes manual Authorization header setup (line 86)
   - Now uses get_authenticated_requests_session() for all template updates
   - Improves error handling with raise_for_status()
   - Token parameter marked deprecated; uses RUNPOD_API_KEY env var

3. Add comprehensive tests for sync utility (4 tests)
   - Tests API key presence/absence handling
   - Tests empty API key edge case
   - Tests Session object validation
   - All tests pass with proper cleanup

Benefits:
- True single source of truth for all RunPod HTTP authentication (sync + async)
- Consistent patterns across entire codebase
- Easier future auth changes across all HTTP client types
- Eliminates manual auth header code in template.py
- All 503 unit tests pass
- Code coverage: 64% (exceeds 35% requirement)

Note: requests.Session doesn't support default timeouts; timeout should be
specified per request (e.g., session.post(url, json=data, timeout=30.0))

* fix: Address PR feedback on HTTP utilities implementation

Addresses three feedback items from code review:

1. Fix breaking parameter order change in update_system_dependencies()
   - Restored original parameter order: template_id, token, system_dependencies, base_entry_cmd
   - Maintains backward compatibility with existing callers
   - Token parameter now optional (default None)

2. Add proper deprecation warning for token parameter
   - Issues DeprecationWarning when token parameter is used
   - Clearly communicates migration to RUNPOD_API_KEY environment variable
   - Follows Python deprecation best practices (warnings.warn with stacklevel=2)

3. Standardize test mocking approach across all health check tests
   - All tests now use consistent 'tetra_rp.core.utils.http.httpx.AsyncClient' patching
   - Removed inconsistent 'side_effect=lambda' pattern
   - Improved test maintainability by using same strategy everywhere

All 503 tests pass with consistent, clean implementation.

* refactor(drift): Extract runtime field constants and improve maintainability

- Extract RUNTIME_FIELDS and EXCLUDED_HASH_FIELDS as ClassVar constants in ServerlessResource for centralized field list management
- Add clarifying comments to enum serializers explaining defensive isinstance() checks for nested model serialization
- Document CPU load balancer field list coupling in docstring with maintenance guidelines
- Add TestSerializerDefensiveBehavior class with 4 tests verifying pre-stringified enum value handling
- Use ClassVar annotation to satisfy Pydantic v2 model field requirements

This reduces maintenance burden by centralizing field definitions and improves code clarity without changing functionality.

* docs: Improve LoadBalancer documentation accuracy and completeness

- Fix health check timeout: Add clarification that timeout is 15 seconds per check
- Add HTTP authentication details explaining RUNPOD_API_KEY header injection
- Document stub decision logic for incomplete routing metadata (fallback behavior)
- Clarify function signature inspection with concrete example showing parameter mapping
- Expand /execute security explanation with explicit threats and best practices
- Add detailed parameter type constraints for deployed endpoints (supported vs unsupported)
- Add troubleshooting guide for missing routing metadata (404 errors)
- Strengthen security warnings about never exposing /execute in production

All documentation now matches actual implementation verified through codebase analysis.

* docs: add resource config drift detection documentation

- comprehensive guide on drift detection implementation
- covers hash computation, field exclusion, and cpu-specific behavior
- includes testing patterns and troubleshooting guide
- documents all fields that trigger drift vs those ignored

* docs: proper name for the file

* test(build): Add comprehensive test coverage for scanner and handler improvements

- Add 6 new scanner tests for directory filtering (.venv, .flash, .runpod exclusion)
- Add test for resource type validation to prevent false positives
- Add test for fallback behavior when resource name extraction fails
- Add test for handling resource names with special characters
- Update existing tests to reflect new dynamic import format and resource name extraction

These tests guarantee that improvements to the scanner (resource type validation,
directory filtering, fallback behavior) and handler generator (dynamic imports for
invalid Python identifiers) won't regress in future changes.

* test(scanner): Fix resource type assertions to match scanner behavior

The scanner now extracts resource names from the name= parameter rather
than using variable names. Update test assertions to expect the actual
resource names ('test-api', 'deployed-api') instead of variable names.

* chore: merge correction

* fix(drift): Remove manual undeploy/deploy from update() method

Use saveEndpoint mutation for all changes instead of manual lifecycle
management. Server-side automatically detects version-triggering fields
(GPU, template, volumes) and increments endpoint version accordingly.

Keep _has_structural_changes() as informational for logging purposes only.
This aligns with RunPod API's version-based deployment model.

* docs(drift): Clarify _has_structural_changes detects version-triggering changes

Update docstring to reflect that this method identifies changes that
trigger server-side version increment and worker recreation, not manual
redeploy cycles. Explain which changes are version-triggering vs rolling
updates, and note that the method is now informational for logging only.

* feat(drift): Enable environment variable drift detection

Remove env from EXCLUDED_HASH_FIELDS so changes to environment variables
trigger drift detection and endpoint updates. Environment changes are
non-version-triggering (rolling updates), so server will apply them via
saveEndpoint without recreating workers.

Add env to CPU LoadBalancer config_hash for consistent behavior across
all resource types. Update comments to reflect that env is user-specified
configuration, not dynamically computed.

* test(drift): Update tests for environment variable drift detection

- test_lb_config_hash_excludes_env_variables → test_lb_config_hash_detects_env_changes
- test_env_var_changes_no_drift → test_env_var_changes_trigger_drift
- test_config_hash_excludes_env_from_drift → test_config_hash_detects_env_from_drift

Update assertions to expect different hashes when env changes, matching
new behavior where environment variable changes trigger drift and updates.

* fix: Address Copilot review feedback on type hints and documentation

- Fix type annotation for timeout parameter in LoadBalancerSlsStub (Optional[float])
- Replace hardcoded "30s" with actual self.timeout in error messages (2 locations)
- Update Resource_Config_Drift_Detection.md to reflect actual EXCLUDED_HASH_FIELDS
- Remove duplicate Load-Balanced Endpoints section from README.md

Addresses Copilot review comments (PR #132, review 3642596664)

* chore: Update Python version compatibility to 3.10-3.14

- Drop Python 3.9 support (EOL)
- Ensure support for Python 3.14
- Update requires-python in pyproject.toml from >=3.9,<3.14 to >=3.10,<3.15
- Update mypy python_version from 3.9 to 3.10
- Update CI matrix to test Python 3.10, 3.11, 3.12, 3.13, 3.14

* chore: Increase code coverage requirement to 65%

* perf(tests): make parallel test execution the default

Implement AE-1748 by making parallel test execution the default for all quality checks,
achieving a 4.6x speedup (from ~96s to ~20s on 12-core machines).

Changes:
- Configure pytest-xdist for parallel test execution
- Add worker isolation fixtures to prevent file system conflicts
- Mark concurrency tests (~26 tests) as serial to avoid race conditions
- Update Makefile to make parallel execution the default
- Provide serial execution commands for debugging (quality-check-serial)

Performance:
- make quality-check: 96s → 20s (4.6x faster)
- All 719 tests pass in both parallel and serial modes
- Coverage maintained at 64%+

Technical details:
- Worker-specific temp directories via worker_temp_dir fixture
- Module-level cache clearing in reset_singletons
- State file isolation per worker via isolate_resource_state_file
- Serial markers on threading-specific tests

Rollback: Use `make quality-check-serial` if parallel execution causes issues

* refactor: remove dead code and add serialization tests

Remove unused functions and improve test coverage:
- Remove deprecated update_system_dependencies from template.py
- Remove unused utility functions from utils.py and json.py
- Add comprehensive test suite for serialization module (100% coverage)

Tests cover serialization/deserialization of args, kwargs, and error handling
for cloudpickle failures across Python 3.10-3.14.

* fix: regenerate uv.lock with correct dependency versions

The previous uv.lock was corrupted with an incomplete pytest-xdist==3.8.0 entry
that referenced pytest==8.4.2 which wasn't locked. Regenerating the lock file
resolves the CI/CD dependency installation failures across all Python versions.

* fix: mark TestLoadBalancerSlsStubRouting as serial

The @Remote decorator used in TestLoadBalancerSlsStubRouting modifies module-level
state and can cause race conditions when run in parallel. Mark this test class as
serial to prevent flaky failures, particularly on Python 3.10.

* fix: simplify parallel test execution - remove unnecessary two-pass approach

All tests pass with xdist parallel execution without needing to filter serial
tests. pytest-xdist handles workers independently and coverage merges properly.
Simplified Makefile to use single -n auto command for all test runs.

* fix: re-add serial marker for TestLoadBalancerSlsStubRouting

The @Remote decorator modifies module-level state that isn't properly isolated
between parallel workers. Adding the serial marker prevents race conditions on
Python 3.12 and 3.14. pytest-xdist respects the serial marker automatically.

* fix: implement proper serial test handling with two-pass execution

Add pytest hook to mark serial tests with xdist_group so they run without
parallelization. Use two-pass test execution:
1. Parallel: Run all non-serial tests with -n auto
2. Serial: Run serial tests without parallelization, appending coverage

This ensures:
- No race conditions in serial tests (file locking, @Remote decorator)
- Coverage properly merged across both passes
- Maintains ~4.6x speedup for non-serial tests

* fix: implement proper serial test handling with two-pass execution

Add pytest hook to mark serial tests with xdist_group so they run without
parallelization. Use two-pass test execution:
1. Parallel: Run all non-serial tests with -n auto (--cov-fail-under=0)
2. Serial: Run serial tests without parallelization, appending coverage

This ensures:
- No race conditions in serial tests (file locking, @Remote decorator)
- Coverage properly merged across both passes
- Maintains ~4.6x speedup for non-serial tests
- Both passes complete even if first has < 65% coverage

* chore: consistent coverage failure point

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* chore: this is about reporting coverage (no need to fail)

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* chore: don't know why it was 64

* chore: make test commands parallel by default with serial variants

- All test commands now run in parallel by default using pytest-xdist auto-detect
- Serial versions available with -serial suffix for debugging
- test-parallel, test-parallel-workers, test-unit-parallel removed in favor of cleaner naming
- test-workers added as shorthand for specifying worker count
- test-fast now includes parallel execution
- Quality check commands already use parallel-by-default test-coverage

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
deanq added a commit that referenced this pull request Jan 22, 2026
* feat(runtime): Add generic handler factory for serverless execution

Implement a factory function that creates RunPod serverless handlers,
eliminating code duplication across generated handler files.

The generic_handler module provides:
- create_handler(function_registry) factory that accepts a dict of
  function/class objects and returns a RunPod-compatible handler
- Automatic serialization/deserialization using cloudpickle + base64
- Support for both function execution and class instantiation + method calls
- Structured error responses with full tracebacks for debugging
- Load manifest for cross-endpoint function discovery

This design centralizes all handler logic in one place, making it easy to:
- Fix bugs once, benefit all handlers
- Add new features without regenerating projects
- Keep deployment packages small (handler files are ~23 lines each)

Implementation:
- deserialize_arguments(): Base64 + cloudpickle decoding
- serialize_result(): Cloudpickle + base64 encoding
- execute_function(): Handles function vs. class execution
- load_manifest(): Loads flash_manifest.json for service discovery

* feat(cli): Add handler generator, manifest builder, and scanner for build process

Implement the build pipeline components that work together to generate
serverless handlers from @Remote decorated functions.

Three core components:

1. RemoteDecoratorScanner (scanner.py)
   - Uses Python AST to discover all @Remote decorated functions
   - Extracts function metadata: name, module, async status, is_class
   - Groups functions by resource_config for handler generation
   - Handles edge cases like decorated classes and async functions

2. ManifestBuilder (manifest.py)
   - Groups functions by their resource_config
   - Creates flash_manifest.json structure for service discovery
   - Maps functions to their modules and handler files
   - Enables cross-endpoint function routing at runtime

3. HandlerGenerator (handler_generator.py)
   - Creates lightweight handler_*.py files for each resource config
   - Each handler imports functions and registers them in FUNCTION_REGISTRY
   - Handler delegates to create_handler() factory from generic_handler
   - Generated handlers are ~23 lines (vs ~98 with duplication)

Build Pipeline Flow:
1. Scanner discovers @Remote functions
2. ManifestBuilder groups them by resource_config
3. HandlerGenerator creates handler_*.py for each group
4. All files + manifest bundled into archive.tar.gz

This eliminates ~95% duplication across handlers by using the factory pattern
instead of template-based generation.

* test(runtime): Add comprehensive tests for generic handler

Implement 19 unit tests covering all major paths through the generic_handler
factory and its helper functions.

Test Coverage:

Serialization/Deserialization (7 tests):
- serialize_result() with simple values, dicts, lists
- deserialize_arguments() with empty, args-only, kwargs-only, mixed inputs
- Round-trip encoding/decoding of cloudpickle + base64

Function Execution (4 tests):
- Simple function execution with positional and keyword arguments
- Keyword argument handling
- Class instantiation and method calls
- Argument passing to instance methods

Handler Factory (8 tests):
- create_handler() returns callable RunPod handler
- Handler with simple function registry
- Missing function error handling (returns error response, not exception)
- Function exceptions caught with traceback included
- Multiple functions in single registry
- Complex Python objects (classes, lambdas, closures)
- Empty registry edge case
- Default execution_type parameter
- None return values
- Correct RunPod response format (success, result/error, traceback)

Test Strategy:
- Arrange-Act-Assert pattern for clarity
- Isolated unit tests (no external dependencies)
- Tests verify behavior, not implementation
- Error cases tested for proper error handling
- All serialization tested for round-trip correctness

All tests passing, 83% coverage on generic_handler.py

* test(cli): Add tests for handler generation, manifest building, and scanning

Implement integration tests validating the build pipeline components work
correctly together.

Test Coverage:

HandlerGenerator Tests:
- Handler files created with correct names (handler_<resource_name>.py)
- Generated files import required functions from workers
- FUNCTION_REGISTRY properly formatted
- create_handler() imported from generic_handler
- Handler creation via factory
- RunPod start call present and correct
- Multiple handlers generated for multiple resource configs

ManifestBuilder Tests:
- Manifest structure with correct version and metadata
- Resources grouped by resource_config
- Handler file paths correct
- Function metadata preserved (name, module, is_async, is_class)
- Function registry mapping complete

ScannerTests:
- @Remote decorated functions discovered via AST
- Function metadata extracted correctly
- Module paths resolved properly
- Async functions detected
- Class methods detected
- Edge cases handled (multiple decorators, nested classes)

Test Strategy:
- Integration tests verify components work together
- Tests verify generated files are syntactically correct
- Tests validate data structures match expected schemas
- No external dependencies in build process

Validates that the entire build pipeline:
1. Discovers functions correctly
2. Groups them appropriately
3. Generates valid Python handler files
4. Creates correct manifest structure

* docs(runtime): Document generic handler factory architecture

Add comprehensive architecture documentation explaining why the factory
pattern was chosen and how it works.

Documentation includes:

Overview & Context:
- Problem statement: Handler files had 95% duplication
- Design decision: Use factory function instead of templates
- Benefits: Single source of truth, easier maintenance, consistency

Architecture Diagrams (MermaidJS):
- High-level flow: @Remote functions → Scanner → Manifest → Handlers → Factory
- Component relationships: HandlerGenerator, GeneratedHandler, generic_handler
- Function registry pattern: Discovery → Grouping → Registration → Factory

Implementation Details:
- create_handler(function_registry) signature and behavior
- deserialize_arguments(): Base64 + cloudpickle decoding
- serialize_result(): Cloudpickle + base64 encoding
- execute_function(): Function vs. class execution
- load_manifest(): Service discovery via flash_manifest.json

Design Decisions (with rationale):
- Factory Pattern over Inheritance: Simpler, less coupling, easier to test
- CloudPickle + Base64: Handles arbitrary objects, safe JSON transmission
- Manifest in Generic Handler: Runtime service discovery requirement
- Structured Error Responses: Debugging aid, functional error handling
- Both Execution Types: Supports stateful classes and pure functions

Usage Examples:
- Simple function handler
- Class execution with methods
- Multiple functions in one handler

Build Process Integration:
- 4-phase pipeline: Scanner → Grouping → Generation → Packaging
- Manifest structure and contents
- Generated handler structure (~23 lines)

Testing Strategy:
- 19 unit tests covering all major paths
- 7 integration tests verifying handler generation
- Manual testing with example applications

Performance:
- Zero runtime penalty (factory called once at startup)
- No additional indirection in request path

* docs(cli): Add flash build command documentation

Document the flash build command and update CLI README to include it.

New Documentation:

flash-build.md includes:

Usage & Options:
- Command syntax: flash build [OPTIONS]
- --no-deps: Skip transitive dependencies (faster, smaller archives)
- --keep-build: Keep build directory for inspection/debugging
- --output, -o: Custom archive name (default: archive.tar.gz)

What It Does (5-step process):
1. Discovery: Scan for @Remote decorated functions
2. Grouping: Group functions by resource_config
3. Handler Generation: Create lightweight handler files
4. Manifest Creation: Generate flash_manifest.json
5. Packaging: Create archive.tar.gz for deployment

Build Artifacts:
- .flash/archive.tar.gz: Deployment package (ready for RunPod)
- .flash/flash_manifest.json: Service discovery configuration
- .flash/.build/: Temporary build directory

Handler Generation:
- Explains factory pattern and minimal handler files
- Links to Runtime_Generic_Handler.md for details

Dependency Management:
- Default behavior: Install all dependencies including transitive
- --no-deps: Only direct dependencies (when base image has transitive)
- Trade-offs explained

Cross-Endpoint Function Calls:
- Example showing GPU and CPU endpoints
- Manifest enables routing automatically

Output & Troubleshooting:
- Sample build output with progress indicators
- Common failure scenarios and solutions
- How to debug with --keep-build

Next Steps:
- Test locally with flash run
- Deploy to RunPod
- Monitor with flash undeploy list

Updated CLI README.md:
- Added flash build to command list in sequence
- Links to full flash-build.md documentation

* docs: Add build process and handler generation section to README

Add a new section explaining how the build system works and why the
factory pattern reduces code duplication.

New Section: Build Process and Handler Generation

Explains:

How Flash Builds Your Application (5-step pipeline):
1. Discovery: Scans code for @Remote decorated functions
2. Grouping: Groups functions by resource_config
3. Handler Generation: Creates lightweight handler files
4. Manifest Creation: Generates flash_manifest.json for service discovery
5. Packaging: Bundles everything into archive.tar.gz

Handler Architecture (with code example):
- Shows generated handler using factory pattern
- Single source of truth: All handler logic in one place
- Easier maintenance: Bug fixes don't require rebuilding projects

Cross-Endpoint Function Calls:
- Example of GPU and CPU endpoints calling each other
- Manifest and runtime wrapper handle service discovery

Build Artifacts:
- .flash/.build/: Temporary build directory
- .flash/archive.tar.gz: Deployment package
- .flash/flash_manifest.json: Service configuration

Links to detailed documentation:
- docs/Runtime_Generic_Handler.md for architecture details
- src/tetra_rp/cli/docs/flash-build.md for CLI reference

This section bridges the main README and detailed documentation,
providing entry point for new users discovering the build system.

* feat(cli): Integrate build utilities into flash build command

Wire up the handler generator, manifest builder, and scanner into the
actual flash build command implementation.

Changes to build.py:

1. Integration:
   - Import RemoteDecoratorScanner for function discovery
   - Import ManifestBuilder for manifest creation
   - Import HandlerGenerator for handler file creation
   - Call these in sequence during the build process

2. Build Pipeline:
   - After copying project files, scan for @Remote functions
   - Build manifest from discovered functions
   - Generate handler files for each resource config
   - Write manifest to build directory
   - Progress indicators show what's being generated

3. Fixes:
   - Change .tetra directory references to .flash
   - Uncomment actual build logic (was showing "Coming Soon" message)
   - Fix progress messages to show actual file counts

4. Error Handling:
   - Try/catch around handler generation
   - Warning shown if generation fails but build continues
   - User can debug with --keep-build flag

Build Flow Now:
1. Load ignore patterns
2. Collect project files
3. Create build directory
4. Copy files to build directory
5. [NEW] Scan for @Remote functions
6. [NEW] Build and write manifest
7. [NEW] Generate handler files
8. Install dependencies
9. Create archive
10. Clean up build directory (unless --keep-build)

Dependencies:
- Updated uv.lock with all required dependencies

* refactor(build): Fix directory structure and add comprehensive error handling

**Critical Fixes:**
- Remove "Coming Soon" message blocking build command execution
- Fix build directory to use .flash/.build/ directly (no app_name subdirectory)
- Fix tarball to extract with flat structure using arcname="."
- Fix cleanup to remove correct build directory

**Error Handling & Validation:**
- Add specific exception handling (ImportError, SyntaxError, ValueError)
- Add import validation to generated handlers
- Add duplicate function name detection across resources
- Add proper error logging throughout build process

**Resource Type Tracking:**
- Add resource_type field to RemoteFunctionMetadata
- Track actual resource types (LiveServerless, CpuLiveServerless)
- Use actual types in manifest instead of hardcoding

**Robustness Improvements:**
- Add handler import validation post-generation
- Add manifest path fallback search (cwd, module dir, legacy location)
- Add resource name sanitization for safe filenames
- Add specific exception logging in scanner (UnicodeDecodeError, SyntaxError)

**User Experience:**
- Add troubleshooting section to README
- Update manifest path documentation in docs
- Change "Zero Runtime Penalty" to "Minimal Runtime Overhead"
- Mark future enhancements as "Not Yet Implemented"
- Improve build success message with next steps

Fixes all 20 issues identified in code review (issues #1-13, #19-22)

* feat(resources): Add LoadBalancerSlsResource for LB endpoints

Implement LoadBalancerSlsResource class for provisioning RunPod load-balanced
serverless endpoints. Load-balanced endpoints expose HTTP servers directly to
clients without queue-based processing, enabling REST APIs, webhooks, and
real-time communication patterns.

Key features:
- Type enforcement (always LB, never QB)
- Scaler validation (REQUEST_COUNT required, not QUEUE_DELAY)
- Health check polling via /ping endpoint (200/204 = healthy)
- Post-deployment verification with configurable retries
- Async and sync health check methods
- Comprehensive unit tests
- Full documentation with architecture diagrams and examples

Architecture:
- Extends ServerlessResource with LB-specific behavior
- Validates configuration before deployment
- Polls /ping endpoint until healthy (10 retries × 5s = 50s timeout)
- Raises TimeoutError if endpoint fails to become healthy

This forms the foundation for Mothership architecture where a load-balanced
endpoint serves as a directory server for child endpoints.

* fix(test): Fix LoadBalancerSlsResource deployment test mocks

Import ServerlessResource directly and use patch.object on the imported class
instead of string-based patches. This ensures the mocks properly intercept the
parent class's _do_deploy method when called via super(). Simplifies mock
configuration and removes an unused variable assertion.

Fixes the three failing deployment tests that were making real GraphQL API calls.
All tests now pass: 418 passed, 1 skipped.

* feat(resources): Phase 1 - Core infrastructure for @Remote on LB endpoints

Implement core infrastructure for enabling @Remote decorator on
LoadBalancerSlsResource endpoints with HTTP method/path routing.

Changes:
- Create LoadBalancerSlsStub: HTTP-based stub for direct endpoint execution
  (src/tetra_rp/stubs/load_balancer_sls.py, 170 lines)
  - Serializes functions and arguments using cloudpickle + base64
  - Direct HTTP POST to /execute endpoint (no queue polling)
  - Proper error handling and deserialization

- Register stub with singledispatch (src/tetra_rp/stubs/registry.py)
  - Enables @Remote to dispatch to LoadBalancerSlsStub for LB resources

- Extend @Remote decorator with HTTP routing parameters (src/tetra_rp/client.py)
  - Add 'method' parameter: GET, POST, PUT, DELETE, PATCH
  - Add 'path' parameter: /api/endpoint routes
  - Validate method/path required for LoadBalancerSlsResource
  - Store routing metadata on decorated functions/classes
  - Warn if routing params used with non-LB resources

Foundation for Phase 2 (Build system integration) and Phase 3 (Local dev).

* feat(build): Phase 2.1 - Enhanced scanner for HTTP routing extraction

Update RemoteDecoratorScanner to extract HTTP method and path from
@Remote decorator for LoadBalancerSlsResource endpoints.

Changes:
- Add http_method and http_path fields to RemoteFunctionMetadata
- Add _extract_http_routing() method to parse decorator keywords
- Extract method (GET, POST, PUT, DELETE, PATCH) from decorator
- Extract path (/api/process) from decorator
- Store routing metadata for manifest generation

Foundation for Phase 2.2 (Manifest updates) and Phase 2.3 (Handler generation).

* feat(build): Phase 2.2 - Updated manifest schema for HTTP routing

Enhance ManifestBuilder to support HTTP method/path routing for
LoadBalancerSlsResource endpoints.

Changes:
- Add http_method and http_path fields to ManifestFunction
- Validate LB endpoints have both method and path
- Detect and prevent route conflicts (same method + path)
- Prevent use of reserved paths (/execute, /ping)
- Add 'routes' section to manifest for LB endpoints
- Conditional inclusion of routing fields (only for LB)

Manifest structure for LB endpoints now includes:
{
  "resources": {
    "api_service": {
      "resource_type": "LoadBalancerSlsResource",
      "functions": [
        {
          "name": "process_data",
          "http_method": "POST",
          "http_path": "/api/process"
        }
      ]
    }
  },
  "routes": {
    "api_service": {
      "POST /api/process": "process_data"
    }
  }
}

* feat(cli): Add LB handler generator for FastAPI app creation

Implement LBHandlerGenerator to create FastAPI applications for
LoadBalancerSlsResource endpoints with HTTP method/path routing.

Key features:
- Generates FastAPI apps with explicit route registry
- Creates (method, path) -> function mappings from manifest
- Validates route conflicts and reserved paths
- Imports user functions and creates dynamic routes
- Includes required /ping health check endpoint
- Validates generated handler Python syntax via import

Generated handler structure enables:
- Direct HTTP routing to user functions via FastAPI
- Framework /execute endpoint for @Remote stub execution
- Local development with uvicorn

* feat(runtime): Implement LB handler factory for FastAPI app creation

Create create_lb_handler() factory function that dynamically builds FastAPI
applications from route registries for LoadBalancerSlsResource endpoints.

Key features:
- Accepts route_registry: Dict[(method, path)] -> handler_function mapping
- Registers all user-defined routes from registry to FastAPI app
- Provides /execute endpoint for @Remote stub function execution
- Handles async function execution automatically
- Serializes results with cloudpickle + base64 encoding
- Comprehensive error handling with detailed logging

The /execute endpoint enables:
- Remote function code execution via @Remote decorator
- Automatic argument deserialization from cloudpickle/base64
- Result serialization for transmission back to client
- Support for both sync and async functions

* feat(cli): Route build command to separate handlers for LB endpoints

Update build command to use appropriate handler generators based on
resource type. Separates LoadBalancerSlsResource endpoints (using FastAPI)
from queue-based endpoints (using generic handler).

Changes:
- Import LBHandlerGenerator alongside HandlerGenerator
- Inspect manifest resources and separate by type
- Generate LB handlers via LBHandlerGenerator
- Generate QB handlers via HandlerGenerator
- Combine all generated handler paths for summary

Enables users to mix LB and QB endpoints in same project with correct
code generation for each resource type.

* feat(resources): Add LiveLoadBalancer for local LB endpoint testing

Implement LiveLoadBalancer resource following the LiveServerless pattern
for local development and testing of load-balanced endpoints.

Changes:
- Add TETRA_LB_IMAGE constant for load-balanced Tetra image
- Create LiveLoadBalancer class extending LoadBalancerSlsResource
- Uses LiveServerlessMixin to lock imageName to Tetra LB image
- Register LiveLoadBalancer with LoadBalancerSlsStub in singledispatch
- Export LiveLoadBalancer from core.resources and top-level __init__

This enables users to test LB-based functions locally before deploying,
using the same pattern as LiveServerless for queue-based endpoints.

Users can now write:
  from tetra_rp import LiveLoadBalancer, remote

  api = LiveLoadBalancer(name="test-api")

  @Remote(api, method="POST", path="/api/process")
  async def process_data(x, y):
      return {"result": x + y}

  result = await process_data(5, 3)  # Local execution

* test(stubs): Add comprehensive unit tests for LoadBalancerSlsStub

Implement unit tests for LoadBalancerSlsStub covering:
- Request preparation with arguments and dependencies
- Response handling for success and error cases
- Error handling for invalid responses
- Base64 encoding/decoding of serialized data
- Endpoint URL validation
- Timeout and HTTP error handling

Test coverage:
- _prepare_request: 4 tests
- _handle_response: 5 tests
- _execute_function: 3 error case tests
- __call__: 2 integration tests

Tests verify proper function serialization, argument handling,
error propagation, and response deserialization.

* fix(test): Correct LB endpoint test decorator to match assertions

Fix test_load_balancer_vs_queue_based_endpoints by updating the @Remote
decorator to use method='POST' and path='/api/echo' to match the test
assertions. This was a test-level bug where the decorator definition
didn't match what was being asserted.

* docs: Add comprehensive documentation for @Remote with LoadBalancer endpoints

- Using_Remote_With_LoadBalancer.md: User guide for HTTP routing, local development, building and deploying
- LoadBalancer_Runtime_Architecture.md: Technical details on deployment, request flows, security, and performance
- Updated README.md with LoadBalancer section and code example
- Updated Load_Balancer_Endpoints.md with cross-references to new guides

* security: Remove /execute from deployed LoadBalancer endpoints

Split @Remote execution behavior between local and deployed:
- LiveLoadBalancer (local): Uses /execute endpoint for function serialization
- LoadBalancerSlsResource (deployed): Uses user-defined routes with HTTP param mapping

Changes:
1. LoadBalancerSlsStub routing detection:
   - _should_use_execute_endpoint() determines execution path
   - _execute_via_user_route() maps args to JSON and POSTs to user routes
   - Auto-detects resource type and routing metadata

2. Conditional /execute registration:
   - create_lb_handler() now accepts include_execute parameter
   - Generated handlers default to include_execute=False (security)
   - LiveLoadBalancer can enable /execute if needed

3. Updated handler generator:
   - Added clarity comments on /execute exclusion for deployed endpoints

4. Comprehensive test coverage:
   - 8 new tests for routing detection and execution paths
   - All 31 tests passing (22 unit + 9 integration)

5. Documentation updates:
   - Using_Remote_With_LoadBalancer.md: clarified /execute scope
   - Added 'Local vs Deployed Execution' section explaining differences
   - LoadBalancer_Runtime_Architecture.md: updated execution model
   - Added troubleshooting for deployed endpoint scenarios

Security improvement:
- Deployed endpoints only expose user-defined routes
- /execute endpoint removed from production (prevents arbitrary code execution)
- Lower attack surface for deployed endpoints

* feat(build): Phase 4 - Fix LiveLoadBalancer handler generation to include /execute endpoint

- Modified manifest.py to validate LiveLoadBalancer endpoints like LoadBalancerSlsResource
- Updated lb_handler_generator to:
  - Include LiveLoadBalancer in handler generation filter
  - Pass include_execute=True for LiveLoadBalancer (local dev)
  - Pass include_execute=False for LoadBalancerSlsResource (deployed)
- Added integration tests:
  - Verify LiveLoadBalancer handlers include /execute endpoint
  - Verify deployed handlers exclude /execute endpoint
- Fixes critical bug: LiveLoadBalancer now gets /execute endpoint in generated handlers

* fix(scanner): Discover LoadBalancer resources in addition to Serverless resources

- Updated scanner to extract LiveLoadBalancer and LoadBalancerSlsResource resources
- Previously only looked for 'Serverless' in class name, missing LoadBalancer endpoints
- Now checks for both 'Serverless' and 'LoadBalancer' in resource type names
- Added integration test to verify scanner discovers both resource types
- Fixes critical bug that prevented flash build from finding LoadBalancer endpoints

* chore: Format code for line length and remove unused imports

- Wrap long lines in manifest.py, lb_handler.py, and load_balancer_sls.py
- Remove unused httpx import in test_load_balancer_sls_stub.py
- Apply consistent formatting across codebase

* fix: Address PR #131 review feedback

- Scanner: Use exact type name matching instead of substring matching
  - Whitelist specific resource types to avoid false positives
  - Prevents matching classes like 'MyServerlessHelper' or 'LoadBalancerUtils'

- Type hints: Use Optional[str] for nullable fields in manifest
  - ManifestFunction.http_method and http_path now properly typed

- Timeout: Make HTTP client timeout configurable
  - Added LoadBalancerSlsStub.DEFAULT_TIMEOUT class attribute
  - Added timeout parameter to __init__
  - Updated both _execute_function and _execute_via_user_route to use self.timeout

- Deprecated datetime: Replace datetime.utcnow() with datetime.now(timezone.utc)
  - Updated manifest.py and test_lb_remote_execution.py
  - Ensures Python 3.12+ compatibility

* style: Format datetime chaining for line length

* fix: LiveLoadBalancer template not serialized to RunPod GraphQL

The set_serverless_template model_validator was being overwritten by sync_input_fields
(both had mode="after"). In Pydantic v2, when two validators with the same mode are
defined in a class, only one is registered.

This caused templates to never be created from imageName, resulting in:
  "GraphQL errors: One of templateId, template is required to create an endpoint"

Solution:
- Move set_serverless_template validator from ServerlessResource base class to subclasses
  (ServerlessEndpoint and LoadBalancerSlsResource) where the validation is actually needed
- Keep helper methods (_create_new_template, _configure_existing_template) in base class
  for reuse
- Add comprehensive tests for LiveLoadBalancer template serialization

This allows:
1. Base ServerlessResource to be instantiated freely for testing/configuration
2. Subclasses (ServerlessEndpoint, LoadBalancerSlsResource) to enforce template
   requirements during deployment
3. Proper template serialization in GraphQL payload for RunPod API

Fixes: One of templateId, template is required to create an endpoint error when
deploying LiveLoadBalancer with custom image tags like runpod/tetra-rp-lb:local

* fix: LoadBalancer endpoint URL and add CPU support

- Fix: Use correct endpoint URL format for load-balanced endpoints
  (https://{id}.api.runpod.ai instead of https://api.runpod.ai/v2/{id})
  This fixes 404 errors on /ping health check endpoints

- Feature: Add CPU LoadBalancer support
  * Create CpuLoadBalancerSlsResource for CPU-based load-balanced endpoints
  * Create CpuLiveLoadBalancer for local CPU LB development
  * Add TETRA_CPU_LB_IMAGE constant for CPU LB Docker image
  * Update example code to use CpuLiveLoadBalancer for CPU worker
  * Add 8 comprehensive tests for CPU LoadBalancer functionality

- Tests: Add 2 tests for endpoint URL format validation
- All 474 tests passing, 64% code coverage

* fix: Export CpuLiveLoadBalancer and CpuLoadBalancerSlsResource from tetra_rp package

LoadBalancer resources were not being discovered by ResourceDiscovery because
the new CPU variants (CpuLiveLoadBalancer, CpuLoadBalancerSlsResource) were
not exported from the main tetra_rp package. This prevented undeploy from
picking up these resources.

Added exports to:
- TYPE_CHECKING imports for type hints
- __getattr__ function for lazy loading
- __all__ list for public API

This fixes the issue where 'flash undeploy list' could not find LoadBalancer
resources that were deployed with 'flash run --auto-provision'.

* fix: Add API key authentication to LoadBalancer health check

The /ping endpoint for RunPod load-balanced endpoints requires the
RUNPOD_API_KEY header for authentication. Without it, the health check
fails with 401 Unauthorized, causing provisioning to timeout.

This fix adds the Authorization header to the health check request if
the RUNPOD_API_KEY environment variable is available, allowing the
endpoint health check to succeed during provisioning.

Fixes issue where 'flash run --auto-provision' would fail even though
the endpoint was successfully created on RunPod.

* fix(lb): Exclude flashboot from CpuLoadBalancerSlsResource GraphQL payload

CpuLoadBalancerSlsResource was overriding _input_only without including flashboot,
causing it to be sent to the RunPod GraphQL API which doesn't accept this field.
This caused deployment to fail with: Field "flashboot" is not defined by type "EndpointInput".

* fix(lb): Expand CpuInstanceType.ANY to all CPU flavors in CpuLoadBalancerSlsResource

Add field_validator to expand [CpuInstanceType.ANY] to all available CPU instance
types (cpu3g, cpu3c, cpu5c variants). This matches the behavior in CpuServerlessEndpoint
and prevents deployment errors like 'instanceId must be in the format of flavorId-vcpu-ram'.

* refactor(cpu): Move instanceIds validator to CpuEndpointMixin

Move the instanceIds field_validator from CpuServerlessEndpoint to CpuEndpointMixin
so both CpuServerlessEndpoint and CpuLoadBalancerSlsResource share the same validator
that expands [CpuInstanceType.ANY] to all available CPU flavors. This eliminates
code duplication and ensures consistent behavior across all CPU endpoint types.

* test: Update CPU instance test to reflect validator expansion

Update test_cpu_live_load_balancer_defaults_to_cpu_any to verify that
[CpuInstanceType.ANY] is correctly expanded to all available CPU instance types
by the field_validator in CpuEndpointMixin.

* fix(lb): Increase health check timeout from 5s to 15s

Load-balanced workers need more time to respond during cold starts and initialization.
RunPod docs recommend at least 10-15 second timeouts for health checks. Workers may
return 204 during initialization, which is normal and expected.

* fix(lb): Fix CPU load balancer template deployment error

Fixes two bugs in CpuLoadBalancerSlsResource that prevented CPU load balancers from deploying:

1. Added gpuCount and allowedCudaVersions to _input_only exclusion set to prevent
   GPU-specific fields from being sent to RunPod API
2. Overrode set_serverless_template() to call _sync_cpu_fields() first, ensuring
   GPU defaults are overridden to CPU-appropriate values (gpuCount=0)

The RunPod API was rejecting CPU load balancer templates because GPU-specific
fields were being included in the GraphQL payload. These changes align
CpuLoadBalancerSlsResource behavior with CpuServerlessEndpoint.

Also added comprehensive test coverage (30+ tests) to verify:
- GPU fields are correctly overridden to CPU defaults
- GPU fields are excluded from API payloads
- CPU-specific fields are properly included
- Consistency with CpuServerlessEndpoint behavior

* fix(drift): Exclude runtime fields from config hash to prevent false positives

Fixes false positive configuration drift detection by separating concerns:

1. Update ServerlessResource.config_hash to exclude runtime fields
   - Fields like template, templateId, aiKey, userId are API-assigned
   - Prevents false drift when same config is redeployed across processes
   - Now only hashes user-specified configuration

2. Add config_hash override to CpuLoadBalancerSlsResource
   - CPU load balancers hash only CPU-relevant fields
   - Excludes GPU-specific fields and runtime fields
   - Follows same pattern as CpuServerlessEndpoint

3. Fix _has_structural_changes to exclude template/templateId
   - CRITICAL: These runtime fields were causing false structural changes
   - Was forcing unnecessary redeployments despite update() being available
   - Now system correctly uses update() instead of undeploy+deploy

4. Make field serializers robust to handle string/enum values
   - Prevents serialization errors when fields are pre-converted to strings

5. Add comprehensive drift detection tests (16 tests)
   - Test hash stability with runtime field changes
   - Test exclusion of env, template, templateId, and other runtime fields
   - Test that actual config changes (image, flashboot) are detected
   - Test structural change detection behavior
   - Test real-world deployment scenarios

Results:
- Same config deployed multiple times: no false drift
- Different env vars with same config: no false drift
- Template/templateId changes: no false drift
- API-assigned fields: no false drift
- User config changes (image, flashboot): drift detected correctly
- All 512 unit tests pass

* fix(http): Standardize RunPod HTTP client authentication across codebase

Centralizes HTTP client creation for RunPod load-balanced endpoints to prevent
manual Authorization header code duplication and ensure consistent authentication:

1. Create centralized HTTP utility function (src/tetra_rp/core/utils/http.py)
   - New function: get_authenticated_httpx_client()
   - Automatically adds Bearer token Authorization header if RUNPOD_API_KEY set
   - Provides consistent timeout handling (default 30s, customizable)
   - Follows existing GraphQL/REST client authentication pattern

2. Fix critical authentication bug in LoadBalancerSlsStub._execute_via_user_route()
   - Previously: Missing Authorization header (401 errors on user routes)
   - Now: Uses centralized utility for proper authentication
   - Enables direct HTTP calls to user-defined routes with auth

3. Refactor two methods to use centralized utility
   - LoadBalancerSlsStub._execute_function() - removes 7+ lines of manual auth code
   - LoadBalancerSlsResource._check_ping_endpoint() - simplifies auth setup

4. Add comprehensive unit tests (tests/unit/core/utils/test_http.py)
   - Tests API key presence/absence handling
   - Tests custom and default timeout configuration
   - Tests edge cases (empty key, zero timeout)
   - All 7 tests pass with 100% coverage

Results:
- Single source of truth for HTTP authentication (centralized utility)
- Fixes 401 Unauthorized errors on load-balanced endpoints
- Eliminates repetitive manual auth code across 3+ locations
- Easier to maintain and update authentication patterns in future
- All 499 unit tests pass
- Code coverage: 64% (exceeds 35% requirement)

* feat(http): Extend HTTP utilities to cover both sync and async authentication

Extends the centralized HTTP authentication pattern to all RunPod API calls:

1. Add get_authenticated_requests_session() for synchronous requests
   - Creates requests.Session with automatic Bearer token Authorization header
   - Follows same pattern as async get_authenticated_httpx_client()
   - Single source of truth for sync HTTP authentication

2. Refactor template.py to use centralized utility
   - Removes manual Authorization header setup (line 86)
   - Now uses get_authenticated_requests_session() for all template updates
   - Improves error handling with raise_for_status()
   - Token parameter marked deprecated; uses RUNPOD_API_KEY env var

3. Add comprehensive tests for sync utility (4 tests)
   - Tests API key presence/absence handling
   - Tests empty API key edge case
   - Tests Session object validation
   - All tests pass with proper cleanup

Benefits:
- True single source of truth for all RunPod HTTP authentication (sync + async)
- Consistent patterns across entire codebase
- Easier future auth changes across all HTTP client types
- Eliminates manual auth header code in template.py
- All 503 unit tests pass
- Code coverage: 64% (exceeds 35% requirement)

Note: requests.Session doesn't support default timeouts; timeout should be
specified per request (e.g., session.post(url, json=data, timeout=30.0))

* fix: Address PR feedback on HTTP utilities implementation

Addresses three feedback items from code review:

1. Fix breaking parameter order change in update_system_dependencies()
   - Restored original parameter order: template_id, token, system_dependencies, base_entry_cmd
   - Maintains backward compatibility with existing callers
   - Token parameter now optional (default None)

2. Add proper deprecation warning for token parameter
   - Issues DeprecationWarning when token parameter is used
   - Clearly communicates migration to RUNPOD_API_KEY environment variable
   - Follows Python deprecation best practices (warnings.warn with stacklevel=2)

3. Standardize test mocking approach across all health check tests
   - All tests now use consistent 'tetra_rp.core.utils.http.httpx.AsyncClient' patching
   - Removed inconsistent 'side_effect=lambda' pattern
   - Improved test maintainability by using same strategy everywhere

All 503 tests pass with consistent, clean implementation.

* refactor(drift): Extract runtime field constants and improve maintainability

- Extract RUNTIME_FIELDS and EXCLUDED_HASH_FIELDS as ClassVar constants in ServerlessResource for centralized field list management
- Add clarifying comments to enum serializers explaining defensive isinstance() checks for nested model serialization
- Document CPU load balancer field list coupling in docstring with maintenance guidelines
- Add TestSerializerDefensiveBehavior class with 4 tests verifying pre-stringified enum value handling
- Use ClassVar annotation to satisfy Pydantic v2 model field requirements

This reduces maintenance burden by centralizing field definitions and improves code clarity without changing functionality.

* docs: Improve LoadBalancer documentation accuracy and completeness

- Fix health check timeout: Add clarification that timeout is 15 seconds per check
- Add HTTP authentication details explaining RUNPOD_API_KEY header injection
- Document stub decision logic for incomplete routing metadata (fallback behavior)
- Clarify function signature inspection with concrete example showing parameter mapping
- Expand /execute security explanation with explicit threats and best practices
- Add detailed parameter type constraints for deployed endpoints (supported vs unsupported)
- Add troubleshooting guide for missing routing metadata (404 errors)
- Strengthen security warnings about never exposing /execute in production

All documentation now matches actual implementation verified through codebase analysis.

* docs: add resource config drift detection documentation

- comprehensive guide on drift detection implementation
- covers hash computation, field exclusion, and cpu-specific behavior
- includes testing patterns and troubleshooting guide
- documents all fields that trigger drift vs those ignored

* docs: proper name for the file

* test(build): Add comprehensive test coverage for scanner and handler improvements

- Add 6 new scanner tests for directory filtering (.venv, .flash, .runpod exclusion)
- Add test for resource type validation to prevent false positives
- Add test for fallback behavior when resource name extraction fails
- Add test for handling resource names with special characters
- Update existing tests to reflect new dynamic import format and resource name extraction

These tests guarantee that improvements to the scanner (resource type validation,
directory filtering, fallback behavior) and handler generator (dynamic imports for
invalid Python identifiers) won't regress in future changes.

* test(scanner): Fix resource type assertions to match scanner behavior

The scanner now extracts resource names from the name= parameter rather
than using variable names. Update test assertions to expect the actual
resource names ('test-api', 'deployed-api') instead of variable names.

* chore: merge correction

* fix(drift): Remove manual undeploy/deploy from update() method

Use saveEndpoint mutation for all changes instead of manual lifecycle
management. Server-side automatically detects version-triggering fields
(GPU, template, volumes) and increments endpoint version accordingly.

Keep _has_structural_changes() as informational for logging purposes only.
This aligns with RunPod API's version-based deployment model.

* docs(drift): Clarify _has_structural_changes detects version-triggering changes

Update docstring to reflect that this method identifies changes that
trigger server-side version increment and worker recreation, not manual
redeploy cycles. Explain which changes are version-triggering vs rolling
updates, and note that the method is now informational for logging only.

* feat(drift): Enable environment variable drift detection

Remove env from EXCLUDED_HASH_FIELDS so changes to environment variables
trigger drift detection and endpoint updates. Environment changes are
non-version-triggering (rolling updates), so server will apply them via
saveEndpoint without recreating workers.

Add env to CPU LoadBalancer config_hash for consistent behavior across
all resource types. Update comments to reflect that env is user-specified
configuration, not dynamically computed.

* test(drift): Update tests for environment variable drift detection

- test_lb_config_hash_excludes_env_variables → test_lb_config_hash_detects_env_changes
- test_env_var_changes_no_drift → test_env_var_changes_trigger_drift
- test_config_hash_excludes_env_from_drift → test_config_hash_detects_env_from_drift

Update assertions to expect different hashes when env changes, matching
new behavior where environment variable changes trigger drift and updates.

* fix: Address Copilot review feedback on type hints and documentation

- Fix type annotation for timeout parameter in LoadBalancerSlsStub (Optional[float])
- Replace hardcoded "30s" with actual self.timeout in error messages (2 locations)
- Update Resource_Config_Drift_Detection.md to reflect actual EXCLUDED_HASH_FIELDS
- Remove duplicate Load-Balanced Endpoints section from README.md

Addresses Copilot review comments (PR #132, review 3642596664)

* chore: Update Python version compatibility to 3.10-3.14

- Drop Python 3.9 support (EOL)
- Ensure support for Python 3.14
- Update requires-python in pyproject.toml from >=3.9,<3.14 to >=3.10,<3.15
- Update mypy python_version from 3.9 to 3.10
- Update CI matrix to test Python 3.10, 3.11, 3.12, 3.13, 3.14

* chore: Increase code coverage requirement to 65%

* perf(tests): make parallel test execution the default

Implement AE-1748 by making parallel test execution the default for all quality checks,
achieving a 4.6x speedup (from ~96s to ~20s on 12-core machines).

Changes:
- Configure pytest-xdist for parallel test execution
- Add worker isolation fixtures to prevent file system conflicts
- Mark concurrency tests (~26 tests) as serial to avoid race conditions
- Update Makefile to make parallel execution the default
- Provide serial execution commands for debugging (quality-check-serial)

Performance:
- make quality-check: 96s → 20s (4.6x faster)
- All 719 tests pass in both parallel and serial modes
- Coverage maintained at 64%+

Technical details:
- Worker-specific temp directories via worker_temp_dir fixture
- Module-level cache clearing in reset_singletons
- State file isolation per worker via isolate_resource_state_file
- Serial markers on threading-specific tests

Rollback: Use `make quality-check-serial` if parallel execution causes issues

* refactor: remove dead code and add serialization tests

Remove unused functions and improve test coverage:
- Remove deprecated update_system_dependencies from template.py
- Remove unused utility functions from utils.py and json.py
- Add comprehensive test suite for serialization module (100% coverage)

Tests cover serialization/deserialization of args, kwargs, and error handling
for cloudpickle failures across Python 3.10-3.14.

* fix: regenerate uv.lock with correct dependency versions

The previous uv.lock was corrupted with an incomplete pytest-xdist==3.8.0 entry
that referenced pytest==8.4.2 which wasn't locked. Regenerating the lock file
resolves the CI/CD dependency installation failures across all Python versions.

* fix: mark TestLoadBalancerSlsStubRouting as serial

The @Remote decorator used in TestLoadBalancerSlsStubRouting modifies module-level
state and can cause race conditions when run in parallel. Mark this test class as
serial to prevent flaky failures, particularly on Python 3.10.

* fix: simplify parallel test execution - remove unnecessary two-pass approach

All tests pass with xdist parallel execution without needing to filter serial
tests. pytest-xdist handles workers independently and coverage merges properly.
Simplified Makefile to use single -n auto command for all test runs.

* fix: re-add serial marker for TestLoadBalancerSlsStubRouting

The @Remote decorator modifies module-level state that isn't properly isolated
between parallel workers. Adding the serial marker prevents race conditions on
Python 3.12 and 3.14. pytest-xdist respects the serial marker automatically.

* fix: implement proper serial test handling with two-pass execution

Add pytest hook to mark serial tests with xdist_group so they run without
parallelization. Use two-pass test execution:
1. Parallel: Run all non-serial tests with -n auto
2. Serial: Run serial tests without parallelization, appending coverage

This ensures:
- No race conditions in serial tests (file locking, @Remote decorator)
- Coverage properly merged across both passes
- Maintains ~4.6x speedup for non-serial tests

* fix: implement proper serial test handling with two-pass execution

Add pytest hook to mark serial tests with xdist_group so they run without
parallelization. Use two-pass test execution:
1. Parallel: Run all non-serial tests with -n auto (--cov-fail-under=0)
2. Serial: Run serial tests without parallelization, appending coverage

This ensures:
- No race conditions in serial tests (file locking, @Remote decorator)
- Coverage properly merged across both passes
- Maintains ~4.6x speedup for non-serial tests
- Both passes complete even if first has < 65% coverage

* chore: consistent coverage failure point

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* chore: this is about reporting coverage (no need to fail)

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* chore: don't know why it was 64

* chore: make test commands parallel by default with serial variants

- All test commands now run in parallel by default using pytest-xdist auto-detect
- Serial versions available with -serial suffix for debugging
- test-parallel, test-parallel-workers, test-unit-parallel removed in favor of cleaner naming
- test-workers added as shorthand for specifying worker count
- test-fast now includes parallel execution
- Quality check commands already use parallel-by-default test-coverage

* test: add comprehensive test coverage for json, init, and resource modules

- Add 26 tests for json.py normalization utility (100% coverage)
- Add 16 tests for init.py CLI command (91% coverage)
- Add 19 tests for resource.py CLI command (85% coverage)

Total: 61 new tests covering JSON serialization, project initialization,
and resource status reporting. Increases project coverage from 64.72% to 66.85%.

* fix: add coverage configuration for parallel test execution

- Add [tool.coverage.run] with parallel mode enabled for pytest-xdist
- Add [tool.coverage.report] with proper exclude patterns
- Add [tool.coverage.paths] to handle different installation paths
- Implement normalize_for_json utility function for JSON serialization

This fixes the coverage discrepancy between parallel and serial test execution.
Parallel now reports 68.43%, matching serial execution at 68.59% (within 0.16%).

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants