Skip to content

feat: add PINNED_MODELS and PRELOAD_API_KEY for preload on serverless#2048

Merged
hansent merged 2 commits intomainfrom
feat/preload-pinned-models
Feb 27, 2026
Merged

feat: add PINNED_MODELS and PRELOAD_API_KEY for preload on serverless#2048
hansent merged 2 commits intomainfrom
feat/preload-pinned-models

Conversation

@hansent
Copy link
Collaborator

@hansent hansent commented Feb 27, 2026

Summary

  • PRELOAD_API_KEY: Dedicated API key for model preloading. On user-facing deployments, setting API_KEY globally causes unintended side effects (fallback auth on unauthenticated requests, billing attribution, model-access changes). PRELOAD_API_KEY provides the credential needed for model download during startup without affecting per-request behaviour. Falls back to API_KEY when not set.
  • PINNED_MODELS: Comma-separated list of model IDs that are always preloaded at startup (bypassing the LAMBDA/GCP_SERVERLESS gate) and pinned in the LRU cache so they are never evicted under size or memory pressure. This replaces the need for a separate FORCE_PRELOAD flag.
  • Improved preload logging: Load timing and resolved model IDs logged for each preloaded model.
  • Direct model_manager.add_model() call: The model_add route handler is defined inside if not (LAMBDA or GCP_SERVERLESS), so it doesn't exist on serverless deployments. Preload now calls the underlying method directly.

New env vars

Variable Values Default
PRELOAD_API_KEY Roboflow API key for preloading Falls back to API_KEY
PINNED_MODELS Comma-separated model IDs unset

Example usage

# User-facing deployment on GCP with GCP_SERVERLESS=True:
PINNED_MODELS=sam2/hiera_large,sam3/sam3_final,sam3/sam3_interactive
PRELOAD_API_KEY=rf_your_api_key

Files changed

  • inference/core/env.py — new PRELOAD_API_KEY and PINNED_MODELS env vars
  • inference/core/interfaces/http/http_api.py — preload gate uses PRELOAD_API_KEY and PINNED_MODELS, calls model_manager.add_model() directly, improved logging
  • inference/core/managers/decorators/fixed_size_cache.pypin_model() method + eviction skip logic

Test plan

  • Verify server starts normally without new env vars set (backward compat)
  • Set PINNED_MODELS + PRELOAD_API_KEY with GCP_SERVERLESS=True, verify preloading works
  • Verify logs show model loading with timing
  • Verify /model/registry shows preloaded model after startup
  • Load enough models to exceed MAX_ACTIVE_MODELS and verify pinned models are NOT evicted
  • Verify PRELOAD_MODELS still works with API_KEY on non-serverless (backward compat)
  • Verify per-request auth is unaffected when only PRELOAD_API_KEY is set

🤖 Generated with Claude Code

- PRELOAD_API_KEY: dedicated API key for model preloading so the global
  API_KEY doesn't need to be set on user-facing deployments (falls back
  to API_KEY when not set)
- PINNED_MODELS: comma-separated model IDs that are always preloaded at
  startup (bypassing the LAMBDA/GCP_SERVERLESS gate) and pinned in the
  LRU cache so they are never evicted by size or memory pressure limits
- Improved preload logging with timing and resolved model IDs
- Call self.model_manager.add_model() directly instead of the model_add
  route handler (which doesn't exist when GCP_SERVERLESS=True)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@hansent hansent merged commit c692969 into main Feb 27, 2026
51 of 57 checks passed
@hansent hansent deleted the feat/preload-pinned-models branch February 27, 2026 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants