feat: add PINNED_MODELS and PRELOAD_API_KEY for preload on serverless#2048
Merged
feat: add PINNED_MODELS and PRELOAD_API_KEY for preload on serverless#2048
Conversation
- PRELOAD_API_KEY: dedicated API key for model preloading so the global API_KEY doesn't need to be set on user-facing deployments (falls back to API_KEY when not set) - PINNED_MODELS: comma-separated model IDs that are always preloaded at startup (bypassing the LAMBDA/GCP_SERVERLESS gate) and pinned in the LRU cache so they are never evicted by size or memory pressure limits - Improved preload logging with timing and resolved model IDs - Call self.model_manager.add_model() directly instead of the model_add route handler (which doesn't exist when GCP_SERVERLESS=True) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
10 tasks
PawelPeczek-Roboflow
approved these changes
Feb 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PRELOAD_API_KEY: Dedicated API key for model preloading. On user-facing deployments, settingAPI_KEYglobally causes unintended side effects (fallback auth on unauthenticated requests, billing attribution, model-access changes).PRELOAD_API_KEYprovides the credential needed for model download during startup without affecting per-request behaviour. Falls back toAPI_KEYwhen not set.PINNED_MODELS: Comma-separated list of model IDs that are always preloaded at startup (bypassing theLAMBDA/GCP_SERVERLESSgate) and pinned in the LRU cache so they are never evicted under size or memory pressure. This replaces the need for a separateFORCE_PRELOADflag.model_manager.add_model()call: Themodel_addroute handler is defined insideif not (LAMBDA or GCP_SERVERLESS), so it doesn't exist on serverless deployments. Preload now calls the underlying method directly.New env vars
PRELOAD_API_KEYAPI_KEYPINNED_MODELSExample usage
# User-facing deployment on GCP with GCP_SERVERLESS=True: PINNED_MODELS=sam2/hiera_large,sam3/sam3_final,sam3/sam3_interactive PRELOAD_API_KEY=rf_your_api_keyFiles changed
inference/core/env.py— newPRELOAD_API_KEYandPINNED_MODELSenv varsinference/core/interfaces/http/http_api.py— preload gate usesPRELOAD_API_KEYandPINNED_MODELS, callsmodel_manager.add_model()directly, improved logginginference/core/managers/decorators/fixed_size_cache.py—pin_model()method + eviction skip logicTest plan
PINNED_MODELS+PRELOAD_API_KEYwithGCP_SERVERLESS=True, verify preloading works/model/registryshows preloaded model after startupMAX_ACTIVE_MODELSand verify pinned models are NOT evictedPRELOAD_MODELSstill works withAPI_KEYon non-serverless (backward compat)PRELOAD_API_KEYis set🤖 Generated with Claude Code