Skip to content

Conversation

@trevin-lee
Copy link

@trevin-lee trevin-lee commented Oct 13, 2025

PR description:

This PR introduces dynamic model loading/unloading capabilities and server health monitoring to the SONIC Triton integration in CMSSW. The main features include:

1. Dynamic Model Loading and Unloading:

  • Adds loadModel() and unloadModel() methods to TritonService for managing model lifecycle at runtime
  • Implements thread-safe model operations using mutex protection
  • Introduces DynamicModelLoadingProducer test module to validate dynamic model management
  • Models can be loaded on-demand and unloaded when no longer needed, improving resource utilization

4. Code Improvements:

  • Moves retry configuration options to customize.py for better configurability
  • Updates TritonClient with new constructor for testing and enhanced server connection methods
  • Improves logging for model operations and server health status
  • Refactors code for better maintainability and documentation

Expected Output Changes:

  • Users can now dynamically load and unload models during job execution
  • Improved resilience through automatic server health monitoring and failover
  • Better error handling and retry logic for transient server failures
  • Enhanced logging messages for model operations and server health

Dependencies:

  • Based on CMSSW_15_1_0_pre6
  • No external PR dependencies

Files Modified:

  • HeterogeneousCore/SonicCore/src/SonicClientBase.cc
  • HeterogeneousCore/SonicCore/plugins/BuildFile.xml
  • HeterogeneousCore/SonicTriton/interface/TritonService.h
  • HeterogeneousCore/SonicTriton/interface/TritonClient.h
  • HeterogeneousCore/SonicTriton/src/TritonService.cc
  • HeterogeneousCore/SonicTriton/src/TritonClient.cc
  • HeterogeneousCore/SonicTriton/src/RetryActionDiffServer.cc
  • HeterogeneousCore/SonicTriton/test/BuildFile.xml
  • HeterogeneousCore/SonicTriton/test/tritonTest_cfg.py

New Files:

  • HeterogeneousCore/SonicTriton/test/DynamicModelLoadingProducer.cc
  • HeterogeneousCore/SonicTriton/test/test_RetryActionDiffServer.cc

Removed Files:

  • HeterogeneousCore/SonicTriton/test/RetryActionDiffServer.cc (replaced with unit test)

PR validation:

Integration Tests:

  • Compiled successfully with scram b -j 8
  • No compilation warnings or errors
  • All modified code follows CMS coding standards

Known Issues:

  • Dynamic model loading tests require polling to be disabled in configuration
  • Some unit tests need mock server environment adjustments
  • Will be addressed in follow-up commits or configuration updates

Documentation:

  • Code includes inline documentation for new methods
  • Test configurations demonstrate usage patterns
  • README updates may be needed (can be done in follow-up)

Backport Information:

This PR is NOT a backport. It is intended for CMSSW_15_1_X release cycle.

If backporting becomes necessary, it would target future release cycles after initial integration and validation in CMSSW_15_1_X.

Martin and others added 21 commits October 12, 2025 22:17
…r method in TritonClient. Update BuildFile.xml and fix formatting in header files.
…tructor for TritonClient, and update BuildFile.xml to include Catch2 for testing.
…lection; remove unused parameters and improve documentation.
- Introduced `loadModel` and `unloadModel` methods for managing model lifecycle.
- Added mutex for thread safety during model operations.
- Updated `TritonService` header and implementation to support dynamic model management.
- Enhanced logging for model loading and unloading processes.
- Updated test configurations to include dynamic model loading tests.
…requirements

- Modified input handling to utilize actual model input for "x" instead of dummy data.
- Adjusted shape and data allocation for input to meet base class expectations.
- Updated parameter set description method to use TritonClient for configuration.
@kpedro88
Copy link

I have not looked at the test code yet because I think the logic in the TritonService needs to be addressed first.

Another general point: part of the idea for dynamic loading with the fallback server would be to get rid of the unservedModels_ list that is currently formed at the start of the job. The model repository folder for the fallback server should be created with all models known to the job included, but the fallback server should be launched in explicit model control mode (modifying the cmsTriton script). Then, whenever a module switches over to the fallback server, it should ask the fallback server to load its model. (Dynamic loading with remote servers may reuse some of the logic, but will be somewhat different and needs more thought.)

- Applied scram b code-format to fix formatting issues
- Added TRITON_THROW_IF_ERROR for error handling in loadModel/unloadModel
- Fixed unload semantics to not erase tracking data on failure
- Added comments explaining dynamic loading test requirements
- Removed spurious formatting changes
- Removed loadedModels_ set, now derive loaded status from modelRefCount_ > 0
- Simplified loadModel() and unloadModel() to only work with fallback server
- Removed server searching logic since only fallback is supported
- Added documentation clarifying fallback-only limitation
- Updated error messages to reflect this architectural decision
- Fixed XML comment syntax in BuildFile.xml
…ading

- Updated retry logic to switch to the fallback server directly for model loading.
- Removed previous best server selection logic, simplifying the retry process.
- Added logging for fallback loading failures and re-evaluation after loading.
- Ensured dynamic loading is explicitly handled in TritonService with updated command options.
…in TritonService

- Introduced FallbackModelState to track dynamic model state, including reference count and model path.
- Updated loadModel and unloadModel methods to utilize FallbackModelState for managing model lifecycle.
- Enhanced logging to reflect model state changes and operations.
- Adjusted test configuration to reflect changes in module handling for dynamic model loading.
… handling

- Eliminated unservedModels_ from TritonService, simplifying model management.
- Updated addModel and preBeginJob methods to work directly with models_.
- Enhanced error handling in loadModel for fallback server scenarios.
- Improved comments for clarity on model path handling and fallback server usage.
- Move --model-control-mode flag from TritonService.cc to cmsTriton script
  with new -L option to disable (enabled by default)
- Remove path parameter from loadModel(); retrieve path from models_ map
- Use client's modelName() in DynamicModelLoadingProducer instead of
  separate testModelName/testModelPath parameters
- Restore getBestServer() logic in RetryActionDiffServer to try remote
  servers before falling back to local server with dynamic loading
- Remove unnecessary fallbackName variable in RetryActionDiffServer
- Restore default --modules value to TritonGraphProducer in test config
- Add clarifying comment for path fix-up in addModel()

Deferred for future work:
- Consolidate Model/FallbackModelState structs to eliminate refcount duplication
- Further examination of lock scope in loadModel/unloadModel
- Add RefCount.cc: unit test for refcount correctness with 7 test cases
  - Verifies multiple loads increment refcount without server calls
  - Verifies unloads only call server when refcount reaches 0
  - Verifies models are independent and paths are preserved
- Rename test_RetryActionDiffServer.cc to RetryActionDiffServer.cc
  to match naming convention of other test files
- Fix error message: fallback server not found is global, not model-specific
- Reorder checks: check startedFallback_ before checking servers_ map
- Wrap LoadModel/UnloadModel calls directly in TRITON_THROW_IF_ERROR
- Remove models_.emplace in loadModel (models already added during init)
- Remove unnecessary parentheses in --state.refCount
- Add refCount member to Model struct (removes duplication)
- Remove FallbackModelState struct (no longer needed)
- Remove modelRefCount_ map (redundant with Model.refCount)
- Rename fallbackModels_ to fallbackLoadedModels_ (now just a set of names)
- Update loadModel/unloadModel to operate on Model directly

This addresses review comment about unnecessary duplication between
Model and FallbackModelState objects.
;;
I) INSTANCES="$OPTARG"
;;
L) MODELCONTROL=""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to go back to the old behavior, the value should be "none" rather than empty string
(https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_management.md)


if (!startedFallback_) {
throw cms::Exception("TritonService")
<< "loadModel: fallback server not started; cannot load model '" << modelName << "'";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this message also does not need to reference the model name (the fallback server not being started is a global problem, not model-specific)

@kpedro88
Copy link

kpedro88 commented Jan 8, 2026

@trevin-lee thanks, this is in good shape now! Just a few more minor comments.

Further thoughts on mutex for load/unload: I have actually changed my mind somewhat. Because we are doing both find() and then insert() operations, locking at some level is necessary.

There is a more streamlined approach possible using tbb:concurrent_hash_map, which has an accessor feature that locks a specific slot in the map, instead of the whole map. That allows operations on different keys to proceed in parallel. However, since we actually insert() into multiple maps, synchronization might still be a challenge with this approach. Needs further thought/investigation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants