Default LLM Android model loading to mmap-only (no mlock)#18398
Default LLM Android model loading to mmap-only (no mlock)#18398psiddh merged 1 commit intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18398
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 2 Unrelated FailuresAs of commit ab3893c with merge base f7977a6 ( NEW FAILURE - The following job has failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
This PR adjusts Android ExecuTorch LLM model loading to default to mmap-only (no mlock), reducing OOM risk for large .pte models while still allowing apps to opt back into mlock behavior.
Changes:
- Adds
LOAD_MODE_*constants plusloadModeplumbing toLlmModuleConfig. - Threads
loadModefromLlmModulethrough the JNIinitHybridboundary. - Updates
jni_layer_llama.cppto map JavaloadModeints toexecutorch::extension::Module::LoadModeand apply it across text, multimodal, and QNN runner creation paths.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
extension/android/jni/jni_layer_llama.cpp |
Accepts load_mode from Java, converts it to Module::LoadMode, and uses it in runner/module creation instead of hardcoding MmapUseMlockIgnoreErrors. |
extension/android/executorch_android/src/main/java/org/pytorch/executorch/extension/llm/LlmModuleConfig.java |
Introduces load mode constants, stores loadMode in config, and exposes builder + getter. |
extension/android/executorch_android/src/main/java/org/pytorch/executorch/extension/llm/LlmModule.java |
Extends native init signature and constructors to pass loadMode, defaulting existing constructors to mmap-only. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| public Builder loadMode(int loadMode) { | ||
| this.loadMode = loadMode; | ||
| return this; | ||
| } |
There was a problem hiding this comment.
New loadMode plumbing changes runtime behavior (defaulting to mmap-only vs mmap+mlock). There are existing Android instrumentation tests for LlmModule; please add coverage that constructs via LlmModuleConfig with a non-default loadMode and verifies load() succeeds (and/or that invalid modes are rejected once validation is added).
On Android, ExecuTorch LLM apps previously used mmap+mlock to load .pte model files. While mmap memory-maps the file (pages loaded on demand), mlock pins all mapped pages into physical RAM upfront — defeating mmap's lazy-loading benefit for large models (1-4GB). This causes high OOM kill risk on devices with 6-12GB RAM shared across all apps. Changes: - LlmModuleConfig.java: Add LOAD_MODE_* constants and loadMode field (default LOAD_MODE_MMAP) with builder method and getter - LlmModule.java: Thread loadMode through to JNI initHybrid; existing constructors default to LOAD_MODE_MMAP — no breaking change - jni_layer_llama.cpp: Accept loadMode from Java, map to C++ Module::LoadMode enum, pass to all runner creation paths (text, multimodal, QNN) instead of hardcoded MmapUseMlockIgnoreErrors Apps needing the old behavior can pass LOAD_MODE_MMAP_USE_MLOCK_IGNORE_ERRORS.
On Android, ExecuTorch LLM apps previously used mmap+mlock to load .pte model files. While mmap memory-maps the file (pages loaded on demand), mlock pins all mapped pages into physical RAM upfront — defeating mmap's lazy-loading benefit for large models (1-4GB). This causes high OOM kill risk on devices with 6-12GB RAM shared across all apps.
Changes:
Apps needing the old behavior can pass LOAD_MODE_MMAP_USE_MLOCK_IGNORE_ERRORS.