Skip to content

Default LLM Android model loading to mmap-only (no mlock)#18398

Merged
psiddh merged 1 commit intopytorch:mainfrom
psiddh:main
Mar 24, 2026
Merged

Default LLM Android model loading to mmap-only (no mlock)#18398
psiddh merged 1 commit intopytorch:mainfrom
psiddh:main

Conversation

@psiddh
Copy link
Copy Markdown
Contributor

@psiddh psiddh commented Mar 23, 2026

On Android, ExecuTorch LLM apps previously used mmap+mlock to load .pte model files. While mmap memory-maps the file (pages loaded on demand), mlock pins all mapped pages into physical RAM upfront — defeating mmap's lazy-loading benefit for large models (1-4GB). This causes high OOM kill risk on devices with 6-12GB RAM shared across all apps.

Changes:

  • LlmModuleConfig.java: Add LOAD_MODE_* constants and loadMode field (default LOAD_MODE_MMAP) with builder method and getter
  • LlmModule.java: Thread loadMode through to JNI initHybrid; existing constructors default to LOAD_MODE_MMAP — no breaking change
  • jni_layer_llama.cpp: Accept loadMode from Java, map to C++ Module::LoadMode enum, pass to all runner creation paths (text, multimodal, QNN) instead of hardcoded MmapUseMlockIgnoreErrors

Apps needing the old behavior can pass LOAD_MODE_MMAP_USE_MLOCK_IGNORE_ERRORS.

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Mar 23, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18398

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Unrelated Failures

As of commit ab3893c with merge base f7977a6 (image):

NEW FAILURE - The following job has failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 23, 2026
@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@psiddh psiddh requested a review from GregoryComer March 23, 2026 22:18
@psiddh psiddh marked this pull request as ready for review March 23, 2026 22:24
@psiddh psiddh requested a review from kirklandsign as a code owner March 23, 2026 22:24
Copilot AI review requested due to automatic review settings March 23, 2026 22:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts Android ExecuTorch LLM model loading to default to mmap-only (no mlock), reducing OOM risk for large .pte models while still allowing apps to opt back into mlock behavior.

Changes:

  • Adds LOAD_MODE_* constants plus loadMode plumbing to LlmModuleConfig.
  • Threads loadMode from LlmModule through the JNI initHybrid boundary.
  • Updates jni_layer_llama.cpp to map Java loadMode ints to executorch::extension::Module::LoadMode and apply it across text, multimodal, and QNN runner creation paths.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
extension/android/jni/jni_layer_llama.cpp Accepts load_mode from Java, converts it to Module::LoadMode, and uses it in runner/module creation instead of hardcoding MmapUseMlockIgnoreErrors.
extension/android/executorch_android/src/main/java/org/pytorch/executorch/extension/llm/LlmModuleConfig.java Introduces load mode constants, stores loadMode in config, and exposes builder + getter.
extension/android/executorch_android/src/main/java/org/pytorch/executorch/extension/llm/LlmModule.java Extends native init signature and constructors to pass loadMode, defaulting existing constructors to mmap-only.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +227 to +230
public Builder loadMode(int loadMode) {
this.loadMode = loadMode;
return this;
}
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New loadMode plumbing changes runtime behavior (defaulting to mmap-only vs mmap+mlock). There are existing Android instrumentation tests for LlmModule; please add coverage that constructs via LlmModuleConfig with a non-default loadMode and verifies load() succeeds (and/or that invalid modes are rejected once validation is added).

Copilot uses AI. Check for mistakes.
On Android, ExecuTorch LLM apps previously used mmap+mlock to load .pte
model files. While mmap memory-maps the file (pages loaded on demand),
mlock pins all mapped pages into physical RAM upfront — defeating mmap's
lazy-loading benefit for large models (1-4GB). This causes high OOM kill
risk on devices with 6-12GB RAM shared across all apps.

Changes:
- LlmModuleConfig.java: Add LOAD_MODE_* constants and loadMode field
  (default LOAD_MODE_MMAP) with builder method and getter
- LlmModule.java: Thread loadMode through to JNI initHybrid; existing
  constructors default to LOAD_MODE_MMAP — no breaking change
- jni_layer_llama.cpp: Accept loadMode from Java, map to C++
  Module::LoadMode enum, pass to all runner creation paths (text,
  multimodal, QNN) instead of hardcoded MmapUseMlockIgnoreErrors

Apps needing the old behavior can pass LOAD_MODE_MMAP_USE_MLOCK_IGNORE_ERRORS.
@psiddh psiddh merged commit 24d1337 into pytorch:main Mar 24, 2026
135 of 138 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants