Default LLM Android model loading to mmap-only (no mlock) by psiddh · Pull Request #18398 · pytorch/executorch

psiddh · 2026-03-23T08:03:29Z

On Android, ExecuTorch LLM apps previously used mmap+mlock to load .pte model files. While mmap memory-maps the file (pages loaded on demand), mlock pins all mapped pages into physical RAM upfront — defeating mmap's lazy-loading benefit for large models (1-4GB). This causes high OOM kill risk on devices with 6-12GB RAM shared across all apps.

Changes:

LlmModuleConfig.java: Add LOAD_MODE_* constants and loadMode field (default LOAD_MODE_MMAP) with builder method and getter
LlmModule.java: Thread loadMode through to JNI initHybrid; existing constructors default to LOAD_MODE_MMAP — no breaking change
jni_layer_llama.cpp: Accept loadMode from Java, map to C++ Module::LoadMode enum, pass to all runner creation paths (text, multimodal, QNN) instead of hardcoded MmapUseMlockIgnoreErrors

Apps needing the old behavior can pass LOAD_MODE_MMAP_USE_MLOCK_IGNORE_ERRORS.

pytorch-bot · 2026-03-23T08:03:34Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18398

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Unrelated Failures

As of commit ab3893c with merge base f7977a6 ():

NEW FAILURE - The following job has failed:

pull / unittest-arm-backend-with-no-deps (test_pytest_ops_tosa) / linux-job (gh)
RuntimeError: Command docker exec -t 60d62c76919c25a298f09c283ddd3cbbd622d4fd9076651bd5b8f81f118db09d /exec failed with exit code 1

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest-editable / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-03-23T08:04:12Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copilot

Pull request overview

This PR adjusts Android ExecuTorch LLM model loading to default to mmap-only (no mlock), reducing OOM risk for large .pte models while still allowing apps to opt back into mlock behavior.

Changes:

Adds LOAD_MODE_* constants plus loadMode plumbing to LlmModuleConfig.
Threads loadMode from LlmModule through the JNI initHybrid boundary.
Updates jni_layer_llama.cpp to map Java loadMode ints to executorch::extension::Module::LoadMode and apply it across text, multimodal, and QNN runner creation paths.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
`extension/android/jni/jni_layer_llama.cpp`	Accepts `load_mode` from Java, converts it to `Module::LoadMode`, and uses it in runner/module creation instead of hardcoding `MmapUseMlockIgnoreErrors`.
`extension/android/executorch_android/src/main/java/org/pytorch/executorch/extension/llm/LlmModuleConfig.java`	Introduces load mode constants, stores `loadMode` in config, and exposes builder + getter.
`extension/android/executorch_android/src/main/java/org/pytorch/executorch/extension/llm/LlmModule.java`	Extends native init signature and constructors to pass `loadMode`, defaulting existing constructors to mmap-only.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-23T22:27:56Z

+    public Builder loadMode(int loadMode) {
+      this.loadMode = loadMode;
+      return this;
+    }


New loadMode plumbing changes runtime behavior (defaulting to mmap-only vs mmap+mlock). There are existing Android instrumentation tests for LlmModule; please add coverage that constructs via LlmModuleConfig with a non-default loadMode and verifies load() succeeds (and/or that invalid modes are rejected once validation is added).

On Android, ExecuTorch LLM apps previously used mmap+mlock to load .pte model files. While mmap memory-maps the file (pages loaded on demand), mlock pins all mapped pages into physical RAM upfront — defeating mmap's lazy-loading benefit for large models (1-4GB). This causes high OOM kill risk on devices with 6-12GB RAM shared across all apps. Changes: - LlmModuleConfig.java: Add LOAD_MODE_* constants and loadMode field (default LOAD_MODE_MMAP) with builder method and getter - LlmModule.java: Thread loadMode through to JNI initHybrid; existing constructors default to LOAD_MODE_MMAP — no breaking change - jni_layer_llama.cpp: Accept loadMode from Java, map to C++ Module::LoadMode enum, pass to all runner creation paths (text, multimodal, QNN) instead of hardcoded MmapUseMlockIgnoreErrors Apps needing the old behavior can pass LOAD_MODE_MMAP_USE_MLOCK_IGNORE_ERRORS.

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 23, 2026

psiddh requested a review from GregoryComer March 23, 2026 22:18

GregoryComer approved these changes Mar 23, 2026

View reviewed changes

psiddh marked this pull request as ready for review March 23, 2026 22:24

psiddh requested a review from kirklandsign as a code owner March 23, 2026 22:24

Copilot AI review requested due to automatic review settings March 23, 2026 22:24

Copilot started reviewing on behalf of psiddh March 23, 2026 22:24 View session

psiddh force-pushed the main branch from 02b30b7 to a5a818d Compare March 23, 2026 22:25

Copilot AI reviewed Mar 23, 2026

View reviewed changes

psiddh force-pushed the main branch from a5a818d to ab3893c Compare March 24, 2026 03:32

psiddh merged commit 24d1337 into pytorch:main Mar 24, 2026
135 of 138 checks passed

psiddh mentioned this pull request Mar 24, 2026

Improve (Android) LLM UX App experiences / Usecases such as load time, overall perf, Peak RAM usage etc #18439

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default LLM Android model loading to mmap-only (no mlock)#18398

Default LLM Android model loading to mmap-only (no mlock)#18398
psiddh merged 1 commit intopytorch:mainfrom
psiddh:main

psiddh commented Mar 23, 2026

Uh oh!

pytorch-bot bot commented Mar 23, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

psiddh commented Mar 23, 2026

Uh oh!

pytorch-bot bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18398

❌ 1 New Failure, 2 Unrelated Failures

Uh oh!

github-actions bot commented Mar 23, 2026

This PR needs a release notes: label

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pytorch-bot bot commented Mar 23, 2026 •

edited

Loading

This PR needs a `release notes:` label