Skip to content

Fixed the wasm integrity checking, xxs, prototype pollution, RCE, and created the utils and scripts to generate and apply the hashes#771

Open
nextep wants to merge 7 commits intomlc-ai:mainfrom
nextep:main
Open

Fixed the wasm integrity checking, xxs, prototype pollution, RCE, and created the utils and scripts to generate and apply the hashes#771
nextep wants to merge 7 commits intomlc-ai:mainfrom
nextep:main

Conversation

@nextep
Copy link

@nextep nextep commented Feb 14, 2026

…ed the prototype buffer and fixed the RCE. Other improvements

  • Created the generate hashes and apply hashes scripts
  • Updated the security tests to include the new security tests
  • Updated the package.json to include the new generate hashes
  • And created a generated chassis.jl script to be included as part of the build process.

…ed the prototype buffer and fixed the RCE. Other improvements

- Created the generate hashes and apply hashes scripts
- Updated the security tests to include the new security tests
- Updated the package.json to include the new generate hashes
- And created a generated chassis.jl script to be included as part of the build process.
@gemini-code-assist
Copy link

Summary of Changes

Hello @nextep, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly strengthens the security posture of the @mlc-ai/web-llm library by addressing several critical vulnerabilities. It introduces a comprehensive integrity verification system for WebAssembly modules, safeguards against prototype pollution during configuration merging, and implements robust sanitization for configuration fields to prevent Cross-Site Scripting attacks. Additionally, new tooling has been developed to streamline the process of generating and applying model integrity hashes, ensuring ongoing security maintenance.

Highlights

  • WASM Integrity Verification: Implemented robust integrity checking for WASM files using SHA-384 hashes to prevent Remote Code Execution (RCE) via malicious WASM injection and persistent RCE through cache poisoning. This check is performed before WASM instantiation, whether fetched from network or cache.
  • Prototype Pollution Prevention: Introduced a safeDeepMerge utility function to securely merge configuration objects, explicitly blocking dangerous keys like __proto__, constructor, and prototype to prevent prototype pollution vulnerabilities.
  • Cross-Site Scripting (XSS) Protection: Added sanitizeString and sanitizeConfig utility functions to strip common XSS vectors (e.g., script tags, onXXX event handlers) from sensitive configuration fields, mitigating XSS risks.
  • Automated Model Hash Management: Created new utility scripts (generate_model_hashes.js, apply_model_hashes.js, prep_model_hashes.sh) to automate the generation and application of integrity hashes for model WASM files, ensuring that src/config.ts is kept up-to-date with verified hashes.
  • Enhanced Security Testing: Added a comprehensive security test suite (security.test.ts) to validate the effectiveness of the new integrity checks, prototype pollution prevention, and XSS sanitization mechanisms.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • docs/SECURITY_FIXES.md
    • Added a detailed security hardening document outlining identified vulnerabilities and their respective fixes.
  • model_hashes.md
    • Added a markdown table listing model WASM files and their SHA-384 hashes.
  • package.json
    • Added a new script prep-models to automate model hash management.
  • scripts/apply_model_hashes.js
    • Added a script that reads model hashes from a JSON file and updates src/config.ts with integrity values.
  • scripts/generate_model_hashes.js
    • Added a script to fetch WASM files from URLs, compute their SHA-384 hashes, and output them to JSON and Markdown files.
  • scripts/prep_model_hashes.sh
    • Added a shell script to orchestrate the generation and application of model hashes.
  • src/config.ts
    • Updated the ModelRecord interface with new model_lib_integrity, chat_config_integrity, and tokenizer_integrity fields.
    • Populated model_lib_integrity for numerous prebuilt models with their respective SHA-384 hashes.
    • Removed deprecated Ministral 3 model entries.
  • src/engine.ts
    • Integrated new utility functions to verify WASM integrity before instantiation.
    • Replaced direct object spread with safeDeepMerge and sanitizeConfig for secure configuration loading.
    • Adjusted warning message formatting for maxStorageBufferBindingSize.
  • src/model_hashes.json
    • Added a JSON file containing computed SHA-384 hashes for various model WASM files.
  • src/utils.ts
    • Added computeSHA384 function for calculating SHA-384 hashes.
    • Added verifyIntegrity function to check buffer integrity against expected hashes.
    • Added safeDeepMerge function for deep merging objects while preventing prototype pollution.
    • Added sanitizeString function for basic XSS sanitization of strings.
    • Added sanitizeConfig function to recursively sanitize sensitive fields within configuration objects.
  • tests/security.test.ts
    • Added a new test suite to validate the implemented security fixes against WASM integrity, prototype pollution, and XSS vulnerabilities.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant security enhancements, addressing critical vulnerabilities like Remote Code Execution (RCE) through WASM integrity checks, prototype pollution, and Cross-Site Scripting (XSS) with robust implementations and new utility functions. However, the security hardening is incomplete. The XSS sanitizer is easily bypassed due to restrictive regular expressions, and integrity checks are missing for critical artifacts such as chat configuration and tokenizer files, leaving them vulnerable to cache poisoning. It is recommended to use a robust HTML sanitization library and enforce integrity verification for all remote and cached artifacts.

Comment on lines +216 to +224
export function sanitizeString(str: any): any {
if (typeof str !== 'string') return str;
// Strip <script> tags and onXXX event handlers
return str
.replace(/<script\b[^>]*>([\s\S]*?)<\/script>/gim, "")
.replace(/on\w+="[^"]*"/gim, "")
.replace(/on\w+='[^']*'/gim, "")
.replace(/on\w+=[^\s>]+/gim, "");
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The sanitizeString function's regex-based sanitization is easily bypassed due to its blacklist approach and restrictive regular expressions. For instance, the regexes for stripping event handlers do not account for spaces around the assignment operator, allowing bypasses like <img src=x onmouseover = "alert(1)">. Additionally, it fails to handle other common XSS vectors such as javascript: pseudo-protocols or dangerous tags like <iframe> or <object>. This could lead to Cross-Site Scripting (XSS) if strings are rendered in the UI. Consider using a robust, well-vetted HTML sanitization library like DOMPurify instead of brittle regex-based denylists for security-critical sanitization.

Comment on lines +273 to +277
const fetchedConfig = await configCache.fetchWithCache(
configUrl,
"json",
this.reloadController?.signal,
);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The pull request introduces integrity verification for WASM binaries to prevent RCE and cache poisoning. However, it fails to implement similar checks for the chat configuration (mlc-chat-config.json), even though the chat_config_integrity field was added to the ModelRecord interface. This leaves the application vulnerable to persistent compromise via cache poisoning of the configuration file. An attacker who can poison the cache could manipulate model settings or inject XSS payloads into fields like system_message.

Comment on lines +64 to +89
const models = [];
const lines = configContent.split('\n');
let currentModelId = null;

for (let i = 0; i < lines.length; i++) {
const line = lines[i];
const idMatch = line.match(/model_id:\s+"(.+?)"/);
if (idMatch) {
currentModelId = idMatch[1];
}

if (currentModelId) {
// Look for wasm file in subsequent lines
const wasmMatch = line.match(/"\/(.+?\.wasm)"/);
if (wasmMatch) {
const wasmPath = wasmMatch[1];
models.push({
id: currentModelId,
path: wasmPath,
// Construct URL using the prefix and version from config
url: `${modelLibURLPrefix}${modelVersion}/${wasmPath}`
});
currentModelId = null; // Reset
}
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current approach of parsing src/config.ts with line-by-line string matching is quite fragile and could easily break if the file's formatting changes in the future. A more robust solution would be to refactor the model list into a separate JSON file. This file could then be imported directly by both this script (for hash generation) and src/config.ts (to build the prebuiltAppConfig), eliminating the need for brittle regex parsing.

@akaashrp
Copy link
Collaborator

akaashrp commented Mar 9, 2026

Hi @nextep, thanks for the contribution! I think we should separate the fixes for each vulnerability into separate PRs, both because they're distinct and because it will make reviewing easier. I'll work on the integrity verification separately, but if you could create separate PRs for vulnerabilities 2-4, that would be great!

@akaashrp akaashrp self-assigned this Mar 9, 2026
@nextep
Copy link
Author

nextep commented Mar 11, 2026

Hi @akaashrp, thanks for checking the PR and absolutely I can do that for you. Give me some time to improve the fixes since they were rather simple as I was in a rush.

Just remembered that in the integrity verification, I only modified the primary config file but not the sample config despite adding the placeholder for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants