DO NOT MERGE: QVAC-19214 transcription-parakeet Vulkan device-farm validation (overlay ggml-speech @ PR #14)#2476
Open
pratiknarola-t wants to merge 11 commits into
Open
Conversation
…test Overlay-pins ggml-speech to qvac-ext-ggml PR #14 (commit 8bf760f4 -- Adreno Vulkan support + the Parakeet quantized-matmul fix) and forces the Parakeet Android tests onto Vulkan, to validate on Device Farm (Samsung S25 Ultra / Adreno + Pixel 9 / Mali) that Vulkan runs Parakeet correctly and nothing else breaks. NOT for merge -- the overlay + GPU-guard removal are test-only. - vcpkg-overlay-ports/ggml-speech: pinned to qvac-ext-ggml@8bf760f4; Android default-features drop OpenCL (Vulkan-only) so the loader has no OpenCL .so and the tier policy selects Vulkan on Adreno. - vcpkg-configuration.json: register ./vcpkg-overlay-ports. - vcpkg.json: Android parakeet-cpp features=["vulkan"] + default-features:false. - ParakeetModel.cpp: remove the Android useGPU=false guard so useGPU=true reaches the engine. - mobile-perf-runner.js: assert Android use_gpu runs select Vulkan (backendId=3).
5122440 to
6fc214e
Compare
Contributor
Mobile integration tests — @qvac/transcription-parakeet (iOS)Result: passed
|
Contributor
Mobile integration tests — @qvac/transcription-parakeet (Android)Result: passed
|
…Mali RCA Device-Farm Mali-G715 aborts in the EOU-q4_0 Vulkan run (ggml_abort), but the RN/bare runtime does not forward native stderr to logcat, so the exact GGML_ASSERT is invisible in Device-Farm logs. Install a synchronous ggml_set_abort_callback (llama.cpp upstream API) that forwards the formatted 'file:line: message' straight to logcat via __android_log_print before abort(), and mirror ggml log lines (incl. the DEBUG Vulkan device-caps banner) to logcat. Diagnostic only - strip with the rest of the device-farm test scaffolding.
…rce Vulkan on Adreno Two fixes so the Device-Farm Mali RCA run actually exercises the models: 1. Link liblog into the addon .bare. The abort-capture instrumentation's __android_log_print was an undefined symbol (no liblog in DT_NEEDED), so the addon failed to dlopen at runtime -> bare ADDON_NOT_FOUND -> 18/19 tests failed on both devices. Verified: rebuilt .bare now lists liblog.so in DT_NEEDED. 2. Re-add default-features:false to the android parakeet-cpp dep (dropped by the main merge), so OpenCL is not built and Adreno selects Vulkan (backendId=3), not OpenCL (backendId=4). Part of the DO-NOT-MERGE device-farm test scaffolding.
…gml (9be02126) Point the device-farm overlay at qvac-ext-ggml@QVAC-19214-mali-rca (9be02126 = PR #14's 8bf760f4 + a ggml_abort->logcat diagnostic). The Vulkan backend's GGML_ASSERT on Mali-G715 is otherwise invisible: bare drops native stderr and the addon abort callback never fires for the dlopen'd backend (separate linker namespace). This rebuilds ggml-speech from the instrumented commit so the next Device-Farm Mali EOU crash logs the exact assert under logcat tag ggml_abort.
…scriptor-set overflow pipeline) Bump the RCA overlay to qvac-ext-ggml@a066fa47 (= 9be02126 + a GGML_ABORT that prints the pipeline name/idx/size at the descriptor-set overflow). Next Mali EOU crash will name the exact under-requesting pipeline so we can write a targeted per-op request-count fix.
…e descriptor-set over-dispatcher)
…tor-set grow-on-demand fix) Verification run for the actual fix: grow the Vulkan descriptor-set pool on demand so the Mali-G715 f16 small-tile matmul over-dispatch no longer overflows. Expect Mali EOU GPU to pass now (Adreno + Xclipse unaffected).
…w-on-demand to actually allocate)
…chive-gen race served a transient hash)
…#14 commit, no RCA instrumentation)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
⛔ DO NOT MERGE — Vulkan-on-Adreno device-farm validation
Test-only PR to prove, on AWS Device Farm real devices, that our ggml Vulkan fix
(qvac-ext-ggml PR #14, commit
8bf760f4) makes Parakeet run correctly on Vulkan for Adreno(and Mali), and that nothing else regresses. The overlay + GPU-guard removal must not ship.
What it does
ggml-speech→tetherto/qvac-ext-ggml@8bf760f4(branchQVAC-19213-adreno-vulkan-shmem-fix):Adreno Vulkan support + two Qualcomm-gated quantized-matmul fixes (disable integer-dot MMQ; force a
quantized
src0→f16 dequant — so the int8 shaders the Adreno SPIR-V compiler rejects are never used).ggml-speechdropsopenclfrom Android default-features andthe addon requests
parakeet-cppfeatures:["vulkan"] + default-features:false, so no OpenCL.soisbuilt → the backend tier policy selects Vulkan on the Adreno device.
ParakeetModel.cpp#ifdef __ANDROID__ useGPU=falseguard.mobile-perf-runner.jsnow requiresbackendId === 3(Vulkan) for Androiduse_gpuruns.Expected result on Device Farm
cases on Vulkan (backendId=3), no crash, correct output.
Verified locally
Parakeet CTC/TDT/EOU (q4_0) + Sortformer (q8_0) already run on Vulkan on a local Adreno 740 with
output byte-identical to CPU. This PR extends validation to the Device Farm Adreno 830 + Mali.
Tracking: qvac-ext-ggml PR #14.