Skip to content

DO NOT MERGE: QVAC-19214 transcription-parakeet Vulkan device-farm validation (overlay ggml-speech @ PR #14)#2476

Open
pratiknarola-t wants to merge 11 commits into
mainfrom
tmp-qvac-19214-parakeet-vulkan-devicefarm
Open

DO NOT MERGE: QVAC-19214 transcription-parakeet Vulkan device-farm validation (overlay ggml-speech @ PR #14)#2476
pratiknarola-t wants to merge 11 commits into
mainfrom
tmp-qvac-19214-parakeet-vulkan-devicefarm

Conversation

@pratiknarola-t

Copy link
Copy Markdown
Contributor

⛔ DO NOT MERGE — Vulkan-on-Adreno device-farm validation

Test-only PR to prove, on AWS Device Farm real devices, that our ggml Vulkan fix
(qvac-ext-ggml PR #14, commit 8bf760f4) makes Parakeet run correctly on Vulkan for Adreno
(and Mali), and that nothing else regresses. The overlay + GPU-guard removal must not ship.

What it does

  • Overlays ggml-speechtetherto/qvac-ext-ggml@8bf760f4 (branch QVAC-19213-adreno-vulkan-shmem-fix):
    Adreno Vulkan support + two Qualcomm-gated quantized-matmul fixes (disable integer-dot MMQ; force a
    quantized src0→f16 dequant — so the int8 shaders the Adreno SPIR-V compiler rejects are never used).
  • Forces Vulkan on Android: the overlay ggml-speech drops opencl from Android default-features and
    the addon requests parakeet-cpp features:["vulkan"] + default-features:false, so no OpenCL .so is
    built → the backend tier policy selects Vulkan on the Adreno device.
  • Enables Android GPU: removes the temporary ParakeetModel.cpp #ifdef __ANDROID__ useGPU=false guard.
  • Asserts Vulkan: mobile-perf-runner.js now requires backendId === 3 (Vulkan) for Android use_gpu runs.

Expected result on Device Farm

  • Samsung S25 Ultra (Adreno) and Pixel 9 (Mali) both run the Parakeet CTC/EOU/Sortformer GPU perf
    cases on Vulkan (backendId=3), no crash, correct output.
  • All other platforms (desktop Vulkan, iOS Metal) build + pass → the ggml commit breaks nothing.

Verified locally

Parakeet CTC/TDT/EOU (q4_0) + Sortformer (q8_0) already run on Vulkan on a local Adreno 740 with
output byte-identical to CPU. This PR extends validation to the Device Farm Adreno 830 + Mali.

Tracking: qvac-ext-ggml PR #14.

…test

Overlay-pins ggml-speech to qvac-ext-ggml PR #14 (commit 8bf760f4 -- Adreno
Vulkan support + the Parakeet quantized-matmul fix) and forces the Parakeet
Android tests onto Vulkan, to validate on Device Farm (Samsung S25 Ultra /
Adreno + Pixel 9 / Mali) that Vulkan runs Parakeet correctly and nothing else
breaks. NOT for merge -- the overlay + GPU-guard removal are test-only.

- vcpkg-overlay-ports/ggml-speech: pinned to qvac-ext-ggml@8bf760f4; Android
  default-features drop OpenCL (Vulkan-only) so the loader has no OpenCL .so
  and the tier policy selects Vulkan on Adreno.
- vcpkg-configuration.json: register ./vcpkg-overlay-ports.
- vcpkg.json: Android parakeet-cpp features=["vulkan"] + default-features:false.
- ParakeetModel.cpp: remove the Android useGPU=false guard so useGPU=true
  reaches the engine.
- mobile-perf-runner.js: assert Android use_gpu runs select Vulkan (backendId=3).
@pratiknarola-t pratiknarola-t force-pushed the tmp-qvac-19214-parakeet-vulkan-devicefarm branch from 5122440 to 6fc214e Compare June 8, 2026 07:41
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Mobile integration tests — @qvac/transcription-parakeet (iOS)

Result: passed

metric value
Devices passed 2
Devices failed 0
Test cases total 6
Test cases passed 6
Test cases failed 0
Test cases skipped 0

View workflow run

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Mobile integration tests — @qvac/transcription-parakeet (Android)

Result: passed

metric value
Devices passed 2
Devices failed 0
Test cases total 6
Test cases passed 6
Test cases failed 0
Test cases skipped 0

View workflow run

…Mali RCA

Device-Farm Mali-G715 aborts in the EOU-q4_0 Vulkan run (ggml_abort), but the
RN/bare runtime does not forward native stderr to logcat, so the exact
GGML_ASSERT is invisible in Device-Farm logs. Install a synchronous
ggml_set_abort_callback (llama.cpp upstream API) that forwards the formatted
'file:line: message' straight to logcat via __android_log_print before abort(),
and mirror ggml log lines (incl. the DEBUG Vulkan device-caps banner) to logcat.
Diagnostic only - strip with the rest of the device-farm test scaffolding.
…rce Vulkan on Adreno

Two fixes so the Device-Farm Mali RCA run actually exercises the models:
1. Link liblog into the addon .bare. The abort-capture instrumentation's
   __android_log_print was an undefined symbol (no liblog in DT_NEEDED), so the
   addon failed to dlopen at runtime -> bare ADDON_NOT_FOUND -> 18/19 tests
   failed on both devices. Verified: rebuilt .bare now lists liblog.so in
   DT_NEEDED.
2. Re-add default-features:false to the android parakeet-cpp dep (dropped by the
   main merge), so OpenCL is not built and Adreno selects Vulkan (backendId=3),
   not OpenCL (backendId=4).

Part of the DO-NOT-MERGE device-farm test scaffolding.
…gml (9be02126)

Point the device-farm overlay at qvac-ext-ggml@QVAC-19214-mali-rca (9be02126 =
PR #14's 8bf760f4 + a ggml_abort->logcat diagnostic). The Vulkan backend's
GGML_ASSERT on Mali-G715 is otherwise invisible: bare drops native stderr and
the addon abort callback never fires for the dlopen'd backend (separate linker
namespace). This rebuilds ggml-speech from the instrumented commit so the next
Device-Farm Mali EOU crash logs the exact assert under logcat tag ggml_abort.
…scriptor-set overflow pipeline)

Bump the RCA overlay to qvac-ext-ggml@a066fa47 (= 9be02126 + a GGML_ABORT that
prints the pipeline name/idx/size at the descriptor-set overflow). Next Mali EOU
crash will name the exact under-requesting pipeline so we can write a targeted
per-op request-count fix.
…tor-set grow-on-demand fix)

Verification run for the actual fix: grow the Vulkan descriptor-set pool on demand
so the Mali-G715 f16 small-tile matmul over-dispatch no longer overflows. Expect
Mali EOU GPU to pass now (Adreno + Xclipse unaffected).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

verified Authorize secrets / label-gate in PR workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant