Skip to content

fix: correct ROCm GPU name extraction and discrete GPU selection#307

Closed
octo-patch wants to merge 347 commits intoAlexsJones:mainfrom
octo-patch:fix/issue-271-rocm-gpu-name-detection
Closed

fix: correct ROCm GPU name extraction and discrete GPU selection#307
octo-patch wants to merge 347 commits intoAlexsJones:mainfrom
octo-patch:fix/issue-271-rocm-gpu-name-detection

Conversation

@octo-patch
Copy link
Copy Markdown
Contributor

Fixes #271

Problem

On systems with both a discrete GPU and an iGPU visible to ROCm (e.g. Ryzen 9800X3D + RX 7900 XTX), two bugs in detect_amd_gpu_rocm_info caused the wrong GPU name to be reported and the GPU to still show up with gpu_name: "Card Series":

  1. Wrong split index: The --showproductname parser used split(':').nth(1), which returns the field label (literally "Card series") instead of the model value. The line format is GPU[N] : Card series : <name>, so the value lives after the second colon.

  2. GPU index not tracked: VRAM filtering correctly identified discrete GPU entries by byte count, but didn't record which GPU[N] indices they belong to. The subsequent name lookup therefore had no way to target the correct GPU in the product-name output and always returned the first matching line (which could be the iGPU).

Solution

  • Fixed the split to splitn(3, ':').nth(2) so the actual model name is returned instead of the label.
  • Changed VRAM parsing to track (gpu_index, vram_bytes) tuples. The first discrete GPU index is then passed to the name parser, which performs a targeted GPU[N]-prefixed scan before falling back to the first match.
  • Extracted both parsing steps into parse_rocm_vram_indexed and parse_rocm_gpu_name helper methods so they can be unit-tested without a real ROCm installation.

Testing

Five new unit tests added in hardware::tests:

  • test_parse_rocm_vram_indexed_single_gpu — single GPU VRAM parsed with correct index
  • test_parse_rocm_vram_indexed_dual_gpu_apu — two entries (discrete + iGPU) parsed with correct indices
  • test_parse_rocm_gpu_name_single_gpu — card series value extracted (not the label)
  • test_parse_rocm_gpu_name_prefers_target_index — discrete GPU name returned when iGPU comes first
  • test_parse_rocm_gpu_name_falls_back_without_index — fallback path works when no target index given

All tests pass (cargo test -p llmfit-core -- rocm). No hardware required to run the tests.

counterposition and others added 30 commits February 21, 2026 12:35
The name 'huggingface-cli' is deprecated.
Their CLI is now called 'hf': https://huggingface.co/docs/huggingface_hub/en/guides/cli
fix: invoke hf instead of huggingface-cli
Restructure single-crate project into Cargo workspace:
- llmfit-core: core library (hardware detection, model fitting, providers)
- llmfit-tui: CLI/TUI binary (unchanged user experience)
- llmfit-desktop: macOS desktop app via Tauri 2

The workspace split enables the desktop app to reuse core logic
while keeping the CLI/TUI as the default build target.

Moved SortColumn to core crate for shared use across frontends.

Desktop app features:
- System specs display (RAM, CPU, GPU)
- Model compatibility table with fit scoring
- Dark theme UI using project icon from assets/icon.svg
- Tauri 2 with minimal permissions

No changes to data files — moved as-is via git mv.

Signed-off-by: Three Foxes (in a Trenchcoat) <threefoxes53235@gmail.com>
Signed-off-by: Three Foxes (in a Trenchcoat) <threefoxesyes3inatrenchcoat@gmail.com>
- Remove llama title/subtitle header from desktop app
- Show total + available RAM separately
- Render all detected GPUs with VRAM, backend, and count
- Show unified memory indicator for Apple Silicon
- Responsive grid layout for system spec cards

Signed-off-by: Three Foxes (in a Trenchcoat) <threefoxesyes3inatrenchcoat@gmail.com>
…t/workspace-and-desktop-v2

feat: workspace restructure + Tauri desktop app
Adds build-desktop job that builds Tauri desktop app for both
aarch64-apple-darwin and x86_64-apple-darwin targets.
DMGs are uploaded alongside CLI tarballs in GitHub Releases.

Signed-off-by: Three Foxes (in a Trenchcoat) <threefoxesyes3inatrenchcoat@gmail.com>
…t/workspace-and-desktop-v2

ci: build macOS desktop app (.dmg) in release workflow
Click any model row to open a detail modal showing:
- Parameters, quantization, runtime, score, speed, use case
- Memory utilization bar (color-coded green/yellow/red)
- Fit analysis with notes
- Installed status badge
- Download button (pulls via Ollama when available)
- Pull progress bar with live status polling

New Tauri commands: start_pull, poll_pull, is_ollama_available
Added runtime, installed, utilization_pct to model data.

Signed-off-by: Three Foxes (in a Trenchcoat) <threefoxesyes3inatrenchcoat@gmail.com>
The Tauri build runs with working-directory: llmfit-desktop but
the workspace target dir may be at the repo root or under the
subcrate. Search both locations and fail with diagnostics if
neither contains the bundle.

Signed-off-by: Three Foxes (in a Trenchcoat) <threefoxesyes3inatrenchcoat@gmail.com>
…t/desktop-modal

feat: model detail modal + Ollama download in desktop app
Signed-off-by: Alex <alexsimonjones@gmail.com>
- release.yml now excludes v*-mac tags (CLI + crate + homebrew only)
- New release-desktop.yml triggers on v*-mac tags
- Uses --bundles app to produce .app bundle without code signing
- Searches both target/ and llmfit-desktop/target/ for bundle
- Desktop releases no longer slow down normal CLI releases

Signed-off-by: Three Foxes (in a Trenchcoat) <threefoxesyes3inatrenchcoat@gmail.com>
Problem: Multi-GPU systems had their VRAM summed into a single pool, leading to
overly optimistic model fit recommendations since most inference runtimes
(llama.cpp, Ollama, etc.) don't support tensor parallelism by default.

Changes:
- NVIDIA detection: group by model, keep max per-card VRAM (never sum)
- AMD ROCm detection: collect per-card VRAM, use max per-card
- Refactor nvidia-smi parsing into separate testable function
- Update display text from "GB VRAM total" → "GB VRAM each"
- Add unit tests for multi-GPU parsing behavior

This gives more realistic recommendations by assuming models must fit on
a single GPU unless explicitly configured for tensor parallelism.
fix: use per-card VRAM instead of summed for multi-GPU systems
fix: typo in CHANGELOG.md (suppor -> support)
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…AlexsJones#49)

- For dense models: use choose_quant before deciding GPU path
- For MoE models: try quantization hierarchy in moe_offload_path
- Add moe_memory_for_quant helper to compute MoE memory at specific quant
- Add test_moe_offload_tries_lower_quantization test
- Add Remote Ollama instances section to README
- Documents OLLAMA_HOST env var for custom endpoints
- Addresses issue AlexsJones#40 - feature already exists but was undocumented
- Includes examples for remote servers, custom ports, Docker, etc.
docs: document OLLAMA_HOST environment variable for remote connections
…ysfs

Improve GPU identification fallback on Linux containers
- Rename llmfit-tui package to llmfit for crates.io continuity
- Add homepage and keywords to llmfit-core for publishing
- Update authors field to proper format
- Add version requirement for llmfit-core dependency

Fixes AlexsJones#58

Signed-off-by: Three Foxes (in a Trenchcoat) <threefoxesyes3inatrenchcoat@gmail.com>
- Publish llmfit-core first (dependency)
- Wait for crates.io index to update
- Then publish llmfit (depends on llmfit-core)

Signed-off-by: Three Foxes (in a Trenchcoat) <threefoxesyes3inatrenchcoat@gmail.com>
…/crates-io-metadata

fix: correct crates.io metadata and prepare for publishing
AlexsJones and others added 28 commits March 12, 2026 01:13
Signed-off-by: AlexsJones <alexsimonjones@gmail.com>
Signed-off-by: AlexsJones <alexsimonjones@gmail.com>
…-android-hw-detection

Fix Android CPU and Vulkan GPU detection fallback
Signed-off-by: AlexsJones <alexsimonjones@gmail.com>
…rmux-gpu-limitations

docs: document Android GPU detection limitations
Signed-off-by: AlexsJones <alexsimonjones@gmail.com>
Signed-off-by: AlexsJones <alexsimonjones@gmail.com>
  - test_gguf_source_deserialization — GgufSource JSON round-trips correctly
  - test_gguf_sources_default_to_empty — models without gguf_sources in JSON default to []
  - test_catalog_popular_models_have_gguf_sources — 5 well-known models (Llama-3.3-70B, Qwen2.5-7B, etc.)
  have non-empty gguf_sources in the catalog
  - test_catalog_gguf_sources_have_valid_repos — every gguf_source in the catalog has owner/repo format,
  non-empty provider, and contains GGUF
  - test_catalog_has_significant_gguf_coverage — at least 25% of catalog models have GGUF sources (currently
  30%)

  providers.rs (7 tests):
  - test_hf_name_to_gguf_candidates_generates_common_patterns — heuristic generates bartowski, ggml-org,
  TheBloke candidates
  - test_hf_name_to_gguf_candidates_strips_owner — strips the Org/ prefix correctly
  - test_lookup_gguf_repo_known_mappings — hardcoded mappings resolve for known models
  - test_lookup_gguf_repo_unknown_returns_none — unknown models return None
  - test_has_gguf_mapping_matches_known_models — boolean check works
  - test_gguf_candidates_fallback_covers_major_providers — fallback covers all 3 providers and all end in
  -GGUF
  - test_gguf_candidates_known_mapping_returns_single — hardcoded mapping returns exactly 1 result

Signed-off-by: AlexsJones <alexsimonjones@gmail.com>
Signed-off-by: AlexsJones <alexsimonjones@gmail.com>
Signed-off-by: AlexsJones <alexsimonjones@gmail.com>
Signed-off-by: AlexsJones <alexsimonjones@gmail.com>
Signed-off-by: AlexsJones <alexsimonjones@gmail.com>
Signed-off-by: AlexsJones <alexsimonjones@gmail.com>
Signed-off-by: AlexsJones <alexsimonjones@gmail.com>
The JSON output (--json flag and API) was missing `moe_offloaded_gb`,
so MoE models showed only active-expert VRAM as `memory_required_gb`
without indicating the additional RAM needed for inactive experts.

Add `moe_offloaded_gb` and `total_memory_gb` (VRAM + offloaded RAM)
to both display and API JSON serializers so consumers can see the
full memory footprint.

Closes AlexsJones#230

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…-fields

fix: surface MoE offloaded RAM in JSON output
Signed-off-by: Alex <alexsimonjones@gmail.com>
Signed-off-by: Alex <alexsimonjones@gmail.com>
Add support for Docker Desktop's built-in Model Runner as a fourth
runtime provider alongside Ollama, llama.cpp, and MLX. Detection probes
the OpenAI-compatible /v1/models endpoint on localhost:12434 (configurable
via DOCKER_MODEL_RUNNER_HOST). Downloads use `docker model pull`.

A new scraper (scripts/scrape_docker_models.py) queries Docker Hub's ai/
namespace and cross-references against the HF model database to produce
an embedded catalog (docker_models.json) of confirmed available models.
Only models verified in the catalog appear as downloadable via Docker.

- Provider: detect, list installed, pull via docker CLI
- TUI: status bar shows Docker availability, 'D' in Inst column,
  provider picker includes Docker Model Runner
- Inst column refactored from enum to bitfield for extensibility
- Makefile: `make update-catalogs` refreshes all scrapers and rebuilds

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Alex <alexsimonjones@gmail.com>
Signed-off-by: Alex <alexsimonjones@gmail.com>
Signed-off-by: Alex <alexsimonjones@gmail.com>
Signed-off-by: Alex <alexsimonjones@gmail.com>
When rocm-smi reports multiple GPU agents (e.g. a discrete RX 7900 XTX
alongside the integrated Raphael/iGPU on a Ryzen 9800X3D), two bugs
caused the wrong name to be returned:

1. The --showproductname parser used split(':').nth(1) which returns
   the field label ("Card series") instead of the model value.
   The line format is "GPU[N] : Card series : <name>", so the value is
   after the second colon; fixed to splitn(3, ':').nth(2).

2. VRAM filtering correctly identified the discrete GPU by its byte count,
   but the GPU index was not tracked, so the subsequent name lookup had no
   way to target the right GPU[N] in the product-name output.
   Fixed by tracking (gpu_index, vram_bytes) tuples and passing the
   first discrete GPU index to the name parser.

Extracted both parsing steps into parse_rocm_vram_indexed and
parse_rocm_gpu_name helper methods so they can be unit-tested without
a real ROCm installation. Five new unit tests are added.

Fixes AlexsJones#271
@AlexsJones
Copy link
Copy Markdown
Owner

Please cut a new PR against HEAD, this has too many changes from a bad rebase

@AlexsJones AlexsJones closed this Apr 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

llmfit detecting Ryzen 9800X3D CPU as a 2nd 7900XTX