fix: correct ROCm GPU name extraction and discrete GPU selection#307
Closed
octo-patch wants to merge 347 commits intoAlexsJones:mainfrom
Closed
fix: correct ROCm GPU name extraction and discrete GPU selection#307octo-patch wants to merge 347 commits intoAlexsJones:mainfrom
octo-patch wants to merge 347 commits intoAlexsJones:mainfrom
Conversation
The name 'huggingface-cli' is deprecated. Their CLI is now called 'hf': https://huggingface.co/docs/huggingface_hub/en/guides/cli
fix: invoke hf instead of huggingface-cli
Restructure single-crate project into Cargo workspace: - llmfit-core: core library (hardware detection, model fitting, providers) - llmfit-tui: CLI/TUI binary (unchanged user experience) - llmfit-desktop: macOS desktop app via Tauri 2 The workspace split enables the desktop app to reuse core logic while keeping the CLI/TUI as the default build target. Moved SortColumn to core crate for shared use across frontends. Desktop app features: - System specs display (RAM, CPU, GPU) - Model compatibility table with fit scoring - Dark theme UI using project icon from assets/icon.svg - Tauri 2 with minimal permissions No changes to data files — moved as-is via git mv. Signed-off-by: Three Foxes (in a Trenchcoat) <threefoxes53235@gmail.com> Signed-off-by: Three Foxes (in a Trenchcoat) <threefoxesyes3inatrenchcoat@gmail.com>
- Remove llama title/subtitle header from desktop app - Show total + available RAM separately - Render all detected GPUs with VRAM, backend, and count - Show unified memory indicator for Apple Silicon - Responsive grid layout for system spec cards Signed-off-by: Three Foxes (in a Trenchcoat) <threefoxesyes3inatrenchcoat@gmail.com>
…t/workspace-and-desktop-v2 feat: workspace restructure + Tauri desktop app
Adds build-desktop job that builds Tauri desktop app for both aarch64-apple-darwin and x86_64-apple-darwin targets. DMGs are uploaded alongside CLI tarballs in GitHub Releases. Signed-off-by: Three Foxes (in a Trenchcoat) <threefoxesyes3inatrenchcoat@gmail.com>
…t/workspace-and-desktop-v2 ci: build macOS desktop app (.dmg) in release workflow
Click any model row to open a detail modal showing: - Parameters, quantization, runtime, score, speed, use case - Memory utilization bar (color-coded green/yellow/red) - Fit analysis with notes - Installed status badge - Download button (pulls via Ollama when available) - Pull progress bar with live status polling New Tauri commands: start_pull, poll_pull, is_ollama_available Added runtime, installed, utilization_pct to model data. Signed-off-by: Three Foxes (in a Trenchcoat) <threefoxesyes3inatrenchcoat@gmail.com>
The Tauri build runs with working-directory: llmfit-desktop but the workspace target dir may be at the repo root or under the subcrate. Search both locations and fail with diagnostics if neither contains the bundle. Signed-off-by: Three Foxes (in a Trenchcoat) <threefoxesyes3inatrenchcoat@gmail.com>
…t/desktop-modal feat: model detail modal + Ollama download in desktop app
Signed-off-by: Alex <alexsimonjones@gmail.com>
- release.yml now excludes v*-mac tags (CLI + crate + homebrew only) - New release-desktop.yml triggers on v*-mac tags - Uses --bundles app to produce .app bundle without code signing - Searches both target/ and llmfit-desktop/target/ for bundle - Desktop releases no longer slow down normal CLI releases Signed-off-by: Three Foxes (in a Trenchcoat) <threefoxesyes3inatrenchcoat@gmail.com>
Problem: Multi-GPU systems had their VRAM summed into a single pool, leading to overly optimistic model fit recommendations since most inference runtimes (llama.cpp, Ollama, etc.) don't support tensor parallelism by default. Changes: - NVIDIA detection: group by model, keep max per-card VRAM (never sum) - AMD ROCm detection: collect per-card VRAM, use max per-card - Refactor nvidia-smi parsing into separate testable function - Update display text from "GB VRAM total" → "GB VRAM each" - Add unit tests for multi-GPU parsing behavior This gives more realistic recommendations by assuming models must fit on a single GPU unless explicitly configured for tensor parallelism.
fix: use per-card VRAM instead of summed for multi-GPU systems
fix: typo in CHANGELOG.md (suppor -> support)
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Fix compile warnings in providers and TUI
…AlexsJones#49) - For dense models: use choose_quant before deciding GPU path - For MoE models: try quantization hierarchy in moe_offload_path - Add moe_memory_for_quant helper to compute MoE memory at specific quant - Add test_moe_offload_tries_lower_quantization test
- Add Remote Ollama instances section to README - Documents OLLAMA_HOST env var for custom endpoints - Addresses issue AlexsJones#40 - feature already exists but was undocumented - Includes examples for remote servers, custom ports, Docker, etc.
docs: document OLLAMA_HOST environment variable for remote connections
…ysfs Improve GPU identification fallback on Linux containers
- Rename llmfit-tui package to llmfit for crates.io continuity - Add homepage and keywords to llmfit-core for publishing - Update authors field to proper format - Add version requirement for llmfit-core dependency Fixes AlexsJones#58 Signed-off-by: Three Foxes (in a Trenchcoat) <threefoxesyes3inatrenchcoat@gmail.com>
- Publish llmfit-core first (dependency) - Wait for crates.io index to update - Then publish llmfit (depends on llmfit-core) Signed-off-by: Three Foxes (in a Trenchcoat) <threefoxesyes3inatrenchcoat@gmail.com>
…/crates-io-metadata fix: correct crates.io metadata and prepare for publishing
ci: enable windows build targets
Signed-off-by: AlexsJones <alexsimonjones@gmail.com>
Signed-off-by: AlexsJones <alexsimonjones@gmail.com>
…-android-hw-detection Fix Android CPU and Vulkan GPU detection fallback
Add more lfm models
Signed-off-by: AlexsJones <alexsimonjones@gmail.com>
…rmux-gpu-limitations docs: document Android GPU detection limitations
Signed-off-by: AlexsJones <alexsimonjones@gmail.com>
Signed-off-by: AlexsJones <alexsimonjones@gmail.com>
- test_gguf_source_deserialization — GgufSource JSON round-trips correctly - test_gguf_sources_default_to_empty — models without gguf_sources in JSON default to [] - test_catalog_popular_models_have_gguf_sources — 5 well-known models (Llama-3.3-70B, Qwen2.5-7B, etc.) have non-empty gguf_sources in the catalog - test_catalog_gguf_sources_have_valid_repos — every gguf_source in the catalog has owner/repo format, non-empty provider, and contains GGUF - test_catalog_has_significant_gguf_coverage — at least 25% of catalog models have GGUF sources (currently 30%) providers.rs (7 tests): - test_hf_name_to_gguf_candidates_generates_common_patterns — heuristic generates bartowski, ggml-org, TheBloke candidates - test_hf_name_to_gguf_candidates_strips_owner — strips the Org/ prefix correctly - test_lookup_gguf_repo_known_mappings — hardcoded mappings resolve for known models - test_lookup_gguf_repo_unknown_returns_none — unknown models return None - test_has_gguf_mapping_matches_known_models — boolean check works - test_gguf_candidates_fallback_covers_major_providers — fallback covers all 3 providers and all end in -GGUF - test_gguf_candidates_known_mapping_returns_single — hardcoded mapping returns exactly 1 result Signed-off-by: AlexsJones <alexsimonjones@gmail.com>
Signed-off-by: AlexsJones <alexsimonjones@gmail.com>
Signed-off-by: AlexsJones <alexsimonjones@gmail.com>
Signed-off-by: AlexsJones <alexsimonjones@gmail.com>
Signed-off-by: AlexsJones <alexsimonjones@gmail.com>
Signed-off-by: AlexsJones <alexsimonjones@gmail.com>
Signed-off-by: AlexsJones <alexsimonjones@gmail.com>
The JSON output (--json flag and API) was missing `moe_offloaded_gb`, so MoE models showed only active-expert VRAM as `memory_required_gb` without indicating the additional RAM needed for inactive experts. Add `moe_offloaded_gb` and `total_memory_gb` (VRAM + offloaded RAM) to both display and API JSON serializers so consumers can see the full memory footprint. Closes AlexsJones#230 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…-fields fix: surface MoE offloaded RAM in JSON output
Signed-off-by: Alex <alexsimonjones@gmail.com>
Signed-off-by: Alex <alexsimonjones@gmail.com>
Add support for Docker Desktop's built-in Model Runner as a fourth runtime provider alongside Ollama, llama.cpp, and MLX. Detection probes the OpenAI-compatible /v1/models endpoint on localhost:12434 (configurable via DOCKER_MODEL_RUNNER_HOST). Downloads use `docker model pull`. A new scraper (scripts/scrape_docker_models.py) queries Docker Hub's ai/ namespace and cross-references against the HF model database to produce an embedded catalog (docker_models.json) of confirmed available models. Only models verified in the catalog appear as downloadable via Docker. - Provider: detect, list installed, pull via docker CLI - TUI: status bar shows Docker availability, 'D' in Inst column, provider picker includes Docker Model Runner - Inst column refactored from enum to bitfield for extensibility - Makefile: `make update-catalogs` refreshes all scrapers and rebuilds Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Alex <alexsimonjones@gmail.com>
Signed-off-by: Alex <alexsimonjones@gmail.com>
Signed-off-by: Alex <alexsimonjones@gmail.com>
Signed-off-by: Alex <alexsimonjones@gmail.com>
When rocm-smi reports multiple GPU agents (e.g. a discrete RX 7900 XTX
alongside the integrated Raphael/iGPU on a Ryzen 9800X3D), two bugs
caused the wrong name to be returned:
1. The --showproductname parser used split(':').nth(1) which returns
the field label ("Card series") instead of the model value.
The line format is "GPU[N] : Card series : <name>", so the value is
after the second colon; fixed to splitn(3, ':').nth(2).
2. VRAM filtering correctly identified the discrete GPU by its byte count,
but the GPU index was not tracked, so the subsequent name lookup had no
way to target the right GPU[N] in the product-name output.
Fixed by tracking (gpu_index, vram_bytes) tuples and passing the
first discrete GPU index to the name parser.
Extracted both parsing steps into parse_rocm_vram_indexed and
parse_rocm_gpu_name helper methods so they can be unit-tested without
a real ROCm installation. Five new unit tests are added.
Fixes AlexsJones#271
Owner
|
Please cut a new PR against HEAD, this has too many changes from a bad rebase |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #271
Problem
On systems with both a discrete GPU and an iGPU visible to ROCm (e.g. Ryzen 9800X3D + RX 7900 XTX), two bugs in
detect_amd_gpu_rocm_infocaused the wrong GPU name to be reported and the GPU to still show up withgpu_name: "Card Series":Wrong split index: The
--showproductnameparser usedsplit(':').nth(1), which returns the field label (literally"Card series") instead of the model value. The line format isGPU[N] : Card series : <name>, so the value lives after the second colon.GPU index not tracked: VRAM filtering correctly identified discrete GPU entries by byte count, but didn't record which
GPU[N]indices they belong to. The subsequent name lookup therefore had no way to target the correct GPU in the product-name output and always returned the first matching line (which could be the iGPU).Solution
splitn(3, ':').nth(2)so the actual model name is returned instead of the label.(gpu_index, vram_bytes)tuples. The first discrete GPU index is then passed to the name parser, which performs a targetedGPU[N]-prefixed scan before falling back to the first match.parse_rocm_vram_indexedandparse_rocm_gpu_namehelper methods so they can be unit-tested without a real ROCm installation.Testing
Five new unit tests added in
hardware::tests:test_parse_rocm_vram_indexed_single_gpu— single GPU VRAM parsed with correct indextest_parse_rocm_vram_indexed_dual_gpu_apu— two entries (discrete + iGPU) parsed with correct indicestest_parse_rocm_gpu_name_single_gpu— card series value extracted (not the label)test_parse_rocm_gpu_name_prefers_target_index— discrete GPU name returned when iGPU comes firsttest_parse_rocm_gpu_name_falls_back_without_index— fallback path works when no target index givenAll tests pass (
cargo test -p llmfit-core -- rocm). No hardware required to run the tests.