Hnimrama/ix atom#238
Conversation
Stacked PR body covering motivation, technical details, test plan, lab results, and checklist targeting dev/dtni.
d62122b to
e17ab7f
Compare
Probe python3.13..python3 for import vllm; export BENCH_PY and BENCH_SCRIPT. Use shlex.quote for docker exec bash -c. Align InferenceMax client completion with Serving Benchmark Result or End-to-end Latency.
Search site-packages and ancestor paths, verify the file is readable, and document vllm[bench] when wheels omit benchmarks/.
Use CVS_GPU_MEMORY_UTIL in sample config and serve script to avoid vLLM unknown-env warnings. Extend default readiness poll budget to 60 and grep full server logs so Uvicorn ready is not missed after long model loads.
Wheels often omit vllm/benchmarks; resolve the driver via eval exports, run python -m vllm.entrypoints.cli.main bench serve when needed, and fail fast on missing-script log patterns in InferenceMax and base polling.
vLLM random workloads scale (ISL+OSL)*(1+r); clamp ratio when it would exceed MML, pass --temperature 0 for greedy parity, and forward --metric-percentiles in InferenceMax and vllm_single clients.
Read client_poll_count and client_poll_wait_time from benchmark_params (defaults 50/60), document them and fix the inferencemax.rst table, and surface the keys in sample MI300X/MI355X configs.
…st polling Gate benchmark success on Failed requests only after the summary is present; tail more client log lines for InferenceMax. Variant and benchmark_params accept bench_max_failed_requests (default 0 remains strict for CI).
Move InferenceMax loading onto substitute_config and a typed InferenceMaxVariantConfig with legacy adapters for InferenceMaxJob until the driver is ported.
…ase 2) Flatten MI300X and MI355X variant configs to paths/model/container/roles/params/sweep and client.* threshold specs with enforce_thresholds false until recalibrated.
…Phase 2) Use variant_config and legacy adapter fixtures, parametrization from sweep.runs, and unit tests for load_variant and threshold adapters.
…ion 1 (Phase 2) Point loader and threshold docs at inferencemax_config_loader.load_variant and the client.* sweep cell format.
Point run-cvs-tests and dtni-dev-guide at cvs.lib.utils and inference/utils loaders.
Standalone driver uses Python-built vllm serve, vllm bench serve, and artifact parsing. Drop legacy InferenceBaseJob path and factory construction.
…_args (Phase 3) MI300X and MI355X variants drop host-script and bench_serving params in favor of Python serve args.
… (Phase 3) Add model_fetch, test_metric, and new InferenceMaxJob lifecycle. Update conftest and unit tests for typed config.
…ase 3) Document Python serve, client.* metrics, and expanded lifecycle test stages.
Host script staging was dropped when InferenceMaxJob moved to Python-built vllm serve.
InferenceMax and vllm_single build vllm serve in Python; this package remains for InferenceBaseJob paths.
…(Phase 5) Replace legacy config/benchmark_params table with typed blocks and client.* thresholds. Document inferencemax_config_loader in AGENTS.md.
Verify stock results artifact maps to client.* metrics via FakeOrch.
…ngle Adopt InferenceX ATOM as the framework identity while the suite is still internal. Renames the driver, config loader, pytest suite, variant configs, and documentation to inferencex_atom_single.
Add Section 1.2 hardware policy, revise M1/Phase A exit criteria, and update milestone diagrams so MI355X confirmation is optional until nodes are available.
…log. Align branch state and phases with ATOM driver reality, add accuracy test catalog, metric tiers, and platform enhancements; update W1 README lab notes.
…and metrics. Document perf variant modes, workload-specific accuracy tests, inferencex_atom_vllm/sglang parity suites, supplemental and MTP metrics, and CI compare keys.
Route inferencemax_repo and framework=inferencemax to inferencex_atom with a warning. InferenceMax host jobs remain a placeholder that points callers at inferencex_atom_single.
Add validate_thresholds_cover_sweep() for reuse by vllm_single and inferencex_atom loaders. Optional gated_metrics parameter allows framework-specific SLO sets.
Rename W1 and GPT-OSS variant stems to {gpu}_inferencex-atom-single_{model}_{precision}[_{mode}] and remove per-variant subfolders. Update README and docs with new copy-config paths and W1 threshold gates for per-GPU throughput and tail latencies.
Add inferencex_atom_parsing with IX-specific GATED_METRICS (per_gpu_throughput, output_tput_per_gpu) without changing vllm_single. Wire InferenceXAtomJob and the suite to atom parsing; default metric_percentiles to 95,99 for p95 TPOT and p99 TTFT.
Clarify that W1 GATED_METRICS and output_tput_per_gpu live in inferencex_atom_parsing so vllm_single stays untouched until parity.
Drop vllm_single comparisons and internal threshold tables; keep naming pattern, variant list, and copy/run commands.
Document per-variant ~/input subdirs to avoid ambiguous threshold discovery, remote launcher vs GPU node prerequisites, and ~/cvs_results output paths.
Elevate scaling to P1 milestone M5 immediately after M4 parity when hardware and suite recipes support nnodes>1; defer MTP+P2 widen to M6.
544f8ad to
73dfb65
Compare
Gate p99_tpot_ms instead of absent p95_tpot_ms, skip missing tier metrics in actuals, and recalibrate MI300X perf thresholds from the 2026-06-25 lab run.
Replace per-node calibrated gates with conservative throughput floors and loose latency caps so healthy runs pass across lab nodes without recalibration.
atnair-amd
left a comment
There was a problem hiding this comment.
Automated review from five-pass analysis (structure · duplication · unit tests · code quality · live validation run).
Blockers (must fix before merge): false-green on missing config_file · parse_results error paths untested · _client_log_failures untested
Majors: driver default guard · config/threshold structure diverges from vllm · server reuse helpers untested · build_server_cmd suppression untested · _merged_serve_args untested · tier-explosion untested · reuse_server_across_sweep default · no ATOM early-failure detection · wheel not in shared venv
Minors/NITs: see inline comments
No changes are requested without reviewer approval — comments only.
| class InferenceXAtomParams(_Forbid): | ||
| # ``atom`` uses ATOM openai_server + benchmark_serving; ``vllm`` keeps the | ||
| # interim uplift path (vllm serve + vllm bench serve). | ||
| driver: Literal["atom", "vllm"] = "vllm" |
There was a problem hiding this comment.
[MAJOR] driver="vllm" default has no guard — silent misrouting for new ATOM configs
The two gpt-oss-120b configs intentionally omit driver and rely on this default (their serve args are vllm-specific — that is correct). But there is no safeguard for a new config author who adds an ATOM config and forgets the field: they silently get a different binary, different log paths, and a different client — all with exit 0. Suggest adding a Pydantic validator or a conftest-level log warning that fires when framework == "inferencex_atom_single" and driver == "vllm", so the mismatch is visible at run start rather than discovered in logs.
| @@ -0,0 +1,80 @@ | |||
| { | |||
There was a problem hiding this comment.
[MAJOR] Config files use ix_recipe_id indirection instead of the flat inline structure used by vllm — and threshold_json is absent
vllm configs are fully self-contained: model id, tensor parallelism, dtype, serve args, sweep, and threshold_json are all directly embedded. The ix-atom configs instead carry a single pointer (ix_recipe_id) and defer atom_args (e.g. -tp 8 --kv_cache_dtype fp8 --trust-remote-code), model id, and bench_extra_args to ix_recipes.json, resolved at load time by apply_ix_recipe. This means you cannot understand what a config will actually do without reading a second file.
Additionally, none of the ix-atom configs populate the threshold_json field (inherited from BaseVariantConfig). They rely on the proximity-glob fallback in substitute_config() which raises ValueError if more than one *threshold.json ever exists in the directory. The vllm pattern is explicit: each config carries "threshold_json": "<path>".
Suggested fix: (1) flatten atom_args and key recipe fields inline into the config (as vllm does with roles.server.serve_args), and (2) add "threshold_json": "<path>" to each config file pointing at its paired threshold.
| def orch(cluster_dict, variant_config, lifecycle): | ||
| container_block = _deep_merge( | ||
| cluster_dict.get("container", {}), | ||
| orchestrator_container_from_variant(variant_config), |
There was a problem hiding this comment.
[NIT] orchestrator_container_from_variant call site has no comment explaining its side-effect
The function now has a docstring (includes server env), but this call site has no annotation. A future author who copies the conftest and swaps back to variant_config.container.model_dump() silently drops server env vars. Suggest adding: # also injects roles.server.env — do not replace with .container.model_dump()
| lifecycle, | ||
| request, | ||
| ): | ||
| """One pytest row per metric tier per sweep cell (W1 gate batches).""" |
There was a problem hiding this comment.
[MINOR] test_cell_metrics failure modes are undocumented in the docstring
There are now two pytest.fail paths (lines 185 and 192–195) that are stricter than vllm's silent-return behaviour: (1) no threshold specs exist for a gated tier, and (2) specs exist but none of the expected metrics are present in the actuals. Neither is documented in the docstring, making it hard for future authors to understand the contract. Suggest updating the docstring to describe both failure conditions.
|
|
||
|
|
||
| @pytest.fixture(scope="module") | ||
| def server_session(): |
There was a problem hiding this comment.
[MINOR] server_session fixture has no docstring
This is the only fixture in conftest.py with no vllm analogue and its purpose is non-obvious. Suggest adding: """Tracks the active server session key to allow reuse across sweep cells when reuse_server_across_sweep=true."""
| self.orch.exec(f"mkdir -p {shlex.quote(self.out_dir)}") | ||
|
|
||
| @classmethod | ||
| def _merged_serve_args(cls, variant): |
There was a problem hiding this comment.
[MAJOR] _merged_serve_args gpu-memory-util promotion is untested
The promotion of CVS_GPU_MEMORY_UTIL from server_env into gpu-memory-utilization in the serve-args dict (lines 172–174) is a separate code path from orchestrator_container_from_variant and has no test. Please add a test asserting the promotion happens when the key is absent from merged, and is a no-op when already present.
| with open(config_file) as fp: | ||
| raw = json.load(fp) | ||
| cases, ids = expand_sweep(raw.get("sweep", {})) | ||
| if "metric_tier" in metafunc.fixturenames: |
There was a problem hiding this comment.
[MAJOR] pytest_generate_tests tier-explosion path is untested
The "metric_tier" in metafunc.fixturenames branch (line 84) and the tier × sweep multiplication (lines 85–92) are a separate code path from expand_sweep (which is tested). The len(cases) * len(METRIC_TIER_ORDER) product is not pinned anywhere. Please add a test using a SimpleNamespace fake metafunc (with a recording .parametrize) asserting the correct number of parametrize calls and IDs for a known sweep config.
|
|
||
|
|
||
| def pytest_generate_tests(metafunc): | ||
| config_file = metafunc.config.getoption("config_file") |
There was a problem hiding this comment.
[BLOCKER] Missing config_file produces a false-green zero-test run
Lines 78–79: if not config_file or not os.path.isfile(config_file): return causes pytest to collect 0 tests and exit 0 — indistinguishable from a fully passing run. This is a silent failure mode: a typo in --config_file or a missing file will look like success in CI. Please replace the return with pytest.exit("--config_file not found or not specified", returncode=1) or raise pytest.UsageError.
|
|
||
|
|
||
| def _reuse_server_flag(params) -> bool: | ||
| raw = str(getattr(params, "reuse_server_across_sweep", "true")).strip().lower() |
There was a problem hiding this comment.
[MAJOR] _reuse_server_flag defaults to "true" when reuse_server_across_sweep is absent from config
getattr(params, "reuse_server_across_sweep", "true") — a config that omits the field silently enables server reuse, which is a surprising default (especially for smoke runs). The same default is set at inferencex_atom_config_loader.py line 68. Suggest changing to "false" (safe: always restart) or making the field required in the schema.
| log.info("waiting %ds for server log to materialise", self._precheck_wait) | ||
| time.sleep(self._precheck_wait) | ||
|
|
||
| if self.driver != "atom": |
There was a problem hiding this comment.
[MAJOR] No early-failure detection for the ATOM driver during the wait_ready poll loop
Lines 294–298: the EARLY_FAILURE_RE log scan is gated on if self.driver != "atom" — the atom driver skips it entirely. During the smoke run the server was silent for 29 minutes with 28 consecutive health-poll NO responses; a crashed server would be indistinguishable from a loading one until timeout. start_server does check EARLY_FAILURE_RE on the immediate nohup shell output (line 257) but that only catches shell-level launch failures, not server-side startup failures written to the log file during the load window. Suggest adding EARLY_FAILURE_RE pattern matching against the ATOM server log inside the wait_ready poll loop, analogous to the vllm path.
| '_cvs_run_bench() { case "$BENCH_KIND" in cli_module) ' | ||
| '"$BENCH_PY" -m vllm.entrypoints.cli.main bench serve "$@" ' | ||
| ';; *) "$BENCH_PY" "$BENCH_SCRIPT" "$@" ;; esac; }; ' | ||
| "export BENCH_PY BENCH_KIND BENCH_SCRIPT" |
There was a problem hiding this comment.
This function returns a long shell string containing embedded Python and may create several problems.
RECOMMENDED
PYTHON_CANDIDATES = ["python3.13", "python3.12", "python3.11", "python3.10", "python3"]
def _build_interpreter_loop(py_q: str, candidates: list[str]) -> str:
"""Build the shell interpreter-discovery loop separately for testability."""
candidate_str = " ".join(candidates)
return (
f"for _cvs_py in {candidate_str}; do\n"
f" _cvs_eval=$($_cvs_py -c {py_q}) || continue\n"
...
)
| _inserted = True | ||
| if not _inserted: | ||
| CLIENT_METRICS.append(_METRIC_INSERT) | ||
| CLIENT_METRIC_UNITS = dict(CLIENT_METRICS) |
There was a problem hiding this comment.
Alternative:
idx = next((i for i, (n, _) in enumerate(_VLLM_CLIENT_METRICS) if n == "per_gpu_throughput"), None)
CLIENT_METRICS = list(_VLLM_CLIENT_METRICS)
CLIENT_METRICS.insert((idx + 1) if idx is not None else len(CLIENT_METRICS), _METRIC_INSERT)
| saw_marker = True | ||
| continue | ||
| if text.isdigit(): | ||
| total = max(total, int(text)) |
There was a problem hiding this comment.
Taking max across hosts means the return value is the largest single-host measurement, not the total bytes across the distributed model cache. I think it should be
if text.isdigit():
total += int(text) # sum shards, not max shard
| if not rows: | ||
| return | ||
| try: | ||
| import pytest_html |
There was a problem hiding this comment.
you can move this to the top in the import section
PR #238 — InferenceX ATOM W1 (MI300X perf gates)
PR: #238
Head:
hnimrama/IX-atom→ Base:dev/dtniJira: AIMVT-236 · AIMVT-244
Motivation
CVS needs a first-class InferenceX ATOM automation path aligned with the DTNI Validation Tracker (IX ATOM) — not the legacy
inferencemax_singleuplift. This PR:inferencex_atom_singlesuite with an ATOM-native driver (atom.entrypoints.openai_server+atom.benchmarks.benchmark_serving).enforce_thresholds: trueafter lab confirmation.plans/inferencex-atom-cvs-automation-plan.md.MI355X variant configs ship with
enforce_thresholds: falseuntil hardware is available — they do not block merge or MI300X milestone work.Base branch:
dev/dtni.Technical Details
Suite and orchestration
inferencemax_single→inferencex_atom_single; legacyinferencemax/configs removed.InferenceXAtomJob(inferencex_atom_orch.py):params.driver=atom→ ATOM server +benchmark_serving; JSON artifacts parsed viato_client_metrics.params.driver=vllmretained for interim GPT-OSS uplift variants only.cvs/input/config_file/inference/inferencex_atom_single/:{gpu}_inferencex-atom-single_{model}_{precision}[_{mode}]_config.json{gpu}_inferencex-atom-single_{model}_{precision}[_{mode}]_threshold.jsonschema_version: 1, typed loader, andix_recipes.jsonrecipe pins (dsr1-fp8-mi300x-atom, etc.).mi300x_atom_single.json,mi355x_atom_single.json; container names pinned (inferencex_atom_mi300x/inferencex_atom_mi355x).inference_suite_lifecycle.py,inference_suite_results_table.py.W1 variants shipped
enforce_thresholdsmi300x_inferencex-atom-single_deepseek-r1_fp8_perftruemi300x_inferencex-atom-single_deepseek-r1_fp8_smokefalsemi300x_inferencex-atom-single_deepseek-r1_fp8_mtp3falsemi355x_inferencex-atom-single_deepseek-r1_fp8_perffalsemi355x_inferencex-atom-single_deepseek-r1_fp8_mtp3falseThreshold / metrics plumbing
test_cell_metrics(METRIC_TIERS: throughput, ttft, tpot, health, record) — one pytest row per tier per sweep cell.to_client_metrics: deriveclient.failedwhen ATOM omits it (num_prompts - completed), then computeclient.success_rate; addclient.output_tput_per_gpu.p99_tpot_ms(ATOMbenchmark_servingemits p99 tails;p95_tpot_msmay be absent even withmetric_percentiles: "95,99"). Tier enforcement skips metrics missing from the artifact.Platform / shared infra (supporting changes)
cvs/lib/utils/— sharedconfig_loader,verdict, sweep selector.vllm_singlesuite refactor to the same lifecycle / metric pattern.cvs/input/config_file/inference/inferencex_atom_single/README.md.Out of scope (follow-up on
dev/dtni)inferencex_atom_vllm_single/inferencex_atom_sglang_singleparity frameworksTest Plan
CI / unit (no GPU)
pytest cvs/tests/inference/inferencex_atom/ -qpytest cvs/lib/inference/unittests/test_inferencex_atom_parsing.py -qpytest cvs/lib/inference/unittests/test_inferencex_atom_config_loader.py -qLab — MI300X smoke
Launcher:
CTR-SVDT-L005(10.7.54.167) · GPU node:10.245.135.75Lab — MI300X W1 perf (M1 gate)
CONC=128,CONC=256(ISL=OSL=1024, TP=8); server reused across cells.enforce_thresholds: true.MI355X
enforce_thresholdswhen hardware is available.Test Result
MI300X lab —
mi300x_inferencex-atom-single_deepseek-r1_fp8_perfhnimrama/IX-atom@07c90a7CTR-SVDT-L005(10.7.54.167) · GPU node:10.245.135.75make installfrom repo root (.cvs_venv)rocm/atom-dev:latestdeepseek-ai/DeepSeek-R1-0528(FP8, TP=8)client.output_throughput(measured)MI300X lab — smoke
Artifacts (attach to PR)
20260625_173442_ix-atom-w1-perf_mi300x.html
inferencex_atom_single_2026-06-25T175704.zip
Unit tests
Submission Checklist
dev/dtnimake installfrom this branch (not a stale site-packages install)enforce_thresholds: trueonly on MI300X W1 perf — confirmed in labenforce_thresholds: false(pending hardware)dev/dtni