Skip to content

feat: wire oem_manager_profiles config through to machine_setup (nvbugs-5834644)#533

Open
martinraumann wants to merge 5 commits intoNVIDIA:mainfrom
martinraumann:nvbugs-5834644-oem-profiles
Open

feat: wire oem_manager_profiles config through to machine_setup (nvbugs-5834644)#533
martinraumann wants to merge 5 commits intoNVIDIA:mainfrom
martinraumann:nvbugs-5834644-oem-profiles

Conversation

@martinraumann
Copy link
Contributor

@martinraumann martinraumann commented Mar 11, 2026

Description

Wires the new `oem_manager_profiles` config parameter (introduced in libredfish v0.43.3) through carbide-core's config system and state controller, enabling site operators to apply vendor-specific BMC manager attributes during `machine_setup` via TOML config — without code changes.

The motivating use case is the Dell R760 PSU Hot Spare fan issue (nvbugs-5834644): the `ServerPwr.1.PSRapidOn` iDRAC attribute must be set to `Disabled` before BMC lockdown to prevent unnecessary fan spin-up. This PR provides the mechanism and enables the fix globally by default in `carbide-api-config.toml`.

Note: libredfish v0.43.3 was already bumped in #466. This PR builds on top of that.

Changes

  • `helm/charts/carbide-api/files/carbide-api-config.toml` — adds `oem_manager_profiles` entries for Dell PowerEdge R760 (both `performance` and `powerefficiency` profiles) so the PSU Hot Spare fix is active by default on all sites with R760 hosts.
  • `crates/api/src/cfg/file.rs` — adds `oem_manager_profiles: libredfish::BiosProfileVendor` to `CarbideConfig` with `#[serde(default)]`. Existing site configs require no changes.
  • `crates/api/src/state_controller/machine/handler.rs` — passes `&site_config.oem_manager_profiles` to `machine_setup` (previously `&HashMap::default()` from Upgrade libredfish to 0.43.3 #466). Adds a regression test verifying the field flows through correctly.
  • `crates/api/src/redfish.rs` — updates test mock to record `MachineSetup` actions with the OEM profiles map, enabling test assertions.
  • `crates/api/src/tests/common/api_fixtures/mod.rs` — initializes new field to empty map in shared test fixture.
  • `crates/admin-cli/src/redfish/cmds.rs` — minor style: `HashMap::new()` → `HashMap::default()` for consistency.

Design

`oem_manager_profiles` uses the same `BiosProfileVendor` type as the existing `bios_profiles` field — a vendor-keyed nested map. It is vendor-neutral: libredfish routes entries to the correct OEM manager endpoint per vendor (e.g. `Managers/{id}/Oem/Dell/DellAttributes/{id}` for Dell). Full design notes: nvbugs-5834644-oem-profiles-design.md

Type of Change

  • Add - New feature or capability
  • Fix - Bug fixes

Testing

  • Unit tests added/updated

New test: `test_oem_manager_profiles_passed_to_machine_setup` — constructs a Dell R760 OEM profile map, puts it in `CarbideConfig`, calls `call_machine_setup_and_handle_no_dpu_error`, and asserts the mock received the correct map. Guards against future regressions where the argument is dropped or replaced with an empty map.

Additional Notes

The R760 entry in `carbide-api-config.toml` is keyed by model name as returned by the Redfish `/Systems/{id}` `Model` field (spaces replaced with underscores via `model_coerce`). Sites without R760 hosts are unaffected — an empty/non-matching map produces no PATCH body changes.

@martinraumann martinraumann requested a review from a team as a code owner March 11, 2026 23:42
@github-actions
Copy link

github-actions bot commented Mar 11, 2026

🛡️ Vulnerability Scan

🚨 Found 72 vulnerability(ies)
📊 vs main: 72 (no change)

Severity Breakdown:

  • 🔴 Critical/High: 72
  • 🟡 Medium: 0
  • 🔵 Low/Info: 0

🔗 View full details in Security tab

🕐 Last updated: 2026-03-16 15:53:27 UTC | Commit: c7f446b

@martinraumann martinraumann force-pushed the nvbugs-5834644-oem-profiles branch from 78ddba5 to 6e38623 Compare March 11, 2026 23:45
@github-actions
Copy link

🔐 TruffleHog Secret Scan

No secrets or credentials found!

Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉

🔗 View scan details

🕐 Last updated: 2026-03-11 23:45:10 UTC | Commit: 78ddba5

@martinraumann martinraumann self-assigned this Mar 11, 2026
Copy link
Contributor

@kensimon kensimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor readability nit, otherwise LGTM.

…utes

Extends machine_setup to accept oem_manager_profiles, a new config-driven
map (same structure as bios_profiles) for vendor/model-keyed Dell iDRAC
manager attributes.

In libredfish (v0.43.2 / f08c5e79), machine_setup gains a fourth parameter
oem_manager_profiles: &BiosProfileVendor. For Dell, matching attributes are
merged into the machine_setup_oem PATCH payload targeting
Managers/{id}/Oem/Dell/DellAttributes/{id}. All other vendors accept the
parameter as a no-op. disable_psu_hot_spare() is removed from the trait.

carbide-core changes:
- CarbideConfig gains oem_manager_profiles field (TOML-deserializable,
  same type as bios_profiles)
- state_controller passes oem_manager_profiles to machine_setup at all
  call sites
- mock machine_setup updated to match new signature
- admin-cli and site_explorer pass empty map (no OEM profiles needed there)

To enable the Dell R760 PSU Hot Spare fix, add to site config:
  [oem_manager_profiles.Dell.PowerEdge_R760.performance]
  ServerPwr.1.PSRapidOn = "Disabled"

Fixes: nvbugs-5834644
Signed-off-by: M Raumann <mraumann@nvidia.com>
@martinraumann martinraumann force-pushed the nvbugs-5834644-oem-profiles branch from 3b64aa8 to 3ef1d25 Compare March 13, 2026 15:31
@martinraumann martinraumann enabled auto-merge (squash) March 13, 2026 16:11
/// Used for vendor-specific pre-ingestion fixes that must run before BMC lockdown.
/// Example: Dell R760 PSU Hot Spare disable (ServerPwr.1.PSRapidOn = Disabled).
#[serde(default)]
pub oem_manager_profiles: libredfish::BiosProfileVendor,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the doc should say that this is a hashmap so that people know what to put in the config file. better yet, maybe an example of what goes in the config file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only because you'd need to look at the redfish crate to know

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hopefully I'm just behind the times. are we using this for all production sites (that want the fix anyway). or is there a corresponding change in the forged repo?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants