Report recommended CPU models for heterogeneous clusters #3944

dankenigsberg · 2025-12-30T12:43:22Z

In heterogeneous clusters with multiple CPU vendors and generations, choosing the best CPU model for VMs is challenging:

Using a newer CPU model limits VM placement to a subset of nodes
Using an older model impacts guest performance unnecessarily

This change adds automatic CPU model recommendations to the HyperConverged status. The recommendations are sorted by a weighted score that balances:

PassMark performance score (50%) - for guest performance
Available CPU cores (20%) - for workload capacity
Available memory (15%) - for workload capacity
Node count (15%) - for availability and redundancy

The top recommendations appear in status.nodeInfo.recommendedCpuModels, helping cluster administrators make informed decisions when setting spec.defaultCPUModel.

The CPU model data is gathered from node labels (cpu-model.node.kubevirt.io/*) set by KubeVirt's virt-handler, and is automatically refreshed when nodes change or hourly.

Assisted-by: claude-4-sonnet

Reviewer Checklist

Reviewers are supposed to review the PR for every aspect below one by one. To check an item means the PR is either "OK" or "Not Applicable" in terms of that item. All items are supposed to be checked before merging a PR.

Jira Ticket:

CNV-42906

Release note:

HCO CR status recommends cpu models for VMs

kubevirt-bot · 2025-12-30T12:43:24Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci · 2025-12-30T12:43:24Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

kubevirt-bot · 2025-12-30T12:43:26Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign nunnatsa for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

dankenigsberg · 2025-12-30T12:45:09Z

/test all

api/v1beta1/hyperconverged_types.go

pkg/internal/nodeinfo/cpu_models.go

pkg/internal/nodeinfo/cpu_models_test.go

pkg/internal/nodeinfo/cpu_models.go

deploy/olm-catalog/community-kubevirt-hyperconverged/1.16.0/manifests/hco00.crd.yaml

nunnatsa

Thanks for fixing the comment, Added some more inline.

Please also handle the linter issues (redefinition of the builtin function max).

pkg/internal/nodeinfo/cpu_models.go

api/v1beta1/hyperconverged_types.go

coveralls · 2026-01-02T13:50:36Z

Pull Request Test Coverage Report for Build 20880735694

Details

122 of 134 (91.04%) changed or added relevant lines in 3 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.2%) to 76.215%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
controllers/hyperconverged/hyperconverged_controller.go	1	4	25.0%
pkg/internal/nodeinfo/cpu_models.go	119	128	92.97%

Totals
Change from base Build 20851170488:	0.2%
Covered Lines:	8623
Relevant Lines:	11314

💛 - Coveralls

sourcery-ai

Hey - I've found 4 issues, and left some high level feedback:

The cpu_models_test.go description and expectations talk about “weak node filtering” and excluding models that appear only on a weak node (e.g. Penryn), but processCpuModels only sorts and truncates by score without any explicit filtering, so either the heuristic or the test/comments should be aligned to make the intended behavior clear and actually enforced.
In the CRD schema CpuModelInfo.cpu and memory are marked as required, but the Go struct uses *resource.Quantity and the code paths in processCpuModels can leave them nil if capacity is missing, which can violate the schema; consider either making these fields non-pointer in the API or relaxing the CRD required list and guarding against nil more explicitly.
You now have two separate equality implementations for CpuModelInfo (one in recommendedModelsEqual and one inline via slices.EqualFunc in updateStatus); it would be more robust to centralize this comparison logic in a shared helper to avoid future drift between the cache and status update behaviors.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The `cpu_models_test.go` description and expectations talk about “weak node filtering” and excluding models that appear only on a weak node (e.g. Penryn), but `processCpuModels` only sorts and truncates by score without any explicit filtering, so either the heuristic or the test/comments should be aligned to make the intended behavior clear and actually enforced.
- In the CRD schema `CpuModelInfo.cpu` and `memory` are marked as required, but the Go struct uses `*resource.Quantity` and the code paths in `processCpuModels` can leave them nil if capacity is missing, which can violate the schema; consider either making these fields non-pointer in the API or relaxing the CRD `required` list and guarding against nil more explicitly.
- You now have two separate equality implementations for `CpuModelInfo` (one in `recommendedModelsEqual` and one inline via `slices.EqualFunc` in `updateStatus`); it would be more robust to centralize this comparison logic in a shared helper to avoid future drift between the cache and status update behaviors.

## Individual Comments

### Comment 1
<location> `pkg/internal/nodeinfo/cpu_models.go:26-27` </location>
<code_context>
+	nodeWeight      = 0.15
+)
+
+// cpuModelPassMarkScores maps libvirt CPU model names to approximate PassMark scores.
+// Keys must match exactly the cpu-model.node.kubevirt.io/* label values.
+// The precise values do not matter much, this is intended only as a rough heuristic.
+var cpuModelPassMarkScores = map[string]int{
</code_context>

<issue_to_address>
**nitpick:** The comment about cpuModelPassMarkScores referring to label "values" is misleading given extractCpuModels uses the label key suffix.

`extractCpuModels` derives the model name from the suffix of the label *key* (after `CpuModelLabelPrefix`) and only checks that the label value is "true". So this map’s keys must match `strings.TrimPrefix(labelKey, CpuModelLabelPrefix)`, not the label value. Please update the comment to reflect that to avoid confusion.
</issue_to_address>

### Comment 2
<location> `pkg/internal/nodeinfo/cpu_models_test.go:120-126` </location>
<code_context>
+			Expect(modelNames).ToNot(ContainElement("Penryn"))
+		})
+
+		It("should correctly aggregate CPU across nodes", func() {
+			Expect(modelMap).To(HaveKey("Skylake"))
+			Expect(modelMap["Skylake"].CPU.String()).To(Equal("16"))
+
+			Expect(modelMap).To(HaveKey("Haswell"))
+			Expect(modelMap["Haswell"].CPU.String()).To(Equal("24")) // 16 + 8
+
+			Expect(modelMap).To(HaveKey("Broadwell"))
+			Expect(modelMap["Broadwell"].CPU.String()).To(Equal("8"))
+		})
+
</code_context>

<issue_to_address>
**suggestion (testing):** Add coverage for the Benchmark field and the mapping from CPU model name to PassMark score

Current tests cover CPU aggregation but not that `CpuModelInfo.Benchmark` is derived correctly from `cpuModelPassMarkScores`. Please add a test that creates nodes with a few known models (e.g., `Haswell`, `Skylake`, `EPYC`), runs the handler, and asserts that the resulting `Benchmark` values match the expected scores from the map. This will help catch regressions in the score table and mapping logic that aggregation tests alone won’t detect.

```suggestion
		It("should exclude models that appear only once on a weak node", func() {
			modelNames := make([]string, len(recommendedCpuModels))
			for i, model := range recommendedCpuModels {
				modelNames[i] = model.Name
			}
			Expect(modelNames).ToNot(ContainElement("Penryn"))
		})

		It("should correctly aggregate CPU across nodes", func() {
			Expect(modelMap).To(HaveKey("Skylake"))
			Expect(modelMap["Skylake"].CPU.String()).To(Equal("16"))

			Expect(modelMap).To(HaveKey("Haswell"))
			Expect(modelMap["Haswell"].CPU.String()).To(Equal("24")) // 16 + 8

			Expect(modelMap).To(HaveKey("Broadwell"))
			Expect(modelMap["Broadwell"].CPU.String()).To(Equal("8"))
		})

		It("should set Benchmark from PassMark scores", func() {
			Expect(modelMap).To(HaveKey("Skylake"))
			Expect(modelMap["Skylake"].Benchmark).To(Equal(cpuModelPassMarkScores["Skylake"]))

			Expect(modelMap).To(HaveKey("Haswell"))
			Expect(modelMap["Haswell"].Benchmark).To(Equal(cpuModelPassMarkScores["Haswell"]))

			Expect(modelMap).To(HaveKey("Broadwell"))
			Expect(modelMap["Broadwell"].Benchmark).To(Equal(cpuModelPassMarkScores["Broadwell"]))
		})
```
</issue_to_address>

### Comment 3
<location> `pkg/internal/nodeinfo/cpu_models_test.go:150-158` </location>
<code_context>
+			Expect(modelMap["Broadwell"].Memory.String()).To(Equal("32Gi"))
+		})
+
+		It("should correctly count nodes for each model", func() {
+			Expect(modelMap).To(HaveKey("Skylake"))
+			Expect(modelMap["Skylake"].Nodes).To(Equal(1))
+
+			Expect(modelMap).To(HaveKey("Haswell"))
+			Expect(modelMap["Haswell"].Nodes).To(Equal(2))
+
+			Expect(modelMap).To(HaveKey("Broadwell"))
+			Expect(modelMap["Broadwell"].Nodes).To(Equal(1))
+		})
+	})
</code_context>

<issue_to_address>
**suggestion (testing):** Add tests for recommendation ordering and the top-N truncation behavior

This scenario only checks node counts, not the relative ordering or the top-4 truncation described for this feature. Please add tests that:

- Use a cluster with >4 distinct CPU models and assert that only four `CpuModelInfo` entries are returned.
- Assert that the returned models are in the expected order based on benchmark, node count, and resource distribution.

That way the tests cover the core recommendation behavior rather than just the aggregation.

Suggested implementation:

```golang
			Expect(modelMap).To(HaveKey("Skylake"))
			Expect(modelMap["Skylake"].Memory.String()).To(Equal("64Gi"))

			Expect(modelMap).To(HaveKey("Haswell"))
			Expect(modelMap["Haswell"].Memory.String()).To(Equal("96Gi")) // 64Gi + 32Gi

			Expect(modelMap).To(HaveKey("Broadwell"))
			Expect(modelMap["Broadwell"].Memory.String()).To(Equal("32Gi"))
		})

		It("should correctly count nodes for each model", func() {
			Expect(modelMap).To(HaveKey("Skylake"))
			Expect(modelMap["Skylake"].Nodes).To(Equal(1))

			Expect(modelMap).To(HaveKey("Haswell"))
			Expect(modelMap["Haswell"].Nodes).To(Equal(2))

			Expect(modelMap).To(HaveKey("Broadwell"))
			Expect(modelMap["Broadwell"].Nodes).To(Equal(1))
		})

		Context("recommendation ordering and top-N truncation", func() {
			It("should only return the top 4 CPU models", func() {
				// cpuModels is expected to be the ordered recommendation slice under test.
				// It should contain more than 4 models before truncation; the implementation
				// should truncate it to 4.
				Expect(cpuModels).To(HaveLen(4))
			})

			It("should order models by benchmark, node count, and resource distribution", func() {
				// Verify that the models are in the expected order. The exact expected model
				// names here should match the test cluster setup used to build cpuModels.
				// For example, given a cluster where:
				//   - "Skylake" has the best benchmark score
				//   - "Broadwell" is next
				//   - "Haswell" and a fourth model follow
				// we can assert an explicit order.
				Expect(cpuModels[0].Model).To(Equal("Skylake"))
				Expect(cpuModels[1].Model).To(Equal("Broadwell"))
				Expect(cpuModels[2].Model).To(Equal("Haswell"))
				// The fourth entry should be the next-best model according to the scoring logic.
				Expect(cpuModels[3].Model).ToNot(BeEmpty())

				// Additionally, validate that the ordering is consistent with the scoring
				// dimensions: benchmark score first, then node count, then resource distribution.
				// These expectations assume that higher Nodes/Memory correspond to better
				// recommendations when benchmark scores are equal.
				Expect(cpuModels[0].Nodes).To(BeNumerically(">=", cpuModels[1].Nodes))
				Expect(cpuModels[1].Nodes).To(BeNumerically(">=", cpuModels[2].Nodes))

				Expect(cpuModels[0].Memory.Value()).To(BeNumerically(">=", cpuModels[1].Memory.Value()))
				Expect(cpuModels[1].Memory.Value()).To(BeNumerically(">=", cpuModels[2].Memory.Value()))
			})
		})
	})
})

```

To make these tests compile and correctly cover the behavior, you will need to:

1. Ensure that `cpuModels` is the ordered slice of `CpuModelInfo` returned by the recommendation function and that it is in scope for this `Describe`/`Context`. If the slice is currently named something else (e.g. `models` or `recommendations`), rename the references in the new tests accordingly.
2. Make sure the test fixture used to build `cpuModels` includes more than 4 distinct CPU models so that the top-4 truncation behavior is exercised. At least one additional model beyond `"Skylake"`, `"Haswell"`, and `"Broadwell"` should be present.
3. Adjust the explicit expected ordering (`"Skylake"`, `"Broadwell"`, `"Haswell"`, and the fourth model) so that it matches the actual scoring logic:
   - If you have an explicit benchmark/score field on `CpuModelInfo`, you may want to assert on that field directly rather than inferring via `Nodes`/`Memory`.
   - Update the per-field expectations (`Nodes`, `Memory`) so they align with how your recommendation algorithm uses benchmark, node count, and resource distribution to rank models.
4. If the scoring logic uses additional fields (e.g. CPU quantity or benchmark scores), add appropriate assertions mirroring the real ordering criteria so the tests fail when the ordering logic regresses.
</issue_to_address>

### Comment 4
<location> `pkg/internal/nodeinfo/cpu_models_test.go:30-39` </location>
<code_context>
+		BeforeEach(func(ctx context.Context) {
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding a smaller, more focused unit test for GetRecommendedCpuModels and the cache behavior

Right now the cache behavior is only covered indirectly via `HandleNodeChanges` and the fake client. A dedicated test could populate the cache once, call `GetRecommendedCpuModels` twice, mutate the slice from the first call, and then assert the second call is unchanged. That would explicitly verify that the function returns a copy rather than exposing internal state.

Suggested implementation:

```golang
		var modelMap map[string]v1beta1.CpuModelInfo

```

You’ll want to add a new context and spec similar to the following in `pkg/internal/nodeinfo/cpu_models_test.go`. Place it near your other `Context` blocks (for example, after the existing "3-node cluster with resource aggregation and weak node filtering" context), and adjust helper usage to match your test setup:

```go
	Context("GetRecommendedCpuModels cache behavior", func() {
		var (
			scheme                *runtime.Scheme
			nodes                 []client.Object
			cachePopulated        bool
			initialRecommended    []v1beta1.CpuModelInfo
			secondRecommended     []v1beta1.CpuModelInfo
			fakeClient            client.Client
			recommendationWatcher *CpuModelRecommender // or whatever type exposes HandleNodeChanges/GetRecommendedCpuModels
		)

		BeforeEach(func(ctx context.Context) {
			scheme = runtime.NewScheme()
			Expect(corev1.AddToScheme(scheme)).To(Succeed())
			Expect(v1beta1.AddToScheme(scheme)).To(Succeed())

			// Use a minimal set of nodes sufficient to generate at least one recommended model.
			nodes = []client.Object{
				&corev1.Node{
					ObjectMeta: metav1.ObjectMeta{
						Name: "node-1",
						Labels: map[string]string{
							nodeinfo.LabelNodeRoleWorker: "",
						},
					},
					Status: corev1.NodeStatus{
						Capacity: corev1.ResourceList{
							corev1.ResourceCPU:    resource.MustParse("4"),
							corev1.ResourceMemory: resource.MustParse("8Gi"),
						},
					},
				},
			}

			// Create fake client with the nodes
			fakeClient = fake.NewClientBuilder().
				WithScheme(scheme).
				WithObjects(nodes...).
				Build()

			// Instantiate whatever component owns the cache and GetRecommendedCpuModels.
			// This may need to be adjusted to match your code.
			recommendationWatcher = NewCpuModelRecommender(fakeClient, scheme)
		})

		JustBeforeEach(func(ctx context.Context) {
			// Populate the cache once, e.g. by handling node changes.
			// Adjust to the actual method you use to build the cache.
			Expect(recommendationWatcher.HandleNodeChanges(ctx)).To(Succeed())
			cachePopulated = true

			// First call: this should read from the cache and return a slice.
			initialRecommended = recommendationWatcher.GetRecommendedCpuModels(ctx)
			Expect(initialRecommended).ToNot(BeEmpty())

			// Second call: should also read from the cache and return a fresh copy.
			secondRecommended = recommendationWatcher.GetRecommendedCpuModels(ctx)
			Expect(secondRecommended).ToNot(BeEmpty())
		})

		It("returns a defensive copy of cached CPU models", func(ctx context.Context) {
			Expect(cachePopulated).To(BeTrue(), "cache should be populated in JustBeforeEach")

			// Sanity: both slices have the same length and content initially.
			Expect(secondRecommended).To(HaveLen(len(initialRecommended)))
			for i := range initialRecommended {
				Expect(secondRecommended[i]).To(Equal(initialRecommended[i]))
			}

			// Mutate the slice returned by the first call.
			initialRecommended[0].Model = "mutated-model-id"
			initialRecommended[0].BaselineModel = "mutated-baseline"

			// Verify the second slice is unaffected (defensive copy semantics).
			Expect(secondRecommended[0].Model).ToNot(Equal("mutated-model-id"))
			Expect(secondRecommended[0].BaselineModel).ToNot(Equal("mutated-baseline"))

			// Optional: call GetRecommendedCpuModels a third time and
			// assert it still matches the unmodified cache state.
			third := recommendationWatcher.GetRecommendedCpuModels(ctx)
			Expect(third[0].Model).To(Equal(secondRecommended[0].Model))
			Expect(third[0].BaselineModel).To(Equal(secondRecommended[0].BaselineModel))
		})
	})
```

You will likely need to adapt the following to your actual code:

1. **Constructor / type names**:
   - Replace `CpuModelRecommender` and `NewCpuModelRecommender` with the real type and constructor that:
     - Owns the CPU model cache.
     - Exposes `HandleNodeChanges` (or equivalent) to populate that cache.
     - Exposes `GetRecommendedCpuModels`.

2. **Client and scheme wiring**:
   - If the existing tests already have helpers for building a fake client or the recommender/controller, reuse those instead of the inline `fake.NewClientBuilder` shown above.
   - If `GetRecommendedCpuModels` is a package-level function instead of a method, call it directly, and adjust the cache-initialization call to whatever is appropriate in your design.

3. **Node setup**:
   - If your recommendation logic depends on particular CPU model labels or extended resources, mirror the minimal label set used in your existing 3-node cluster tests to ensure at least one CPU model appears in the recommendations.

4. **Ginkgo version**:
   - The example uses `func(ctx context.Context)` signatures for `BeforeEach`, `JustBeforeEach`, and `It`, aligned with Ginkgo v2 and your snippet. Ensure the imports and Ginkgo configuration in the file already support `ctx`; otherwise, drop the `ctx` argument and the `context.Context` parameters.

By adding this context and spec, you’ll explicitly validate that `GetRecommendedCpuModels` returns a defensive copy and does not expose the internal cache slice to callers.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

hco-bot · 2026-01-04T17:39:28Z

hco-e2e-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-sno-azure

kubevirt-bot · 2026-01-04T17:39:32Z

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-operator-sdk-sno-azure

Details

In response to this:

hco-e2e-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-sno-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

hco-bot · 2026-01-04T17:47:29Z

hco-e2e-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-azure
hco-e2e-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-gcp

kubevirt-bot · 2026-01-04T17:47:33Z

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-operator-sdk-azure, ci/prow/hco-e2e-operator-sdk-gcp

Details

In response to this:

hco-e2e-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-azure
hco-e2e-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-gcp

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

hco-bot · 2026-01-04T17:52:06Z

hco-e2e-upgrade-prev-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure
hco-e2e-upgrade-prev-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure
hco-e2e-upgrade-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-operator-sdk-azure

kubevirt-bot · 2026-01-04T17:52:12Z

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-upgrade-operator-sdk-azure, ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure, ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure

Details

In response to this:

hco-e2e-upgrade-prev-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure
hco-e2e-upgrade-prev-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure
hco-e2e-upgrade-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-operator-sdk-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

hco-bot · 2026-01-04T18:07:05Z

hco-e2e-consecutive-operator-sdk-upgrades-aws lane succeeded.
/override ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure

kubevirt-bot · 2026-01-04T18:07:09Z

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure

Details

In response to this:

hco-e2e-consecutive-operator-sdk-upgrades-aws lane succeeded.
/override ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

hco-bot · 2026-01-04T18:55:12Z

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

kubevirt-bot · 2026-01-04T18:55:16Z

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-kv-smoke-azure

Details

In response to this:

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

nunnatsa

Thanks @dankenigsberg

Added some more comments.

pkg/internal/nodeinfo/cpu_models.go

nunnatsa · 2026-01-05T15:00:10Z

controllers/hyperconverged/hyperconverged_controller.go

+
+	if cpuModels := nodeinfo.GetRecommendedCpuModels(); !slices.EqualFunc(req.Instance.Status.NodeInfo.RecommendedCpuModels, cpuModels, func(a, b hcov1beta1.CpuModelInfo) bool {
+		return a.Name == b.Name && a.Benchmark == b.Benchmark && a.Nodes == b.Nodes &&
+			((a.CPU == nil && b.CPU == nil) || (a.CPU != nil && b.CPU != nil && a.CPU.Equal(*b.CPU))) &&
+			((a.Memory == nil && b.Memory == nil) || (a.Memory != nil && b.Memory != nil && a.Memory.Equal(*b.Memory)))
+	}) {
+		req.Instance.Status.NodeInfo.RecommendedCpuModels = cpuModels
+		req.StatusDirty = true
+	}


we already have a comparison function in the nodeinfo package, we can't expose it for reuse, because it's in an internal package, but maybe we can add it as the CpuModel type's method, instead?

this code is not tested. Can we have a few unit tests for it? We can use a simple mock, as done for the rest of the functions in the nodeinfo package, to return any desired value.

In heterogeneous clusters with multiple CPU vendors and generations, choosing the best CPU model for VMs is challenging: - Using a newer CPU model limits VM placement to a subset of nodes - Using an older model impacts guest performance unnecessarily This change adds automatic CPU model recommendations to the HyperConverged status. The recommendations are sorted by a weighted score that balances: - PassMark performance score (50%) - for guest performance - Available CPU cores (20%) - for workload capacity - Available memory (15%) - for workload capacity - Node count (15%) - for availability and redundancy The top recommendations appear in status.nodeInfo.recommendedCpuModels, helping cluster administrators make informed decisions when setting spec.defaultCPUModel. The CPU model data is gathered from node labels (cpu-model.node.kubevirt.io/*) set by KubeVirt's virt-handler, and is automatically refreshed when nodes change or hourly. Assisted-by: claude-4-sonnet Signed-off-by: Dan Kenigsberg <[email protected]>

sonarqubecloud · 2026-01-10T15:54:18Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

hco-bot · 2026-01-10T17:30:27Z

hco-e2e-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-gcp
hco-e2e-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-azure
hco-e2e-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-sno-azure

kubevirt-bot · 2026-01-10T17:30:32Z

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-operator-sdk-azure, ci/prow/hco-e2e-operator-sdk-gcp, ci/prow/hco-e2e-operator-sdk-sno-azure

Details

In response to this:

hco-e2e-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-gcp
hco-e2e-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-azure
hco-e2e-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-sno-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

hco-bot · 2026-01-10T17:32:42Z

hco-e2e-upgrade-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-operator-sdk-azure

kubevirt-bot · 2026-01-10T17:32:47Z

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-upgrade-operator-sdk-azure

Details

In response to this:

hco-e2e-upgrade-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-operator-sdk-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

hco-bot · 2026-01-10T17:44:32Z

hco-e2e-consecutive-operator-sdk-upgrades-aws lane succeeded.
/override ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure
hco-e2e-upgrade-prev-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure
hco-e2e-upgrade-prev-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure

kubevirt-bot · 2026-01-10T17:44:38Z

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure, ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure, ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure

Details

In response to this:

hco-e2e-consecutive-operator-sdk-upgrades-aws lane succeeded.
/override ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure
hco-e2e-upgrade-prev-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure
hco-e2e-upgrade-prev-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci · 2026-01-10T17:45:24Z

@dankenigsberg: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure	`f435588`	link	false	`/test hco-e2e-upgrade-prev-operator-sdk-sno-azure`
ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure	`f435588`	link	true	`/test hco-e2e-upgrade-prev-operator-sdk-azure`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

hco-bot · 2026-01-10T17:52:06Z

hco-e2e-upgrade-prev-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure

kubevirt-bot · 2026-01-10T17:52:10Z

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure

Details

In response to this:

hco-e2e-upgrade-prev-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

hco-bot · 2026-01-10T18:46:14Z

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

kubevirt-bot · 2026-01-10T18:46:18Z

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-kv-smoke-azure

Details

In response to this:

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Barakmor1 · 2026-01-22T12:51:51Z

Thank you for this important effort. Have you considered clusters with multiple vendors or architectures? Would this recommendation still be valid in those cases?

dankenigsberg · 2026-01-22T14:41:44Z

Thank you for this important effort. Have you considered clusters with multiple vendors or architectures? Would this recommendation still be valid in those cases?

Yes: if most of the CPU power of the cluster come from an "exotic" architecture, the code would recommend models from that architecture.

Barakmor1 · 2026-01-22T15:47:26Z

pkg/internal/nodeinfo/cpu_models.go

+)
+
+// cpuModelPassMarkScores maps libvirt CPU model names to approximate PassMark scores.
+// Keys must match exactly the cpu-model.node.kubevirt.io/* label values.


Just a heads-up that the keys in this list tend to change with different libvirt versions, so this would require continued maintenance.

kubevirt-bot added the size/XL label Dec 30, 2025

dankenigsberg force-pushed the recommended-cpu branch from a18fa9d to 4423d64 Compare December 30, 2025 13:06

nunnatsa reviewed Dec 30, 2025

View reviewed changes

api/v1beta1/hyperconverged_types.go Show resolved Hide resolved

nunnatsa reviewed Dec 30, 2025

View reviewed changes

api/v1beta1/hyperconverged_types.go Outdated Show resolved Hide resolved

nunnatsa reviewed Dec 30, 2025

View reviewed changes

api/v1beta1/hyperconverged_types.go Show resolved Hide resolved

nunnatsa reviewed Dec 30, 2025

View reviewed changes

api/v1beta1/hyperconverged_types.go Outdated Show resolved Hide resolved

nunnatsa reviewed Dec 30, 2025

View reviewed changes

api/v1beta1/hyperconverged_types.go Outdated Show resolved Hide resolved