Skip to content

Conversation

@dankenigsberg
Copy link
Member

In heterogeneous clusters with multiple CPU vendors and generations, choosing the best CPU model for VMs is challenging:

  • Using a newer CPU model limits VM placement to a subset of nodes
  • Using an older model impacts guest performance unnecessarily

This change adds automatic CPU model recommendations to the HyperConverged status. The recommendations are sorted by a weighted score that balances:

  • PassMark performance score (50%) - for guest performance
  • Available CPU cores (20%) - for workload capacity
  • Available memory (15%) - for workload capacity
  • Node count (15%) - for availability and redundancy

The top recommendations appear in status.nodeInfo.recommendedCpuModels, helping cluster administrators make informed decisions when setting spec.defaultCPUModel.

The CPU model data is gathered from node labels (cpu-model.node.kubevirt.io/*) set by KubeVirt's virt-handler, and is automatically refreshed when nodes change or hourly.

Assisted-by: claude-4-sonnet

Reviewer Checklist

Reviewers are supposed to review the PR for every aspect below one by one. To check an item means the PR is either "OK" or "Not Applicable" in terms of that item. All items are supposed to be checked before merging a PR.

  • PR Message
  • Commit Messages
  • How to test
  • Unit Tests
  • Functional Tests
  • User Documentation
  • Developer Documentation
  • Upgrade Scenario
  • Uninstallation Scenario
  • Backward Compatibility
  • Troubleshooting Friendly

Jira Ticket:

CNV-42906

Release note:

HCO CR status recommends cpu models for VMs

@kubevirt-bot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

1 similar comment
@openshift-ci
Copy link

openshift-ci bot commented Dec 30, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@kubevirt-bot kubevirt-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Dec 30, 2025
@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign nunnatsa for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dankenigsberg
Copy link
Member Author

/test all

Copy link
Collaborator

@nunnatsa nunnatsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing the comment, Added some more inline.

Please also handle the linter issues (redefinition of the builtin function max).

@coveralls
Copy link
Collaborator

coveralls commented Jan 2, 2026

Pull Request Test Coverage Report for Build 20880735694

Details

  • 122 of 134 (91.04%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.2%) to 76.215%

Changes Missing Coverage Covered Lines Changed/Added Lines %
controllers/hyperconverged/hyperconverged_controller.go 1 4 25.0%
pkg/internal/nodeinfo/cpu_models.go 119 128 92.97%
Totals Coverage Status
Change from base Build 20851170488: 0.2%
Covered Lines: 8623
Relevant Lines: 11314

💛 - Coveralls

@dankenigsberg dankenigsberg marked this pull request as ready for review January 4, 2026 16:02
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 4, 2026
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 4 issues, and left some high level feedback:

  • The cpu_models_test.go description and expectations talk about “weak node filtering” and excluding models that appear only on a weak node (e.g. Penryn), but processCpuModels only sorts and truncates by score without any explicit filtering, so either the heuristic or the test/comments should be aligned to make the intended behavior clear and actually enforced.
  • In the CRD schema CpuModelInfo.cpu and memory are marked as required, but the Go struct uses *resource.Quantity and the code paths in processCpuModels can leave them nil if capacity is missing, which can violate the schema; consider either making these fields non-pointer in the API or relaxing the CRD required list and guarding against nil more explicitly.
  • You now have two separate equality implementations for CpuModelInfo (one in recommendedModelsEqual and one inline via slices.EqualFunc in updateStatus); it would be more robust to centralize this comparison logic in a shared helper to avoid future drift between the cache and status update behaviors.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `cpu_models_test.go` description and expectations talk about “weak node filtering” and excluding models that appear only on a weak node (e.g. Penryn), but `processCpuModels` only sorts and truncates by score without any explicit filtering, so either the heuristic or the test/comments should be aligned to make the intended behavior clear and actually enforced.
- In the CRD schema `CpuModelInfo.cpu` and `memory` are marked as required, but the Go struct uses `*resource.Quantity` and the code paths in `processCpuModels` can leave them nil if capacity is missing, which can violate the schema; consider either making these fields non-pointer in the API or relaxing the CRD `required` list and guarding against nil more explicitly.
- You now have two separate equality implementations for `CpuModelInfo` (one in `recommendedModelsEqual` and one inline via `slices.EqualFunc` in `updateStatus`); it would be more robust to centralize this comparison logic in a shared helper to avoid future drift between the cache and status update behaviors.

## Individual Comments

### Comment 1
<location> `pkg/internal/nodeinfo/cpu_models.go:26-27` </location>
<code_context>
+	nodeWeight      = 0.15
+)
+
+// cpuModelPassMarkScores maps libvirt CPU model names to approximate PassMark scores.
+// Keys must match exactly the cpu-model.node.kubevirt.io/* label values.
+// The precise values do not matter much, this is intended only as a rough heuristic.
+var cpuModelPassMarkScores = map[string]int{
</code_context>

<issue_to_address>
**nitpick:** The comment about cpuModelPassMarkScores referring to label "values" is misleading given extractCpuModels uses the label key suffix.

`extractCpuModels` derives the model name from the suffix of the label *key* (after `CpuModelLabelPrefix`) and only checks that the label value is "true". So this map’s keys must match `strings.TrimPrefix(labelKey, CpuModelLabelPrefix)`, not the label value. Please update the comment to reflect that to avoid confusion.
</issue_to_address>

### Comment 2
<location> `pkg/internal/nodeinfo/cpu_models_test.go:120-126` </location>
<code_context>
+			Expect(modelNames).ToNot(ContainElement("Penryn"))
+		})
+
+		It("should correctly aggregate CPU across nodes", func() {
+			Expect(modelMap).To(HaveKey("Skylake"))
+			Expect(modelMap["Skylake"].CPU.String()).To(Equal("16"))
+
+			Expect(modelMap).To(HaveKey("Haswell"))
+			Expect(modelMap["Haswell"].CPU.String()).To(Equal("24")) // 16 + 8
+
+			Expect(modelMap).To(HaveKey("Broadwell"))
+			Expect(modelMap["Broadwell"].CPU.String()).To(Equal("8"))
+		})
+
</code_context>

<issue_to_address>
**suggestion (testing):** Add coverage for the Benchmark field and the mapping from CPU model name to PassMark score

Current tests cover CPU aggregation but not that `CpuModelInfo.Benchmark` is derived correctly from `cpuModelPassMarkScores`. Please add a test that creates nodes with a few known models (e.g., `Haswell`, `Skylake`, `EPYC`), runs the handler, and asserts that the resulting `Benchmark` values match the expected scores from the map. This will help catch regressions in the score table and mapping logic that aggregation tests alone won’t detect.

```suggestion
		It("should exclude models that appear only once on a weak node", func() {
			modelNames := make([]string, len(recommendedCpuModels))
			for i, model := range recommendedCpuModels {
				modelNames[i] = model.Name
			}
			Expect(modelNames).ToNot(ContainElement("Penryn"))
		})

		It("should correctly aggregate CPU across nodes", func() {
			Expect(modelMap).To(HaveKey("Skylake"))
			Expect(modelMap["Skylake"].CPU.String()).To(Equal("16"))

			Expect(modelMap).To(HaveKey("Haswell"))
			Expect(modelMap["Haswell"].CPU.String()).To(Equal("24")) // 16 + 8

			Expect(modelMap).To(HaveKey("Broadwell"))
			Expect(modelMap["Broadwell"].CPU.String()).To(Equal("8"))
		})

		It("should set Benchmark from PassMark scores", func() {
			Expect(modelMap).To(HaveKey("Skylake"))
			Expect(modelMap["Skylake"].Benchmark).To(Equal(cpuModelPassMarkScores["Skylake"]))

			Expect(modelMap).To(HaveKey("Haswell"))
			Expect(modelMap["Haswell"].Benchmark).To(Equal(cpuModelPassMarkScores["Haswell"]))

			Expect(modelMap).To(HaveKey("Broadwell"))
			Expect(modelMap["Broadwell"].Benchmark).To(Equal(cpuModelPassMarkScores["Broadwell"]))
		})
```
</issue_to_address>

### Comment 3
<location> `pkg/internal/nodeinfo/cpu_models_test.go:150-158` </location>
<code_context>
+			Expect(modelMap["Broadwell"].Memory.String()).To(Equal("32Gi"))
+		})
+
+		It("should correctly count nodes for each model", func() {
+			Expect(modelMap).To(HaveKey("Skylake"))
+			Expect(modelMap["Skylake"].Nodes).To(Equal(1))
+
+			Expect(modelMap).To(HaveKey("Haswell"))
+			Expect(modelMap["Haswell"].Nodes).To(Equal(2))
+
+			Expect(modelMap).To(HaveKey("Broadwell"))
+			Expect(modelMap["Broadwell"].Nodes).To(Equal(1))
+		})
+	})
</code_context>

<issue_to_address>
**suggestion (testing):** Add tests for recommendation ordering and the top-N truncation behavior

This scenario only checks node counts, not the relative ordering or the top-4 truncation described for this feature. Please add tests that:

- Use a cluster with >4 distinct CPU models and assert that only four `CpuModelInfo` entries are returned.
- Assert that the returned models are in the expected order based on benchmark, node count, and resource distribution.

That way the tests cover the core recommendation behavior rather than just the aggregation.

Suggested implementation:

```golang
			Expect(modelMap).To(HaveKey("Skylake"))
			Expect(modelMap["Skylake"].Memory.String()).To(Equal("64Gi"))

			Expect(modelMap).To(HaveKey("Haswell"))
			Expect(modelMap["Haswell"].Memory.String()).To(Equal("96Gi")) // 64Gi + 32Gi

			Expect(modelMap).To(HaveKey("Broadwell"))
			Expect(modelMap["Broadwell"].Memory.String()).To(Equal("32Gi"))
		})

		It("should correctly count nodes for each model", func() {
			Expect(modelMap).To(HaveKey("Skylake"))
			Expect(modelMap["Skylake"].Nodes).To(Equal(1))

			Expect(modelMap).To(HaveKey("Haswell"))
			Expect(modelMap["Haswell"].Nodes).To(Equal(2))

			Expect(modelMap).To(HaveKey("Broadwell"))
			Expect(modelMap["Broadwell"].Nodes).To(Equal(1))
		})

		Context("recommendation ordering and top-N truncation", func() {
			It("should only return the top 4 CPU models", func() {
				// cpuModels is expected to be the ordered recommendation slice under test.
				// It should contain more than 4 models before truncation; the implementation
				// should truncate it to 4.
				Expect(cpuModels).To(HaveLen(4))
			})

			It("should order models by benchmark, node count, and resource distribution", func() {
				// Verify that the models are in the expected order. The exact expected model
				// names here should match the test cluster setup used to build cpuModels.
				// For example, given a cluster where:
				//   - "Skylake" has the best benchmark score
				//   - "Broadwell" is next
				//   - "Haswell" and a fourth model follow
				// we can assert an explicit order.
				Expect(cpuModels[0].Model).To(Equal("Skylake"))
				Expect(cpuModels[1].Model).To(Equal("Broadwell"))
				Expect(cpuModels[2].Model).To(Equal("Haswell"))
				// The fourth entry should be the next-best model according to the scoring logic.
				Expect(cpuModels[3].Model).ToNot(BeEmpty())

				// Additionally, validate that the ordering is consistent with the scoring
				// dimensions: benchmark score first, then node count, then resource distribution.
				// These expectations assume that higher Nodes/Memory correspond to better
				// recommendations when benchmark scores are equal.
				Expect(cpuModels[0].Nodes).To(BeNumerically(">=", cpuModels[1].Nodes))
				Expect(cpuModels[1].Nodes).To(BeNumerically(">=", cpuModels[2].Nodes))

				Expect(cpuModels[0].Memory.Value()).To(BeNumerically(">=", cpuModels[1].Memory.Value()))
				Expect(cpuModels[1].Memory.Value()).To(BeNumerically(">=", cpuModels[2].Memory.Value()))
			})
		})
	})
})

```

To make these tests compile and correctly cover the behavior, you will need to:

1. Ensure that `cpuModels` is the ordered slice of `CpuModelInfo` returned by the recommendation function and that it is in scope for this `Describe`/`Context`. If the slice is currently named something else (e.g. `models` or `recommendations`), rename the references in the new tests accordingly.
2. Make sure the test fixture used to build `cpuModels` includes more than 4 distinct CPU models so that the top-4 truncation behavior is exercised. At least one additional model beyond `"Skylake"`, `"Haswell"`, and `"Broadwell"` should be present.
3. Adjust the explicit expected ordering (`"Skylake"`, `"Broadwell"`, `"Haswell"`, and the fourth model) so that it matches the actual scoring logic:
   - If you have an explicit benchmark/score field on `CpuModelInfo`, you may want to assert on that field directly rather than inferring via `Nodes`/`Memory`.
   - Update the per-field expectations (`Nodes`, `Memory`) so they align with how your recommendation algorithm uses benchmark, node count, and resource distribution to rank models.
4. If the scoring logic uses additional fields (e.g. CPU quantity or benchmark scores), add appropriate assertions mirroring the real ordering criteria so the tests fail when the ordering logic regresses.
</issue_to_address>

### Comment 4
<location> `pkg/internal/nodeinfo/cpu_models_test.go:30-39` </location>
<code_context>
+		BeforeEach(func(ctx context.Context) {
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding a smaller, more focused unit test for GetRecommendedCpuModels and the cache behavior

Right now the cache behavior is only covered indirectly via `HandleNodeChanges` and the fake client. A dedicated test could populate the cache once, call `GetRecommendedCpuModels` twice, mutate the slice from the first call, and then assert the second call is unchanged. That would explicitly verify that the function returns a copy rather than exposing internal state.

Suggested implementation:

```golang
		var modelMap map[string]v1beta1.CpuModelInfo

```

You’ll want to add a new context and spec similar to the following in `pkg/internal/nodeinfo/cpu_models_test.go`. Place it near your other `Context` blocks (for example, after the existing "3-node cluster with resource aggregation and weak node filtering" context), and adjust helper usage to match your test setup:

```go
	Context("GetRecommendedCpuModels cache behavior", func() {
		var (
			scheme                *runtime.Scheme
			nodes                 []client.Object
			cachePopulated        bool
			initialRecommended    []v1beta1.CpuModelInfo
			secondRecommended     []v1beta1.CpuModelInfo
			fakeClient            client.Client
			recommendationWatcher *CpuModelRecommender // or whatever type exposes HandleNodeChanges/GetRecommendedCpuModels
		)

		BeforeEach(func(ctx context.Context) {
			scheme = runtime.NewScheme()
			Expect(corev1.AddToScheme(scheme)).To(Succeed())
			Expect(v1beta1.AddToScheme(scheme)).To(Succeed())

			// Use a minimal set of nodes sufficient to generate at least one recommended model.
			nodes = []client.Object{
				&corev1.Node{
					ObjectMeta: metav1.ObjectMeta{
						Name: "node-1",
						Labels: map[string]string{
							nodeinfo.LabelNodeRoleWorker: "",
						},
					},
					Status: corev1.NodeStatus{
						Capacity: corev1.ResourceList{
							corev1.ResourceCPU:    resource.MustParse("4"),
							corev1.ResourceMemory: resource.MustParse("8Gi"),
						},
					},
				},
			}

			// Create fake client with the nodes
			fakeClient = fake.NewClientBuilder().
				WithScheme(scheme).
				WithObjects(nodes...).
				Build()

			// Instantiate whatever component owns the cache and GetRecommendedCpuModels.
			// This may need to be adjusted to match your code.
			recommendationWatcher = NewCpuModelRecommender(fakeClient, scheme)
		})

		JustBeforeEach(func(ctx context.Context) {
			// Populate the cache once, e.g. by handling node changes.
			// Adjust to the actual method you use to build the cache.
			Expect(recommendationWatcher.HandleNodeChanges(ctx)).To(Succeed())
			cachePopulated = true

			// First call: this should read from the cache and return a slice.
			initialRecommended = recommendationWatcher.GetRecommendedCpuModels(ctx)
			Expect(initialRecommended).ToNot(BeEmpty())

			// Second call: should also read from the cache and return a fresh copy.
			secondRecommended = recommendationWatcher.GetRecommendedCpuModels(ctx)
			Expect(secondRecommended).ToNot(BeEmpty())
		})

		It("returns a defensive copy of cached CPU models", func(ctx context.Context) {
			Expect(cachePopulated).To(BeTrue(), "cache should be populated in JustBeforeEach")

			// Sanity: both slices have the same length and content initially.
			Expect(secondRecommended).To(HaveLen(len(initialRecommended)))
			for i := range initialRecommended {
				Expect(secondRecommended[i]).To(Equal(initialRecommended[i]))
			}

			// Mutate the slice returned by the first call.
			initialRecommended[0].Model = "mutated-model-id"
			initialRecommended[0].BaselineModel = "mutated-baseline"

			// Verify the second slice is unaffected (defensive copy semantics).
			Expect(secondRecommended[0].Model).ToNot(Equal("mutated-model-id"))
			Expect(secondRecommended[0].BaselineModel).ToNot(Equal("mutated-baseline"))

			// Optional: call GetRecommendedCpuModels a third time and
			// assert it still matches the unmodified cache state.
			third := recommendationWatcher.GetRecommendedCpuModels(ctx)
			Expect(third[0].Model).To(Equal(secondRecommended[0].Model))
			Expect(third[0].BaselineModel).To(Equal(secondRecommended[0].BaselineModel))
		})
	})
```

You will likely need to adapt the following to your actual code:

1. **Constructor / type names**:
   - Replace `CpuModelRecommender` and `NewCpuModelRecommender` with the real type and constructor that:
     - Owns the CPU model cache.
     - Exposes `HandleNodeChanges` (or equivalent) to populate that cache.
     - Exposes `GetRecommendedCpuModels`.

2. **Client and scheme wiring**:
   - If the existing tests already have helpers for building a fake client or the recommender/controller, reuse those instead of the inline `fake.NewClientBuilder` shown above.
   - If `GetRecommendedCpuModels` is a package-level function instead of a method, call it directly, and adjust the cache-initialization call to whatever is appropriate in your design.

3. **Node setup**:
   - If your recommendation logic depends on particular CPU model labels or extended resources, mirror the minimal label set used in your existing 3-node cluster tests to ensure at least one CPU model appears in the recommendations.

4. **Ginkgo version**:
   - The example uses `func(ctx context.Context)` signatures for `BeforeEach`, `JustBeforeEach`, and `It`, aligned with Ginkgo v2 and your snippet. Ensure the imports and Ginkgo configuration in the file already support `ctx`; otherwise, drop the `ctx` argument and the `context.Context` parameters.

By adding this context and spec, you’ll explicitly validate that `GetRecommendedCpuModels` returns a defensive copy and does not expose the internal cache slice to callers.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@hco-bot
Copy link
Collaborator

hco-bot commented Jan 4, 2026

hco-e2e-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-sno-azure

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-operator-sdk-sno-azure

Details

In response to this:

hco-e2e-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-sno-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hco-bot
Copy link
Collaborator

hco-bot commented Jan 4, 2026

hco-e2e-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-azure
hco-e2e-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-gcp

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-operator-sdk-azure, ci/prow/hco-e2e-operator-sdk-gcp

Details

In response to this:

hco-e2e-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-azure
hco-e2e-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-gcp

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hco-bot
Copy link
Collaborator

hco-bot commented Jan 4, 2026

hco-e2e-upgrade-prev-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure
hco-e2e-upgrade-prev-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure
hco-e2e-upgrade-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-operator-sdk-azure

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-upgrade-operator-sdk-azure, ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure, ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure

Details

In response to this:

hco-e2e-upgrade-prev-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure
hco-e2e-upgrade-prev-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure
hco-e2e-upgrade-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-operator-sdk-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hco-bot
Copy link
Collaborator

hco-bot commented Jan 4, 2026

hco-e2e-consecutive-operator-sdk-upgrades-aws lane succeeded.
/override ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure

Details

In response to this:

hco-e2e-consecutive-operator-sdk-upgrades-aws lane succeeded.
/override ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hco-bot
Copy link
Collaborator

hco-bot commented Jan 4, 2026

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-kv-smoke-azure

Details

In response to this:

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Collaborator

@nunnatsa nunnatsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dankenigsberg

Added some more comments.

Comment on lines 517 to 521

if cpuModels := nodeinfo.GetRecommendedCpuModels(); !slices.EqualFunc(req.Instance.Status.NodeInfo.RecommendedCpuModels, cpuModels, func(a, b hcov1beta1.CpuModelInfo) bool {
return a.Name == b.Name && a.Benchmark == b.Benchmark && a.Nodes == b.Nodes &&
((a.CPU == nil && b.CPU == nil) || (a.CPU != nil && b.CPU != nil && a.CPU.Equal(*b.CPU))) &&
((a.Memory == nil && b.Memory == nil) || (a.Memory != nil && b.Memory != nil && a.Memory.Equal(*b.Memory)))
}) {
req.Instance.Status.NodeInfo.RecommendedCpuModels = cpuModels
req.StatusDirty = true
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. we already have a comparison function in the nodeinfo package, we can't expose it for reuse, because it's in an internal package, but maybe we can add it as the CpuModel type's method, instead?
  2. this code is not tested. Can we have a few unit tests for it? We can use a simple mock, as done for the rest of the functions in the nodeinfo package, to return any desired value.

In heterogeneous clusters with multiple CPU vendors and generations,
choosing the best CPU model for VMs is challenging:

- Using a newer CPU model limits VM placement to a subset of nodes
- Using an older model impacts guest performance unnecessarily

This change adds automatic CPU model recommendations to the
HyperConverged status. The recommendations are sorted by a weighted
score that balances:

- PassMark performance score (50%) - for guest performance
- Available CPU cores (20%) - for workload capacity
- Available memory (15%) - for workload capacity
- Node count (15%) - for availability and redundancy

The top recommendations appear in status.nodeInfo.recommendedCpuModels,
helping cluster administrators make informed decisions when setting
spec.defaultCPUModel.

The CPU model data is gathered from node labels (cpu-model.node.kubevirt.io/*)
set by KubeVirt's virt-handler, and is automatically refreshed when nodes
change or hourly.

Assisted-by: claude-4-sonnet
Signed-off-by: Dan Kenigsberg <[email protected]>
@sonarqubecloud
Copy link

@hco-bot
Copy link
Collaborator

hco-bot commented Jan 10, 2026

hco-e2e-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-gcp
hco-e2e-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-azure
hco-e2e-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-sno-azure

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-operator-sdk-azure, ci/prow/hco-e2e-operator-sdk-gcp, ci/prow/hco-e2e-operator-sdk-sno-azure

Details

In response to this:

hco-e2e-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-gcp
hco-e2e-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-azure
hco-e2e-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-sno-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hco-bot
Copy link
Collaborator

hco-bot commented Jan 10, 2026

hco-e2e-upgrade-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-operator-sdk-azure

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-upgrade-operator-sdk-azure

Details

In response to this:

hco-e2e-upgrade-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-operator-sdk-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hco-bot
Copy link
Collaborator

hco-bot commented Jan 10, 2026

hco-e2e-consecutive-operator-sdk-upgrades-aws lane succeeded.
/override ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure
hco-e2e-upgrade-prev-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure
hco-e2e-upgrade-prev-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure, ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure, ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure

Details

In response to this:

hco-e2e-consecutive-operator-sdk-upgrades-aws lane succeeded.
/override ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure
hco-e2e-upgrade-prev-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure
hco-e2e-upgrade-prev-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci
Copy link

openshift-ci bot commented Jan 10, 2026

@dankenigsberg: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure f435588 link false /test hco-e2e-upgrade-prev-operator-sdk-sno-azure
ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure f435588 link true /test hco-e2e-upgrade-prev-operator-sdk-azure

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@hco-bot
Copy link
Collaborator

hco-bot commented Jan 10, 2026

hco-e2e-upgrade-prev-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure

Details

In response to this:

hco-e2e-upgrade-prev-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hco-bot
Copy link
Collaborator

hco-bot commented Jan 10, 2026

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-kv-smoke-azure

Details

In response to this:

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@Barakmor1
Copy link
Member

Thank you for this important effort. Have you considered clusters with multiple vendors or architectures? Would this recommendation still be valid in those cases?

@dankenigsberg
Copy link
Member Author

Thank you for this important effort. Have you considered clusters with multiple vendors or architectures? Would this recommendation still be valid in those cases?

Yes: if most of the CPU power of the cluster come from an "exotic" architecture, the code would recommend models from that architecture.

)

// cpuModelPassMarkScores maps libvirt CPU model names to approximate PassMark scores.
// Keys must match exactly the cpu-model.node.kubevirt.io/* label values.
Copy link
Member

@Barakmor1 Barakmor1 Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a heads-up that the keys in this list tend to change with different libvirt versions, so this would require continued maintenance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has DCO signed all their commits. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants