Skipping MCAD CPU Preemption Test#696
Skipping MCAD CPU Preemption Test#696Fiona-Waters wants to merge 29 commits intoproject-codeflare:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
For future reference, the root cause analysis of the test's failure has been conducted by @dgrove-oss, and it can be found here: |
0a8c451 to
5c3d3ed
Compare
5c3d3ed to
701e8e9
Compare
Thanks @ronensc That's good to know! |
|
I don't think it's worth backporting, but I did redo these test cases for mcad v2 to be robust against different cluster sizes in project-codeflare/mcad#83 |
|
More investigation is required as to why these tests are failing. Closing this PR. |
KPostOffice
left a comment
There was a problem hiding this comment.
Looks good. I like the move to more generic tests. One question.
| //aw := createDeploymentAWwith550CPU(context, appendRandomString("aw-deployment-2-550cpu")) | ||
| cap := getClusterCapacitycontext(context) | ||
| resource := cpuDemand(cap, 0.275).String() | ||
| aw := createGenericDeploymentCustomPodResourcesWithCPUAW( |
There was a problem hiding this comment.
What happens if the cluster has many smaller nodes resulting a a high cap but inability to schedule AppWrappers becauase they do not fit on the individual nodes? Do we care about that at all in this test case?
There was a problem hiding this comment.
From a test case perspective, the cluster is assumed to have homogenous nodes and it requests deployments that fit on a node in the cluster in CPU dimension.
Fiona-Waters
left a comment
There was a problem hiding this comment.
This looks great, so happy to move forward with this improvement. Just a couple of small comments.
|
|
||
| func getClusterCapacitycontext(context *context) *clusterstateapi.Resource { | ||
| capacity := clusterstateapi.EmptyResource() | ||
| nodes, _ := context.kubeclient.CoreV1().Nodes().List(context.ctx, metav1.ListOptions{}) |
There was a problem hiding this comment.
We should handle the error here.
| podList, err := context.kubeclient.CoreV1().Pods("").List(context.ctx, metav1.ListOptions{FieldSelector: labelSelector}) | ||
| // TODO: when no pods are listed, do we send entire node capacity as available | ||
| // this will cause false positive dispatch. | ||
| if err != nil { |
There was a problem hiding this comment.
Should the error be caught like this instead?
| if err != nil { | |
| Expect(err).NotTo(HaveOccurred() |
Skipping the MCAD CPU Preemption Test which is failing intermittently on PRs so that we can get some outstanding PRs merged.