Skip to content

fix(CubeMaster): preserve template network rules and resource defaults#581

Open
zyl1121 wants to merge 5 commits into
TencentCloud:masterfrom
zyl1121:fix/cubemaster-template-resource-egress
Open

fix(CubeMaster): preserve template network rules and resource defaults#581
zyl1121 wants to merge 5 commits into
TencentCloud:masterfrom
zyl1121:fix/cubemaster-template-resource-egress

Conversation

@zyl1121

@zyl1121 zyl1121 commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Motivation

The template / sandbox request materialization path currently has a few correctness gaps where accepted or configured fields are not preserved consistently.

The most user-visible issue is in template creation from image: cube_network_config.rules can be submitted successfully, but the stored template create_request drops the rules array while keeping the rest of cube_network_config.

This PR fixes that confirmed regression and also tightens two closely related resource-handling cases in the same request materialization path:

  • preserve the first validation error when both CPU and memory exceed configured limits
  • preserve template CPU / memory defaults when applying them to a request whose Resources field is initially nil

Reproduction

I first observed the network rules regression on a one-click deployment running v0.4.0.

Steps:

  1. Create a template from image with both allowInternetAccess and an egress rules entry inside cube_network_config.
  2. Wait for the template image job to succeed.
  3. Inspect the stored template request:
GET /cube/template?template_id=...&include_request=true

Before this fix, the stored create_request.cube_network_config keeps:

{
  "allowInternetAccess": true
}

but silently drops the submitted rules array.

That means the request is accepted, but the resulting template no longer carries the intended egress rule set forward into later sandbox creation.

What This PR Changes

  • Preserve cube_network_config.rules when cloning a template create request from CreateTemplateFromImageReq into CreateCubeSandboxReq.

    • pkg/templatecenter/template_request.go
    • adds deep-copy helpers for egress rules
  • Preserve the first resource validation error when both CPU and memory exceed configured limits.

    • pkg/service/sandbox/util.go
    • prevents the later memory validation check from overwriting an earlier CPU validation error
  • Preserve template CPU / memory defaults when merging template resources into a request with Resources == nil.

    • pkg/service/httpservice/cube/cubeboxutil.go
    • makes applyTemplateResources() return the effective *types.Resource, and callers keep the returned pointer
  • Add focused regression tests for all three cases.

Validation

Manual validation on the deployed cluster:

  • Confirmed that POST /cube/template/from-image accepts cube_network_config.rules, but GET /cube/template?...&include_request=true drops rules while keeping allowInternetAccess.
  • Confirmed that POST /cube/sandbox with cpu=101, mem=2000Mi is rejected with a CPU validation error.
  • Confirmed that POST /cube/sandbox with both cpu=101 and mem=301Gi returns the later memory validation error, which matches the validation-order issue fixed by this PR.

Focused regression tests on this branch:

cd CubeMaster

go test -vet=off ./pkg/templatecenter -run TestGenerateTemplateCreateRequestClonesCubeNetworkRules -count=1
go test -vet=off ./pkg/service/sandbox -run 'TestGetReqResourceRejectsCPUOverflowWhenMemIsValid|TestGetReqResourceRejectsCPUOverflowBeforeMemOverflow' -count=1
go test -vet=off ./pkg/service/httpservice/cube -run TestApplyTemplateToContainerSetsResourcesWhenRequestResourceIsNil -count=1

The focused pkg/service/sandbox tests cover both the cpu overflow + mem valid case and the cpu overflow + mem overflow case.

Scope

This PR is limited to request correctness in the template / sandbox materialization path.

It does not change scheduler limits, template policy semantics, or introduce any new API fields.

The main behavior change is that already accepted request fields and already configured template resource defaults are now preserved consistently when requests are cloned, validated, and merged.

Preserve egress rules when cloning template create requests so accepted network policy is not lost during template creation from image.

Also preserve the first resource validation error when both CPU and memory exceed configured limits, and make template resource merging nil-safe so template CPU and memory defaults are retained when the destination resource struct is absent.

Signed-off-by: zhengyilei <zheng_yilei@qq.com>
Comment on lines 196 to 198
if resourceIn.Limit != nil {
resourceOut.Limit = resourceIn.Limit
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shallow copy of Resource.Limit — pointer aliasing with template

This assigns the pointer directly (line 198), not a deep copy:

if resourceIn.Limit != nil {
    resourceOut.Limit = resourceIn.Limit
}

After this, the container's Resources.Limit and the template's Resources.Limit point to the same RequestLimit struct. If the template's Limit is ever mutated downstream (even unintentionally), the container's Limit would change too.

The rest of this PR carefully deep-copies egress rule fields (using cloneStringPtr, struct-value copies, etc.), so this inconsistency stands out. A deep copy would be:

if resourceIn.Limit != nil {
    resourceOut.Limit = &types.RequestLimit{
        Cpu: resourceIn.Limit.Cpu,
        Mem: resourceIn.Limit.Mem,
    }
}

How to fix: Replace the pointer assignment with a struct copy on line 198. This is a pre-existing issue, but since applyTemplateResources is being refactored in this PR anyway, it's the right time to close the gap.

Comment on lines +174 to +186
if len(in.Action.Inject) > 0 {
action.Inject = make([]*types.EgressRuleInject, 0, len(in.Action.Inject))
for _, inj := range in.Action.Inject {
if inj == nil {
continue
}
cloned := &types.EgressRuleInject{
Header: inj.Header,
Secret: inj.Secret,
Format: cloneStringPtr(inj.Format),
}
action.Inject = append(action.Inject, cloned)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field-by-field construction vs struct-copy pattern diverges from the sibling implementation

This clones EgressRuleInject fields one by one:

cloned := &types.EgressRuleInject{
    Header: inj.Header,
    Secret: inj.Secret,
    Format: cloneStringPtr(inj.Format),
}

The identical function in cubeboxutil.go:355-367 uses a struct-value copy instead:

cp := *inj
if inj.Format != nil {
    format := *inj.Format
    cp.Format = &format
}
action.Inject = append(action.Inject, &cp)

The struct-copy pattern is more resilient: if a new field is added to EgressRuleInject (which carries credentials), the struct-copy version would automatically pick it up, while the field-by-field version would silently drop it. This is a maintenance hazard — someone adding a field to the type might not know to update all the clone functions.

How to fix: Align this with the cubeboxutil.go pattern by using a struct-value copy instead of field-by-field assignment.

@cubesandboxbot

cubesandboxbot Bot commented Jun 16, 2026

Copy link
Copy Markdown

Summary

This PR fixes three correctness bugs in the template/sandbox request materialization path. Two inline comments were posted for specific lines.


High Priority

1. Duplicated deep-copy functions across two packages

cloneEgressRules, cloneEgressRule, and cloneCubNetworkConfig are defined in both template_request.go and cubeboxutil.go with nearly identical logic but different styles. A third variant exists in local_service.go.

If a field is added to EgressRule or its sub-types, a maintainer must update all three copies.

Recommendation: Extract a shared DeepCopy method onto the type definitions in pkg/service/sandbox/types/types.go.


Medium Priority

2. ensureSandboxTestConfig uses a fragile relative path

util_test.go resolves config via os.Getwd + ../../../conf.yaml which breaks with go test ./... from repo root. Use runtime.Caller(0) instead.

3. Test coverage gaps in getReqResource

No happy-path test for valid values, no mem-only overflow test, no multi-container test, no nil-Resource test.

4. Doc comments missing on new clone functions

The four new functions in template_request.go have no contract comments. Add a comment on cloneEgressRule stating it returns a deep copy.


Low Priority

5. Method slice deep-copy not verified in test

6. Debug-path secret logging (pre-existing)


Working Well

  • Each bug fix is independently tested with focused regression tests.
  • The deep-copy test correctly verifies equality + pointer inequality.
  • The getReqResource fix is minimal and correct.
  • The applyTemplateResources return-type change correctly handles nil resources.

}
if resourceIn.Limit != nil {
resourceOut.Limit = resourceIn.Limit
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shallow copy of Resource.Limit — pointer aliasing with template

This assigns the pointer directly (line 198), not a deep copy:

if resourceIn.Limit != nil {
    resourceOut.Limit = resourceIn.Limit
}

After this, the container's Resources.Limit and the template's Resources.Limit point to the same RequestLimit struct. If the template's Limit is ever mutated downstream (even unintentionally), the container's Limit would change too.

The rest of this PR carefully deep-copies egress rule fields (using cloneStringPtr, struct-value copies, etc.), so this inconsistency stands out. A deep copy would be:

if resourceIn.Limit != nil {
    resourceOut.Limit = &types.RequestLimit{
        Cpu: resourceIn.Limit.Cpu,
        Mem: resourceIn.Limit.Mem,
    }
}

How to fix: Replace the pointer assignment with a struct copy on line 198. This is a pre-existing issue, but since applyTemplateResources is being refactored in this PR anyway, it's the right time to close the gap.

Comment thread CubeMaster/pkg/service/sandbox/util.go Outdated
cpu.MilliValue())
}
if mem.Cmp(config.GetConfig().Scheduler.MaxMvmMemoryRes()) >= 0 {
if err == nil && mem.Cmp(config.GetConfig().Scheduler.MaxMvmMemoryRes()) >= 0 {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's return early instead of nil check here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's return early instead of nil check here.

Done, I updated this in a new commit to return early.

ctr.Resources = &types.Resource{}
}
applyTemplateResources(templateCtr.Resources, ctr.Resources)
ctr.Resources = applyTemplateResources(templateCtr.Resources, ctr.Resources)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like changes to this file is unnecessary.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like changes to this file is unnecessary.

Good point. I added this as defensive hardening for the case where the destination Resources is nil, but the current call sites already handle allocation. To keep this PR focused, I will drop this change.

out := &types.CubeNetworkConfig{
AllowOut: append([]string(nil), in.AllowOut...),
DenyOut: append([]string(nil), in.DenyOut...),
Rules: cloneEgressRules(in.Rules),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, there is no way to pass egress rules when create template.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, there is no way to pass egress rules when create template.

I think this is true for cubemastercli today, but not for the HTTP API.

CreateTemplateFromImageReq already includes cube_network_config.rules, and the /cube/template/from-image handler unmarshals the request body directly into that type.

I reproduced this on a v0.4.0 one-click deployment by sending cube_network_config.rules in the raw POST body. The request was accepted, allowInternetAccess was preserved, but rules was missing from the stored template create_request.

So the issue here seems to be that raw API callers can already pass rules, but the template request clone path drops them.

If the intended scope is to only support fields currently exposed by cubemastercli, I can split the rules preservation into a follow-up and keep this PR focused on other fixes.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think we should also revise mergeEgressRules(). The per sandbox rules should prepend to templates rules.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think we should also revise mergeEgressRules(). The per sandbox rules should prepend to templates rules.

Updated this part.

mergeEgressRules() now prepends per-sandbox/request rules before template rules, so first-match-wins evaluation gives request-side rules precedence.

I also added focused tests for name-based overrides, no-conflict merge order, empty-side handling, and nil rule entries.

zyl1121 added 2 commits June 16, 2026 19:27
Follow the review suggestion in getReqResource by returning immediately on CPU or memory limit validation failures instead of carrying an intermediate error variable through the next check.

Signed-off-by: zhengyilei <zheng_yilei@qq.com>
Follow the review feedback by removing the cubeboxutil.go resource-merge helper changes from this PR and keeping the PR focused on the confirmed template rules fix plus the resource validation fix.

Signed-off-by: zhengyilei <zheng_yilei@qq.com>
@zyl1121 zyl1121 force-pushed the fix/cubemaster-template-resource-egress branch from 22e1921 to 9bc295d Compare June 16, 2026 11:37
Make mergeEgressRules prepend per-sandbox request rules before template rules so first-match-wins evaluation honors request-side overrides. Skip template rules whose names are overridden by the request side, and add focused tests for override order, empty-side handling, and nil entries.

Signed-off-by: zhengyilei <zheng_yilei@qq.com>
// first-match-wins, per-sandbox/request rules must come before template rules.
// Template rules whose Name is overridden by the request side are skipped.
func mergeEgressRules(base []*types.EgressRule, extra []*types.EgressRule) []*types.EgressRule {
if len(extra) == 0 {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: mergeEgressRules now always clones (defensive, correct)

The new code on both early-return paths (len(extra)==0 → returns cloneEgressRules(base), and len(base)==0 → returns cloneEgressRules(extra)) always returns a deep copy. The old code returned base directly when no extra rules were provided.

This pairs with mergeCubeNetworkConfigs (line 267+279): out is already a clone of templateCfg, but mergeEgressRules re-clones it. When both sides have rules, template rules are cloned twice — once in cloneCubeNetworkConfig and again inside mergeEgressRules. Consider passing templateCfg.Rules directly to mergeEgressRules to avoid the redundant clone.

// first-match-wins, per-sandbox/request rules must come before template rules.
// Template rules whose Name is overridden by the request side are skipped.
func mergeEgressRules(base []*types.EgressRule, extra []*types.EgressRule) []*types.EgressRule {
if len(extra) == 0 {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This PR fixes three data-loss bugs in the template/sandbox request materialization path. The changes are well-scoped and the reasoning is clearly articulated. Below are the findings I consider noteworthy.


Pre-existing bug: parse errors silently dropped in getReqResource (Not PR-introduced)

File: CubeMaster/pkg/service/sandbox/util.go:90-98

The resource.ParseQuantity calls use := which declares a new for-loop-scoped err, shadowing the named return err from the function signature. After break, the for-loop-scoped err goes out of scope and line 104 reads the named return err which is still nil.

Suggestion: Use = instead of := to write to the named return err.


Docs describe wrong rule merge ordering

Files:

  • docs/guide/egress-network-policy.md (English, line 197)
  • docs/zh/guide/egress-network-policy.md (Chinese, line 197)

Both state that request rules with new names are appended after template rules. The mergeEgressRules implementation puts all request rules first, then template rules minus overridden ones. The code is correct; the docs should match.


Minor: redundant clone when both sides have egress rules

When both templateCfg and requestCfg contain rules (cubeboxutil.go:267,279), template rules are deep-cloned twice — once in cloneCubeNetworkConfig and again in mergeEgressRules.


Positive observations

  • The getReqResource early-return fix is correct and unambiguous.
  • Test coverage verifies pointer independence at all nesting levels.
  • Nil entry handling in mergeEgressRules is robust.

@zyl1121 zyl1121 force-pushed the fix/cubemaster-template-resource-egress branch from 93b1436 to 152d6af Compare June 17, 2026 07:03
Split cube network config cloning so template rules are not cloned twice when request rules are present, while still returning cloned output when the request has no rules.

Also sync the English and Chinese egress policy docs with the implemented first-match-wins merge order.

Signed-off-by: zhengyilei <zheng_yilei@qq.com>
@zyl1121 zyl1121 force-pushed the fix/cubemaster-template-resource-egress branch from 152d6af to f632d73 Compare June 17, 2026 07:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants