DevSpace support for deploying helm chart locally by kensimon · Pull Request #629 · NVIDIA/ncx-infra-controller-core

kensimon · 2026-03-18T22:28:00Z

Description

This makes it simple to deploy ncx-infra-controller-core to a local kubernetes cluster (like with k3s) with machine-a-tron mock hosts, leveraging the existing helm chart.

Add a bootstrap script, dev/deployment/devspace/bootstrap-prereqs.sh, which sets up prerequisites (cert-manager, postgres, and vault) into an empty kubernetes cluster
Add a devspace.yaml that can build and deploy carbide-api and machine-a-tron
Add Dockerfiles for the api and machine-a-tron, which devspace can use to build them from the current tree

Type of Change

Add - New feature or capability
Change - Changes in existing functionality
Fix - Bug fixes
Remove - Removed features or deprecated functionality
Internal - Internal changes (refactoring, tests, docs, etc.)

Related Issues (Optional)

Breaking Changes

This PR contains breaking changes

Testing

Unit tests added/updated
Integration tests added/updated
Manual testing performed
No testing required (docs, internal refactor, etc.)

Additional Notes

This was mostly done by OpenAI Codex, with 3-4 rounds of redoing it until I liked the result, plus a bunch hand-editing by me afterward. I'm fairly confident I understand everything in this PR, and it seems to work well in local testing (I can spin up carbide-api pretty quickly with machine-a-tron talking to it, and the hosts get to Ready fairly quickly.)

This makes it simple to deploy ncx-infra-controller-core to a local kubernetes cluster (like with k3s) leveraging the helm chart. - Include a script, dev/deployment/devspace/bootstrap-prereqs.sh, which sets up cert-manager, postgres, and vault - Add a helm chart for machine-a-tron inside dev/deployments/devspace (since machine-a-tron isn't generally included in "production" vault setups, it's only really for development) - Add a devspace.yaml that can build deploy carbide-api and machine-a-tron - Make a Dockerfile for the api and machine-a-tron that use docker buildkit for caching, and can run straight from the produced image (rather than having a separate builder/runtime container split)

Matthias247

So this is a fully new chart? Not the same that is already offered as deployment option?

I think it would be good to just have one - where one would use special options to enable local development. Multiple charts will be confusing for OSS users (which one should I use?).

github-actions · 2026-03-18T22:31:10Z

🛡️ Vulnerability Scan

🚨 Found 72 vulnerability(ies)
📊 vs main: 72 (no change)

Severity Breakdown:

🔴 Critical/High: 72
🟡 Medium: 0
🔵 Low/Info: 0

🔗 View full details in Security tab

_{🕐 Last updated: 2026-03-18 22:31:09 UTC | Commit: 70bea03}

github-actions · 2026-03-18T22:31:14Z

🔐 TruffleHog Secret Scan

✅ No secrets or credentials found!

Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉

🔗 View scan details

_{🕐 Last updated: 2026-03-18 22:31:14 UTC | Commit: 70bea03}

kensimon · 2026-03-18T22:35:50Z

So this is a fully new chart? Not the same that is already offered as deployment option?

I think it would be good to just have one - where one would use special options to enable local development. Multiple charts will be confusing for OSS users (which one should I use?).

It's reusing the existing chart for carbide-api (that's the main point of this, at least.) There's a separate helm chart for machine-a-tron though, since I wanted to keep it separate from the main chart. I don't think the main chart should be deploying machine-a-tron, since it's only for environments where you want mock hosts. (We can discuss though... we could technically add machine-a-tron to the main chart, but disabled by default and overrideable by values.yaml, but I decided against that for now.)

The two main things this PR is doing is:

Giving a nice friendly script for bootstrapping the prerequisites into a local k3s cluster (basically cert-manager, postgres, and vault). You only need to run this once.
Making a devspace.yaml and some more straightforward Dockerfiles for building carbide-api and machine-a-tron, so that you can run devspace deploy any time to deploy your local code. (It takes the place of what skaffold was doing before.) The nice thing here is that devspace is simply using the helm chart to deploy, but overriding the image tags with what is produced from the local build.

ianderson-nvidia

Is this only for the api crate? Are we not worrying about DHCP/PXE/DNS/console services?

You will also need to bump the chart version for carbide-api in Chart.yml
As well asChart.yaml in helm root dir.

dev/deployment/devspace/Dockerfile.api

kensimon · 2026-03-18T23:30:34Z

Is this only for the api crate? Are we not worrying about DHCP/PXE/DNS/console services?

Not yet, no... I'm mainly just interested in getting development going with mocks right now, and none of those are really needed. I'd be interested to hear opinions on this though... When I think "local development workflow" I don't generally think of running a real DHCP/PXE/etc server (ie. not a real datacenter environment with DPU's.) I just want the simplest way to get an API server running you can test against, and the machine-a-tron bits are just there just to make the default deployment a little more interesting (rather than just being an empty environment with no hosts.)

Can we use the build containers that we have been using?

I'll try and look into re-using the build container tomorrow, but we do need a different Dockerfile for this, for devspace to do its magic. What we don't want to do is what we did with skaffold, where we had a separate step that ran the build container to spit out a binary, and then another build step that built the runtime container from a simple COPY line that grabbed the built binary (That was always IMO a hack, we could have always just done it all in one step.)

Another thing to consider is just blowing away dev/deployment/localdev, since that is all very specific to the skaffold workflow which is now basically broken.

Having another helm chart was just confusing, since the purpose is only for local development, and the main helm chart shouldn't have machine-a-tron.

kensimon · 2026-03-19T15:01:50Z

So this is a fully new chart? Not the same that is already offered as deployment option?

I think it would be good to just have one - where one would use special options to enable local development. Multiple charts will be confusing for OSS users (which one should I use?).

I reworked the PR to have devspace use plain-old kubernetes yaml and kubectl apply for the machine-a-tron portions, so there's no confusion about a "new chart". (It's simpler this way anyway.)

ianderson-nvidia · 2026-03-19T15:58:33Z

I'll try and look into re-using the build container tomorrow, but we do need a different Dockerfile for this, for devspace to do its magic. What we don't want to do is what we did with skaffold, where we had a separate step that ran the build container to spit out a binary, and then another build step that built the runtime container from a simple COPY line that grabbed the built binary (That was always IMO a hack, we could have always just done it all in one step.)

Another thing to consider is just blowing away dev/deployment/localdev, since that is all very specific to the skaffold workflow which is now basically broken.

I don't think Skaffold is broken. The entire personal dev environment uses the dockerfiles in localdev/ with skaffold. If we want to pivot to devspace, that's fine. But we need a plan to support the current dev environments before we start deleting things.

wminckler · 2026-03-19T17:36:09Z

I'll try and look into re-using the build container tomorrow, but we do need a different Dockerfile for this, for devspace to do its magic. What we don't want to do is what we did with skaffold, where we had a separate step that ran the build container to spit out a binary, and then another build step that built the runtime container from a simple COPY line that grabbed the built binary (That was always IMO a hack, we could have always just done it all in one step.)
Another thing to consider is just blowing away dev/deployment/localdev, since that is all very specific to the skaffold workflow which is now basically broken.

I don't think Skaffold is broken. The entire personal dev environment uses the dockerfiles in localdev/ with skaffold. If we want to pivot to devspace, that's fine. But we need a plan to support the current dev environments before we start deleting things.

Please don't delete the old stuff yet...

kensimon requested a review from a team as a code owner March 18, 2026 22:28

Matthias247 reviewed Mar 18, 2026

View reviewed changes

ianderson-nvidia reviewed Mar 18, 2026

View reviewed changes

dev/deployment/devspace/Dockerfile.api Outdated Show resolved Hide resolved

kensimon added 7 commits March 19, 2026 07:42

Make vault a StatefulSet

ab6532f

Use build-container-x86_64 as base

1cf2032

Add docs for reinitializing

20de66c

Bump helm chart versions

106053d

Fix version in toplevel helm chart

4628b1a

Add link to devspace deployment to README.md

235f199

Use plain k8s manifests instead of a helm chart for machine-a-tron

8b5eb6b

Having another helm chart was just confusing, since the purpose is only for local development, and the main helm chart shouldn't have machine-a-tron.

kensimon changed the title ~~Local helm chart deployment with devspace~~ DevSpace support for deploying helm chart locally Mar 19, 2026

ianderson-nvidia approved these changes Mar 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DevSpace support for deploying helm chart locally#629

DevSpace support for deploying helm chart locally#629
kensimon wants to merge 8 commits intoNVIDIA:mainfrom
kensimon:devspace-bootstrap

kensimon commented Mar 18, 2026 •

edited

Loading

Uh oh!

Matthias247 left a comment

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

kensimon commented Mar 18, 2026 •

edited

Loading

Uh oh!

ianderson-nvidia left a comment

Uh oh!

Uh oh!

kensimon commented Mar 18, 2026

Uh oh!

kensimon commented Mar 19, 2026

Uh oh!

ianderson-nvidia commented Mar 19, 2026

Uh oh!

wminckler commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

kensimon commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Related Issues (Optional)

Breaking Changes

Testing

Additional Notes

Uh oh!

Matthias247 left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 18, 2026

🛡️ Vulnerability Scan

Uh oh!

github-actions bot commented Mar 18, 2026

🔐 TruffleHog Secret Scan

Uh oh!

kensimon commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ianderson-nvidia left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kensimon commented Mar 18, 2026

Uh oh!

kensimon commented Mar 19, 2026

Uh oh!

ianderson-nvidia commented Mar 19, 2026

Uh oh!

wminckler commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kensimon commented Mar 18, 2026 •

edited

Loading

kensimon commented Mar 18, 2026 •

edited

Loading