TJ's Kubernetes Service

Summary
Requirements
Instructions
Post Install
Troubleshooting
- Terraform is Stuck Deleting

Summary

TJ's Kubernetes Service, or TKS, is an IaC project that is used to deliver Kubernetes to Proxmox. Across the years, it has evolved many times and has used a multitude of different technologies. Nowadays, it is a relatively simple collection of Terraform manifests thanks to the work of BPG and Sidero Labs.

Requirements

Requirement	Description
`terraform`	Used for creating the cluster
`kubectl`	Used for removing nodes from the cluster
`talosctl`	Used for removing nodes from the cluster
`ssh-agent`	Used for connecting to the Proxmox server to bootstrap the Talos image
Proxmox	You already know
DNS Resolver	Used for configuring DHCP reservation during cluster creation and DNS resolution within the cluster

Instructions

Configure SSH access with a private key to your Proxmox server. This is needed to provision the installation image and also for certain API actions executed by the Terraform provider.
Create an API token on Proxmox. I use my create_user Ansible role to create mine.

Add your SSH key to ssh-agent:

eval "$(ssh-agent -s)"
ssh-add --apple-use-keychain ~/.ssh/sol.Milkyway

Set the environment variables required to authenticate to your Proxmox server according to the provider docs. I personally use an API Token and define them in vars/config.env. Source them into your shell.
```
source vars/config.env
```
Review variables.tf and set any overrides according to your environment in a new tfvars file.

Create DNS records and DHCP reservations for your nodes according to your configured Hostname, MAC address, and IP Address prefixes. Here is how mine is configured for two clusters:

Hostname	MAC Address	IP Address
k8s-vip	N/A	192.168.40.10
k8s-cp-1	00:00:00:00:00:11	192.168.40.11
k8s-cp-2	00:00:00:00:00:12	192.168.40.12
k8s-cp-3	00:00:00:00:00:13	192.168.40.13
k8s-node-1	00:00:00:00:00:21	192.168.40.21
k8s-node-2	00:00:00:00:00:22	192.168.40.22
k8s-node-3	00:00:00:00:00:23	192.168.40.23
test-k8s-vip	N/A	192.168.40.50
test-k8s-cp-1	00:00:00:00:00:51	192.168.40.51
test-k8s-cp-2	00:00:00:00:00:52	192.168.40.52
test-k8s-cp-3	00:00:00:00:00:53	192.168.40.53
test-k8s-node-1	00:00:00:00:00:61	192.168.40.61
test-k8s-node-2	00:00:00:00:00:62	192.168.40.62
test-k8s-node-3	00:00:00:00:00:63	192.168.40.63

Initialize Terraform and create a workspace for your Terraform state. Or configure a different backend accordingly.
```
terraform init
terraform workspace new test
```

Create the cluster

terraform apply --var-file="vars/test.tfvars"

Retrieve the Kubernetes and Talos configuration files. Be sure not to overwrite any existing configs you wish to preserve. I use kubecm to add/merge configs and kubectx to change contexts.

mkdir -p ~/.{kube,talos}
touch ~/.kube/config

terraform output -raw talosconfig > ~/.talos/config-test
terraform output -raw kubeconfig > ~/.kube/config-test

kubecm add -f ~/.kube/config-test
kubectx admin@test

Confirm Kubernetes is bootstrapped and that all of the nodes have joined the cluster. The Controlplane nodes might take a moment to respond. You can confirm the status of each Talos node using talosctl or by reviewing the VM consoles in Proxmox.
```
watch kubectl get nodes,all -A
```
Kubernetes will only automatically approve certificate signing requests (CSRs) if your nodes use a standard FQDN that matches the cluster’s expected domain. If your nodes have custom or non-standard hostnames, you may need to manually review and approve CSRs to complete cluster bootstrapping:
```
# Review pending CSRs to validate they are as expected
kubectl get csr
kubectl describe csr csr-foobar

# Approve all pending CSRs
kubectl get csr -o name | xargs kubectl certificate approve
```

Post Install

Installing A Different CNI

By default, Talos uses Flannel. To use a different CNI make sure that var.talos_disable_flannel is set to true during provisioning. The cluster will not be functional and you will not be able to upgrade the nodes until a CNI is enabled. Cilium can be installed using my project found here. You will also likely want to install Kubelet CSR Approver to automatically accept the required certificate signing requests. Alternatively, after installing you can accept them manually:

kubectl get csr
kubectl certificate approve $CSR

Scaling the Cluster

The Terraform provider makes it quite easy to scale in, out, up, or down. Simply adjust the variables for resources or desired number of nodes and run terraform plan again. If the plan looks good, apply it.

In the event you scale down a node, terraform will execute a local-provisioner that runs manage_nodes to remove the node from the cluster for you as well:

./bin/manage_nodes remove $NODE

Considerations:

At this time I don't think it's possible to choose a specific node to remove. You must scale up and down the last node.
Due to the way I configure IP Addressing using DHCP reservations, there is a limit of both 9 controlplanes and 9 workernodes.

Installing Other Apps

You can find my personal collection of manifests here.

Troubleshooting

Terraform is Stuck Deleting

If QEMU Guest Agent is not functional correctly, Proxmox may hang when trying to issue a shutdown to the VMs. This can lead to Terraform trying to destroy nodes unsuccessfully until the API times out the command. In the event this occurs, you can work connect to Proxmox manually and remove the VMs, then proceed with terraform destroy as usual. For example:

ssh -i ~/.ssh/sol.milkyway root@earth.sol.milkyway "rm /var/lock/qemu-server/lock-*; qm list | grep 40 | awk '{print \$1}' | xargs -L1 qm stop && sleep 5 && qm list | grep 40 | awk '{print \$1}' | xargs -L1 qm destroy"

Name		Name	Last commit message	Last commit date
Latest commit History 312 Commits
.github		.github
bin		bin
configs		configs
vars		vars
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.tf		main.tf
talos_cluster.tf		talos_cluster.tf
talos_controlplanes.tf		talos_controlplanes.tf
talos_image.tf		talos_image.tf
talos_resource_pool.tf		talos_resource_pool.tf
talos_workernodes.tf		talos_workernodes.tf
variables.tf		variables.tf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TJ's Kubernetes Service

Summary

Requirements

Instructions

Post Install

Installing A Different CNI

Scaling the Cluster

Installing Other Apps

Troubleshooting

Terraform is Stuck Deleting

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TJ's Kubernetes Service

Summary

Requirements

Instructions

Post Install

Installing A Different CNI

Scaling the Cluster

Installing Other Apps

Troubleshooting

Terraform is Stuck Deleting

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages