Running Kubernetes on bare metal gives you something that no cloud provider can: full control over your hardware with zero per-hour compute costs. There is no managed service fee, no cloud tax on every CPU cycle, and no surprise bill at the end of the month. You own the servers, and you decide exactly how they are configured.
The tradeoff is that bare metal Kubernetes is notoriously manual. Without cloud provider integrations, you are responsible for load balancing, storage provisioning, networking, and every other piece of infrastructure that managed services handle for you. A single misconfigured firewall rule or a missed etcd port can leave you debugging for hours.
In this tutorial, you will use KubeOne with Terraform to provision a highly available, 3-node Kubernetes cluster on bare metal servers. KubeOne handles the heavy lifting of bootstrapping kubeadm, configuring etcd for high availability, and deploying the CNI plugin. Terraform provides the structured output format that KubeOne uses to discover your infrastructure. By the end, you will have a production-ready cluster with etcd redundancy across all three control plane nodes and a clear path to adding worker nodes.
This is not a toy cluster. The configuration you will build here is suitable for production workloads, with audit logging, node-local DNS caching, and a high availability architecture that tolerates the failure of any single node.
Step 1: Plan Your Network Architecture
Before you install anything or write a single line of configuration, plan your network. Bare metal networking does not forgive mistakes — there is no VPC wizard to set up subnets for you, and a wrong firewall rule can silently break inter-node communication.
Private Network
All three control plane nodes must communicate over a private network. A typical choice is a 10.0.0.0/24 subnet, but any RFC 1918 range works. The critical requirement is that inter-node traffic — etcd replication, kubelet communication, and internal API calls — does not traverse the public internet. If your servers are in the same data center, they likely already share a private network. If they are in different locations, you will need a VPN or overlay network between them.
Verify private connectivity before proceeding. SSH into each node and ping the private IPs of the other two. If any of those pings fail, stop here and fix your network.
Public IPs
Each node needs a public IP address for two purposes: SSH access from your workstation and Kubernetes API access. If you are running everything behind a VPN and do not need external API access, you can skip public IPs entirely, but most production setups need at least SSH reachability.
API Server Load Balancer
In a highly available cluster, the Kubernetes API server runs on all three control plane nodes. You need a load balancer in front of them so that kubectl and your applications always reach a healthy API server, even if one node goes down. You have several options:
- External hardware or software load balancer — HAProxy, Nginx, or an F5 device. This is the most reliable approach.
- DNS round-robin — Create a DNS A record with all three control plane IPs. This is simple but has no health checks. If one node goes down, roughly one-third of API requests will fail until you update the DNS record.
- keepalived with HAProxy — Run keepalived on two dedicated hosts (or on two of the control plane nodes) for automatic failover of a virtual IP. This is the gold standard for bare metal HA.
For this tutorial, you will set up HAProxy in Step 4. If you already have a load balancer, skip that step and use your existing load balancer’s IP as the API server address.
Firewall Rules
Control plane nodes need specific ports open between them. Get these wrong, and the cluster will fail to form or will exhibit intermittent failures that are difficult to diagnose.
| Port | Protocol | Purpose |
|---|---|---|
| 6443 | TCP | Kubernetes API server |
| 2379-2380 | TCP | etcd client and peer communication |
| 10250 | TCP | kubelet API |
| 10259 | TCP | kube-scheduler |
| 10257 | TCP | kube-controller-manager |
Warning: Port 6443 (API server) must be accessible from wherever you run kubectl. Ports 2379-2380 (etcd) should only be accessible between control plane nodes. Exposing etcd to the internet is a critical security vulnerability — anyone with access to etcd can read and modify every secret, config map, and resource in your cluster.
Open these ports between all three control plane nodes on the private network. If you are using ufw on Ubuntu:
# Run on each control plane node
sudo ufw allow from 10.0.0.0/24 to any port 6443 proto tcp
sudo ufw allow from 10.0.0.0/24 to any port 2379:2380 proto tcp
sudo ufw allow from 10.0.0.0/24 to any port 10250 proto tcp
sudo ufw allow from 10.0.0.0/24 to any port 10259 proto tcp
sudo ufw allow from 10.0.0.0/24 to any port 10257 proto tcp
Also allow SSH (port 22) from your workstation and port 6443 from wherever you need kubectl access.
Step 2: Create the Terraform Configuration
On cloud providers, Terraform provisions the actual infrastructure — VMs, networks, load balancers. On bare metal, your servers already exist. Terraform’s role here is different: it generates the structured JSON output that KubeOne expects, describing where your servers are and how to reach them.
This might feel like overkill for bare metal, but it gives you two significant benefits. First, your infrastructure description is version-controlled and reproducible. Second, if you later move to a cloud provider, your workflow stays the same — only the Terraform configuration changes.
Create a project directory:
mkdir kubeone-bare-metal && cd kubeone-bare-metal
Create main.tf with the following content:
# For bare metal, Terraform generates the output format that KubeOne expects.
# It does not provision infrastructure — your servers already exist.
variable "cluster_name" {
type = string
default = "production"
}
variable "control_plane_hosts" {
type = list(object({
public_address = string
private_address = string
ssh_user = string
}))
}
variable "api_server_address" {
type = string
description = "Load balancer or primary control plane IP for API server access"
}
variable "ssh_private_key_file" {
type = string
default = "~/.ssh/id_rsa"
}
output "kubeone_api" {
value = {
endpoint = {
host = var.api_server_address
port = 6443
connect_timeout = 60
}
}
}
output "kubeone_hosts" {
value = {
control_plane = {
cluster_name = var.cluster_name
hosts = [
for i, host in var.control_plane_hosts : {
public_address = host.public_address
private_address = host.private_address
ssh_user = host.ssh_user
ssh_private_key_file = var.ssh_private_key_file
ssh_agent_socket = ""
}
]
}
}
}
The kubeone_api output tells KubeOne where the API server is reachable. The kubeone_hosts output describes each control plane node — its public and private addresses, the SSH user, and the SSH key to use for authentication.
Now create terraform.tfvars with your actual server details. Replace the example IPs with your real addresses:
cluster_name = "production"
api_server_address = "203.0.113.10" # Your load balancer IP or primary node IP
control_plane_hosts = [
{
public_address = "203.0.113.10"
private_address = "10.0.0.10"
ssh_user = "ubuntu"
},
{
public_address = "203.0.113.11"
private_address = "10.0.0.11"
ssh_user = "ubuntu"
},
{
public_address = "203.0.113.12"
private_address = "10.0.0.12"
ssh_user = "ubuntu"
}
]
If you are using a dedicated load balancer, set api_server_address to the load balancer’s IP. If you do not have a load balancer yet, use the public IP of the first control plane node for now — you will set up HAProxy in Step 4.
Step 3: Generate Terraform Output
Initialize Terraform, apply the configuration, and export the JSON output that KubeOne needs:
terraform init
terraform apply -auto-approve
terraform output -json > tf.json
Since there are no actual cloud resources to create, terraform apply runs instantly. It simply evaluates the variables and generates the outputs.
Verify the output looks correct:
cat tf.json
You should see a JSON structure containing kubeone_api with your API server endpoint and kubeone_hosts with the details for all three control plane nodes. If any IP addresses are wrong or the structure looks malformed, fix terraform.tfvars and re-run terraform apply.
The tf.json file is what KubeOne reads to discover your infrastructure. Every time you change your server details — replacing a node, changing an IP — update terraform.tfvars, re-run the commands above, and regenerate tf.json.
Step 4: Set Up the API Server Load Balancer
For a production HA cluster, you need a load balancer in front of the three control plane nodes on port 6443. Without one, your kubeconfig points to a single node, and if that node goes down, you lose API access even though the other two nodes are healthy.
Option A: HAProxy (Recommended for Bare Metal)
HAProxy is the standard choice for bare metal Kubernetes load balancing. It is lightweight, battle-tested, and handles TCP proxying with health checks.
Install HAProxy on a separate host. If you do not have a dedicated host available, you can run it on one of the control plane nodes, but a separate host is preferable to avoid a single point of failure.
sudo apt update && sudo apt install haproxy -y
Add the following configuration to /etc/haproxy/haproxy.cfg. If the file already has content, append this to the end:
frontend kubernetes-api
bind *:6443
mode tcp
default_backend kubernetes-control-plane
backend kubernetes-control-plane
mode tcp
balance roundrobin
option tcp-check
server cp-1 10.0.0.10:6443 check
server cp-2 10.0.0.11:6443 check
server cp-3 10.0.0.12:6443 check
Replace the 10.0.0.x addresses with the private IPs of your control plane nodes. The option tcp-check directive means HAProxy will verify that each backend server is accepting TCP connections on port 6443 before sending traffic to it.
Restart and enable HAProxy:
sudo systemctl restart haproxy
sudo systemctl enable haproxy
Verify it is running:
sudo systemctl status haproxy
If HAProxy is running on a separate host, update api_server_address in terraform.tfvars to point to the HAProxy host’s IP, then regenerate tf.json.
Option B: DNS Round-Robin (Simpler, Less Reliable)
If you want something simpler and can tolerate reduced availability, create a DNS A record that resolves to all three control plane IPs:
k8s-api.example.com A 203.0.113.10
k8s-api.example.com A 203.0.113.11
k8s-api.example.com A 203.0.113.12
Set api_server_address to k8s-api.example.com in your Terraform variables.
The downside is significant: DNS round-robin has no health checks. If one node goes down, roughly one-third of connections will fail until you manually remove the dead node’s IP from the DNS record. DNS TTLs mean the stale record can persist for minutes or hours.
Tip: For production, use HAProxy with keepalived for automatic failover of the load balancer itself. DNS round-robin is acceptable for development and staging environments but should not be used for workloads that require high availability.
Step 5: Create the KubeOneCluster Manifest
The KubeOneCluster manifest defines the Kubernetes version, cloud provider, CNI plugin, and cluster features. Create kubeone.yaml:
apiVersion: kubeone.k8c.io/v1beta2
kind: KubeOneCluster
name: production
versions:
kubernetes: "v1.30.2"
cloudProvider:
none: {}
containerRuntime:
containerd: {}
clusterNetwork:
cni:
canal: {}
podSubnet: "10.244.0.0/16"
serviceSubnet: "10.96.0.0/12"
features:
nodeLocalDNS:
deploy: true
staticAuditLog:
enable: true
config:
policyFilePath: ""
logPath: /var/log/kubernetes/audit.log
logMaxAge: 30
logMaxBackup: 10
logMaxSize: 100
systemPackages:
configureRepositories: true
Here is what each section does:
cloudProvider.none tells KubeOne that this is a bare metal deployment. No cloud controller manager will be installed, and no cloud-specific integrations (like automatic load balancer provisioning or persistent disk creation) will be configured. You handle storage and load balancing yourself.
containerRuntime.containerd sets containerd as the container runtime. Docker support was removed from Kubernetes in v1.24, so containerd is the standard choice. KubeOne handles the installation and configuration automatically.
clusterNetwork.cni.canal deploys Canal as the CNI plugin. Canal combines Calico for network policy enforcement with Flannel for overlay networking. This is KubeOne’s default CNI and works well on bare metal because Flannel’s VXLAN overlay handles pod-to-pod networking across nodes without requiring any special network configuration from your infrastructure.
clusterNetwork.podSubnet and clusterNetwork.serviceSubnet define the IP ranges for pods and services. The defaults shown here are standard Kubernetes defaults. Make sure these ranges do not overlap with your node network (10.0.0.0/24 in our example) or with each other.
features.nodeLocalDNS deploys a DNS cache on every node in the cluster. Without this, every DNS lookup from a pod goes to the CoreDNS pods, which can become a bottleneck in large clusters. With node-local DNS, lookups are served from a local cache first, reducing latency and CoreDNS load.
features.staticAuditLog enables Kubernetes audit logging. Every API request — who made it, what they changed, when it happened — is written to /var/log/kubernetes/audit.log on the control plane nodes. This is essential for security compliance and for debugging unexpected changes in your cluster.
systemPackages.configureRepositories allows KubeOne to configure the necessary package repositories (for containerd and Kubernetes) on each node during provisioning.
Step 6: Provision the Cluster
You now have three files in your project directory: main.tf, terraform.tfvars, and kubeone.yaml, along with the generated tf.json. Run the provisioning command:
kubeone apply --manifest kubeone.yaml --tfjson tf.json
KubeOne will ask you to confirm the planned actions. Review the output — it shows which nodes will be provisioned and what will be installed — then confirm.
Here is what happens during provisioning, phase by phase:
Validates connectivity. KubeOne SSHes into each host, verifies the operating system is supported, checks that the required ports are open, and confirms that the nodes can reach each other on the private network.
Installs container runtime. Containerd is installed and configured on all three nodes. KubeOne handles the repository setup, package installation, and systemd service configuration.
Installs Kubernetes binaries. kubeadm, kubelet, and kubectl are installed on all nodes at the version specified in your manifest (v1.30.2 in this case).
Bootstraps the first control plane node. KubeOne runs
kubeadm initon the first node with the cluster configuration derived from your manifest and Terraform output. This creates the initial etcd member, generates the cluster certificates, and starts the API server.Joins remaining control plane nodes. The second and third nodes join the cluster using
kubeadm join. Each node gets its own etcd member, API server instance, controller manager, and scheduler.Configures etcd for high availability. With three control plane nodes, etcd operates as a three-member cluster. This provides fault tolerance — the cluster remains operational if any single etcd member (and its associated node) goes down. Quorum requires two of three members to be healthy.
Deploys CNI. Canal is installed across the cluster, enabling pod-to-pod networking and network policy enforcement.
Deploys machine-controller. The Kubermatic machine-controller is installed in the cluster for declarative worker node management. On bare metal, this uses the static provider.
Generates kubeconfig. A
production-kubeconfigfile is created in your current directory. This file contains the credentials needed to access your cluster with kubectl.
The entire process takes 5 to 10 minutes, depending on your network speed and server performance. Do not interrupt it.
Warning: If provisioning fails midway through — a network timeout, a package download failure, an SSH disconnection — run
kubeone applyagain with the same arguments. The command is idempotent. It detects what has already been completed and picks up from where it left off. Do not runkubeone resetunless you intentionally want to tear down the entire cluster and start from scratch.
Step 7: Verify Your Cluster
Set the kubeconfig and check your nodes:
export KUBECONFIG=$(pwd)/production-kubeconfig
kubectl get nodes
You should see three nodes, all in Ready state, all with the control-plane role:
NAME STATUS ROLES AGE VERSION
cp-1 Ready control-plane 5m v1.30.2
cp-2 Ready control-plane 4m v1.30.2
cp-3 Ready control-plane 4m v1.30.2
If any node shows NotReady, check the kubelet logs on that node: journalctl -u kubelet -f.
Verify etcd Cluster Health
The etcd cluster is the backbone of your Kubernetes control plane. Verify all three members are healthy:
kubectl -n kube-system exec -it etcd-cp-1 -- etcdctl \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
--key=/etc/kubernetes/pki/etcd/healthcheck-client.key \
member list -w table
Replace etcd-cp-1 with the actual name of one of your etcd pods (check with kubectl get pods -n kube-system | grep etcd). You should see three members, all with started status.
Verify System Pods
Check that all system components are running:
kubectl get pods -A
You should see pods for the API server, controller manager, scheduler, etcd, CoreDNS, Canal, node-local DNS, and the machine-controller. All pods should be in Running or Completed state. Any pod in CrashLoopBackOff or Pending indicates a problem — check its logs with kubectl logs -n <namespace> <pod-name>.
Step 8: Add Worker Nodes with MachineDeployments
Your cluster currently has three control plane nodes, but no dedicated worker nodes. In production, you generally want to separate control plane and worker responsibilities so that application workloads do not compete with etcd and the API server for resources.
For bare metal environments, you have two options for adding worker nodes.
Option A: MachineDeployment with Static Provider
If your worker nodes have SSH access from the control plane, you can use the Kubermatic machine-controller’s static provider. Create a file called workers.yaml:
apiVersion: cluster.k8s.io/v1alpha1
kind: MachineDeployment
metadata:
name: production-workers
namespace: kube-system
spec:
replicas: 3
selector:
matchLabels:
cluster: production
template:
metadata:
labels:
cluster: production
spec:
providerSpec:
value:
cloudProvider: "none"
cloudProviderSpec: null
operatingSystem: "ubuntu"
operatingSystemSpec:
distUpgradeOnBoot: false
versions:
kubelet: "v1.30.2"
Apply it with:
kubectl apply -f workers.yaml
Tip: For bare metal without a cloud API, the machine-controller’s static provider handles node joining if you provide SSH access details. However, for purely bare metal setups where the control plane cannot SSH into worker nodes, manual joining is simpler and more predictable.
Option B: Manual Worker Node Joining
This approach is straightforward and works in any bare metal environment. On the control plane, generate a join command:
kubeadm token create --print-join-command
This outputs something like:
kubeadm join 203.0.113.10:6443 --token abc123.xyz789 --discovery-token-ca-cert-hash sha256:def456...
SSH into each worker node and run the printed command:
kubeadm join 203.0.113.10:6443 --token abc123.xyz789 --discovery-token-ca-cert-hash sha256:def456...
Before running this, make sure each worker node has containerd, kubeadm, and kubelet installed at the same version as the control plane. KubeOne does not manage manually joined worker nodes, so you are responsible for keeping the Kubernetes version in sync during upgrades.
After joining, verify the worker nodes appear:
kubectl get nodes
You should now see your worker nodes alongside the three control plane nodes, all in Ready state.
Step 9: Deploy a Test Workload
With worker nodes in the cluster, deploy a simple workload to verify everything is functioning:
kubectl create deployment nginx --image=nginx:latest --replicas=6
kubectl get pods -o wide
The six nginx pods should be distributed across your worker nodes. If all pods land on a single node, check that the other nodes are in Ready state and do not have any taints preventing scheduling.
To verify networking between pods, exec into one pod and curl another:
kubectl exec -it $(kubectl get pods -l app=nginx -o jsonpath='{.items[0].metadata.name}') -- curl -s -o /dev/null -w "%{http_code}" http://$(kubectl get pods -l app=nginx -o jsonpath='{.items[1].status.podIP}')
A 200 response confirms that pod-to-pod networking over Canal is working correctly.
Clean up the test deployment when you are done:
kubectl delete deployment nginx
Step 10: Production Hardening Checklist
Your cluster is functional, but “functional” and “production-ready” are not the same thing. Before running real workloads, work through this checklist:
- API server load balancer with health checks — HAProxy with keepalived for automatic failover, not just simple round-robin
- Persistent storage provisioner — Longhorn for distributed block storage, Rook-Ceph for S3-compatible object storage, or NFS for simpler setups
- Monitoring stack — Prometheus for metrics collection and alerting, Grafana for dashboards
- Log aggregation — Loki (lightweight) or the EFK stack (Elasticsearch, Fluentd, Kibana) for centralized logging
- Backup solution — Velero for cluster state and resource backups, plus separate backup processes for persistent volumes
- Network policies — Default deny in every namespace, with explicit allow rules for required traffic flows
- Pod security standards — Enforce the
restrictedorbaselinePod Security Standard at the namespace level - Certificate rotation — kubeadm handles automatic certificate rotation, but verify it is working by checking certificate expiration dates periodically
- Disaster recovery plan — Document the etcd snapshot and restore procedure, test it at least once before you need it in an emergency
Each of these items deserves its own deep dive, but the cluster you have built provides the foundation for all of them.
Troubleshooting
Terraform State Drift
If your bare metal infrastructure changes outside of Terraform — a server is replaced, an IP address changes, or you add a new node — update terraform.tfvars with the new details and re-run:
terraform apply -auto-approve
terraform output -json > tf.json
Then run kubeone apply again to reconcile the cluster state with the new infrastructure description.
etcd Quorum Loss During Provisioning
If KubeOne fails while joining the second or third control plane node, the etcd cluster may be in a partially formed state. Run kubeone apply again — it detects the partial state and attempts to re-converge. KubeOne is designed to handle this scenario gracefully.
If repeated kubeone apply attempts fail, the etcd state may be unrecoverable without a full reset. In that case, run kubeone reset --manifest kubeone.yaml --tfjson tf.json to tear down the cluster entirely, then run kubeone apply to start fresh. This destroys all cluster data, so only do it during initial setup.
SSH Connection Timeouts
If KubeOne cannot connect to your servers, verify:
- Your SSH key is correct and has the right permissions (
chmod 600 ~/.ssh/id_rsa). - The SSH user specified in
terraform.tfvarsexists on each server and has passwordless sudo. - Firewall rules allow SSH (port 22) from your workstation to each server.
- No intermediate NAT or bastion host is interfering with the connection.
Test manually with verbose output:
ssh -v -i ~/.ssh/id_rsa ubuntu@203.0.113.10
Pods Stuck in ContainerCreating
This is almost always a CNI issue. Check that Canal pods are running on every node:
kubectl get pods -n kube-system -l k8s-app=canal
If Canal pods are not running, check their logs for errors. The most common cause is that nodes cannot reach each other on the private network. Verify that the private network interfaces are up and that firewall rules allow traffic on the VXLAN port (UDP 8472) between all nodes.
API Server Unreachable After Setup
If kubectl cannot reach the API server after provisioning:
- Using a load balancer: Verify HAProxy is running and forwarding to port 6443 on all three nodes. Check
sudo systemctl status haproxyand review the HAProxy logs. - Using DNS round-robin: Verify all three IPs are in the DNS record and that the record has propagated. Run
dig k8s-api.example.comto check. - Firewall: Verify that port 6443 is open from your workstation to the load balancer or directly to the control plane nodes.
- Kubeconfig: Verify that the
serverfield in your kubeconfig points to the correct address and port.
Next Steps
With your bare metal cluster running, consider these follow-up tutorials:
- Deploying KubeOne Clusters on Hetzner Cloud — a cloud alternative with lower cost than the major providers, useful for comparison with your bare metal setup
- KubeOne vs kOps — understand how KubeOne compares with the AWS-focused cluster management tool
- Migrating from Ingress-Nginx to Gateway API with KubeLB — add proper load balancing to your bare metal cluster using Gateway API
Summary
You provisioned a highly available, 3-node Kubernetes cluster on bare metal servers using KubeOne and Terraform. The cluster runs etcd distributed across all three control plane nodes, uses Canal for pod networking and network policy, and has audit logging and node-local DNS enabled out of the box.
The key thing to remember about this setup is that kubeone apply is your single command for both initial provisioning and ongoing maintenance. When you need to upgrade Kubernetes, change the version in kubeone.yaml and run kubeone apply again. When a node fails and you replace it, update terraform.tfvars, regenerate tf.json, and run kubeone apply. The same command handles installation, upgrades, and repair.
Bare metal Kubernetes requires more upfront effort than a managed service, but the control and cost savings are significant. You now have a cluster that you fully own, with no ongoing compute fees and no vendor lock-in. The production hardening checklist in Step 10 is your roadmap for turning this into a fully production-ready platform.
