Building the Ultimate Home Kubernetes Cluster ☸️

I’ve been running services at home for a while now, but everything was scattered across different machines with no real orchestration. I finally decided to go all-in and build a proper Kubernetes cluster at home, and I’m really happy with how it turned out. This is the story of how I did it.

The hardware

The cluster is called Magi (yes, the Evangelion reference), and it runs on a mix of amd64 and ARM nodes:

2 amd64 machines with local disks — these are the workhorses, handling storage-heavy workloads and anything that needs more power.
4 Raspberry Pis (ARM nodes) — lightweight nodes that are perfect for running smaller services and spreading the load.

This mixed-architecture setup means I had to be mindful of multi-arch container images for everything I deploy, but honestly most popular images already support both amd64 and arm64, so it hasn’t been a big deal.

K3s — lightweight Kubernetes for the win

I went with K3s instead of full-blown Kubernetes. It’s lightweight, ships as a single binary, and is perfect for a home setup where you don’t want to deal with the complexity of kubeadm or managed control planes. It just works, and it comes with a lot of batteries included — built-in Traefik ingress, CoreDNS, and local storage out of the box.

GitOps with FluxCD — one repo to rule them all

This is probably the part I’m most proud of. Every single thing running on the cluster is defined in a single Git repository. I use FluxCD as my GitOps controller — it watches the repo and automatically reconciles the cluster state every minute.

The repo structure is clean and organized:

clusters/<cluster-name>/
  ├── flux-system/       # FluxCD itself
  ├── infra/             # Infrastructure components
  └── apps/              # Application workloads
infra/
  ├── metallb/           # Load balancer
  ├── longhorn/          # Distributed storage
  ├── prometheus/        # Monitoring
  ├── glances/           # System stats
  ├── node-exporter/     # Hardware metrics
  ├── blackbox-exporter/ # Endpoint probing
  └── flux-notifications/ # Chat alerts
apps/
  ├── ...                # Your applications

The best part is the dependency chain. FluxCD lets you define dependsOn between kustomizations, so the cluster bootstraps itself in the right order:

MetalLB → MetalLB Config → Longhorn → Prometheus → Exporters → Apps

This means I can blow away the entire cluster and rebuild it from scratch — just point FluxCD at the repo and walk away. Everything comes up in the correct order, with health checks at each step before moving to the next.

Secrets management with SOPS

Obviously you can’t just push secrets to a Git repo in plain text. I use SOPS with age encryption — secrets are encrypted at rest in the repository and decrypted by FluxCD at reconciliation time. Container registry credentials, database passwords, API tokens — all safely stored in the repo alongside everything else.

Chat notifications

FluxCD also sends me chat notifications for every deployment event. Every time a kustomization reconciles, a HelmRelease updates, or something fails, I get a message on my phone. It’s a small thing but it makes the whole setup feel alive — I always know what’s happening on the cluster without having to check.

MetalLB — solving networking on bare metal

One of the first problems you run into with Kubernetes at home is LoadBalancer services. In the cloud, your provider hands you an external IP. At home? You get nothing — your services just sit there in Pending state forever.

MetalLB solves this beautifully. I configured an L2 address pool with a small range of IPs reserved on my home network for LoadBalancer services. The L2 advertisement is configured to only announce from control-plane nodes, keeping things tidy.

Now when I create a service of type LoadBalancer, MetalLB assigns it a real IP on my home network. Combined with a local DNS setup using an internal domain, I can access everything by name — no more remembering IPs.

Longhorn — distributed storage that just works

Storage in Kubernetes is always a pain, and it’s even worse on bare metal. I needed something that could provide persistent volumes with replication across nodes, and Longhorn was the answer.

Longhorn runs as a distributed block storage system across my amd64 nodes only — the Raspberry Pis don’t have the disk space or I/O performance to be useful as storage nodes. The local disks on the amd64 machines are where all the data lives, mounted at /var/lib/longhorn.

I’m running with a default replica count of 1 for most volumes — this is a home lab after all, not a production data center. For critical stuff like databases, I bump it to 3x replication for safety. Media volumes stay at 1 replica because I’d rather have the performance and I can always re-download a movie.

Longhorn also comes with a web UI, which makes it easy to monitor volume health, check space usage, and manage snapshots.

Monitoring — Prometheus, Glances, and Grafana

What’s the point of running a cluster if you can’t obsessively stare at dashboards? I set up a full monitoring stack:

Prometheus

I went with a hand-rolled Prometheus deployment instead of the kube-prometheus-stack Helm chart. It’s lighter, I understand exactly what it does, and for a home cluster I don’t need all the complexity that comes with the full operator.

Prometheus is pinned to a specific node with a 10Gi persistent volume on Longhorn and 7 days of retention. The scrape config is straightforward:

Node Exporter — hardware and OS metrics from every node (CPU, memory, disk, temps)
Blackbox Exporter — ICMP probes for network monitoring (I even ping IoT devices on my network to make sure they’re alive)
Self-monitoring for both Prometheus and Blackbox Exporter

All of this feeds into Grafana dashboards where I can see everything at a glance — node temperatures, disk usage, network throughput, service uptime.

Glances

Glances runs as a DaemonSet with full host access (hostNetwork: true, hostPID: true), which means it’s deployed on every node in the cluster and gives me a web-based view of system resources per machine. Think of it as htop in your browser, for every node.

One fun quirk — I had to exclude one of my nodes from the Glances DaemonSet because the Glances container has a compatibility issue with Debian 13 on that machine. A little nodeAffinity rule takes care of that.

Node Exporter

Runs as a DaemonSet on every node (including ARM), tolerating all taints. It exposes hardware metrics on port 9100 — temperatures via hwmon and thermal_zone collectors are particularly useful for keeping an eye on the Raspberry Pis, which are known to thermal throttle.

What’s running on the cluster

The whole point of this is to actually run stuff, right? I’m running a mix of media servers, content management systems, custom web applications, monitoring dashboards, and even file shares for retro gaming (yes, the Gameboy from my GBA post gets its ROMs from the cluster 😄).

All accessible via internal DNS, all managed by FluxCD, all with persistent storage on Longhorn.

Lessons learned

Start simple. I didn’t build all of this in one weekend. I started with K3s + MetalLB + one app, and kept adding layers as I got comfortable.

GitOps is a game changer. Having everything in one repo with FluxCD means I can review changes before they go live, roll back with a git revert, and sleep well knowing the cluster matches what’s in Git.

Mixed architectures work, but check your images. Most popular Docker images support multi-arch now, but every now and then you’ll hit one that’s amd64-only. Longhorn, for instance, only runs on my amd64 nodes.

Monitoring from day one. Don’t wait until something breaks to set up Prometheus. The Raspberry Pis in particular need temperature monitoring — I’ve caught thermal throttling issues early thanks to the Node Exporter + Grafana combo.

Longhorn is great for home labs. It’s not the fastest storage solution, but it’s incredibly easy to set up, the UI is excellent, and having the option to replicate critical data across nodes gives real peace of mind.

What’s next

Grafana dashboards — I want to build more custom dashboards for the services I’m running.
Cert-manager + external access — exposing some services externally with proper TLS.
Backup automation — Longhorn snapshots to an off-site NAS.
More Raspberry Pis — because why not?

If you’re thinking about building a home Kubernetes cluster, just go for it. K3s makes it approachable, FluxCD keeps it sane, and the learning experience is invaluable. Plus, there’s something deeply satisfying about running kubectl get nodes and seeing your little fleet of machines ready to go.

The hardware#

K3s — lightweight Kubernetes for the win#

GitOps with FluxCD — one repo to rule them all#

Secrets management with SOPS#

Chat notifications#

MetalLB — solving networking on bare metal#

Longhorn — distributed storage that just works#

Monitoring — Prometheus, Glances, and Grafana#

Prometheus#

Glances#

Node Exporter#

What’s running on the cluster#

Lessons learned#

What’s next#