Category Archives: Hardware

The state of the cluster – 2021 – part 1 (hardware)

It’s been a while since I’ve posted, and there have been a lot of changes in the lab. The cluster has grown, and in the process it has been reshaped several times. I’ll try to cover all of what’s happened in this post. This project has always been about learning, and I’ve learned a lot.

Pi-Kube consists of a cluster of 10 Raspberry Pi 4b single-board computers, housed in a Pico 10H cluster case. I went with the Picocluster case solution out of frustration with finding a good power supply solution for multiple Raspberry Pi boards. Without doing a full review, I’ll just say that the Pico 10H has been mostly satisfactory; I’ve not had any power issues with a full cluster of 4b boards, the unit is extremely quiet and provides more than adequate cooling for the cluster.

The only technical downside I’ve encountered are with the included 8-port switches (the 10H includes 2 of these,) which seem to be dropping enough packets to make PXE booting the cluster problematic; I would consistently get 2-3 boards to fail to boot every time I cycled power. I ultimately switched to a 16-port TP link switch to resolve this, and then later I abandoned network booting entirely when I switched to k3os.

The non-technical downside of the Pico 10H is the price, which puts it out of range for most single-board computer projects. The best I can say is that you’re paying for the engineering, and the engineering is mostly solid.

The other components of the cluster, aside from the aforementioned switch, are a NAS consisting of an Odroid XU-4 board in a Cloudshell 2 case, with 2 4TB drives installed. This has been a reliable little storage appliance for me, but unfortunately it’s made up of largely discontinued or out-of-stock components. Based on the experience, I’d probably recommend Hardkernel components for a NAS project. Having said that, I’d suggest you look at a dedicated NAS from Synology as well; I use a Diskstation DS220J unit for other projects in my homelab, and I’m very happy with it.

As for disk storage, I’ve used both Seagate Ironwolf NAS-class and Western Digital Red drives for the past 2 years without any incidents.

In the next post, I’ll discuss the software I’m currently using to run the cluster, and talk about how I got there.

Odroid XU-4 file server (CloudShell 2)

I’ve finished my NAS build. It’s built around an ODroid XU4 that I got from ebay. I replaced the fan/heatsink combination with a taller heatsink from AmeriDrod, which brought CPU temp down by 8°C. I picked up a pair of Seagate Ironwolf 1tb drives to use with it. All of this is housed in HardKernel’s CloudShell 2 for XU4 case. The CloudShell 2 can support RAID 0, RAID 1, spanned, or seperate volumes; I went with RAID 1 out of paranoia. It’s running an ODroid-provided Ubuntu Bionic (18.04) minimal system with Samba and Gluster installed.

I didn’t much care for the display software provided by HardKernel, and there was some discussion about how those scripts are CPU-intensive. I tried a few things from scratch, but in the end I settled on Dave Burkhardt’s nmon wrapper scripts for the CloudShell 2 display, and I’m happy with the results.

For now, I’m using this mainly as a share ddrive for the family

The current state of pi-kube

A closer look at the cluster

Hardware and Operating System

Currently, the cluster consists of 5 Raspberry Pi 4B systems with 4GB memory. Each of these has a 16gb micro SD card and a 16gb USB flash drive. I use the SD cards for boot only, but the root filesystem is on the flash drives. I’m not overclocking anything in the cluster yet, mostly because I don’t trust the power supply arrangement enough.

The systems are running Ubuntu 20.04 minimal, the stock distribution code from Canonical.

A 6th Raspberry Pi 4B acts as a file server. It’s equiped with a 256gb USB flash drive. The file server runs Ubuntu 19.04 (it’s due for a rebuild soon.) I use gluster to mount the flash drive on all of the cluster servers. Currently, I’m not using replication or sharding with gluster, although at some point I intend to pursue that. The file system is mounted as /data/shared on each member system in the cluster.

Kubernetes configuration

The system runs k3s from Rancher. One RPi 4B servers as the master (and is also a node); there are 4 RPi 4B nodes besides this.

The stock k3s configuration is deployed, using sqlite for storage and Traefik to manage ingresses. Cert-manager is used to manage letsencrypt requests. External (internet) DNS for ingress is provided by AWS Route 53, although this is not currently automated. My home network using [pihole] for DNS and DHCP; static DHCP leases are used for all hardware nodes. pihole also provides internal DNS spoofing of the external domain names, since the external IP address of my router does not work on the internal network.

The entire system is deployed via a set of ansible playbooks. You can see those playbooks here in github.

Orphan Single-Board Computers

A little side-excursion on hardware infrastructure.

pi_kube is a product of my fascination with Raspberry Pi and other single-board computers. The Raspberry Pi part is pretty much standard stuff in the IT-oriented parts of the maker universe. Other SBCs, though, are interesting both in terms of their technical capabilities and the ecosystems and communities they spawn. My home network includes two of these.

The first non-RPi hardware I acquired was a Rock PI 4B. Getting the device configured and running was a bit of a challenge; it doesn’t use either Raspberry Pi OS images or (AFAIK) standard arm64 boot images. I was only able to get it working with the purpose-built images that Radxa has on their website. Since I wanted to use it as a server rather than as a desktop or media center, I chose the Ubuntu “server” image, which as it turns out is a very minimal distribution indeed. It took quite a bit of wrangling to get it working consistently. Most of the problems I had were with the network configuration, which was in a confused state. I ended up disabling netplan and NetworkManager, and got it to work with a fixed address. I had high hopes of using the Rock PI as a NAS for the kubernetes cluster, but that’s a still-evolving project. Sourcing add-on boards and components for the board means ordering them from China, and with the current trade and travel restrictions and lockdowns due to the pandemic, I’m still about 35 days out from receiving what’s needed to attach an NVME SSD to the system. So more will come on this one.

My second foray into the SBC wilderness was to acquire an Atomic PI board. This is a full Intel CPU system, so it works with any amd64 OS you can get onto it. I’ve had less time to play with it, but it seems like a fairly capable system. It will handle the USB-Sata case from Amazon Basics that my RPi systems can’t drive (due to power limits, I suspect). With only a single USB port, getting it wired up to everything you want might be a challenge; fortunately, that’s an easy-to-solve issue. I was able to get the standard Ubuntu 19.10 distribution images to boot off an SSD, and then installed the server onto a USB drive. The BIOS on this system can be a little confusing, and I wasn’t able to install anything to the EMMC onboard, but I’ve got a functional mini-server running on it. I’m not quite sure what I’ll do with this system, but it’s fanless and thus quiet; it might make a good media center, or perhaps a backup for Kepler, my main development system, which runs Ubuntu 19 on an old Mac Mini that’s been tricked out with extra RAM and an SSD. The Atomic Pi has 2gb onboard, so it might not be the best desktop system, but it seems fast enough as a server.

Refactoring update

This is going to be kind of a rambling post. Sorry in advance.

I’ve done some reworking of both the hardware and software. These are presented in no particular order.

My original stack had a 1tb external HD wired up via a SATA2-to-USB3 connector which has proved problematic; I was connecting it to the master node, but I had persistent low voltage warnings. I’ve got some plans to run the HD off a powered USB hub, but with COVID-19 messing with everyone’s work and shipment schedule, it might be a month before the parts required arrive. So in the meantime, I’ve replaced the drive with a USB flash drive. This is only a 16gb drive, but it’s enough to let me tinker with NFS and persistent volumes.

With that settled, I’ve done some ansible work to get an NFS server configured on the master node, and have NFS clients running on each of the worker nodes. This lets me have a common pool for persistent volume claims to work off. I haven’t actually started using PVCs yet, so no idea how well this will work, but it’s a start.

I’ve added a Rock Pi 4 from Radxa to the stack. Eventually, once the power issues are resolved, my plan is to convert this to a dedicated NAS (perhaps using Open Media Vault) and take the NFS server burden off the master node. The Rock Pi might be a challenge, as support for it seems spotty and it seems to run off images specifically created for it; we will see how this experiment works out. If all else fails, I’ve managed to pick up an Atomic Pi that might serve nicely.

I’ve replaced the BrambleCase with a Cloudlet Cluster case, also from C4Labs. I like the BrambleCase a lot, but the Cloutlet case offers easier access to the boards installed in it, which works better at this phase of the project. I’d still recommend, guardedly, the BrambleCase; it’s a fine piece of engineering, albeit a bit tough to assemble (especially with my near-60 eyes and fingers.) I’ve kept the Bramble for some other RPi projects I’m planning.

After struggling with internal name resolution issues, I’ve made two sweeping changes. First, I’ve added a dedicated RPi 4 running pihole. My main reason for doing this is because I’ve got a DNS spoofing requirement, which I’ll cover below. The second change was to systematically disable systemd-resolved on all the Ubuntu 19 systems I’m running (which includes Kepler, my day-to-day Linux desktop, which is built off an old Mac Mini, and which probably deserves a whole series of posts itself.) I have had nothing but grief and misfortune with systemd-resolved, and it’s bad enough that I’ve decided to disable it anywhere I can. There are a lot of critiques and defenses of systemd and related project out on the web; I won’t go into the controversies, because I’m not firmly in either the pro- or anti- camps as far as systemd goes, but systemd-resolved violates the principle of least surprise, and the way it works both obscures DNS resolution and intentionally breaks how classic resolv.conf/glibc resolver works. Systemd-resolved expects a world where there is one contiguous DNS namespace, and all DNS servers agree on all hosts. That doesn’t work for internal networks, which is basically every corporate network and a lot of home networks as well.

I’ve been trying to follow the series that Lee Carpenter has been doing on his RPi/k3s cluster, but I am hung up on getting cert-manager to work. I’ll update more on those issues in another post once I land on a solution, but the gist of things is: the external interface of my router is not reachable from my internal net. As a result of this, cert-manager fails its self-check, because the self-check tries to make sure that the ACME challenge url is reachable (from its container) before it actually forwards the request to letsencrypt. This doesn’t work with “regular” DNS for me, because the internet DNS resolves to my external IP, and the container (inside my network) can’t reach that IP. To try and solve this, I use dnsmasq internally (via pihole.) So far, this hasn’t helped, which I’ve tracked down to one of two things: either coreDNS is configured wrong in my cluster, or the cert-manager containers are hard-wired to use some external DNS rather than refer back to the node’s DNS configuration for resolving names. I’ll have more to say about this once I’ve solved it.

Hardware Choices

The hardware I used to create the cluster consists of:

Note: the links above are affiliate links; if you purchase through them, you support this project.