Tag Archives: tls

Using letsencrypt with Zentyal

In this article, I will describe the steps I took to use acme.sh to obtain and manage TLS certificates for a Zentyal server running in my homelab.


On my home network, I run a single virtualized instance of Zentyal Developers Edition to act as an internal DNS server. Let me make a few things clear about this server up front:

  • I am in no sense fully utilizing Zentyal. Zentyal can provide a multitude of services to a LAN, including acting as a Domain Controller, DHCP server, Mail server (with optional filtering and a Web Mail interface), NTP service, RADIUS, VPN services and more. I’m using it for DNS, which arguably isn’t even scratching the surface of Zentyal’s capabilities.
  • There is a dirth of Open Source DNS server offerings available for small networks that would fit my ideal requirements. Those would include:
    • Providing an API service (hopefully compatible with the API for something like AWS Route 53) that would support automation.
    • Providing a web user interface for maintaining DNS.
    • Including the ability to manage zones across providers. For example, I want to manage and serve “home.thejimnicholson.com” within my home network, but “thejimnicholson.com” would reside in an external DNS server or provider.

One of the things I’ve been trying to do within my home networking projects is to use TLS certificates from an actual recognized “official” issuer. This is possible because I own the domain I’m using for internal names, and free TLS providers like LetsEncrypt and ZeroSSL support the use of DNS TXT records for issuer challenges. What this means is that rather than authenticating a certificate request by issuing an HTTP request to a webserver on the target domain, the issuer

  • generates a text secret that it returns to the requester,
  • sleeps for a bit to allow the requester to store the secret as a TXT record in DNS and then
  • issues a DNS lookup for a specific TXT record and insures that the value returned matches.

ACME is a protocol for requesting, obtaining and renewing TLS certificates. The protocol is used by many certificate authorities, services and has been implemented by many certificate management system vendors within their products. There are a number of client tools that can be used to interact with a service that supports the ACME protocol; the most well-known of these is certbot from the Electronic Frontier Foundation. While certbot is a fine tool for the job, it has a potential drawback in that it is written in Python, and thus requires a python interpreter to work. An alternative client, acme.sh, implemented entirely as a bash shell script, offers similar capabilities with a simpler deployment process.

Zentyal’s own documentation for using LetsEncrypt TLS for its web administration interface is (sadly, like most of their documentation) terse and a bit incomplete. I was able to find a snipet for using acme.sh with GoDaddy on a Zentyal server, and adapt it to use AWS Route 53, my provider.

Installing acme.sh

Log into the zentyal server using ssh. Use an account with administrator privileges (ie, one that can run sudo.)

You can either run the quick install as outlined at get.acme.sh, or clone the project source and run the installer. It shouldn’t make a difference which you use; if you choose to clone the project, you should make sure that git and socat are installed on the server

apt install -y git socat
cd /tmp
git clone https://github.com/acmesh-official/acme.sh.git
cd acme.sh
./acme.sh --install --accountemail <YOUR EMAIL>

After the installation, you will need to restart your root shell session to pick up changes that the install process makes to the root .bashrc. The simplest way to do this is to exit the shell and restart it with sudo again.

If everything went well, you will now have a ~/.acme.sh containing the installed script, and a shell alias acme.sh that executes the script.

AWS Credentials

There are various tutorials for setting up an IAM user with access keys and policy rights to interact with Route 53. I followed this one. The bottom line is that before you can get certificates, you need to set these environment variables:

export AWS_ACCESS_KEY_ID="<your access key ID>"
export AWS_SECRET_ACCESS_KEY="<your secret access key>"
export HOST_FQDN="<your fully qualified domain name>"
export ACCOUNT_EMAIL="<your email address>"

Choosing a CA

acme.sh is configured to use ZeroSSL by default. If you want to use that issuer, you need to read this page, which covers the initial steps you need to register your ZeroSSL credentials for the script.

I decided to use LetsEncrypt, and to make it the default issuer. To do this, run the command

acme.sh --set-default-ca --server letsencrypt

Obtaining a certificate

Assuming you set the environment variables above, you can obtain a certificate by running this command

acme.sh --issue -d ${HOST_FQDN} --dns dns_aws --ocsp-must-staple \
   --keylength 4096  --force

The script will process for a while, and the results should be a brand new certificate.

Installing the certificate in Zentyal

This is the tricky part. acme.sh has automation that will keep your certificates up to date, but you need to use the script to do the installation. Fortunately, there’s a way to do this:

acme.sh --install-cert -d ${HOST_FQDN}  \
  --reloadcmd "cat /root/.acme.sh/${HOST_FQDN}/${HOST_FQDN}.cer /root/.acme.sh/${HOST_FQDN}/${HOST_FQDN}.key > /var/lib/zentyal/conf/ssl/ssl.pem && systemctl restart zentyal.webadmin-nginx.service"

What all this is doing is creating a .pem file out of the certificate and key created when the certificate was issued, putting it into place where Zentyal will use it, and then restarting the web admin interface to pick up the new cert.

If everything goes well, you should now have an “offical” certificate for your server.

Getting Cert-manager to work

I’ve been sort-of following the series that Lee Carpenter is doing over at carpie.net, but for a while I was hung up on getting cert-manager to work. The specific failure mode I had was this:

My external IP address (the IP assigned to my router by the cable company) for some reason isn’t routed correctly from inside my home network. The IP responds to pings, and DNS resolves it, but any SSH, HTTP or HTTPS traffic (and presumably any other TCP connections) all hang indefinitely. This appears to be a router issue, since my router, a TP-Link Archer 20-based model, doesn’t use an alternate port for its web admin UI. The router presents the UI on port 443 with a self-signed certificate, and redirects port 80 traffic to 443. I suspect that the web server embedded in the router’s firmware is catching my web connections (the ones that originate inside the network) and doesn’t know what to do with them, so they just hang.

External connections are properly routed, as I’ve got port-forwarding configured to send the traffic to the kube cluster.

Here’s why this is a problem: cert-manager has a “sanity check” it runs before issuing a certificate request; if you are using the http01 verification strategy, cert-manager tries to reach the verification challenge response URL before it sends any cert requests to letsencrypt. This makes sense, since there’s no reason to send a request if letsencrypt can’t find the verification challenge response.

Except, in this case, the response actually is correctly configured, and if you hit that URL from outside of my home network, you would see it. The sanity check, however, is running inside, and thus it was failing, thus no certificate for me!.

The solution to this was simple: I run pi-hole on my home network, as both a DHCP server and a DNS server. So all I had to do was “spoof” my external DNS name on the internal network, so that it resolved to the internal address of the kube cluster, rather than the external address of the router.

At least, it sounds simple. In reality, it proved to be difficult, mainly because I made a decision when I started building my cluster to use Ubuntu server (which is a full 64 bit OS) rather than Raspian (which runs userspace in 32-bit, even on Raspberry Pi 4). And I’m running Ubuntu 19.10, which means that (by default) I’m using systemd-resolved to handle DNS resolution.

I’ve long ago gotten over my distaste for systemd, but man, systemd-resolved is pure evil. If you think you understand how Linux DNS resolution works, be prepared to feel dumb. I won’t go into all the reasons why I think what they’ve done with resolution in systemd is evil, but I will say this: no matter what I did, the cert-manager pods seemed to not use my internal DNS server, until I fully disabled (and apt purged) systemd-resolved, and did a whole bunch of other stuff to get resolv.conf back to what anyone who’s used Unix for 30 years would expect.

I actually walked away from this for a while, because it was so frustrating. And in the course of trying to figure out what was wrong, I rebuilt the kube cluster without traefik, and installed metallb and nginx using GrĂ©goire Jeanmart’s helpful articles as a guide. Let me be clear: traefik was NOT the problem, and not even related. My issue was with DNS. But at this point, I’ve got the cluster working with cert-manager, so I think I’m just going to leave it the way it is for now.