How I setup Rancher these days

Rancher is a very handy interface when you need to manage Kubernetes clusters. In the past I was deploying rancher on a single VM running the container as per the getting-started instructions. You could go a long way using a single machine setup.

However, if you observe how recent versions of the container start, a k3s cluster is launched within the container. Which kind of makes it an overkill to work this way. Also the Rancher documentation includes multiple directions on how to run it in a Kubernetes cluster (k3s being their obvious preferrence) and how to also do certificate management (which is something you need, since otherwise the rancher agent deployed in your clusters won’t be able to communicate via web sockets). Well, I am not a big fun of how the Rancher documentation describes the actions to take to launch it in a Kubernetes cluster, and more importantly I am annoyed at how SSL certificates are managed. You can go a long way using microk8s, and this is what I did in this case.

Assuming you have setup a microk8s cluster (single or three node cluster for HA) we are almost ready to start. Rancher deploys its stuff in the cattle-system namespace, so we create this first with kubectl create ns cattle-system. We will use helm to install Rancher and we want to provide some basic values to the installation. So we create a file named values.yaml with the following contents

auditLog:
  level: 1
bootstrapPassword: A_PASSWORD_HERE
hostname: rancher.example.net
replicas: 1
ingress:
  enabled: false

With the above we instruct helm not to deal with the Ingress, since we will provide this later (we want to manage certificates either on our own or via cert-manager at the Ingress object). Thus we run helm -n cattle-system install rancher rancher-latest/rancher -f values.yaml --version 2.8.3 to install it.

After some time passes (verified by something like kubectl -n cattle-system get pod) Rancher is installed and we now need to make it accessible from the “outside” world. Microk8s offers nginx-ingress as an add on (microk8s enable ingress sets this up) or we can use a different ingress like for example haproxy again using helm -n ingress-haproxy install haproxy-ingress haproxy-ingress/haproxy-ingress -f ./values-haproxy.yaml --version 0.14.6 . The contents for values-haproxy.yaml are:

controller:
  hostNetwork: true
  ingressClassResource:
    enabled: true

And now that we have the Ingress controller installed, we can also set it up

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: rancher-haproxy
  namespace: cattle-system
  annotations:
    haproxy-ingress.github.io/ssl-redirect: "true"
spec:
  ingressClassName: "haproxy"
  tls:
  - hosts:
    - rancher.example.net
    secretName: example-net
  rules:
  - host: rancher.example.net
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: rancher
            port:
              number: 80

And you are done. You can of course setup a cert-manager Issuer that will help you automate certificate management and issuing.

Happy ranchering.

PS: Assuming that a new version of Rancher is out, you can upgrade with something like helm -n cattle-system upgrade rancher rancher-latest/rancher -f values-rancher.yaml --version 2.8.4

A quick Dockerfile for traduora

If you want to use / test traduora , at the time of writing this blog post, the latest provided docker image is over a year old and the supplied Dockerfile not really working. So in order to be in a position to test, you can start with the following

FROM node:18
WORKDIR /usr/app
RUN git clone https://github.com/ever-co/ever-traduora
WORKDIR /usr/app/ever-traduora
ENV NODE_OPTIONS='--openssl-legacy-provider'
ARG NODE_OPTIONS='--openssl-legacy-provider'
RUN bin/build.sh
CMD /usr/app/ever-traduora/bin/start.sh

And if you happen to want to connect to an AWS RDS, you may need to set the environment variable PGSSLMODE=no-verify. Traduora is using typeorm. The way traduora is written, it assumes that the database is running as a sidecar container, so it cannot handle the ssl default requirement from AWS RDS without messing with PGSSLMODE.

Network Request Failed when configuring OpenLDAP authentication in Rancher

It may be the case that you have installed Rancher in a cluster via helm with something like

helm install rancher rancher-latest/rancher \
--namespace=cattle-system \
--set hostname=rancher.storfund.net \
--set replicas=1 \
--set bootstrapPassword=PASSWORD_HERE \
--set auditLog.level=1 \
--version 2.8.3

If you try to configure the OpenLDAP authentication (and maybe other directories) you will be greeted with the not at all helpful message Network Request Failed` where in the logs you will see that your OpenLDAP server was never contacted. What gives?

Well, the above helm command installs Rancher with a self-signed certificate. And you have to open the developer tools in the browser to see that a wss:// call failed because of the certificate. The solution of course is to use a certificate that your browser considers valid. First we ask helm to give us the configuration values with helm -n cattle-system get values rancher -o yaml > values.yaml and then we augment values.yaml with:

ingress:
  tls:
    source: secret
privateCA: true

It does not have to be a “really” private CA. I did the above with a certificate issued by Let’s Encrypt. The above can be upgraded now with helm -n cattle-system upgrade rancher rancher-latest/rancher -f values.yaml --version 2.8.3 And now we are ready to add our own working certificate with

kubectl -n cattle-system delete secret tls-rancher-ingress
kubectl -n cattle-system create secret tls --key ./key.pem --cert ./cert.pem

Of course, if you are using cert-manager there are other ways to do stuff. See also:

How many env: blocks per container?

While creating a Deployment earlier today, I faced a weird situation where a specific environment variable that was held as a Secret, was not being set. I tried deleting and recreating the secret, with no success. Mind you this was a long YAML with volumes, volumeMounts, ConfigMaps as enviroment variables, lots of lines. In the end the issue was pretty simple and I missed it because kubectl silently accepted the submitted YAML. I had two(!) env: blocks defined for the same container and somehow I missed that. It turns out, that when you do so, only the last one gets accepted, and whatever is defined in the previous, is not taken into account. To show this with an example:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: box
  name: box
spec:
  replicas: 1
  selector:
    matchLabels:
      app: box
  template:
    metadata:
      labels:
        app: box
    spec:
      containers:
      - image: busybox
        name: busybox
        env:
        - name: FIRST_ENV
          value: "True"
        - name: ANOTHER_FIRST
          value: "True"
        command:
        - sleep
        - infinity
        env:
        - name: SECOND_ENV
          value: "True"

In the above example, when the Pod starts the container, only SECOND_ENV is set. FIRST_ENV and ANOTHER_FIRST are not.

I do not know whether this is a well known YAML fact or not, but it cost me some head-scratching and I hope it won’t cost you too.

“Works on my computer” was and still is the wrong attitude

I have waited some years before posting this. I started writing this document as a means of coping with my frustration back then. It is now promoted from my private to my public journal.

I get it. It is all too common and frustrating to try something in your machine, be happy with it and when you push your changes and the CI/CD takes over, the build fails.

@here X is working fine on my machine, but failing on Jenkins
typical Slack message everywhere

You have now pinged hundreds of people, across multiple timezones. Only a tiny fraction of them are in a position to support you. By any chance, have you scrolled up a bit before posting? Assuming it was infrastructure’s fault, are you really the first one facing it?

For the sake of the argument you don’t find anything relevant in the last five Slack messages and you go ahead and ping everybody. You have now provided zero useful information. And the person you implicitly demand to fix this, is not your part-time psychologist to take it with a smile. If anything, they equally (if not more than you, since they are dealing with hundreds of running builds) want you to have and wish you green builds for your birthday.

Your laptop is not part of production. When your code runs OK in it, you do not ship it with a courier to a data center. So, in a way, whether it runs in your computer or not does not matter, as you are not developing for it to run on your laptop. Your laptop is not production. You’re developing for something else, and supposedly this is what your CI/CD is trying to show you. Hence when it fails, try to think why. You know your tooling better than anyone else. Your language of choice, its libraries and whatnot failed. You reach out for help to a person who most likely has zero experience in your tooling and certainly knows even less about the application you write with it. Think of it, if they knew all that, they’d be a member of your team already!

“But this is a blocker and we cannot release.” Well, your P1 is a P1 for your world and I sympathise. But the whole constellation of systems and builds in your organization does not revolve around it. If it was a P1 for everyone, it would be known by every means of communication available. Your P1 is my P5, just like sometimes my P1 was your P7.

Would you ever complain if it run on the CI/CD, but not on your computer? No. One more reason why “but it runs on my computer” is irrelevant and conveys no useful information to an already stressful situation.

lloadd and (Google) LDAP

In a previous post I expanded on Google’s suggestion to use stunnel to proxy queries to Google’s LDAP (and any other LDAP server of course, including AD). The problem with this approach is that it does not offer much for debugging if things go wrong. And since we do not control the receiving end of the queries, that makes us fly blind. Wouldn’t it be better if there was a more LDAP aware proxy server of some kind to use instead.

It turns out there is one: lloadd, the stand-alone LDAP daemon by the OpenLDAP project. It listens for LDAP connections on any number of ports (default 389), forwarding the LDAP operations it receives over these connections to be handled by the configured backends. Curiously it seems not to come with the slapd/openldap packages available for Ubuntu 22.04, so I had to compile it from the source:

./configure --prefix=/opt/ldap --enable-balancer --with-tls
make depend
make
sudo make install

You can now find the daemon sitting on /opt/ldap/libexec/lloadd. Let’s see how we can use it now. It requires a configuration file:

# /opt/ldap/etc/openldap/lloadd.conf
feature proxyauthz
bindconf
  bindmethod=simple
  binddn=UserNameFromGoogleLDAP
  credentials=PasswordFromGoogleLDAP
  tls_key=/opt/ldap/etc/openldap/Google_123456.key
  tls_cert=/opt/ldap/etc/openldap/Google_123456.crt
tier roundrobin
backend-server
  uri=ldaps://ldap.google.com:636
  numconns=3
  bindconns=2
  retry=5000
  max-pending-ops=5
  conn-max-pending=3
  starttls=yes

The above configuration connects to Google’s LDAP using the key/cert pair and username / password obtained from admin.google.com for your Workspace. Note how we (ab)use the roundrobin tier with just one backend to proxy instead of loadbalancing between multiple servers.

We now need to run the daemon, so we use the following systemd file

# /etc/systemd/system/lloadd.service
[Unit]
Description=lloadd

[Service]
User=nobody
WorkingDirectory=/tmp
ExecStart=/opt/ldap/libexec/lloadd -4 -f /opt/ldap/etc/openldap/lloadd.conf -h ldap://127.0.0.1:1389 -d stats

[Install]
WantedBy=multi-user.target

Using the above after sudo systemctl daemon-reload and systemctl start lloadd we have an LDAP proxy listening in 127.0.0.1:1389 and by adjusting the loglevel (either in the configuration file, or in the command line) we can achieve more visibility.

Caddy and WordPress

This is not something you won’t find elsewhere on the web, but it is my take on making Caddy and WordPress play nice. Assuming an Ubuntu 22.04 VM, we follow the site instructions to install Caddy.

Next, we install PHP-FPM (the FastCGI server for PHP) and other required libraries

sudo apt install php8.1-fpm php-pear php-bcmath php-curl \ 
php-imagick php-intl php-json php-mbstring \ 
php-mysql php-xml php-zip

Now we are ready to configure /etc/caddy/Caddyfile to work with PHP-FPM

website-test.example.net {
	tls hostmaster@example.com
	root * /usr/share/caddy/wordpress
	file_server
	encode zstd gzip
	php_fastcgi unix//run/php/php8.1-fpm.sock
	@disallowed {
		path /xmlrpc.php
		path *.sql
		path /wp-content/uploads/*.php
	}
	rewrite @disallowed '/index.php'
}

The trickiest line for me in the above configuration was the file URL for php_fastcgi. Not exactly in the format I expected. This configuration assumes that we have unpacked WordPress in /usr/share/caddy/wordpress . Keep in mind that php-fpm runs as UID:GID www-data:www-data so you may want to sudo chown -R www-data:www-data /usr/share/caddy/wordpress

We are still missing a database. If you do not want to mess much with installing MariaDB, you can use something like the following docker-compose.yaml:

# docker-compose up -d 
version: '3.1'
services:
  mysql:
    image: mariadb:10.11.7
    restart: always
    ports:
    - "3306:3306"
    environment:
      MYSQL_DATABASE: wordpress-database
      MYSQL_USER: wordpress-user
      MYSQL_ROOT_PASSWORD: PASSWORD
      MYSQL_PASSWORD: PASSWORD
    volumes:
      - mysql:/var/lib/mysql
volumes:
  mysql:

You now only need to edit wp-config.php to set the database settings and the salt keys and you are all set to play.

PS: I’ve read a number of posts on the web before reaching to the summary above. I did not keep track at the time for proper citation.

A stunnel to (Google) LDAP Pod

Sometimes you have an application that needs to speak with an LDAP server. And more specifically to Google’s LDAP service. Google’s LDAP requires certificates to connect to it (which you get from your Workspace console). You can integrate those certificates with your application if possible, or you can run stunnel as a proxy, as per Google’s instructions.

It is not that complex then to expand from those instructions to running a stunnel Pod as a proxy. And it happens that Chainguard maintain a stunnel image that can be of use to us. Thus we can now run an LDAP service which can proxy our queries to our (Google) LDAP service:

apiVersion: v1
kind: ConfigMap
metadata:
  name: stunnel-conf
data:
  stunnel.conf: |
    foreground = yes
    #debug = debug
    output = /dev/stdout
    [ldap]
    client = yes
    accept = 0.0.0.0:1389
    connect = ldap.google.com:636
    cert = /google-ldap/tls.crt
    key = /google-ldap/tls.key
---
apiVersion: v1
kind: Service
metadata:
  name: ldap
spec:
  selector:
    app: ldap
  ports:
    - protocol: TCP
      port: 389
      targetPort: 1389
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: ldap
  name: ldap
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ldap
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: ldap
    spec:
      volumes:
      - name: google-ldap
        secret:
          secretName: google-ldap
      - name: stunnel-conf
        configMap:
          name: stunnel-conf
      containers:
      - image: chainguard/stunnel
        command:
        - /usr/bin/stunnel
        - /etc/conf/stunnel.conf
        name: stunnel
        ports:
        - containerPort: 1389
        volumeMounts:
        - name: google-ldap
          mountPath: /google-ldap
        - name: stunnel-conf
          mountPath: /etc/conf/stunnel.conf
          subPath: stunnel.conf

What is not defined in the above YAML is the kubernetes TLS secret that contains the certificate key-pair from Google:

$ kubectl create secret tls google-ldap \
--key ./Google_123456.key \
--cert ./Google_123456.crt

The LDAP service is now accessible from within your Kubernetes cluster as ldap://ldap.default.svc.cluster.local:389 (that is ldap:// and NOT ldaps://). If this is an issue for you, you can make it a sidecar container and thus access the stunnel proxy as ldap://127.0.0.1:389 instead.

My GitLab is not my resume

[ Originally posted on LinkedIn ]

My #GitHub is not my resume. It is mostly a repository of forks of things I found interesting enough to keep a local copy.

My #GitLab is not my resume. It is just a place where I used (but not anymore) store some stuff related to classes that I teach. So that I could also have a feel of how GitLab works.

You are not looking for a resume. You are looking for some weird form of reassurance because you see some repos, you know nothing about, there. And you will never audit beyond the latest commit so you don’t even have a clue whether and how I wrote that stuff.

I keep my stuff in sr.ht and mostly private.

How sometimes Docker saves the day

For reasons, I needed to work with a GNUPG version 2.2.0 precicely. This is not what comes with the standard packages of the OS distribution that I work with, so why not compile and install it in a separate tree and be done with it?

Unfortunately, GCC >= 10 cannot compile version 2.2.0 with all sorts of errors popping at different places of preprocessor definitions. I followed through some bug reports after googling around, and was this close to launching a VM with an old OS carrying the gpg. But I remembered that GCC is also on dockerhub. And I thought, In the old days I have used newer versions of GCC to compile older ones for similar reasons. Maybe now I can use a container?

Sure enough using

docker run --rm -it -w $(pwd):/app -w /app gcc:9 bash

and then (and after unzipping the source code) from within the container

make distclean
./configure --prefix=/app/gnupg --disable-libdns
make
make install

and voila! I had the thing installed locally.

Not much of an innovative story here, but I had not written in the blog for days and thought I could share a story.