How I setup Rancher these days

Rancher is a very handy interface when you need to manage Kubernetes clusters. In the past I was deploying rancher on a single VM running the container as per the getting-started instructions. You could go a long way using a single machine setup.

However, if you observe how recent versions of the container start, a k3s cluster is launched within the container. Which kind of makes it an overkill to work this way. Also the Rancher documentation includes multiple directions on how to run it in a Kubernetes cluster (k3s being their obvious preferrence) and how to also do certificate management (which is something you need, since otherwise the rancher agent deployed in your clusters won’t be able to communicate via web sockets). Well, I am not a big fun of how the Rancher documentation describes the actions to take to launch it in a Kubernetes cluster, and more importantly I am annoyed at how SSL certificates are managed. You can go a long way using microk8s, and this is what I did in this case.

Assuming you have setup a microk8s cluster (single or three node cluster for HA) we are almost ready to start. Rancher deploys its stuff in the cattle-system namespace, so we create this first with kubectl create ns cattle-system. We will use helm to install Rancher and we want to provide some basic values to the installation. So we create a file named values.yaml with the following contents

auditLog:
  level: 1
bootstrapPassword: A_PASSWORD_HERE
hostname: rancher.example.net
replicas: 1
ingress:
  enabled: false

With the above we instruct helm not to deal with the Ingress, since we will provide this later (we want to manage certificates either on our own or via cert-manager at the Ingress object). Thus we run helm -n cattle-system install rancher rancher-latest/rancher -f values.yaml --version 2.8.3 to install it.

After some time passes (verified by something like kubectl -n cattle-system get pod) Rancher is installed and we now need to make it accessible from the “outside” world. Microk8s offers nginx-ingress as an add on (microk8s enable ingress sets this up) or we can use a different ingress like for example haproxy again using helm -n ingress-haproxy install haproxy-ingress haproxy-ingress/haproxy-ingress -f ./values-haproxy.yaml --version 0.14.6 . The contents for values-haproxy.yaml are:

controller:
  hostNetwork: true
  ingressClassResource:
    enabled: true

And now that we have the Ingress controller installed, we can also set it up

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: rancher-haproxy
  namespace: cattle-system
  annotations:
    haproxy-ingress.github.io/ssl-redirect: "true"
spec:
  ingressClassName: "haproxy"
  tls:
  - hosts:
    - rancher.example.net
    secretName: example-net
  rules:
  - host: rancher.example.net
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: rancher
            port:
              number: 80

And you are done. You can of course setup a cert-manager Issuer that will help you automate certificate management and issuing.

Happy ranchering.

PS: Assuming that a new version of Rancher is out, you can upgrade with something like helm -n cattle-system upgrade rancher rancher-latest/rancher -f values-rancher.yaml --version 2.8.4

Network Request Failed when configuring OpenLDAP authentication in Rancher

It may be the case that you have installed Rancher in a cluster via helm with something like

helm install rancher rancher-latest/rancher \
--namespace=cattle-system \
--set hostname=rancher.storfund.net \
--set replicas=1 \
--set bootstrapPassword=PASSWORD_HERE \
--set auditLog.level=1 \
--version 2.8.3

If you try to configure the OpenLDAP authentication (and maybe other directories) you will be greeted with the not at all helpful message Network Request Failed` where in the logs you will see that your OpenLDAP server was never contacted. What gives?

Well, the above helm command installs Rancher with a self-signed certificate. And you have to open the developer tools in the browser to see that a wss:// call failed because of the certificate. The solution of course is to use a certificate that your browser considers valid. First we ask helm to give us the configuration values with helm -n cattle-system get values rancher -o yaml > values.yaml and then we augment values.yaml with:

ingress:
  tls:
    source: secret
privateCA: true

It does not have to be a “really” private CA. I did the above with a certificate issued by Let’s Encrypt. The above can be upgraded now with helm -n cattle-system upgrade rancher rancher-latest/rancher -f values.yaml --version 2.8.3 And now we are ready to add our own working certificate with

kubectl -n cattle-system delete secret tls-rancher-ingress
kubectl -n cattle-system create secret tls --key ./key.pem --cert ./cert.pem

Of course, if you are using cert-manager there are other ways to do stuff. See also:

Rancher’s cattle-cluster-agent and error 404

It may be the case that when you deploy a new Rancher2 Kubernetes cluster, all pods are working fine, with the exception of cattle-cluster-agent (whose scope is to connect to the Kubernetes API of Rancher Launched Kubernetes clusters) that enters a CrashLoopBackoff state (red state in your UI under the System project).

One common error you will see from View Logs of the agent’s pod is 404 due to a HTTP ping failing:

ERROR: https://rancher-ui.example.com/ping is not accessible (The requested URL returned error: 404)

It is a DNS problem

The issue here is that if you watch the network traffic on your Rancher2 UI server, you will never see pings coming from the pod, yet the pod is sending traffic somewhere. Where?

Observe the contents of your pod’s /etc/resolv.conf:

nameserver 10.43.0.10
search default.svc.cluster.local svc.cluster.local cluster.local example.com
options ndots:5

Now if you happen to have a wildcard DNS A record in example.com the HTTP ping in question becomes http://rancher-ui.example.com.example.com/ping which happens to resolve to the A record of the wildcard (most likely not the A RR of the host where the Rancher UI runs). Hence if this machine runs a web server, you are at the mercy of what that web server responds.

One quick hack is to edit your Rancher2 cluster’s YAML and instruct the kubelet to start with a different resolv.conf that does not contain a search path with your domain with the wildcard record in it. The kubelet appends the search path line to the default and in this particular case you do not want that. So you tell your Rancher2 cluster the following:

  kubelet:
    extra_args:
      resolv-conf: /host/etc/resolv.rancher

resolv.rancher contains only nameserver entries in my case. The path is /host/etc/resolv.rancher because you have to remember that in Rancher2 clusters, the kubelet itself runs from within a container and access the host’s file system under /host.

Now I am pretty certain this can be dealt with, with some coredns configuration too, but did not have the time to pursue it.

once again bitten by the MTU

At work we use Rancher2 clusters a lot. The UI makes some things easier I have to admit. Like sending logs from the cluster somewhere. I wanted to test sending such logs to an ElasticSearch and thus I setup a test installation with docker-compose:

version: "3.4"

services:
  elasticsearch:
    restart: always
    image: elasticsearch:7.5.1
    container_name: elasticsearch
    ports:
      - "9200:9200"
    environment:
      - ES_JAVA_OPTS=-Xmx16g
      - cluster.name=lala-cluster
      - bootstrap.memory_lock=true
      - discovery.type=single-node
      - node.name=lala-node
      - http.port=9200
      - xpack.security.enabled=true
      - xpack.monitoring.collection.enabled=true
    volumes:
      # ensure chown 1000:1000 /opt/elasticsearch/data please.
      - /opt/elasticsearch/data:/usr/share/elasticsearch/data

  kibana:
    restart: always
    image: kibana:7.5.1
    ports:
      - "5601:5601"
    container_name: kibana
    depends_on:
      - elasticsearch
    volumes:
      - /etc/docker/compose/kibana.yml:/usr/share/kibana/config/kibana.yml

Yes, this is a yellow cluster, but then again, it is a test cluster on a single machine.

This seemed to work for some days, and the it stopped. tcpdump showed packets arriving at the machine, but not really responding back after the three way handshake. So the old mantra kicked in:

It is a MTU problem.

Editing daemon.json to accommodate for that assumption:

{
  "mtu": 1400
}

and logging was back to normal.

I really hate fixes like this, but sometimes when pressed by other priorities they present a handy arsenal.

rkube: Rancher2 Kubernetes cluster on a single VM using RKE

There are many solutions to run a complete Kubernetes cluster in a VM on your machine, minikube, microk8s or even with kubeadm. So embarking into what others have done before me, I wanted to do the same with RKE. Mostly because I work with Rancher2 lately and I want to experiment on VirtualBox without remorse.

Enter rkube (the name directly inspired from minikube and rke). It does not do the many things that minikube does, but it is closer to my work environments.

We use vagrant to boot an Ubuntu Bionic box. It creates a 4G RAM / 2 CPU machine. We provision the machine using ansible_local and install docker from the Ubuntu archives. This is version 17 for Bionic. If you need a newer version, check the docker documentation and modify ansible.yml accordingly.

Once the machine boots up and is provisioned, it is ready for use. You will find the kubectl configuration file named kube_cluster_config.yml installed in the cloned repository directory. You can now run a simple echo server with:

kubectl --kubeconfig kube_cluster_config.yml apply -f echo.yml

Check that the cluster is deployed with:

kubectl --kubeconfig kube_cluster_config.yml get pod
kubectl --kubeconfig kube_cluster_config.yml get deployment
kubectl --kubeconfig kube_cluster_config.yml get svc
kubectl --kubeconfig kube_cluster_config.yml get ingress

and you can visit the echo server at http://192.168.98.100/echo Ignore the SSL error. We have not created a specific SSL certificate for the Ingress controller yet.

You can change the IP address you can connect to the RKE VM in the Vagrantfile.

Suppose you now want to upgrade the Kubernetes version. vagrant ssh into the VM and run rke config -l -s -a and pick the new version that you want to install. Look for the containers named hypercube. You now edit /vagrant/cluster.yml and run rke up --config /vagrant/cluster.yml.

Note that thanks to vagrant’s niceties, the /vagrant directory within the VM is the directory you cloned the repository into.

I developed the whole thing in Windows 10, so it should be able to run just about anywhere. I hope you like it and help me make it a bit better if you find it useful.

You can browse rkube here