“Management Science Fiction “

In this episode of ArrayCast, Turing Award winner Ken E. Iverson talks about Ian P. Sharp (founder of IPSharp and Associates) and shares this:

He was one of the early people in operations research, which came to be called management science. So he knows what this stuff is, but he also likes to speak of management science fiction, which I think reflects the correct thing that those techniques were very much overblown and oversold, at least for a period.

This of course reminded me of Gene Woolsey (again someone well known in operations research) who at the beginning of the book is seen saying:

:
5. Does it work?
6. If yes, is there a measurable, verifiable reduction in cost over what was done before, or a measurable, verifiable increase in readiness?
7. If yes, show it to me NOW.

If you think I am trying to take a dig at some current trend of overblown and oversold techniques using lessons and parallels from the past, you are correct :) I have expressed the very same opinion in private conversations about what is happening with GenAI today as I.P. Sharp had years ago about Management Science.

That made me smile for the rest of the weekend.

A peculiarity with Kubernetes immutable secrets

This post is again something from a question that popped in #kubernetes-users. A user was updating an immutable secret and yet the value was not propagating. But what is an immutable secret in the first place?

Once a Secret or ConfigMap is marked as immutable, it is not possible to revert this change nor to mutate the contents of the data field. You can only delete and recreate the Secret. Existing Pods maintain a mount point to the deleted Secret – it is recommended to recreate these pods.

So the workflow is: Delete old immutable secret, create a new one, restart the pods that use it.

Let’s create an immutable secret from the command line:

% kubectl create secret generic version -o json --dry-run=client --from-literal=version=1  | jq '. += {"immutable": true}' | kubectl apply -f -
secret/version created

If there’s a way to create an immutable secret from the command line without using jq please tell me.

Now let’s create a deployment that uses it as an environment variable

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - image: nginx
        name: nginx
        env:
        - name: VERSION
          valueFrom:
            secretKeyRef:
              name: version
              key: version

Now let’s check the value for $VERSION:

% kubectl exec -it nginx-68bbbd9d9f-rwtfm -- env | grep ^VERSION
VERSION=1

The time has come for us to update the secret with a new value. Since it is immutable, we delete and recreate it:

% kubectl delete secrets version
secret "version" deleted
% kubectl create secret generic version -o json --dry-run=client --from-literal=version=2  | jq '. += {"immutable": true}' | kubectl apply -f -
secret/version created

We restart the pod and check the value of $VERSION again:

% kubectl delete pod nginx-68bbbd9d9f-rwtfm
pod "nginx-68bbbd9d9f-rwtfm" deleted
% kubectl exec -it nginx-68bbbd9d9f-x4c74 -- env | grep ^VERSION
VERSION=1

What happened here? Why has the pod the old value still? It seems that there some caching at play here and the new immutable secret is not passed at the pod. But let’s try something different now:

% kubectl delete secrets version
secret "version" deleted
% kubectl create secret generic version -o json --dry-run=client --from-literal=version=3  | jq '. += {"immutable": true}' | kubectl apply -f -
secret/version created
% kubectl scale deployment nginx --replicas 0
deployment.apps/nginx scaled
% kubectl scale deployment nginx --replicas 1
deployment.apps/nginx scaled
% kubectl exec -it nginx-68bbbd9d9f-zkj8h -- env | grep ^VERSION
VERSION=3

From what I understand, if you scale down your deployment and scale it up again this gives enough time for the kubelet to release the old cached value and pass the new, proper one.

Now all the above were tested with Docker Desktop Kubernetes. I did similar tests with a multinode microk8s cluster and when restarting the pods the environment variable was updated properly, but if instead you used a volume mount for the secret, it did not and you needed to scale down to zero first.

Can a pod belong to 2 workloads?

In the Kubernetes Slack an interesting question was posed:

Hi, can a pod belong to 2 workloads? For example, can a pod belong both the a workload and to the control plane workload?

My initial reaction was that, while a Pod can belong to two (or three, or more) services, it cannot belong to two workloads (Deployments for example). I put my theory to the test by creating initially a pod with some labels

apiVersion: v1
kind: Pod
metadata:
  name: caddy
  labels:
    apache: ok
    nginx: ok
spec:
  containers:
  - name: caddy
    image: caddy
    ports:
    - name: http
      containerPort: 80
    - name: https
      containerPort: 443

Sure enough the pod was created

% kubectl get pod
NAME    READY   STATUS    RESTARTS   AGE
caddy   1/1     Running   0          2s

Next I created a replicaSet whose pods have a label that the above (caddy) pod has also.

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      nginx: ok
  template:
    metadata:
      labels:
        app: nginx
        nginx: ok
    spec:
      containers:
      - name: nginx
        image: bitnami/nginx
        ports:
        - containerPort: 8080

Since the original pod and the replicaSet share a common label (nginx: ok), the pod is assimilated in the replicaSet and it launches one additional pod only:

% kubectl get pod
NAME          READY   STATUS    RESTARTS   AGE
caddy         1/1     Running   0          2m52s
nginx-lmmbk   1/1     Running   0          3s

We can now ask Kubernetes to create an identical replicaSet that launches apache instead of nginx and has the apache: ok label set.

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: apache
  labels:
    app: apache
spec:
  replicas: 2
  selector:
    matchLabels:
      apache: ok
  template:
    metadata:
      labels:
        app: apache
        apache: ok
    spec:
      containers:
      - name: apache
        image: bitnami/apache
        ports:
        - containerPort: 8080

If a pod can be shared among workloads, then it should start a single apache pod. Does it?

% kubectl get pod
NAME           READY   STATUS    RESTARTS   AGE
apache-8fwdz   1/1     Running   0          4s
apache-9xwhd   1/1     Running   0          4s
caddy          1/1     Running   0          5m17s
nginx-lmmbk    1/1     Running   0          2m28

As you can see, it starts two apache pods and the pods carrying the apache-ok label are three:

 % kubectl get pod -l apache=ok
NAME           READY   STATUS    RESTARTS   AGE
apache-8fwdz   1/1     Running   0          6m20s
apache-9xwhd   1/1     Running   0          6m20s
caddy          1/1     Running   0          11m

% kubectl get rs
NAME     DESIRED   CURRENT   READY   AGE
apache   2         2         2       6m21s
nginx    2         2         2       8m45s

So there you have it, a Pod cannot be shared among workloads.

Writing a Jenkinsfile

I like Jenkins a lot. Even with a plethora of systems that have a vastly better web UI and many of them tailored for specific platforms, it is still my first choice. Not for many other people and they are right, because you can easily shoot yourself on the foot the worst of times. That is why when people are new to Jenkins, I have an opinionated method to start them working with it. You work only with Multibranch pipelines (even when with a single branch) and also them being declarative pipelines:

Introduction

Multibranch pipelines which are what we would like to make use of at work, are driven by Jenkinsfiles. The language to program a Jenkinsfile is a DSL based on the Groovy language. Groovy is based on (and resembles) Java and thus is vast, as is the Jenkins declarative pipeline DSL and the multitude of plugins that are supported. This guide aims to help you write your first Jenkinsfile when you have no prior experience. As such it is opinionated. You are welcome to deviate from it once you get more experience with the tooling.

So, with your editor open a new file named Jenkinsfile at the top of your repository and let’s start!

Define a pipeline

To define a pipeline simply type

pipeline {
}

That’s it! You have defined a pipeline!

Lock the pipeline

Assuming we do not want two builds of the same project running concurrently, we acquire a temporary lock

pipeline {
  options {
    lock('poc-pipeline')
  }
}

Now if two different people start the same build, the builds will be executed sequentially

But where will the build run?

Builds run on Jenkins agents. Jenkins agents are labeled and we can select them based on their labels. In the general case we run docker based builds and as such we need to select an agent that has docker installed and also provide a container image to be launched for the build to run

pipeline {
  options {
    lock('poc-pipeline')
  }
  
  agent {
    docker {
      label 'docker'
      image 'busybox'
    }
  }
}

So with the above we select a Jenkins node labeled docker which will launch a docker container inside which all our intended operations will run

Build stages

Builds in Jenkins happen in stages. As such we define a stages section in the Jenkinsfile

pipeline {
  options {
    lock('poc-pipeline')
  }
  
  agent {
    docker {
      label 'docker'
      image 'busybox'
    }
  }
  
  stages {
    stage("build") {
    }
    stage("test") {
    }
    stage("deploy") {
    }
  }
}

Above we have defined three stages, build, test and deploy, which will run in any of the Jenkins agents labeled as docker and not necessarily on the same one. Because this can lead to confusion, we require, for now, that all of our build runs on the same node. One way to do this is to have “substages” within a stage in Jenkins. The syntax becomes a bit convoluted when you are not much experienced, but let’s see how it transforms

pipeline {
  options {
    lock('poc-pipeline')
  }
  
  agent {
    docker {
      label 'docker'
      image 'busybox'
    }
  }
  
  stages {
    stage("acquire node") {
      stages {
        stage("build") {
        }
      
        stage("test") {
        }
    
        stage("deploy") {
        }
      }
    } 
  }
}

The stage acquire node is assigned to a node labeled docker and the “sub-stages” build, test and deploy will run within this node.

Each stage has steps

Each stage in a pipeline executes a series of steps

pipeline {
  options {
    lock('poc-pipeline')
  }
  
  agent {
    docker {
      label 'docker'
      image 'busybox'
    }
  }
  
  stages {
    stage("acquire node") {
      stages {
        stage("build") {
          steps {
          }
        }
      
        stage("test") {
          steps {
          }
        }
    
        stage("deploy") {
          steps {
          }
        }
      }
    } 
  }
}

Time to say Hello, World!

It is now time to make something meaningful with the Jenkinsfile like have it tell us Hello, World!. We will show you two ways to do this, one via a script section which allows us to run some Groovy code (in case we need to check some logic or something) and one using direct sh commands:

pipeline {
  options {
    lock('poc-pipeline')
  }
  
  agent {
    docker {
      label 'docker'
      image 'busybox'
    }
  }
  
  stages {
    stage("acquire node") {
      stages {
        stage("build") {
          steps {
            script {
              // This is Groovy code here
              println "This is the build stage executing"
            }
          }
        }
      
        stage("test") {
          steps {
            sh """
            echo This is the test stage executing
            """
          }
        }
    
        stage("deploy") {
          steps {
            sh """
            echo This is the deploy stage executing
            """
            script {
              println "Hello, World!"
            }
          }
        }
      }
    } 
  }
}

Congratulations! You have now created a complete Jenkinsfile.

Epilogue

Where do we go from here? You are set for your Jenkins journey. By using the above boilerplate and understanding how it is created, you can now specify jobs, have them described in code and run. Most likely you will need to read about credentials in order to perform operations to services where authentication is needed.

I understand there is a lot of curly-brace hell, which can be abstracted by extending the pipeline DSL (I am, very slowly, experimenting with Pkl to see how to best achieve this, but here is a book for Groovy DSLs if you like).

Happy SysAdmin Day

The CrowdStrike thing happened a Friday too soon :) Which got me thinking. We give third party software a lot of permission in kernel mode, when in fact they are most likely not involved in the kernel development. And we ship updates to this software that get interpreted and execute actions in kernel space.

The only real difference from malware here is intent. Which reminded me of this old story where the author of an AdWare described how they used tinyscheme for their purpose.

Or the case when a friend figured out that a driver was crashing because it was using an XML parser (not designed for kernel space) to parse five lines of XML.

Or when Prolog was used in the WindowsNT kernel.

Random thoughts of the day.

Have a lovely weekend.

How I setup Rancher these days

Rancher is a very handy interface when you need to manage Kubernetes clusters. In the past I was deploying rancher on a single VM running the container as per the getting-started instructions. You could go a long way using a single machine setup.

However, if you observe how recent versions of the container start, a k3s cluster is launched within the container. Which kind of makes it an overkill to work this way. Also the Rancher documentation includes multiple directions on how to run it in a Kubernetes cluster (k3s being their obvious preferrence) and how to also do certificate management (which is something you need, since otherwise the rancher agent deployed in your clusters won’t be able to communicate via web sockets). Well, I am not a big fun of how the Rancher documentation describes the actions to take to launch it in a Kubernetes cluster, and more importantly I am annoyed at how SSL certificates are managed. You can go a long way using microk8s, and this is what I did in this case.

Assuming you have setup a microk8s cluster (single or three node cluster for HA) we are almost ready to start. Rancher deploys its stuff in the cattle-system namespace, so we create this first with kubectl create ns cattle-system. We will use helm to install Rancher and we want to provide some basic values to the installation. So we create a file named values.yaml with the following contents

auditLog:
  level: 1
bootstrapPassword: A_PASSWORD_HERE
hostname: rancher.example.net
replicas: 1
ingress:
  enabled: false

With the above we instruct helm not to deal with the Ingress, since we will provide this later (we want to manage certificates either on our own or via cert-manager at the Ingress object). Thus we run helm -n cattle-system install rancher rancher-latest/rancher -f values.yaml --version 2.8.3 to install it.

After some time passes (verified by something like kubectl -n cattle-system get pod) Rancher is installed and we now need to make it accessible from the “outside” world. Microk8s offers nginx-ingress as an add on (microk8s enable ingress sets this up) or we can use a different ingress like for example haproxy again using helm -n ingress-haproxy install haproxy-ingress haproxy-ingress/haproxy-ingress -f ./values-haproxy.yaml --version 0.14.6 . The contents for values-haproxy.yaml are:

controller:
  hostNetwork: true
  ingressClassResource:
    enabled: true

And now that we have the Ingress controller installed, we can also set it up

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: rancher-haproxy
  namespace: cattle-system
  annotations:
    haproxy-ingress.github.io/ssl-redirect: "true"
spec:
  ingressClassName: "haproxy"
  tls:
  - hosts:
    - rancher.example.net
    secretName: example-net
  rules:
  - host: rancher.example.net
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: rancher
            port:
              number: 80

And you are done. You can of course setup a cert-manager Issuer that will help you automate certificate management and issuing.

Happy ranchering.

PS: Assuming that a new version of Rancher is out, you can upgrade with something like helm -n cattle-system upgrade rancher rancher-latest/rancher -f values-rancher.yaml --version 2.8.4

A quick Dockerfile for traduora

If you want to use / test traduora , at the time of writing this blog post, the latest provided docker image is over a year old and the supplied Dockerfile not really working. So in order to be in a position to test, you can start with the following

FROM node:18
WORKDIR /usr/app
RUN git clone https://github.com/ever-co/ever-traduora
WORKDIR /usr/app/ever-traduora
ENV NODE_OPTIONS='--openssl-legacy-provider'
ARG NODE_OPTIONS='--openssl-legacy-provider'
RUN bin/build.sh
CMD /usr/app/ever-traduora/bin/start.sh

And if you happen to want to connect to an AWS RDS, you may need to set the environment variable PGSSLMODE=no-verify. Traduora is using typeorm. The way traduora is written, it assumes that the database is running as a sidecar container, so it cannot handle the ssl default requirement from AWS RDS without messing with PGSSLMODE.

Network Request Failed when configuring OpenLDAP authentication in Rancher

It may be the case that you have installed Rancher in a cluster via helm with something like

helm install rancher rancher-latest/rancher \
--namespace=cattle-system \
--set hostname=rancher.storfund.net \
--set replicas=1 \
--set bootstrapPassword=PASSWORD_HERE \
--set auditLog.level=1 \
--version 2.8.3

If you try to configure the OpenLDAP authentication (and maybe other directories) you will be greeted with the not at all helpful message Network Request Failed` where in the logs you will see that your OpenLDAP server was never contacted. What gives?

Well, the above helm command installs Rancher with a self-signed certificate. And you have to open the developer tools in the browser to see that a wss:// call failed because of the certificate. The solution of course is to use a certificate that your browser considers valid. First we ask helm to give us the configuration values with helm -n cattle-system get values rancher -o yaml > values.yaml and then we augment values.yaml with:

ingress:
  tls:
    source: secret
privateCA: true

It does not have to be a “really” private CA. I did the above with a certificate issued by Let’s Encrypt. The above can be upgraded now with helm -n cattle-system upgrade rancher rancher-latest/rancher -f values.yaml --version 2.8.3 And now we are ready to add our own working certificate with

kubectl -n cattle-system delete secret tls-rancher-ingress
kubectl -n cattle-system create secret tls --key ./key.pem --cert ./cert.pem

Of course, if you are using cert-manager there are other ways to do stuff. See also:

How many env: blocks per container?

While creating a Deployment earlier today, I faced a weird situation where a specific environment variable that was held as a Secret, was not being set. I tried deleting and recreating the secret, with no success. Mind you this was a long YAML with volumes, volumeMounts, ConfigMaps as enviroment variables, lots of lines. In the end the issue was pretty simple and I missed it because kubectl silently accepted the submitted YAML. I had two(!) env: blocks defined for the same container and somehow I missed that. It turns out, that when you do so, only the last one gets accepted, and whatever is defined in the previous, is not taken into account. To show this with an example:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: box
  name: box
spec:
  replicas: 1
  selector:
    matchLabels:
      app: box
  template:
    metadata:
      labels:
        app: box
    spec:
      containers:
      - image: busybox
        name: busybox
        env:
        - name: FIRST_ENV
          value: "True"
        - name: ANOTHER_FIRST
          value: "True"
        command:
        - sleep
        - infinity
        env:
        - name: SECOND_ENV
          value: "True"

In the above example, when the Pod starts the container, only SECOND_ENV is set. FIRST_ENV and ANOTHER_FIRST are not.

I do not know whether this is a well known YAML fact or not, but it cost me some head-scratching and I hope it won’t cost you too.

“Works on my computer” was and still is the wrong attitude

I have waited some years before posting this. I started writing this document as a means of coping with my frustration back then. It is now promoted from my private to my public journal.

I get it. It is all too common and frustrating to try something in your machine, be happy with it and when you push your changes and the CI/CD takes over, the build fails.

@here X is working fine on my machine, but failing on Jenkins

typical Slack message everywhere

You have now pinged hundreds of people, across multiple timezones. Only a tiny fraction of them are in a position to support you. By any chance, have you scrolled up a bit before posting? Assuming it was infrastructure’s fault, are you really the first one facing it?

For the sake of the argument you don’t find anything relevant in the last five Slack messages and you go ahead and ping everybody. You have now provided zero useful information. And the person you implicitly demand to fix this, is not your part-time psychologist to take it with a smile. If anything, they equally (if not more than you, since they are dealing with hundreds of running builds) want you to have and wish you green builds for your birthday.

Your laptop is not part of production. When your code runs OK in it, you do not ship it with a courier to a data center. So, in a way, whether it runs in your computer or not does not matter, as you are not developing for it to run on your laptop. Your laptop is not production. You’re developing for something else, and supposedly this is what your CI/CD is trying to show you. Hence when it fails, try to think why. You know your tooling better than anyone else. Your language of choice, its libraries and whatnot failed. You reach out for help to a person who most likely has zero experience in your tooling and certainly knows even less about the application you write with it. Think of it, if they knew all that, they’d be a member of your team already!

“But this is a blocker and we cannot release.” Well, your P1 is a P1 for your world and I sympathise. But the whole constellation of systems and builds in your organization does not revolve around it. If it was a P1 for everyone, it would be known by every means of communication available. Your P1 is my P5, just like sometimes my P1 was your P7.

Would you ever complain if it run on the CI/CD, but not on your computer? No. One more reason why “but it runs on my computer” is irrelevant and conveys no useful information to an already stressful situation.