Writing a Jenkinsfile

I like Jenkins a lot. Even with a plethora of systems that have a vastly better web UI and many of them tailored for specific platforms, it is still my first choice. Not for many other people and they are right, because you can easily shoot yourself on the foot the worst of times. That is why when people are new to Jenkins, I have an opinionated method to start them working with it. You work only with Multibranch pipelines (even when with a single branch) and also them being declarative pipelines:

Introduction

Multibranch pipelines which are what we would like to make use of at work, are driven by Jenkinsfiles. The language to program a Jenkinsfile is a DSL based on the Groovy language. Groovy is based on (and resembles) Java and thus is vast, as is the Jenkins declarative pipeline DSL and the multitude of plugins that are supported. This guide aims to help you write your first Jenkinsfile when you have no prior experience. As such it is opinionated. You are welcome to deviate from it once you get more experience with the tooling.

So, with your editor open a new file named Jenkinsfile at the top of your repository and let’s start!

Define a pipeline

To define a pipeline simply type

pipeline {
}

That’s it! You have defined a pipeline!

Lock the pipeline

Assuming we do not want two builds of the same project running concurrently, we acquire a temporary lock

pipeline {
  options {
    lock('poc-pipeline')
  }
}

Now if two different people start the same build, the builds will be executed sequentially

But where will the build run?

Builds run on Jenkins agents. Jenkins agents are labeled and we can select them based on their labels. In the general case we run docker based builds and as such we need to select an agent that has docker installed and also provide a container image to be launched for the build to run

pipeline {
  options {
    lock('poc-pipeline')
  }
  
  agent {
    docker {
      label 'docker'
      image 'busybox'
    }
  }
}

So with the above we select a Jenkins node labeled docker which will launch a docker container inside which all our intended operations will run

Build stages

Builds in Jenkins happen in stages. As such we define a stages section in the Jenkinsfile

pipeline {
  options {
    lock('poc-pipeline')
  }
  
  agent {
    docker {
      label 'docker'
      image 'busybox'
    }
  }
  
  stages {
    stage("build") {
    }
    stage("test") {
    }
    stage("deploy") {
    }
  }
}

Above we have defined three stages, build, test and deploy, which will run in any of the Jenkins agents labeled as docker and not necessarily on the same one. Because this can lead to confusion, we require, for now, that all of our build runs on the same node. One way to do this is to have “substages” within a stage in Jenkins. The syntax becomes a bit convoluted when you are not much experienced, but let’s see how it transforms

pipeline {
  options {
    lock('poc-pipeline')
  }
  
  agent {
    docker {
      label 'docker'
      image 'busybox'
    }
  }
  
  stages {
    stage("acquire node") {
      stages {
        stage("build") {
        }
      
        stage("test") {
        }
    
        stage("deploy") {
        }
      }
    } 
  }
}

The stage acquire node is assigned to a node labeled docker and the “sub-stages” build, test and deploy will run within this node.

Each stage has steps

Each stage in a pipeline executes a series of steps

pipeline {
  options {
    lock('poc-pipeline')
  }
  
  agent {
    docker {
      label 'docker'
      image 'busybox'
    }
  }
  
  stages {
    stage("acquire node") {
      stages {
        stage("build") {
          steps {
          }
        }
      
        stage("test") {
          steps {
          }
        }
    
        stage("deploy") {
          steps {
          }
        }
      }
    } 
  }
}

Time to say Hello, World!

It is now time to make something meaningful with the Jenkinsfile like have it tell us Hello, World!. We will show you two ways to do this, one via a script section which allows us to run some Groovy code (in case we need to check some logic or something) and one using direct sh commands:

pipeline {
  options {
    lock('poc-pipeline')
  }
  
  agent {
    docker {
      label 'docker'
      image 'busybox'
    }
  }
  
  stages {
    stage("acquire node") {
      stages {
        stage("build") {
          steps {
            script {
              // This is Groovy code here
              println "This is the build stage executing"
            }
          }
        }
      
        stage("test") {
          steps {
            sh """
            echo This is the test stage executing
            """
          }
        }
    
        stage("deploy") {
          steps {
            sh """
            echo This is the deploy stage executing
            """
            script {
              println "Hello, World!"
            }
          }
        }
      }
    } 
  }
}

Congratulations! You have now created a complete Jenkinsfile.

Epilogue

Where do we go from here? You are set for your Jenkins journey. By using the above boilerplate and understanding how it is created, you can now specify jobs, have them described in code and run. Most likely you will need to read about credentials in order to perform operations to services where authentication is needed.

I understand there is a lot of curly-brace hell, which can be abstracted by extending the pipeline DSL (I am, very slowly, experimenting with Pkl to see how to best achieve this, but here is a book for Groovy DSLs if you like).

“Works on my computer” was and still is the wrong attitude

I have waited some years before posting this. I started writing this document as a means of coping with my frustration back then. It is now promoted from my private to my public journal.

I get it. It is all too common and frustrating to try something in your machine, be happy with it and when you push your changes and the CI/CD takes over, the build fails.

@here X is working fine on my machine, but failing on Jenkins
typical Slack message everywhere

You have now pinged hundreds of people, across multiple timezones. Only a tiny fraction of them are in a position to support you. By any chance, have you scrolled up a bit before posting? Assuming it was infrastructure’s fault, are you really the first one facing it?

For the sake of the argument you don’t find anything relevant in the last five Slack messages and you go ahead and ping everybody. You have now provided zero useful information. And the person you implicitly demand to fix this, is not your part-time psychologist to take it with a smile. If anything, they equally (if not more than you, since they are dealing with hundreds of running builds) want you to have and wish you green builds for your birthday.

Your laptop is not part of production. When your code runs OK in it, you do not ship it with a courier to a data center. So, in a way, whether it runs in your computer or not does not matter, as you are not developing for it to run on your laptop. Your laptop is not production. You’re developing for something else, and supposedly this is what your CI/CD is trying to show you. Hence when it fails, try to think why. You know your tooling better than anyone else. Your language of choice, its libraries and whatnot failed. You reach out for help to a person who most likely has zero experience in your tooling and certainly knows even less about the application you write with it. Think of it, if they knew all that, they’d be a member of your team already!

“But this is a blocker and we cannot release.” Well, your P1 is a P1 for your world and I sympathise. But the whole constellation of systems and builds in your organization does not revolve around it. If it was a P1 for everyone, it would be known by every means of communication available. Your P1 is my P5, just like sometimes my P1 was your P7.

Would you ever complain if it run on the CI/CD, but not on your computer? No. One more reason why “but it runs on my computer” is irrelevant and conveys no useful information to an already stressful situation.

Canceling all Jenkins jobs in queue

This is a continuation from a previous post where I showed how to disable all configured jobs in a Jenkins server (when for example launching a copy for test purposes). To this end, it may be the case that you have placed your Jenkins controller in quiet mode to have some ease of mind examining what goes on with your queue, or you simply want to cleanup the queue and have the system start with no jobs submitted. Whatever the reason, if you need to erase all of your Jenkins queue, python-jenkins and a few lines come to your assistance:

import jenkins

server = jenkins.Jenkins('http://127.0.0.1:8080/',
        timeout=3600,
        username=USERNAME,
        password=PASSWORD)

queue_info = server.get_queue_info()
for i in range(len(queue_info)):
    print(queue_info[i]['id'])
    server.cancel_queue(queue_info[i]['id'])

RUN –mount=type=ssh is not always easy

Let’s take a very barebones Jenkinsfile and use it to build a docker image that clones something from GitHub (and possibly does other stuff next):

pipeline {
  agent any

  environment {
    DOCKER_BUILDKIT=1
  }

  stages {
    stage('200ok') {
      steps {
        sshagent(["readonly-ssh-key-here"]) {
          script {
            sh 'docker build --ssh default -t adamo/200ok .'
          }
        }
      }
    }
  }
}

We are using the SSH Agent Plugin in order to allow a clone that happens in the Dockerfile:

# syntax=docker/dockerfile:experimental
FROM bitnami/git
RUN mkdir /root/.ssh &amp;&amp; ssh-keyscan github.com &gt;&gt; /root/.ssh/known_hosts
RUN --mount=type=ssh git clone git@github.com:a-yiorgos/200ok.git

This builds fine. But what if you need this to be some "rootless" container?

# syntax=docker/dockerfile:experimental
FROM bitnami/git
USER bitnami
WORKDIR /home/bitnami
RUN mkdir /home/bitnami/.ssh &amp;&amp; ssh-keyscan github.com &gt;&gt; /home/bitnami/.ssh/known_hosts
RUN --mount=type=ssh git clone git@github.com:a-yiorgos/200ok.git

This will fail with something like:

#14 [7/7] RUN --mount=type=ssh git clone git@github.com:a-yiorgos/200ok.git
#14       digest: sha256:fb15ac6ca5703d056c7f9bf7dd61bf7ff70b32dea87acbb011e91152b4c78ad4
#14         name: "[7/7] RUN --mount=type=ssh git clone git@github.com:a-yiorgos/200ok.git"
#14      started: 2021-12-17 12:00:22.859388318 +0000 UTC
#14 0.572 fatal: destination path '200ok' already exists and is not an empty directory.
#14    completed: 2021-12-17 12:00:23.508950696 +0000 UTC
#14     duration: 649.562378ms
#14        error: "executor failed running [/bin/sh -c git clone git@github.com:a-yiorgos/200ok.git]: exit code: 128"

rpc error: code = Unknown desc = executor failed running [/bin/sh -c git clone git@github.com:a-yiorgos/200ok.git]: exit code: 128

Why is that? Is not the SSH agent forwarding working? Well, kind of. Let’s add a couple of commands in the Dockerfile to see what might be the issue:

# syntax=docker/dockerfile:experimental
FROM bitnami/git
USER bitnami
WORKDIR /home/bitnami
RUN mkdir /home/bitnami/.ssh &amp;&amp; ssh-keyscan github.com &gt;&gt; /home/bitnami/.ssh/known_hosts
RUN --mount=type=ssh env
RUN --mount=type=ssh ls -l ${SSH_AUTH_SOCK}
RUN --mount=type=ssh git clone git@github.com:a-yiorgos/200ok.git

Then the build output gives us:

:
#13 [6/7] RUN --mount=type=ssh ls -l ${SSH_AUTH_SOCK}
#13       digest: sha256:ce8fcd7187eb813c16d84c13f8d318d21ac90945415b647aef9c753d0112a8a7
#13         name: "[6/7] RUN --mount=type=ssh ls -l ${SSH_AUTH_SOCK}"
#13      started: 2021-12-17 12:00:22.460172872 +0000 UTC
#13 0.320 srw------- 1 root root 0 Dec 17 12:00 /run/buildkit/ssh_agent.0
#13    completed: 2021-12-17 12:00:22.856049431 +0000 UTC
#13     duration: 395.876559ms
:

and subsequently fails to clone. This happens because the socket file /run/buildkit/ssh_agent.0 for the SSH agent forwarding is not accessible by user bitnami and thus no ssh identity is available to it.

I do not know whether it is possible to make use of RUN --mount=type=ssh in combination with USER where the user is not root. Please leave a comment if you know whether/how this can be accomplished.

So on which Jenkins system am I running on?

It is often the case that you run a staging / test Jenkins server that has identically configured jobs as the production one. In such cases you want your pipeline to be able to distinguish in which system it runs on.

One way to do so it by checking the value of the BUILD_URL environment variable. However, this is not very helpful when you’re running the master inside a container, in which case you get back the container hostname in response.

There are also a number of solutions in StackOverflow you can look at, but you may opt to utilise the fact that you can add labels to each master accordingly and then query the master for the value of the labels it carries. Our solution depends on the httpRequest plugin in order to query the master.

import groovy.json.JsonSlurper

def get_jenkins_master_labels() {
    def response = httpRequest httpMode: 'GET', url: "http://127.0.0.1:8080/computer/(master)/api/json"
    def j = new JsonSlurper().parseText(response.content)
    return j.assignedLabels.name
}

def MASTER_NODE = get_jenkins_master_labels()

pipeline {
    agent {
        label 'docker'
    }
    stages {
        stage("test") {
            steps {
                println MASTER_NODE
            }
        }
    }
}

The trick here is that the part outside of the pipeline { ... } block runs directly on the master, so we can go ahead and call http://127.0.0.1:8080/computer/(master)/api/json to figure out stuff. get_jenkins_master_labels() queries the master and returns a list of all the labels assigned to the master (or a single string, master if no other labels are assigned to it). By checking the values of the list, one can infer in which Jenkins environment they are running on and continue from there.

What does the file $JENKINS_HOME/.owner do?

I have four books that on Jenkins and have read numerous posts on the Net that discuss weird Jenkins details and internals (more than I ever wished to know about), but none that explains what the file $JENKINS_HOME/.owner does (even though they include listings like this ). I found out about it recently because I was greeted by the message:

Jenkins detected that you appear to be running more than one instance of Jenkins
that share the same home directory. This greatly confuses Jenkins and you will
likely experience strange behaviours, so please correct the situation.

This Jenkins:  1232342241 contextPath="" at 2288@ip-172.31.0.10
Other Jenkins: 863352860 contextPath="" at 1994@ip-172.31.0.14

[Ignore this problem and keep using Jenkins anyway]

Indeed it appears that Jenkins, after initialisation, does run a test to check whether another process already runs from the same directory. When the check is run, it creates the file $JENKINS_HOME/.owner, The .owner part of the name is hardcoded.

Even more interesting is the fact, that in order to avoid having the two processes write information on .owner at the same time, randomises when the process is going to write on the file, so even if both processes start at the same time, chances that their writes coincide are slim.

What does it write in this file, you ask? There you go. When was this feature added? 2008/01/31. The mechanism is documented in the comments of the code:

The mechanism is simple. This class occasionally updates a known file inside the hudson home directory, and whenever it does so, it monitors the timestamp of the file to make sure no one else is updating this file. In this way, while we cannot detect the problem right away, within a reasonable time frame we can detect the collision.

You may want to keep that in mind, especially in cases when you’re greeted by the above message but know for a fact that a second process is not running. Some abrupt ending of the previous process occurred and you did not take notice. Or indeed a second process is messing with your CI

Mass disabling all Jenkins jobs

There are times that you need to disable all jobs on a Jenkins server. Especially when you’ve made a backup copy for testing or other purposes. You do not want jobs to start executing from that second server before you’re ready. Sure you can start Jenkins in quiet mode but sometime you have to exit it and scheduled jobs will start running. What can you do?

Well, there are plenty of pages that show Groovy code that allows you to stop jobs, and there are even suggestions to locate and change every config.xml file by running something like sed -i 's/disabled>false/disabled>true/' config.xml on each of them. Or even better use the Configuration Slicing plugin. Firstly, you may feel uneasy to mass change all config.xml file from a process external to Jenkins. Secondly, the Configuration Slicing plugin does not give you a "select all option" nor does it handle Multibranch Pipeline jobs. Thirdly, the Groovy scripts I’ve found shared by others online, also do not handle Pipelines and Multibranch Pipelines. If you’re based on Multibranch Pipelines, you’re kind of stuck then. Or you have to go and manually disable each one of them.

Thankfully there’s a solution using Jenkins’s REST API and python-jenkins. An example follows:

import jenkins

server = jenkins.Jenkins('http://127.0.0.1:8080/',
        timeout=3600,
        username=USERNAME,
        password=PASSWORD)


all_jobs = server.get_all_jobs()
for j in range(len(all_jobs)):
    try:
        server.disable_job(all_jobs[j]['fullname'])
    except Exception as e:
        print(all_jobs[j]['fullname'])

I hope it helps you out maintaining your Jenkins.