Data Engineer’s Lunch #15: Introduction to Jenkins

In Data Engineer’s Lunch #15: Introduction to Jenkins, we discussed Jenkins the automation platform. The live recording of the Data Engineer’s Lunch, which includes a more in-depth discussion, is also embedded below in case you were not able to attend live. If you would like to attend a Data Engineer’s Lunch live, it is hosted every Monday at noon EST. Register here now!

Jenkins is an open-source automation tool. It is generally used for doing automated builds, tests, and deployments. This alongside its ability to trigger actions on various types of events makes it a good CI/CD platform. It might also work well as a potential scheduling tool. This post will cover some of the tool’s base abilities. It will also compare its potential as a scheduling tool against Airflow, a scheduling tool that we have discussed previously. That post can be found here.

Jenkins Overview

Jenkins works on a number of platforms including Windows, Mac, and Linux, but also including containers like Docker and orchestration platforms like Kubernetes. Its base functionality enables it to work as a complete CI/CD platform doing both the integration and delivery workloads. It comes with a web UI for ease of use and easy configuration. Plugins allow Jenkins to extend its functionality and connect to external tools. Plugins can be managed from the UI. Users can install, update, and delete plugins there. It is also possible to write custom plugins for Jenkins. The tool can also work as a distributed platform, running across multiple machines. 

The main tool we will use for defining workloads for Jenkins is Pipelines. Jenkins pipelines define the environment, configuration, and commands to set up complex tasks. Pipelines have defined, build conditions where one trigger of the condition runs the entire pipeline once. The environments and commands are defined within a Jenkinsfile and need to be packaged with any extra code or other resources they want to work with. 

Jenkins Scheduling

Jenkins can trigger pipelines to run on a number of triggers. The default for CI/CD activities is to run when the code base is updated. Jenkins has a setting for this. In order to be a scheduling platform like Airflow, it needs to have other ways to trigger builds as well, and it does. The first we will discuss is running builds on a schedule. Airflow also has this functionality. Airflow can trigger DAG runs based on a schedule defined in cron notation or defined with some standard shorthands for certain schedules. Jenkins has this same functionality. We can use cron notation as well as aliases for certain schedules like @daily and @midnight. We can also enter a number of schedules in the box to define irregular schedules.

Jenkins pipelines are split into stages inside the Jenkinsfile, each of which runs a set of commands in linear order called steps. This means that pipelines only run linear workflows. If we had no other tools to work with, this would make Jenkins inferior to Airflow since Airflow has the ability to run DAGs (directed acyclic graphs) of tasks in order to work through complex workflows with dependencies. In order to do that in Jenkins, we use the “build after other projects are built” build trigger. With this, we can split any linear section of more complex workflows into their own pipelines and have the build triggers combine them into DAGs.

Compared to Airflow, Jenkins can’t dynamically define DAGs / Pipelines in code (unless code modifies Jenkinsfiles). Airflow DAG building only requires dependencies defined in code, Jenkins must be built into the build trigger. Jenkins also can’t trigger specific steps, while Airflow API allows the running of specific tasks alone.

Cassandra.Link

Cassandra.Link is a knowledge base that we created for all things Apache Cassandra. Our goal with Cassandra.Link was to not only fill the gap of Planet Cassandra but to bring the Cassandra community together. Feel free to reach out if you wish to collaborate with us on this project in any capacity.

We are a technology company that specializes in building business platforms. If you have any questions about the tools discussed in this post or about any of our services, feel free to send us an email!


Join Anant's Newsletter

Subscribe to our monthly newsletter below and never miss the latest Cassandra and data engineering news!