Apache Mesos and Jenkins - Elastic Build Slaves

The Mesos project is a cluster management tool for building your own elastic clouds. (see more at mesosphere.io). Here I will take you through setting up Jenkins to make use of Mesos as an elastic cloud of build slaves. Mesos is fast changing project and one to watch.

The Mesos project provides an excellent Jenkins plugin to do the heavy lifting for you. Mesos can help with some of the heavy lifting of running a large pool of Jenkins build slaves - something that us at CloudBees have a lot of experience with, and can appreciate (and assist with).

 

 

Quick intro to Mesos

 

 

Say you have a bunch of hardware - virtual or otherwise - you can run Mesos across it to manage it as a cluster available for all sorts of tasks: applications, build jobs, indexing, and more. 
 
Mesos is a multi-master (masters get elected) and multi-slave system - meaning that you set up a few masters, and many slaves - and then you can ask the masters to provide you with resources - Mesos is then responsible for finding the right capacity for you. Some may think of it as a PaaS - for highly variable workloads.
 
(Image credit: mesosphere.io)
 
 
You can even go so far as to think of it as a toolkit for build a PaaS - which the Typesafe folk have written about here:
 
(Image credit: typesafe.com)
 
What is interesting here is that in a Mesos “cloud” - you have lots of slots available - why not use them for build slaves as well. Workloads of the type Mesos appeal to are often highly elastic, and there are times where you can have a lot of spare capacity (eg at some times you want to run many apps, or many indexers, or many builds - but hopefully not all at once). Mesos lets you manage this, and make the most out of what hardware you do have. 
 
 
 
You can read more about Mesos here.
 
Mesos uses the concept of “frameworks” for launching apps (a framework is like an app that can be launched on many slave). Jenkins, via the mesos plugin, will register a framework that can then be deployed (giving your master as many slaves as the the mesos cloud is able to) - as mesos slaves “offer” to host the framework job. 
 
The theory:
Lets see how it works. 
 

 

Setting up Jenkins with Mesos

 
 
I built a simple cluster to try this out - you can too. 
 
The high level steps for getting going are: 
  • Install Mesos and Jenkins (I turned the finished product into an image so I could then launch many of them)
  • Launch at least one master and one slave
  • Install and configure a Jenkins with the Mesos plugin, connected to a master
  • Run a test build

1. Install Mesos and Jenkins on one server

This is currently the hardest bit, if you are already using it - skip this bit. Mesos is not readily packaged so you will need probably need to build it from source.
 

Follow the getting started instructions - step by step.
(I used Ubuntu 13.10)
 
If you are building 0.18.0 - you will need to apply a patch to prevent it from trying (and failing) to unzip the slave.jar. Sorry - this is unpleasant - I am asking you to edit a .cpp file and recompile - I know. You could also get a distro from here - but it may not have the fix to make slave.jar work.
 
In your build directory there will be a file: src/.libs/libmesos.so - you will need this for when you run the Jenkins master. 
 
 
At this point - you have everything setup - and you can snapshot/create an image (eg if you are on ec2) so you can launch it later - if you like. 
 

2. Launch a Master and a Slave

On a server with mesos installed, run
./build/bin/mesos-master.sh —ip=MASTER_IP —port=8999
 
I use the specific private IP address I would like it to listen on. 
 
On another server (this will be the slave) - actually it can even be the same server as a master, if you like: 
./build/bin/mesos-slave.sh —master=MASTER_IP:8999
 
At this point - you have a mesos cluster running.
You can even look at the web interface for the mesos master (http://MASTER_IP:8999):
 
 
 
You can then see the the Mesos slave attached, on the slave screen:
 
This is what will actually do the work for Mesos  - read on for how to make it do the work of a build slave. 

3. Setup Jenkins

 
As mesos is a moving target - I recommend running the mesos jenkins plugin from the git repo - it will also have updated instructions.
 
Launch a server - or you can use your master/slave server above, and run:
 
  • apt-get install git mvn
  • git clone https://github.com/jenkinsci/mesos-plugin.git && cd mesos-plugin
  • Modify the pom.xml to use the Mesos version that you compiled in the steps above (in my case it was 0.18.0) 
  • mvn hpi:run
 
At this point - Jenkins is running on port 8080 - browse to it with a web browser. 
 
Go to the /configure screen - “Add a new cloud” - pick “Mesos” 
 
This will give you the config screen for setting up Mesos slaves: 
 
 
 
 
 
I have highlighted the important bits. Firstly - put in the full path to the mesos client binary - this will be located in build/src/.libs/libmesos.so - where you build Mesos above. This is how the plugin connects to the Mesos cloud. 
 
Secondly  - put in the master IP:PORT - this must be the IP that the master is listening on. 
 
Finally - note the Label String - this defaults to “mesos” and will be how you tell build jobs to run in mesos, vs elsewhere.
 

4. Set up and try out a build job

Now - set up a new job in Jenkins. Then on the configure screen, check the box that says “restrict where this build can run”
 
 
 
Put in “mesos” (what was in the label in plugin configuration). 
 
At this point you are good to go. If you check the Mesos console, you should see that Jenkins Scheduler is now setup as a framework - which means it is able to accept jobs: 
 
 
 
 
 
Finally, joy of joys, you can run the job - and it will run on Mesos. You will see an executor magically appear - and then pause for a little bit (while the slave.jar is setup etc) - and then run the job, this will ask the Mesos master to find a suitable slave (you can have multiple slave types setup) - and then set it up with the slave.jar - connected to the Jenkins master - and run the build on the Mesos slave (so whatever tools are available on the slave, or what your build installs, can be used, as normal):
 
 
 
 
So there you go. Mesos is a fascinating project to watch which can give you, for example, elastic build slaves, that can work alongside other jobs and apps running on Mesos. 
 
 
You can read more about Mesos here and here.
 

Comments

mesos 0.21.0<br />jenkins 1.595 mesos 0.5.0 (jenkins mesos plugin) I followed your post. And: Mesos is trying continuously to register framework but never works. The framworks mesos's view show Jenkins Scheduler is register and inmediatly unregister. The Mesos logs show for each attempt of register Jenkins Scheduler frameword: I.. 10:44:03... 2849 mtr.cpp:1383] Received registration request for framework 'Jenkins Scheduler' at scheduler-b25cff90-7204-4a4f-9c6e-c5ee6c04a64b@127.0.1.1:46580 I.. 10:44:03... 2849 mtr.cpp:1447] Registering framework 20150105-100316-1275074476-5050-2830-2438 (Jenkins Scheduler) at scheduler-b25cff90-7204-4a4f-9c6e-c5ee6c04a64b@127.0.1.1:46580 I.. 10:44:03... 2847 hierarchical_allocator_process.hpp:329] Added framework 20150105-100316-1275074476-5050-2830-2438 I.. 10:44:03... 2849 mtr.cpp:789] Framework 20150105-100316-1275074476-5050-2830-2438 (Jenkins Scheduler) at scheduler-b25cff90-7204-4a4f-9c6e-c5ee6c04a64b@127.0.1.1:46580 disconnected I.. 10:44:03... 2849 mtr.cpp:1752] Disconnecting framework 20150105-100316-1275074476-5050-2830-2438 (Jenkins Scheduler) at scheduler-b25cff90-7204-4a4f-9c6e-c5ee6c04a64b@127.0.1.1:46580 I.. 10:44:03... 2849 mtr.cpp:1768] Deactivating framework 20150105-100316-1275074476-5050-2830-2438 (Jenkins Scheduler) at scheduler-b25cff90-7204-4a4f-9c6e-c5ee6c04a64b@127.0.1.1:46580 I.. 10:44:03... 2849 mtr.cpp:811] Giving framework 20150105-100316-1275074476-5050-2830-2438 (Jenkins Scheduler) at scheduler-b25cff90-7204-4a4f-9c6e-c5ee6c04a64b@127.0.1.1:46580 0ns to failover W0105 10:44:03... 2849 mtr.cpp:3726] Master returning resources offered to framework 20150105-100316-1275074476-5050-2830-2438 because the framework has terminated or is inactive I.. 10:44:03... 2847 hierarchical_allocator_process.hpp:405] Deactivated framework 20150105-100316-1275074476-5050-2830-2438 I.. 10:44:03... 2849 mtr.cpp:3713] Framework failover timeout, removing framework 20150105-100316-1275074476-5050-2830-2438 (Jenkins Scheduler) at scheduler-b25cff90-7204-4a4f-9c6e-c5ee6c04a64b@127.0.1.1:46580 I.. 10:44:03... 2849 mtr.cpp:4271] Removing framework 20150105-100316-1275074476-5050-2830-2438 (Jenkins Scheduler) at scheduler-b25cff90-7204-4a4f-9c6e-c5ee6c04a64b@127.0.1.1:46580 I.. 10:44:03... 2847 hierarchical_allocator_process.hpp:563] Recovered cpus(*):2.9; mem(*):2029; disk(*):296627; ports(*):[31003-32000] (total allocatable: cpus(*):2.9; mem(*):2029; disk(*):296627; ports(*):[31003-32000]) on slave 20150105-090907-1275074476-5050-1164-S1 from framework 20150105-100316-1275074476-5050-2830-2438 I.. 10:44:03... 2847 hierarchical_allocator_process.hpp:563] Recovered cpus(*):7.5; mem(*):6612; disk(*):925636; ports(*):[31001-32000] (total allocatable: cpus(*):7.5; mem(*):6612; disk(*):925636; ports(*):[31001-32000]) on slave 20150105-090907-1275074476-5050-1164-S0 from framework 20150105-100316-1275074476-5050-2830-2438 I.. 10:44:03... 2847 hierarchical_allocator_process.hpp:360] Removed framework 20150105-100316-1275074476-5050-2830-2438 I.. 10:44:04... 2853 mtr.cpp:3843] Sending 1 offers to framework 20141104-152335-1275074476-5050-13789-0000 (marathon-0.7.6) at scheduler-4deff1f8-16b3-496a-b382-8597b1918336@172.23.0.76:32796 I.. 10:44:04... 2848 mtr.cpp:2344] Processing reply for offers: [ 20150105-100316-1275074476-5050-2830-O1539 ] on slave 20150105-090907-1275074476-5050-1164-S0 at slave(1)@172.23.0.85:5051 (m4.vc.datys.cu) for framework 20141104-152335-1275074476-5050-13789-0000 (marathon-0.7.6) at scheduler-4deff1f8-16b3-496a-b382-8597b1918336@172.23.0.76:32796 I.. 10:44:04... 2848 hierarchical_allocator_process.hpp:563] Recovered cpus(*):7.5; mem(*):6612; disk(*):925636; ports(*):[31001-32000] (total allocatable: cpus(*):7.5; mem(*):6612; disk(*):925636; ports(*):[31001-32000]) on slave 20150105-090907-1275074476-5050-1164-S0 from framework 20141104-152335-1275074476-5050-13789-0000 I have some differents vs your example. My Mesos cluster is working in different server of Jenkins.

This could well be out of date - I wrote this against quite an old version of mesos - so not sure what that error means. I would try running it from "mvn hpi:run" from the mesos plugin sourcecode - see if that is better.

Add new comment