Health Check-up for Your Jenkins

Just like human beings, Jenkins benefits from periodic health check-ups. It helps you establish baselines if you develop performance problems later, and sometimes it even lets you discover a problem and nip it in the bud before it becomes a serious one. In this post, I’m going to talk about a couple of very accessible tools that let you do the check-up by yourself. 

The first thing you want to check is the memory usage. Tracking memory usage is important because it affects the UI performance of Jenkins, and if unattended, it’ll eventually kill Jenkins. 

To track memory usage, you can use jconsole. You need X on the server to do this, since this is a graphical tool, but it’s more accessible for the first time user. 

You run it like: 

$ ssh -X jenkins@myserver jconsole 

and you can choose PID of the Jenkins process to get the heap summary.

Pay particular attention to the “PS Old Gen”, and compare the “used” field against the “max” field to see how much space you have used up. The old generation heap is the area where long living objects go, and this is also the space Jenkins uses for caching build records, page templates, and so on. So you don’t want to see this stay too long above the 90% line. Keeping track of what the memory usage looks like regularly allows you spot the abnormal increase in the usage.

Another thing you want to keep track of is the round trip time of HTTP requests. The Monitoring plugin provides an excellent detailed view into this, allowing you to see the actual numbers and which page is taking a long time to run.

It’s also very useful to keep the long-term record of the page loading time, as this allows you to correlate changes in the time with changes you made to Jenkins. We use Nagios extensively for this both at CloudBees and in the Jenkins project itself. Once you install Nagios, you can use that to monitor Jenkins itself, or any other metrics you care to track, such as the length of the queue.

Once you find suspiciously slow pages, we’d like you to look at the thread dumps and see if you can spot the hot code that’s causing the problem. That’d be immensely helpful for the Jenkins developers in understanding the problem. 

Lastly, check out the load statistics of the Jenkins instance, which you can access from the “Manage Jenkins” page. This page gives you the overall utilization of slaves:

The blue line tracks the total capacity of your Jenkins instance, and the red line tracks the busy slaves. The grey line is the length of the waiting queue.

For example, in the above picture you see a very high queue length, which indicates that you don’t have enough slaves to keep up with the workload. At the same time you see the red line well below the blue line, which indicates that some slaves are sitting idle — this happens because sometimes builds need to run on a specific kind of slaves.

So a picture like this indicates that your slave portfolio isn’t optimal. Perhaps you have more Windows slaves than necessary and converting some of them to Linux might improve the utilization, etc.

Those are just a few of the things you can do to get insights into how Jenkins is behaving. Hope you find some of this useful. 

 

Kohsuke Kawaguchi

Founder, Jenkins CI & Elite Developer, CloudBees
www.cloudbees.com

 

- Kohsuke

— Kohsuke Kawaguchi

Founder, Jenkins CI & Elite Developer, CloudBees
www.cloudbees.com

 

Follow CloudBees:

- See more at: http://blog.cloudbees.com/2013/01/2012-jenkins-survey-results-are-in.html#sthash.ehmFz7cs.dpuf

- Kohsuke

— Kohsuke Kawaguchi

Founder, Jenkins CI & Elite Developer, CloudBees
www.cloudbees.com

 

Follow CloudBees:

- See more at: http://blog.cloudbees.com/2013/01/2012-jenkins-survey-results-are-in.html#sthash.ehmFz7cs.dpuf

 

— Kohsuke Kawaguchi

Founder, Jenkins CI & Elite Developer, CloudBees
www.cloudbees.com - See more at: http://blog.cloudbees.com/2013/01/2012-jenkins-survey-results-are-in.html#sthash.egtondf5.dpuf

— Kohsuke Kawaguchi

Founder, Jenkins CI & Elite Developer, CloudBees
www.cloudbees.com - See more at: http://blog.cloudbees.com/2013/01/2012-jenkins-survey-results-are-in.html#sthash.egtondf5.dpu

— Kohsuke Kawaguchi

Founder, Jenkins CI & Elite Developer, CloudBees
www.cloudbees.com - See more at: http://blog.cloudbees.com/2013/01/2012-jenkins-survey-results-are-in.html#sthash.egtondf5.dpuf

- Kohsuke

— Kohsuke Kawaguchi

Founder, Jenkins CI & Elite Developer, CloudBees
www.cloudbees.com
 

F

- See more at: http://blog.cloudbees.com/2013/01/2012-jenkins-survey-results-are-in.html#sthash.egtondf5.dp

Blog Categories: 

Comments

Also pay attention to the used permgen space. We found the JVM's default MaxPermGen size to be woefully inadequate for our instance of Jenkins, leading to OutOfMemory exceptions every few hours until we increased the MaxPermGen size.

Can you say a little more on how you use Nagios to monitor things like queue length? A little while ago I tried to set up monitoring like this in Zabbix, but it wasn't obvious to me how to get app-specific metrics like that into Zabbix. Do you create an mbean for this or something?

Add new comment