Our use of RabbitMQ

One interesting point that came out in recent talks I (Michael) have given is that developers are curious about the innards of CloudBees. 

On this blog I hope we can talk about how some interesting bits work - some will be an approximation (as the detail is boring) or perhaps misinformation (in case the person writing the blog forgets exactly how a system works that they didn’t themselves build !). This post is one of these….

When an application is deployed, or a database created, stats collected, and many more things - a means is needed to get a message to the agents that manage each server (each server has an “agent” running that has local control of the box).

On the RUN@cloud side of things this is done by having each agent connect to a broker - in this case the routing/broker is RabbitMQ. This is an outbound connection from the agent to the broker - which simplifies firewall rules (and also is used with our AnyCloud where servers run in arbitrary data centres).

Initially we used AMQP clients directly to the broker - this was kind of ok, although we struggled with various clients (in erlang, and java) being reliable in the face of internet style networks (where drop outs and connection re-establishment was necessary). This was remedied by a shift to ZeroMQ - at least for the agent-broker communications.

RabbitMQ is still there behind the ZeroMQ endpoint - and this has proved to be quite a reliable situation even with flaky networks.

A 10000 foot view of how we use Rabbit is this: each client connects and on connection is given an “inbox” where, should a message appear, the agent side message handler will fire and dispatch to the appropriate handler (remember agents run on all the worker/server machines, not the broker).

When a message is sent - it is sent to something like a “topic name” where any interested parties can listen in. If an interested party (a client, an agent) declares it is interested - the broker then ensures it is placed in the inbox for that agent/client.

Messages - when sent can be targeted to a specific server/agent, or “broadcast” based on name. For example: we would use a naming standard like stax.genapp.SERVERID where SERVERID is some unique/random looking id.  A message can be addressed to stax.genapp.SERVERID (in which case it is point to point) or “stax.genapp” - in which case it is more like a broadcast to interested parties. In this naming scheme, we call the “stax.genapp” part the “service name” (a service is made up of 1 or more servers) - the “SERVERID” is known as the “target”.

Some messages are “send and forget”, some expect a response - some only look for the first response that comes back (ie we don’t care which server in a service responds, as long as one does).

This scheme is fairly generic, simple, but can be built into powerful messaging patterns. We even use it to collect statistical data from servers - each agent reports data to a service called something like “app.stat”- they don’t care who is listening - it is just a firehose of information (in fact, tapping into it is a good way to test the “scalability” of your message handler - and spot memory leaks !).

Makes sense? No? oh well… consider yourself informed, or something.

Enjoy !

Michael
@michaelneale on twitter.