In 2015, Docker announced their plugin system and revealed a list of early network and storage integrators. During Docker Global Hackday #3, we started playing with Swarm and Calico -- however the tooling and integration at that stage made it difficult to implement.
Thanks to the hard work of contributors to both projects, the barrier to entry for implementing Swarm and Calico has been greatly reduced. In this article, we’ll look into implementing a Docker Swarm and using Calico as a network plugin, using Docker Machine.
Swarm allows a set of Docker hosts to be clustered, presenting a container API which abstracts scheduling of containers. By using a swarm instead of a set of hosts, much of the complexity around managing application availability and distributing resources is taken care of.
Project Calico provides a layer 3 network implementation, aimed at scalable datacenter deployments. Compared to traditional network overlays, Calico provides a more efficient implementation with minimal packet encapsulation. This allows better usage of node resources and a simple yet powerful network stack for your infrastructure.
While Calico is a great SDN solution for many cases, it has drawbacks in advanced cases. You can read more about why you should use Calico here.
To prototype using Swarm with Calico, I’ll be using Docker Machine creating VMs on VirtualBox. There may be some minor changes when using a different driver, but the core process should be the same.
Many of the steps taken here are not production safe -- keep an eye out for warnings around implementation specifics in this document, as well as in the relevant provider docs.
Implementing Docker Swarm without Calico
Let’s start by creating a non-Calico Docker swarm using the standard Docker Machine interface. This is purely an exercise for comparison. We’ll be following the documentation listed in the Swarm docs.
You’ll need an existing Docker instance to get started, purely to create a Swarm discovery token, so if need be, create a Docker Machine instance just for this. Let’s create a Swarm discovery token:
$ docker run --rm swarm create c25aa882df76a92ae962f4b4fc26168d
Next we can launch the necessary Swarm containers using this token to coordinate the swarm.
$ docker-machine create -d virtualbox --swarm --swarm-master --swarm-discovery token://c25aa882df76a92ae962f4b4fc26168d swarm-master Running pre-create checks... Creating machine... Waiting for machine to be running, this may take a few minutes... Machine is running, waiting for SSH to be available... Detecting operating system of created instance... Provisioning created instance... Copying certs to the local machine directory... Copying certs to the remote machine... Setting Docker configuration on the remote daemon... Configuring swarm... To see how to connect Docker to this machine, run: docker-machine env swarm-master $ docker-machine create -d virtualbox --swarm --swarm-discovery token://c25aa882df76a92ae962f4b4fc26168d swarm-agent-00 Running pre-create checks... Creating machine... Waiting for machine to be running, this may take a few minutes... Machine is running, waiting for SSH to be available... Detecting operating system of created instance... Provisioning created instance... Copying certs to the local machine directory... Copying certs to the remote machine... Setting Docker configuration on the remote daemon... Configuring swarm... To see how to connect Docker to this machine, run: docker-machine env swarm-agent-00 …. $ docker-machine create -d virtualbox --swarm --swarm-discovery token://c25aa882df76a92ae962f4b4fc26168d swarm-agent-NN ...
At this point, you should have a running Swarm cluster. Let’s take a look at it.
$ docker-machine ls NAME ACTIVE DRIVER STATE URL SWARM swarm-agent-00 - virtualbox Running tcp://192.168.99.101:2376 swarm-master swarm-agent-01 - virtualbox Running tcp://192.168.99.102:2376 swarm-master swarm-master * virtualbox Running tcp://192.168.99.100:2376 swarm-master (master) $ eval $(docker-machine env --swarm swarm-master) $ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 8f4cef9a02b3 swarm:latest "/swarm join --advert" 9 minutes ago Up 9 minutes 2375/tcp swarm-agent-01/swarm-agent 01a3712457c7 swarm:latest "/swarm join --advert" 10 minutes ago Up 10 minutes 2375/tcp swarm-agent-00/swarm-agent 5220dca465d2 swarm:latest "/swarm join --advert" 12 minutes ago Up 12 minutes 2375/tcp swarm-master/swarm-agent 1205877156ee swarm:latest "/swarm manage --tlsv" 12 minutes ago Up 12 minutes 2375/tcp, 192.168.99.100:3376->3376/tcp swarm-master/swarm-agent-master $ docker info Containers: 4 Images: 3 Role: primary Strategy: spread Filters: health, port, dependency, affinity, constraint Nodes: 3 swarm-agent-00: 192.168.99.101:2376 └ Status: Healthy └ Containers: 1 └ Reserved CPUs: 0 / 1 └ Reserved Memory: 0 B / 1.021 GiB └ Labels: executiondriver=native-0.2, kernelversion=4.1.13-boot2docker, operatingsystem=Boot2Docker 1.9.1 (TCL 6.4.1); master : cef800b - Fri Nov 20 19:33:59 UTC 2015, provider=virtualbox, storagedriver=aufs swarm-agent-01: 192.168.99.102:2376 └ Status: Healthy └ Containers: 1 └ Reserved CPUs: 0 / 1 └ Reserved Memory: 0 B / 1.021 GiB └ Labels: executiondriver=native-0.2, kernelversion=4.1.13-boot2docker, operatingsystem=Boot2Docker 1.9.1 (TCL 6.4.1); master : cef800b - Fri Nov 20 19:33:59 UTC 2015, provider=virtualbox, storagedriver=aufs swarm-master: 192.168.99.100:2376 └ Status: Healthy └ Containers: 2 └ Reserved CPUs: 0 / 1 └ Reserved Memory: 0 B / 1.021 GiB └ Labels: executiondriver=native-0.2, kernelversion=4.1.13-boot2docker, operatingsystem=Boot2Docker 1.9.1 (TCL 6.4.1); master : cef800b - Fri Nov 20 19:33:59 UTC 2015, provider=virtualbox, storagedriver=aufs CPUs: 3 Total Memory: 3.064 GiB Name: 1205877156ee
docker ps
now shows containers running on all nodes. You can interact with containers on the swarm as a whole using the Docker CLI.
Integrating Calico
Using Swarm with Calico requires a few changes from the standard swarm creation process.
Each node within the Swarm cluster must be configured to offload the cluster store to a location also accessible to the Calico cluster. Each host also needs to create a Calico node using the calicoctl
binary. This configures the host and creates a set of Docker containers to maintain the cluster.
Because of these changes, we can’t create a swarm using the Docker Machine helpers. We will need to create individual machine hosts and then manually create Swarm and Calico clusters. We’ll be following the guide from the Calico docs for implementing Calico as a Docker network plugin. We hope to see more direct plugin support around the docker-machine
tool in the future.
Create a machine cluster
To implement a Calico cluster, we’ll need a set of unclustered Docker hosts.
$ docker-machine create -d virtualbox node-00 Running pre-create checks... Creating machine... Waiting for machine to be running, this may take a few minutes... Machine is running, waiting for SSH to be available... Detecting operating system of created instance... Provisioning created instance... Copying certs to the local machine directory... Copying certs to the remote machine... Setting Docker configuration on the remote daemon... To see how to connect Docker to this machine, run: docker-machine env node-00 $ docker-machine create -d virtualbox node-01 Running pre-create checks... Creating machine... Waiting for machine to be running, this may take a few minutes... Machine is running, waiting for SSH to be available... Detecting operating system of created instance... Provisioning created instance... Copying certs to the local machine directory... Copying certs to the remote machine... Setting Docker configuration on the remote daemon... To see how to connect Docker to this machine, run: docker-machine env node-01 ... $ docker-machine create -d virtualbox node-NN … $ docker-machine ls NAME ACTIVE DRIVER STATE URL SWARM node-00 * virtualbox Running tcp://192.168.99.100:2376 node-01 - virtualbox Running tcp://192.168.99.101:2376 node-02 - virtualbox Running tcp://192.168.99.102:2376
Set up a cluster store
We need to use an external cluster store, etcd, in order to synchronize the Calico and Swarm clusters.
We can run this etcd store on our cluster. In a production environment, this should be a scaled cluster supporting HA. However in this example, we’ll just run a single instance. We’ll run this single instance of etcd on node-00
, which is on IP 192.168.99.100
.
$ eval $(docker-machine env node-00) $ docker run -d -p 2379:2379 quay.io/coreos/etcd -advertise-client-urls http://192.168.99.100:2379 -listen-client-urls http://0.0.0.0:2379 ... $ curl 192.168.99.100:2379/v2/keys {"action":"get","node":{"dir":true}}
Set up a Calico cluster
Thanks to tooling and the fact that we're using boot2docker as a base OS in Docker Machine, Calico requires very little setup.
We simply need to download calicoctl
and use it to set up the cluster on each host. The ETCD_AUTHORITY
variable will remain the same on each host, however NODE_IP
should be the IP of the host.
$ docker-machine ssh node-00 ... docker $ wget http://www.projectcalico.org/latest/calicoctl Connecting to www.projectcalico.org (64.91.234.195:80) Connecting to www.projectcalico.org (64.91.234.195:80) Connecting to github.com (192.30.252.128:443) Connecting to github-cloud.s3.amazonaws.com (54.231.114.122:443) calicoctl 100% |*******************************| 5428k 0:00:00 ETA docker $ chmod +x calicoctl docker $ sudo ETCD_AUTHORITY=192.168.99.100:2379 ./calicoctl node --libnetwork --ip=$NODE_IP Pulling Docker image calico/node:v0.14.0 Calico node is running with id: 3cb0b50060d2bf423bd22f51ec30b9408bf5d199f631a70da7ca340902e7e134 Pulling Docker image calico/node-libnetwork:v0.7.0 Calico libnetwork driver is running with id: a8a9e8bd63e2540de76fb89fc40e31652b2df08392366046d3304626387e5b01 docker $ sudo ETCD_AUTHORITY=192.168.99.100:2379 ./calicoctl status calico-node container is running. Status: Up 7 minutes Running felix version 1.3.0rc6 IPv4 BGP status IP: 192.168.99.100 AS Number: 64511 (inherited) +--------------+-----------+-------+-------+------+ | Peer address | Peer type | State | Since | Info | +--------------+-----------+-------+-------+------+ +--------------+-----------+-------+-------+------+ IPv6 BGP status No IPv6 address configured.
This needs to be run on every host, using the relevant host IP. Once this is done, calicotl
should list all connected nodes.
docker $ sudo ETCD_AUTHORITY=192.168.99.100:2379 ./calicoctl status calico-node container is running. Status: Up 19 seconds Running felix version 1.3.0rc6 IPv4 BGP status IP: 192.168.99.102 AS Number: 64511 (inherited) +----------------+-------------------+-------+----------+-------------+ | Peer address | Peer type | State | Since | Info | +----------------+-------------------+-------+----------+-------------+ | 192.168.99.100 | node-to-node mesh | up | 23:38:29 | Established | | 192.168.99.101 | node-to-node mesh | up | 23:38:30 | Established | +----------------+-------------------+-------+----------+-------------+ IPv6 BGP status No IPv6 address configured.
Configure your Docker hosts
Before we can set up a Swarm, we need to reconfigure each host to use the same cluster store as Calico.
This is complicated by the fact we are using boot2docker. However we can still reconfigure our Docker initialization parameters with an external cluster store. We can do this by adding a --cluster-store
argument to /var/lib/boot2docker/profile
.
$ cat <<EOF > profile.new EXTRA_ARGS=' --label provider=virtualbox --cluster-store=etcd://192.168.99.100:2379 ' CACERT=/var/lib/boot2docker/ca.pem DOCKER_HOST='-H tcp://0.0.0.0:2376' DOCKER_STORAGE=aufs DOCKER_TLS=auto SERVERKEY=/var/lib/boot2docker/server-key.pem SERVERCERT=/var/lib/boot2docker/server.pem EOF $ sudo chown root:root profile.new && sudo mv profile.new /var/l ib/boot2docker/profile && sudo /etc/init.d/docker restart
Now your Docker hosts are configured to use the same cluster store as Calico.
Setting up a Swarm
Just like with the simple Swarm example, we first need to create a cluster token and then Swarm nodes and a manager.
If you plan on deploying Swarm in a production environment, be sure to read the Swarm docs on discovery methods and use something other than the standard Swarm token service.
Starting Swarm nodes is fairly standard. However we need to configure the manager with TLS enabled and mount the generated boot2docker certificates when starting the container.
Keep in mind that in this configuration the Swarm manager is a single point of failure. For production environments, be sure to follow the Swarm docs on HA deployments.
$ docker run --rm swarm create 938891a526e627d6ab11dd2e92cb8694 $ docker-machine ls NAME ACTIVE DRIVER STATE URL SWARM node-00 - virtualbox Running tcp://192.168.99.100:2376 node-01 - virtualbox Running tcp://192.168.99.101:2376 node-02 * virtualbox Running tcp://192.168.99.102:2376 $ eval $(docker-machine env node-00) $ docker run -d swarm join --addr=192.168.99.100:2376 token://938891a526e627d6ab11dd2e92cb8694 $ eval $(docker-machine env node-01) $ docker run -d swarm join --addr=192.168.99.101:2376 token://938891a526e627d6ab11dd2e92cb8694 $ eval $(docker-machine env node-02) $ docker run -d swarm join --addr=192.168.99.102:2376 token://938891a526e627d6ab11dd2e92cb8694 $ docker run -dp 2377:2375 -v /var/lib/boot2docker:/var/lib/boot2docker swarm manage --tlsverify --tlscert /var/lib/boot2docker/server.pem --tlscacert /var/lib/boot2docker/ca.pem --tlskey /var/lib/boot2docker/server-key.pem token://938891a526e627d6ab11dd2e92cb8694
At this point, we have a Swarm manager with control over all nodes in the cluster, available via 192.168.99.102:2377
. From here, we can follow the standard steps to test Docker Swarm and Calico listed in the Calico tutorial.
Rather than using the -H flag, you can also just redefine the DOCKER_HOST variable to specify the Swarm endpoint. Be sure to check your etcd cluster store to make sure it's still running; restarting the Docker service may not have automatically started it.
$ eval $(docker-machine env node-02) $ export DOCKER_HOST=tcp://192.168.99.102:2377 $ docker ps fe3fcff4e0d2 calico/node-libnetwork:v0.7.0 "./start.sh" About an hour ago Up 56 minutes node-02/calico-libnetwork 83a472b3ba65 calico/node:v0.14.0 "/sbin/start_runit" About an hour ago Up 56 minutes node-02/calico-node dbe7968c4707 calico/node-libnetwork:v0.7.0 "./start.sh" About an hour ago Up 53 minutes node-01/calico-libnetwork 4bddb4e3d6b4 calico/node:v0.14.0 "/sbin/start_runit" About an hour ago Up 53 minutes node-01/calico-node a8a9e8bd63e2 calico/node-libnetwork:v0.7.0 "./start.sh" About an hour ago Up 54 minutes node-00/calico-libnetwork 3cb0b50060d2 calico/node:v0.14.0 "/sbin/start_runit" About an hour ago Up 54 minutes node-00/calico-node 3c9092601f5a quay.io/coreos/etcd "/etcd -advertise-cli" 2 hours ago Up About a minute 2380/tcp, 4001/tcp, 192.168.99.100:2379->2379/tcp, 7001/tcp node-00/boring_panini
It is also simpler to manage your Calico cluster by directly executing commands inline via docker-machine ssh
, which can be aliased in Bash or scripted. But keep in mind that any time your Docker Machine needs to be restarted, you’ll need to re-download the calicoctl
binary.
$ docker-machine ssh node-00 sudo ETCD_AUTHORITY=192.168.99.100:2379 ./calicoctl pool show +----------------+---------+ | IPv4 CIDR | Options | +----------------+---------+ | 192.168.0.0/16 | | +----------------+---------+ +--------------------------+---------+ | IPv6 CIDR | Options | +--------------------------+---------+ | fd80:24e2:f998:72d6::/64 | | +--------------------------+---------+
Using Calico
Now that Calico is set up, what does that mean for your infrastructure and your applications?
First of all, Calico is not an entirely self-managing service. Be sure to read the documentation thoroughly to ensure you are applying the correct base iptables rules and configuring your nodes in a sensible manner. It’s important to understand what restrictions Calico brings to your infrastructure, as well as what it does and does not protect against. The Calico docs provide information on these concerns.
With the Calico libnetwork driver in place, you can manage networks and container IPs via the docker network
interface. This means that after standard Calico node configuration, most operational changes required can be implemented through the libnetwork driver in much the same way such changes would be made using an overlay network.
Conclusion
Calico can greatly enhance your Docker infrastructure by facilitating scale and providing a more full-featured SDN than a standard overlay.
With recent updates in Docker and Calico tooling, setting up and maintaining a cluster is far simpler, and running one in a highly available manner in production is well documented. As further tooling and integrations develops, we expect this process to be even more simplified and configurable.
Calico provides a simple-to-implement container network, a highly scalable alternative to the standard overlay. By integrating with these networks via libnetwork and using a standard control interface, you can make it easy to switch Calico out for another network layer or connect different geographic regions as a single homogenous network.
Before settling on Calcio, be sure to read up on its benefits and those of its alternatives such as Weave and the standard overlay.