Building Cloud Apps with Civo and Docker Part V: Managing State

Written by: Lee Sylvester
12 min read

Over the past four articles, you've seen how to distribute a Docker application across multiple nodes and load balance it. This was performed using Docker Swarm and Kubernetes. These techniques are powerful and increase the availability and fault tolerance of your applications. However, up until now, none of the examples explained managing state.

In this article, you'll take a look at how state can be handled with Kubernetes as well as a little theory for tackling bigger and more complex applications.

What Is State?

In case you're new to this, it helps to identify what state actually is.

When creating the examples in the previous articles, each of the nodes were agnostic of both user and application-based information. The containers created were simply processing handlers. Information went in one end and came out transformed; a simple request -> response transaction.

In the wild, very few applications in their entirety function in this way. While we endeavor to make microservices as stateless as possible, an application on the whole has many elements of state, such as database files, uploaded user assets, user session files, etc.

When dealing with state in a microservice application, state should be reduced to as few services as possible. If multiple services utilize the same data, then that data should be extracted to a single independent service, allowing other interested services access to the data.

Categorizing State

While state can be categorized into multiple types, such as database data, static assets, user assets, log files, user session files, etc., it helps instead to consider the state in terms of availability. At its root, state is usually either remote or local data and exists either ephemerally or physically (in memory or on disk). By grouping data into as few categories as possible, you are then able to reduce the necessary resources required to serve this information.

For instance, providing a stored file resource for logs, user assets and static assets means requiring fewer marshalling services (those services in charge of managing such files) as well as fewer attached storage mediums when spaced over a cluster of nodes.

The caveat of this approach is with regard to black-box data. Databases, for instance, will typically run independently of other services in larger applications and, as they will require direct management of their files, will likely warrant an independent file storage medium.

Ultimately, the strategy you leverage will be unique to your application's requirements. Since application state requirements can vary wildly, this will be an important consideration at every stage of your development.

Managing State: an Example

Any physical state in your application will require file storage space. With Docker, this means using volumes.

Kubernetes provides a wealth of functionality for attaching and managing persistent storage using the PersistentVolume and PersistentVolumeClaim functionality, which you'll see shortly. The majority of this functionality is oriented around third-party storage mediums, such as GlusterFS, CephFS, Amazon Web Services Elastic Block Storage, Google Cloud Engines Persistent Disk, and many others, most of which are remote storage options. A decent explanation of the supported mediums are provided in the Kubernetes documentation.

For Kubernetes applications, these are considered the go-to options, as they increase the reliability of your application should any nodes become unstable.

Despite this, however, storing data simply and locally is still an important requirement and one that is surprisingly infrequently documented on the web. As such, the example detailed in this article will demonstrate just that.

A note about PersistentVolume and PersistentVolumeClaim

Kubernetes is designed to abstract pods (those services performing functionality within your cluster) and the orchestration of those pods. The same is true of any and all storage or volumes.

For persistent volumes, we have PersistentVolume and PersistentVolumeClaim; the former is the representation of your physical storage volume and the latter is the association assigned to any given pod type. I like to think of them as velcro! You attach the spiky strip (PersistentVolume) to the volume and the fluffy strip (PersistentVolumeClaim) to your pods. This way, pods can be attached and detached as needed.

A PersistentVolumeClaim can be assigned to a single pod type only, and with local storage, a given volume can be associated with pods on the same node only. This means that with storage that can only be attached to a single node, those pods that need to access the storage medium must also be assigned to that node.

However, storage such as AWS Elastic File Storage that can be attached to multiple nodes can have pods exist on any of those nodes. You can, however, assign multiple single-node storage mediums if you require distributing pods across multiple nodes; useful if you're hosting your own GlusterFS or CephFS cluster.

Getting started

This example continues from the previous article, so if you do not have your cluster running with Kubernetes installed, go ahead and do that now.

Ensure the services in the previous article are no longer running. To check, simply run the following:

$ kubectl get pods

If you receive the following error:

The connection to the server localhost:8080 was refused - did you specify the right host or port?

then make sure you switch to the kubeuser, using su - kubeuser before executing any kubectl commands.

You'll create new services for this tutorial and, while it may seem a little contrived, you will find yourself utilizing a similar approach for many applications in Kubernetes.

Attaching storage

Now that your nodes are ready to run services, you'll need to allocate some local storage. Civo provides block storage for this purpose. This storage type is a single-node only medium, so it should illustrate the purposes of this example nicely.

An excellent how-to for attaching the storage is presented on Civo's website, so I will not be reproducing that info. Go ahead and attach a 1GB block storage to kube-worker1 with the path /mnt/assets.

Attaching storage to the controller node is always a bad idea. Kubernetes prefers not to allocate any services to the controller node if possible. If you were to follow this tutorial and attempt to attach pods to storage on the controller node, the pod would remain in a pending state. Investigating the logs of that pod would show that the designated node has taints. This is because Kubernetes allocates the controller node as NoSchedule.

Now that your storage is attached, you'll need to place a file or two in there. For this demonstration, I added an image called cats.png. After all, who doesn't like cats? If you wish to do the same, simply run the following on kube-worker1:

$ wget http://pngimg.com/uploads/cat/cat_PNG133.png -O /mnt/assets/cats.png

Backend service

With that out of the way, let's get down to building the services. As before, you'll use a PHP service to serve a file, but with a slight twist. The content of the served page will contain an added img tag, to display our cute little cats image.

< ?php
  echo getHostName();
  echo "<br />";
  echo getHostByName(getHostName());
?>
<img src="assets/cats.png" />

And the Dockerfile:

FROM webgriffe/php-apache-base:5.5
COPY src/ /var/www/html/

I've already created Docker images for each of the services in this tutorial on Docker Hub, so I'll use those in the following YAML files, but you're free to upload your own if you wish.

With the Docker image in Docker Hub, you can now create the service. Create a new file on kube-controller called backend.yaml with the following content:

apiVersion: v1
kind: Service
metadata:
  name: webserver
spec:
  selector:
    app: php-service
    srv: backend
  ports:
  - protocol: TCP
    port: 80
    targetPort: http
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: webserver
spec:
  selector:
    matchLabels:
      app: php-server
      srv: backend
  replicas: 3
  template:
    metadata:
      labels:
        app: php-server
        srv: backend
    spec:
      containers:
      - name: php-server
        image: "leesylvester/codeship-p5-server"
        ports:
        - name: http
          containerPort: 80

This includes both the pod deployment definition and the service definition in a single file. You can then go ahead and create the service, using:

$ kubectl create -f backend.yaml

Now, check that the service is running:

$ kubectl get pods
NAME                         READY     STATUS    RESTARTS   AGE
webserver-5dc468f9bd-8kcf9   1/1       Running   0          1m
webserver-5dc468f9bd-fbz8c   1/1       Running   0          1m
webserver-5dc468f9bd-mw24f   1/1       Running   0          1m

Great!

Assigning PersistentVolume and PersistentVolumeClaim

You now have a server for serving your PHP files, but before you can add one for the static assets, you'll first need to inform Kubernetes of the storage medium.

As stated previously, this is done using the PersistentVolume resource. For the block storage you have attached to kube-worker1, your YAML file will look like this:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: bls-pv
  labels:
    name: assets
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteMany
  storageClassName: local-storage
  persistentVolumeReclaimPolicy: Retain
  volumeMode: Block
  local:
    path: /mnt/assets
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - kube-worker1

This file contains a number of new properties, but each are relatively easy to explain:

  • capacity: provides a means to tell Kubernetes how much space the storage has. This is important, because if a pod requests storage that is greater than this, Kubernetes will continue searching through any other attached PersistentVolume definitions until a suitable size is found.

  • accessModes: is a little misleading. One would assume this enforces how the storage can be accessed, but it actually acts as a simple guide label for the benefit of storage service providers. Possible options are ReadWriteOnce, ReadWriteMany, and ReadOnlyMany.

  • storageClassName: can be any of the supported service providers or local-storage for, well, local storage.

  • persistentVolumeReclaimPolicy: dictates if the data stored in the storage medium should be kept when its attached service has been deleted. Options include Reclaim, Recycle, and Delete.

  • volumeMode: should be set to Block for block storage or FileSystem for local file system storage. Other options exist for service providers.

  • local_path: provides the path to the storage directory or mount.

  • nodeAffinity: provides the node selection requirements for the associated pods, NOT the volume itself.

The PersistentVolume definition is applied cluster wide. The nodeAffinity requirements are pod specific. Therefore, if you wished to have this apply to pods on multiple nodes, the defined storage would be expected to exist on each of those nodes with the same configuration.

Copy the above YAML into a file called persistent_volume.yaml and execute it on kube-controller with:

$ kubectl create -f persistent_volume.yaml

Next, you need to partner it with a claim:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: bls-pvc
spec:
  accessModes:
  - ReadWriteMany
  storageClassName: local-storage
  resources:
     requests:
       storage: 1Gi
  selector:
    matchLabels:
      name: assets

This definition matches the volume definition with the same accessModes and a storage capacity that is equal to or less than the volumes capacity value.

The selector property will allow us to pair the upcoming service to it.

Once more, copy the definition to a file called persistent_volume_claim.yaml and execute with:

$ kubectl create -f persistent_volume_claim.yaml

You can check if this worked by executing the following:

$ kubectl get pv
NAME      CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIM             STORAGECLASS    REASON    AGE
bls-pv    1Gi        RWX            Retain           Bound     default/bls-pvc   local-storage             1m

Labeling your storage

Kubernetes is unaware of the hardware capabilities of each node, at least as far as additional storage, etc., is concerned. As such, you'll need to manually mark the node with the storage. You do this with labels.

Labels can be anything you choose and can be associated with any resource you choose. In the case of the backend service, you labeled it with the keys app and srv and the values php-server and backend respectively.

Labeling nodes works in the same way. However, in this instance, you'll assign the label directly, without the use of a YAML file, with the following command:

$ kubectl label node kube-worker1 node_type=storage
node "kube-worker1" labeled

With that done, you can then assign your asset server pod directly to the node with the storage, without needing to identify the node by IP or other such specific notation in the pod definition.

The asset service

With your persistent volume in place, it's now time to create the asset service. This service will be a simple NGINX service that serves any files in a given directory. The nginx.conf file for this server will look like this:

worker_processes 2;
events { worker_connections 1024; }
http {
  server {
    root /var/www/html;
    listen 80;
    location / {
    }
  }
}

and the Dockerfile:

FROM nginx
COPY nginx.conf /etc/nginx/nginx.conf

Super simple!

You'll then pair this with the PersistentVolumeClaim in its YAML file:

apiVersion: v1
kind: Service
metadata:
  name: assetserver
spec:
  selector:
    app: php-service
    srv: backend
  ports:
  - protocol: TCP
    port: 80
    targetPort: http
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: assetserver
spec:
  selector:
    matchLabels:
      app: asset-server
      srv: backend
  template:
    metadata:
      labels:
        app: asset-server
        srv: backend
    spec:
      volumes:
        - name: assetvol
          persistentVolumeClaim:
            claimName: bls-pvc
      nodeSelector:
        node_type: storage
      containers:
      - name: asset-server
        image: "leesylvester/codeship-p5-asset"
        volumeMounts:
        - name: assetvol
          mountPath: /var/www/html/assets
        ports:
        - name: http
          containerPort: 80

This file should look similar to the backend.yaml definition. However, it also supplies a volumes segment which associates it with the PersistentVolumeClaim. The definition then references the volumeMounts for the pod, which is supplied to the associated Docker container as a Docker volume.

The mountPath extends the root parameter in the nginx.conf file. This means that the block storage will be mapped to ./assets relative to the NGINX root.

Also, note the nodeSelector option. This binds the pod to only those nodes that have the label node_type with the value storage, which we created earlier for kube-worker1. This pod will not, therefore, be assigned to any other node!

Go ahead and copy the above definition to asset_backend.yaml and execute with:

$ kubectl create -f asset_backend.yaml

Now, check that the node is running:

$ kubectl get pods
NAME                         READY     STATUS    RESTARTS   AGE
assetserver-8746b67d-rlcxh   1/1       Running   0          1m
webserver-5dc468f9bd-8kcf9   1/1       Running   0          3m
webserver-5dc468f9bd-fbz8c   1/1       Running   0          3m
webserver-5dc468f9bd-mw24f   1/1       Running   0          3m

As you can see, only one asset pod has been deployed, which will be running on the node with the block storage.

The load balancer

Finally, you will need the load balancer. As before, this will also be an NGINX service but will distribute to the appropriate backend service based on the supplied URL.

The nginx.conf will therefore contain:

worker_processes 2;
events { worker_connections 1024; }
http {
  server {
    listen 80;
    location / {
      proxy_pass http://webserver;
      proxy_http_version 1.1;
    }
    location /assets/ {
      proxy_pass http://assetserver/assets/;
      proxy_http_version 1.1;
    }
  }
}

Here, the NGINX instance will route all requests to the PHP server, unless the URL path starts with assets/, whereby it will be routed to the asset server.

The Dockerfile for this instance will look like this:

FROM nginx
COPY nginx.conf /etc/nginx/nginx.conf

Copy the definition above to frontend.yaml and execute with:

$ kubectl create -f frontend.yaml

Checking the running pods will now show all of the services in attendance:

kubectl get pods
NAME                         READY     STATUS    RESTARTS   AGE
assetserver-8746b67d-rlcxh   1/1       Running   0          2m
frontend-5cdddc6458-5ltvw    1/1       Running   0          1m
frontend-5cdddc6458-6rpbv    1/1       Running   0          1m
frontend-5cdddc6458-x9576    1/1       Running   0          1m
webserver-5dc468f9bd-8kcf9   1/1       Running   0          4m
webserver-5dc468f9bd-fbz8c   1/1       Running   0          4m
webserver-5dc468f9bd-mw24f   1/1       Running   0          4m

Checking your handiwork

If you navigate to port 30001 of any of your nodes in a web browser, you should be presented with the page below:

[caption id="attachment_6531" align="aligncenter" width="533"]

Cute Cats Page[/caption]

Refreshing the page will update the id of the server shown in the page, informing you that the page was served by a different node. However, the image will remain present, as served by the single asset server.

Taking It Further

As with any example, this tutorial has been a little contrived, but I'm sure it's easy to see the power and potential this route offers.

When applied to a controller -> agent MySQL deployment, a single block-server storage is more than adequate, allowing the controller to store its necessary files within the attached storage while the agent nodes manage their data ephemerally. Likewise, utilizing block storage on each node is perfect for GlusterFS and CephFS deployments, where the contents stored are mirrored across nodes for high availability.

The important point to note is that there is no "one solution fits all" with distributed applications. Spending some time to work out what your application needs is paramount before tackling how it should be deployed. Then, it's simply a matter of getting creative and keeping an eye on its performance.

Stay up to date

We'll never share your email address and you can opt out at any time, we promise.