Infrastructure Monitoring with TICK Stack

Written by: Gianluca Arbezzano

11 min read

You can design distributed systems with 400 microservices, but if you're not able to understand what's going on with them and how an application is behaving because of them, you can't control anything. You can deploy 100 times a day, but you won't have the visibility into how your code impacts your system, which will lead to failure.

How fast we're able to predict and detect any failures or new behaviors is an important step of the puzzle. That's why I'm fascinated by time series and monitoring systems. In this article, I will give you a high-level overview about InfluxDB and the TICK stack, a set of open-source projects focused on time series and monitoring.

What Is InfluxDB and How Does the TICK Stack Work?

TICK is acrimonious for Telegraf, InfluxDB, Chronograf, and Kapacitor. It's a set of open-source tools that can be combined together or used separately to collect, store, visualize, and manipulate any time series data.

First, we need to understand what time series is. Essentially, it is a collection of points. Every point has a special label called time. It can be stored at different levels of precision like second, nanosecond, and so on.

The data structure in InfluxDB looks like this:

h2o_feet,location=coyote_creek water_level=5.617,level_description="between 3 and 6 feet" 1439862840

There is a measurement h2o_feet, it represents a set of points, it's the time series. As you can see, the last value is always time: 1439862840. There are two different concepts: field and tag. A tag is a key value store indexed. Fields are not indexed, and they can have a different type of values like integer, boolean, or float.

The schema is measurement,tags fields timestamp. With the previous example:

  • measurement: h2o_feet

  • tags: location=coyote_creek

  • fields: water_level=5.617,level_description="between 3 and 6 feet"

  • timestamp: 1439862840

You can store more than one tag separated by a comma, like so: location=coyote_creek, region=us.

Why do we need a database focused on time series? Can't we use MongoDB, MySQL, Elasticsearch? This is the common question. There are a set of benchmarks about it:

The first reason is that the resource is obviously an important parameter. Time series is a specific area; if we can cover only that struct, we can also offer specific queries and solutions around this use case.

The second reason is that the database needs to be very fast, since a monitoring system receives a lot of points in a short amount of time. It also needs to be able to serve reads without block writes.

InfluxDB is the first and most famous project part of the TICK stack, and it's the storage engine. It's able to handle points and time series. It supports two different protocols: UDP and TCP. It provides an API to write and to query time series.

Getting Started with InfluxDB

I use Docker a lot, and with it, it's easy to show how we can run some containers together. Let's start InfluxDB and use an isolated network called tsdb-test just to be sure that we can configure the internal communication without trouble:

docker network create tsdb-test
docker run -d -it --name influxdb -p 8086:8086 --network tsdb-test influxdb

Let's test it:

$ curl -I http://localhost:8086/ping
HTTP/1.1 204 No Content
Content-Type: application/json
Request-Id: e1096af5-6bb3-11e7-8001-000000000000
X-Influxdb-Version: 1.2.4
Date: Tue, 18 Jul 2017 12:23:17 GMT

By default, port 8085 serves the HTTP API. We called the /ping entry point, just to double check. InfluxDB has a powerful CLI called influx, and now we will use it to make some other tests.

Copy/paste this in a file ~/influx_data.txt:

# DDL
CREATE DATABASE NOAA_water_database
# DML
# CONTEXT-DATABASE: NOAA_water_database
h2o_feet,location=coyote_creek water_level=2.943,level\ description="below 3 feet" 1439870400
h2o_feet,location=coyote_creek water_level=2.831,level\ description="below 3 feet" 1439870760
h2o_feet,location=coyote_creek water_level=2.717,level\ description="below 3 feet" 1439871120
h2o_feet,location=coyote_creek water_level=2.625,level\ description="below 3 feet" 1439871480
h2o_feet,location=coyote_creek water_level=2.533,level\ description="below 3 feet" 1439871840
h2o_feet,location=coyote_creek water_level=2.451,level\ description="below 3 feet" 1439872200
h2o_feet,location=coyote_creek water_level=2.385,level\ description="below 3 feet" 1439872560
h2o_feet,location=coyote_creek water_level=2.339,level\ description="below 3 feet" 1439872920
h2o_feet,location=coyote_creek water_level=2.293,level\ description="below 3 feet" 1439873280
h2o_feet,location=coyote_creek water_level=2.287,level\ description="below 3 feet" 1439873640
h2o_feet,location=coyote_creek water_level=2.290,level\ description="below 3 feet" 1439874000
h2o_feet,location=coyote_creek water_level=2.313,level\ description="below 3 feet" 1439874360
h2o_feet,location=coyote_creek water_level=2.359,level\ description="below 3 feet" 1439874720
h2o_feet,location=coyote_creek water_level=2.425,level\ description="below 3 feet" 1439875080
h2o_feet,location=coyote_creek water_level=2.513,level\ description="below 3 feet" 1439875440
h2o_feet,location=coyote_creek water_level=2.608,level\ description="below 3 feet" 1439875800
h2o_feet,location=coyote_creek water_level=2.703,level\ description="below 3 feet" 1439876160
h2o_feet,location=coyote_creek water_level=2.822,level\ description="below 3 feet" 1439876520
h2o_feet,location=coyote_creek water_level=2.927,level\ description="below 3 feet" 1439876880
h2o_feet,location=coyote_creek water_level=3.054,level\ description="between 3 and 6 feet" 1439877240
h2o_feet,location=coyote_creek water_level=3.176,level\ description="between 3 and 6 feet" 1439877600
h2o_feet,location=coyote_creek water_level=3.304,level\ description="between 3 and 6 feet" 1439877960
h2o_feet,location=coyote_creek water_level=3.432,level\ description="between 3 and 6 feet" 1439878320
h2o_feet,location=coyote_creek water_level=3.570,level\ description="between 3 and 6 feet" 1439878680
h2o_feet,location=coyote_creek water_level=3.720,level\ description="between 3 and 6 feet" 1439879040
h2o_feet,location=coyote_creek water_level=3.881,level\ description="between 3 and 6 feet" 1439879400
h2o_feet,location=coyote_creek water_level=4.049,level\ description="between 3 and 6 feet" 1439879760
h2o_feet,location=coyote_creek water_level=4.209,level\ description="between 3 and 6 feet" 1439880120
h2o_feet,location=coyote_creek water_level=4.383,level\ description="between 3 and 6 feet" 1439880480
h2o_feet,location=coyote_creek water_level=4.560,level\ description="between 3 and 6 feet" 1439880840
h2o_feet,location=coyote_creek water_level=4.744,level\ description="between 3 and 6 feet" 1439881200
h2o_feet,location=coyote_creek water_level=4.915,level\ description="between 3 and 6 feet" 1439881560
h2o_feet,location=coyote_creek water_level=5.102,level\ description="between 3 and 6 feet" 1439881920
h2o_feet,location=coyote_creek water_level=5.289,level\ description="between 3 and 6 feet" 1439882280
h2o_feet,location=coyote_creek water_level=5.469,level\ description="between 3 and 6 feet" 1439882640
h2o_feet,location=coyote_creek water_level=5.643,level\ description="between 3 and 6 feet" 1439883000
h2o_feet,location=coyote_creek water_level=5.814,level\ description="between 3 and 6 feet" 1439883360
h2o_feet,location=coyote_creek water_level=5.974,level\ description="between 3 and 6 feet" 1439883720
h2o_feet,location=coyote_creek water_level=6.138,level\ description="between 6 and 9 feet" 1439884080
h2o_feet,location=coyote_creek water_level=6.293,level\ description="between 6 and 9 feet" 1439884440
h2o_feet,location=coyote_creek water_level=6.447,level\ description="between 6 and 9 feet" 1439884800
h2o_feet,location=coyote_creek water_level=6.601,level\ description="between 6 and 9 feet" 1439885160

Now we can import that data:

$ docker run -it --rm --network tsdb-test -v ${HOME}:${HOME} -w ${HOME} influxdb influx -host influxdb -import -path ./influx_data.txt
2017/07/18 12:35:20 Processed 1 commands
2017/07/18 12:35:20 Processed 42 inserts
2017/07/18 12:35:20 Failed 0 inserts

At this point, we can start the Influx CLI to make some queries:

  • show databases lists all the databases.

  • use NOAA_water_database moves the scope of the CLI to a specific database in our case NOAA_water_database.

$ docker run -it --rm --network tsdb-test -v ${HOME}:${HOME} -w ${HOME} influxdb influx -host influxdb
Connected to http://influxdb:8086 version 1.2.4
InfluxDB shell version: 1.2.4
> show databases
name: databases
name
----
_internal
NOAA_water_database
> use NOAA_water_database
Using database NOAA_water_database
  • select * from h2o_feet limit 10 to get 10 points from h2o_feet measurement.

> select * from h2o_feet limit 10
name: h2o_feet
time       level description location     water_level
----       ----------------- --------     -----------
1439870400 below 3 feet      coyote_creek 2.943
1439870760 below 3 feet      coyote_creek 2.831
1439871120 below 3 feet      coyote_creek 2.717
1439871480 below 3 feet      coyote_creek 2.625
1439871840 below 3 feet      coyote_creek 2.533
1439872200 below 3 feet      coyote_creek 2.451
1439872560 below 3 feet      coyote_creek 2.385
1439872920 below 3 feet      coyote_creek 2.339
1439873280 below 3 feet      coyote_creek 2.293
1439873640 below 3 feet      coyote_creek 2.287

As you probably noticed, the queries look like SQL. They are similar because it makes it easy to interact with a database in a familiar way. You can read more about that here.

Collect Information From Any Server with Telegraf

Building an efficient engine to store and manage time series is one of the challenges of a monitoring system. How to collect and store information from a source is also important. There are several different sources that you need to collect data from: your application, hardware, virtual machines, and so on.

Telegraf is an agent written in Go ,and its main focus is to simplify this task. It's made of input and output plugins. Inputs are MySql, CouchDB, Spark, HaProxy, Disqus, Docker, AWS, and so on. The list is very long. The input plugins represent the list of services where you can get data from.

Output plugins are the store where you can save your data. InfluxDB is just one. As an open-source and standalone project, Telegraf does support storage platforms other than InfluxDB, such as AMQP, Kafka, Kinesis, MQTT, OpenTSDB, Prometheus, and others....

Telegraf is configuration based, which means that there is a configuration file that you need to configure in order to get info from system and services:

[global_tags]
  department = "it"
[agent]
  interval = "10s"
  round_interval = true
  metric_buffer_limit = 5000
  flush_buffer_when_full = true
  collection_jitter = "0s"
  flush_interval = "30s"
  flush_jitter = "30s"
  debug = false
  hostname = ""
# Send metrics to the monitoring instance
[[outputs.influxdb]]
  urls = ["http://influxdb:8086"]
  database = "telegraf"
  retention_policy = "autogen"
  precision = "s"
  timeout = "10s"
  username = ""
  password = ""
[[inputs.cpu]]
  percpu = false
  totalcpu = true
  fieldpass = ["usage_idle", "usage_user", "usage_system"]
[[inputs.diskio]]
[[inputs.diskio]]
  name_prefix = "local_"
[[inputs.docker]]
  endpoint = "unix:///var/run/docker.sock"
  container_names = []
  namepass = [ "docker", "docker_container_cpu", "docker_container_mem" ]
[[inputs.mem]]
[[inputs.netstat]]
[[inputs.system]]

Copy this one in $HOME/telegraf.conf. This is a TOML file that contains some main sections:

  • global_tags is a free set of tags that will be added to every point. For this Telegraf, we are using the department to identify the laptop in a company. But if you think about an infrastructure, you can use provider if your application is running on bare metal, AWS, Google Cloud, or region if you have a multi-region architecture.

  • agent contains information about the single telegraf agent.

  • output and input. As said before, we are describing sources and destination of your points. This is a very easy and standard configuration with system, memory, CPU, Docker, and network input plugin and InfluxDB as output.

There are different ways to install and run Telegraf depending on your distribution. As before, we will use Docker to run our example:

docker run -it -d --network tsdb-test \
    --name telegraf --hostname my-laptop \
    -v /sys:/rootfs/sys:ro \
    -v /proc:/rootfs/proc:ro \
    -v /var/run/docker.sock:/var/run/docker.sock:ro \
    -v /var/run/utmp:/var/run/utmp:ro \
    -v ${HOME}/telegraf.conf:/etc/telegraf/telegraf.conf:ro \
    telegraf

We shared a couple of volumes from the host -- sys, proc, utmp -- because Telegraf needs to get process and system information from the host itself. If you don't share these directories, Telegraf will get this info from the container itself. For this example, that's not the right behavior because we are using Telegraf to monitor our laptop.

As you can see, we shared the Docker socket because Telegraf has a plugin capable of getting events and useful information from Docker itself, such as the number of running containers, images, and so on.

Now that our Telegraf is running, we can log into InfluxDB to have a look at the metrics stored by Telegraf.

14:24 $ docker run -it --rm --network tsdb-test -v ${HOME}:${HOME} -w ${HOME} influxdb influx -host influxdb
Connected to http://influxdb:8086 version 1.2.4
InfluxDB shell version: 1.2.4
> show databases
name: databases
name
----
_internal
telegraf

Now you can see that we have two databases; telegraf is the one used by the agent.

> use telegraf
Using database telegraf
> show measurements
name: measurements
name
----
cpu
diskio
docker
docker_container_cpu
docker_container_mem
local_diskio
mem
netstat
system

You can see that we have some measurements, and they are related to the plugins that we enabled in the configuration file. You can go deeper and query them if you like.

!Sign up for a free Codeship Account

What Is Chronograf?

Now we know how to get and store data, but we need to make them useful. One of the ways is to be able to read them in some nice graph and collect more of them in dashboards to share within our company. One of the most famous open-source tools to create dashboards with is Grafana; there is another one built for InfluxDB and the Tick Stack called Chronograf.

docker run -it -p 8888:8888 -d --name chronograf --network tsdb-test chronograf

`

Open your browser on http://localhost:8888 and properly configure the first source:

  • url: influxdb:8086

  • name: influx

  • password and username are empty for this example.

You can also notice that it asks for the telegraf plugin. Leave that part as it is because we are using the default configuration.

If you are asking yourself, "When do I use Chronograf versus Grafana?", Chronograf is part of the Tick stack, and there are some utilities to increase the interoperability between these projects. For example. you can see that you have the list of Telegraf agents storing information in InfluxDB. They're split by host name, and from this page, you already have a high-level visibility on your cluster. The green circle tells you that the agent is running as expected; it will be red if Telegraf stops sending data.

As I said, Chronograf is built to work well with the Tick stack, and if you look at the Apps column, you can see system, docker. Chronograf detects what you are monitoring via Telegraf and it has a set of built-in dashboards for these plugins.

You can create dashboards and graphs to go deeper and combine information not only from a server but also from your applications to analyze; for example, how a specific application behavior can change how a server works or vice versa.

Chronograf only works with InfuxDB. It's designed to be the unique UI to manage and interact with the entire stack. We saw the powerful integration with Telegraf, but you can also look at the query builder to create InfluxDB queries in an easy and step-by-step way. You can manage your InfuxDB instance, configure ACL, and new users.

Using Kapacitor to Send Alerts

Kapacitor is the last piece of the puzzle. We now know how to store, get and read time series, and now you need to elaborate on them to do something like alerting or proactive monitoring.

Alerting is easy to understand. If your server runs on higher CPU usage (usually more than 70 percent), you can page somebody via Slack, Pagerduty, email, or other channels in order to have a human on the problem.

For some tasks, you really don't need a human. If you're on the cloud or if you're able to spin up more servers via an API, you can trigger an action via HTTP POST, for example.

Let's start Kapacitor via a Docker container:

docker run -d --network tsdb-test \
    -p 9092:9092 --hostname kapacitor \
    -e KAPACITOR_INFLUXDB_0_URLS_0=http://influxdb:8086 \
    --name kapacitor \
    kapacitor

Chronograf has a set of features to manage and interact with Kapacitor. You can open the Alerting tab in Chronograf, and it will show you how to add a new Kapacitor instance.

  • url: kapacitor:9092

  • name: My kapacitor

You can configure the target of your alerts, and as I said before, Kapacitor supports different services like Slack, HipChat, and Pagerduty.

You can use the alert builder to create rules, and if you look at the following image, you will see how to trigger an alert. In this example, we are writing to a file, where the CPU for the laptop with hostname my-laptop is over 70.

I use stress, a utility that you can easily install on Ubuntu/Debian via Apt, to stress my laptop and trigger an alert:

$ stress -c 3

You can leave it running for three minutes. After that, you can look at the page Alerting > Alert history, where you will see that Kapacitor triggered some alerts and it also recovered them (status OK) if the alert is resolved, in our case when the CPU usage comes back as less than 70 percent.

If you remember, we configurated Kapacitor to store alerts in a file, in my case to /tmp/alerts.log. If I look at the file in the Kapacitor container, I have the alerts stored there:

$ docker exec -it kapacitor cat /tmp/alerts.log
{"id":"high cpu:nil","message":" high cpu:nil is CRITICAL value: 77.1220492215092","details":"{\u0026#34;Name\u0026#34;:\u0026#34;cpu\u0026#34;,\u0026#34;TaskName\u0026#34;:\u0026#34;chronograf-v1-8b222593-a176-4678-b94d-88043c79e289\u0026#34;,\u0026#34;Group\u0026#34;:\u0026#34;nil\u0026#34;,\u0026#34;Tags\u0026#34;:{\u0026#34;cpu\u0026#34;:\u0026#34;cpu-total\u0026#34;,\u0026#34;department\u0026#34;:\u0026#34;it\u0026#34;,\u0026#34;host\u0026#34;:\u0026#34;my-laptop\u0026#34;},\u0026#34;ServerInfo\u0026#34;:{\u0026#34;Hostname\u0026#34;:\u0026#34;kapacitor\u0026#34;,\u0026#34;ClusterID\u0026#34;:\u0026#34;9eca4366-b17f-4b9e-9e16-930931353272\u0026#34;,\u0026#34;ServerID\u0026#34;:\u0026#34;59574edb-5a0c-4269-ab5c-6e57cf7b49c2\u0026#34;},\u0026#34;ID\u0026#34;:\u0026#34;high cpu:nil\u0026#34;,\u0026#34;Fields\u0026#34;:{\u0026#34;value\u0026#34;:77.1220492215092},\u0026#34;Level\u0026#34;:\u0026#34;CRITICAL\u0026#34;,\u0026#34;Time\u0026#34;:\u0026#34;2017-07-31T13:26:40Z\u0026#34;,\u0026#34;Message\u0026#34;:\u0026#34; high cpu:nil is CRITICAL value: 77.1220492215092\u0026#34;}\n","time":"2017-07-31T13:26:40Z","duration"
:0,"level":"CRITICAL","data":{"series":[{"name":"cpu","tags":{"cpu":"cpu-total","department":"it","host":"my-laptop"},"columns":["time","value"],"values":[["2017-07-31T13:26:40Z",77.1220492215092]]}]}}
{"id":"high cpu:nil","message":" high cpu:nil is OK value: 29.250830989511382","details":"{\u0026#34;Name\u0026#34;:\u0026#34;cpu\u0026#34;,\u0026#34;TaskName\u0026#34;:\u0026#34;chronograf-v1-8b222593-a176-4678-b94d-88043c79e289\u0026#34;,\u0026#34;Group\u0026#34;:\u0026#34;nil\u0026#34;,\u0026#34;Tags\u0026#34;:{\u0026#34;cpu\u0026#34;:\u0026#34;cpu-total\u0026#34;,\u0026#34;department\u0026#34;:\u0026#34;it\u0026#34;,\u0026#34;host\u0026#34;:\u0026#34;my-laptop\u0026#34;},\u0026#34;ServerInfo\u0026#34;:{\u0026#34;Hostname\u0026#34;:\u0026#34;kapacitor\u0026#34;,\u0026#34;ClusterID\u0026#34;:\u0026#34;9eca4366-b17f-4b9e-9e16-930931353272\u0026#34;,\u0026#34;ServerID\u0026#34;:\u0026#34;59574edb-5a0c-4269-ab5c-6e57cf7b49c2\u0026#34;},\u0026#34;ID\u0026#34;:\u0026#34;high cpu:nil\u0026#34;,\u0026#34;Fields\u0026#34;:{\u0026#34;value\u0026#34;:29.250830989511382},\u0026#34;Level\u0026#34;:\u0026#34;OK\u0026#34;,\u0026#34;Time\u0026#34;:\u0026#34;2017-07-31T13:26:50Z\u0026#34;,\u0026#34;Message\u0026#34;:\u0026#34; high cpu:nil is OK value: 29.250830989511382\u0026#34;}\n","time":"2017-07-31T13:26:50Z","duration":10000000000
,"level":"OK","data":{"series":[{"name":"cpu","tags":{"cpu":"cpu-total","department":"it","host":"my-laptop"},"columns":["time","value"],"values":[["2017-07-31T13:26:50Z",29.250830989511382]]}]}}
{"id":"high cpu:nil","message":" high cpu:nil is CRITICAL value: 77.1220492215092","details":"{\u0026#34;Name\u0026#34;:\u0026#34;cpu\u0026#34;,\u0026#34;TaskName\u0026#34;:\u0026#34;chronograf-v1-8b222593-a176-4678-b94d-88043c79e289\u0026#34;,\u0026#34;Group\u0026#34;:\u0026#34;nil\u0026#34;,\u0026#34;Tags\u0026#34;:{\u0026#34;cpu\u0026#34;:\u0026#34;cpu-total\u0026#34;,\u0026#34;department\u0026#34;:\u0026#34;it\u0026#34;,\u0026#34;host\u0026#34;:\u0026#34;my-laptop\u0026#34;},\u0026#34;ServerInfo\u0026#34;:{\u0026#34;Hostname\u0026#34;:\u0026#34;kapacitor\u0026#34;,\u0026#34;ClusterID\u0026#34;:\u0026#34;9eca4366-b17f-4b9e-9e16-930931353272\u0026#34;,\u0026#34;ServerID\u0026#34;:\u0026#34;59574edb-5a0c-4269-ab5c-6e57cf7b49c2\u0026#34;},\u0026#34;ID\u0026#34;:\u0026#34;high cpu:nil\u0026#34;,\u0026#34;Fields\u0026#34;:{\u0026#34;value\u0026#34;:77.1220492215092},\u0026#34;Level\u0026#34;:\u0026#34;CRITICAL\u0026#34;,\u0026#34;Time\u0026#34;:\u0026#34;2017-07-31T13:26:40Z\u0026#34;,\u0026#34;Message\u0026#34;:\u0026#34; high cpu:nil is CRITICAL value: 77.1220492215092\u0026#34;}\n","time":"2017-07-31T13:26:40Z","duration"
:0,"level":"CRITICAL","data":{"series":[{"name":"cpu","tags":{"cpu":"cpu-total","department":"it","host":"my-laptop"},"columns":["time","value"],"values":[["2017-07-31T13:26:40Z",77.1220492215092]]}]}}
.....

Chronograf offers an easy-to-use Kapacitor, rules builder, and historical visualization of what Kapacitor does, but it is a standalone project that runs an API on port 9092 by default, and you can interact with it via CLI.

$ docker exec -it kapacitor kapacitor list tasks
ID                                                 Type      Status    Executing Databases and Retention Policies
chronograf-v1-8b222593-a176-4678-b94d-88043c79e289 stream    enabled   true      ["telegraf"."autogen"]

kapacitor list tasks, for example, is the command that shows the number of tasks currently managed by Kapacitor. We have only one task created via Chronograf.

docker exec -it kapacitor kapacitor show chronograf-v1-8b222593-a176-4678-b94d-88043c79e289
ID: chronograf-v1-8b222593-a176-4678-b94d-88043c79e289
Error:
Template:
Type: stream
Status: enabled
Executing: true
Created: 31 Jul 17 13:21 UTC
Modified: 31 Jul 17 13:27 UTC
LastEnabled: 31 Jul 17 13:27 UTC
Databases Retention Policies: ["telegraf"."autogen"]
TICKscript:
var db = 'telegraf'
var rp = 'autogen'
var measurement = 'cpu'
var groupBy = []
var whereFilter = lambda: ("cpu" == 'cpu-total') AND ("host" == 'my-laptop')
var name = 'high cpu'
var idVar = name + ':{{.Group}}'
var message = ' {{.ID}} is {{.Level}} value: {{ index .Fields "value" }}'
var idTag = 'alertID'
var levelTag = 'level'
var messageField = 'message'
.....

kapacitor show id-task contains the output of a task. It prints useful information like the row tick script and a representation of how many points the scripts is handling and the alerts triggered.

DOT:
digraph chronograf-v1-8b222593-a176-4678-b94d-88043c79e289 {
graph [throughput="0.00 points/s"];
stream0 [avg_exec_time_ns="0s" errors="0" working_cardinality="0" ];
stream0 -> from1 [processed="6795"];
from1 [avg_exec_time_ns="60.552µs" errors="0" working_cardinality="0" ];
from1 -> eval2 [processed="6795"];
eval2 [avg_exec_time_ns="18.618µs" errors="0" working_cardinality="1" ];
eval2 -> alert3 [processed="6795"];
alert3 [alerts_triggered="10" avg_exec_time_ns="1.75714ms" crits_triggered="5" errors="0" infos_triggered="0" oks_triggered="5" warns_triggered="0" working_cardinality="1" ];
alert3 -> http_out5 [processed="10"];
alert3 -> influxdb_out4 [processed="10"];
http_out5 [avg_exec_time_ns="5.085µs" errors="0" working_cardinality="1" ];
influxdb_out4 [avg_exec_time_ns="3.02µs" errors="0" points_written="10" working_cardinality="0" write_errors="0" ];
}

Conclusion

If you are looking for a set of open-source and free projects to manage your monitoring system, now you know that the Tick stack offers storage, collector, visualization tools, and an alert system. All of them are offering API to build your custom implementation if you need a specific or different approach for one of these tools.

Monitoring is a very hot topic -- you cannot manage a fast-growing system without a deeper understanding of how your applications are working. When you considering a monitoring system, the hard part is that it needs to be up when all your systems are down. If your monitoring goes down with your infrastructure, then you will have an even bigger problem. That's why you need the best tools and methodology available to manage your applications on your own.

Learn more about docker Exec from our blog post An Introductory How-To, With Examples, of Docker Exec.

Stay up to date

We'll never share your email address and you can opt out at any time, we promise.