InfluxDB is a popular time series database and I am here to show you how to query it. It uses a SQL-like (or to be totally honest, an "almost-SQL") language.
SELECT * FROM "foodships" WHERE time > now() - 1h
This language has worked and been supported in InfluxDB since day one and it has both pros and cons:
It's easy for new users to interact with a database in a language that they already know. SQL is popular, and a lot of developers already know it. This query syntax helped InfluxDB grow in its early days.
As I said before, InfluxQL is an "amost-SQL" language. Writing a SQL query language is not a simple job. One problem is that DBAs have some expectations of SQL, but our language is not fully compatible with SQL. This creates a lot of mismatches and doubts.
One of the characteristics of a time series database is that write cannot block reads and reads cannot block writes. This is a must-have. Your IOT devices or your servers will always write metrics and in the meantime, your dashboard, analytic tools, and alert managers will read from the database. InfluxDB needs something more scalable.
At the moment, the storage engine and query language are in the same project, which blocks fast iteration on the code that we would like to have. The query language evolves much quicker compared to the storage engine. The engine needs stability, but the query language needs more functions and capabilties.
InfluxQL needs a re-architecturing because the team wasn't able to implement all the amazing ideas required by the community.
Currently SQL is very simple to learn, but there is another DSL avaiable in our stack: the TICK script used from Kapacitor to send alerts and downsample your series.
InfluxQL doesn't have mathematical functions because the language doesn't allow us to add them in a simple way. With IFQL, they will be very easy to implement.
During InfluxDays in San Francisco, Paul Dix, CTO and founder of InfluxData, released on stage a new open-source query language to bring data analysis close to InfluxDB. It is called IFQL.
Architecture
var cpu = select(db: "telegraf") .filter(exp:{"_measurement" == "cpu" and "_field" == "usage_user"}) .range(start: -30m) select(db: "telegraf") .filter(exp:{"_measurement" == "mem" and "_field" == "used_percent"}) .range(start: -30m) .join(on:["host"], eval:{$ + cpu})
IFQL is not just a query language. It has a parser that reads a query like the one above and generates a plan that can be visualized as JSON and run directly via IFQL. This means that the syntax you are seeing is just one of the possible implementations. The real contract is between the plan and the processor.
The architecture and the idea behind the project will allow us to reimplement InfluxQL and PromQL, TICK Script on top of IFQL AST. Besides that, if you have your query builder or you need to create a custom implementation to interact with InfluxDB, you can generate the plan as simple JSON and push it to IFQL. Flexible and amazing.
IFQL is designed to be extensible and to achieve this goal -- we took an important lesson learned from Telegraf. You need a clean interface and a single entrypoint to make developers happy to contribute to your project. That's why IFQL has a directory called /functions
. It contains all the functions currently supported, and contributors looking to add functions should work only in that directory.
TICK Script and IFQL
One of the goals for IFQL is to provide all the capabilities currently provided by TICK Script so that we will be able to query, manipulate, and send alerts without learning two languages.
Currently, we have functions replicated across both projects, but we will be able to reuse them with IFQL.
At the moment, IFQL is a standalone project available on GitHub. Other than the parser and planner, it provides a simple http-daemon called ifqld
to start some tests.
It is in beta, which means that the API and query syntax will change, but it is very easy to set up. I will use Docker for this example, but you can find the IFQL binaries, deb, and rpm available here if you don't know Docker.
Let's download an InfluxDB configuration file from the IFQL repository:
wget https://raw.githubusercontent.com/influxdata/ifql/master/docker-compose.yml mkdir examples wget https://raw.githubusercontent.com/influxdata/ifql/master/examples/influxdb.conf -O examples/influxdb.conf
The main difference is a paragraph that enables IFQL:
[ifql] enabled = true log-enabled = true bind-address = ":8082"
InfluxDB uses port 8082 to start a gRPC communication with IFQL.
docker-compose up
As you can see, we created a network and deployed two containers. One is InfluxDB with our modified configuration and 8082 and 8086 exposed. The second one is IFQL v0.0.2. It exposes port 8093 and we can verify everything is working as expected by running a simple curl:
15:21 $ curl -I http://localhost:8093/ HTTP/1.1 404 Not Found Content-Type: text/plain; charset=utf-8 X-Content-Type-Options: nosniff Date: Tue, 28 Nov 2017 14:21:40 GMT Content-Length: 19
Now let's populate our database with some sample data:
curl https://s3.amazonaws.com/noaa.water-database/NOAA_data.txt -o NOAA_data.txt influx -import -path=NOAA_data.txt -precision=s -database=NOAA_water_database influx use NOAA_water_database SELECT COUNT(*) FROM NOAA_water_database > SELECT COUNT(*) FROM h2o_feet name: h2o_feet time count_level description count_water_level ---- ----------------------- ----------------- 0 15258 15258
As you can see, we have some points and measurements. Now we will interact with ifqld
using curl
to run some basic examples:
20:11 $ curl -XPOST --data-urlencode 'q=from(db:"NOAA_water_database").filter(exp:{"_measurement"=="h2o_temperature"}).range(start: -500000h).limit(n: 10)' http://localhost:8093/query h2o_temperature,location=coyote_creek degrees=60 1439856000000000000 h2o_temperature,location=coyote_creek degrees=65 1439856360000000000 h2o_temperature,location=coyote_creek degrees=68 1439856720000000000 h2o_temperature,location=coyote_creek degrees=62 1439857080000000000 h2o_temperature,location=coyote_creek degrees=62 1439857440000000000 h2o_temperature,location=coyote_creek degrees=69 1439857800000000000 h2o_temperature,location=coyote_creek degrees=67 1439858160000000000 h2o_temperature,location=coyote_creek degrees=67 1439858520000000000 h2o_temperature,location=coyote_creek degrees=70 1439858880000000000 h2o_temperature,location=coyote_creek degrees=65 1439859240000000000 h2o_temperature,location=santa_monica degrees=70 1439856000000000000 h2o_temperature,location=santa_monica degrees=60 1439856360000000000 h2o_temperature,location=santa_monica degrees=62 1439856720000000000 h2o_temperature,location=santa_monica degrees=62 1439857080000000000 h2o_temperature,location=santa_monica degrees=60 1439857440000000000 h2o_temperature,location=santa_monica degrees=63 1439857800000000000 h2o_temperature,location=santa_monica degrees=64 1439858160000000000 h2o_temperature,location=santa_monica degrees=63 1439858520000000000 h2o_temperature,location=santa_monica degrees=63 1439858880000000000 h2o_temperature,location=santa_monica degrees=61 1439859240000000000
Let's compare the ranned query from(db:"NOAA_water_database").filter(exp:{"_measurement"=="h2o_temperature"}).range(start: -500000h).limit(n: 10)
with InfluxQL:
from
works a bit likeselect
in SQL. It selects thedatabase
to query.filter
is thewhere
on steroids. You can filter bytags
using operators like<
,>
,==
, and!=
.Inside
filter
, there are some specific keywords like_measurement
used by the database itself. In this case, it filters by the measurementh2o_temperature
.limit(10)
returns 20 lines in this case. What is going on? This is one of the first differences around how we built the query language. There is an implicitgroup by *
that groups by series. In this case, there are two series:h2o_temperature,location=santa_monica
andh2o_temperature,location=coyote_creek
. If you need a specific series, you should add.filter(exp:{"_measurement"=="h2o_temperature" AND "location" == "coyote_creek"})
.
Some important decisions behind the query language:
The arguments' order for a function isn't important. Every argument has a name and is not mandatory.
Use
"
everywhere to avoid mismatches between single and double quotes.
IFQL simplifies every query from a specific syntax to a JSON AST. You can get the AST for your specific query by adding the query param analyze=true
.
19:56 $ curl -s -XPOST --data-urlencode 'q=from(db:"NOAA_water_database").range(start: 200h).count()' http://localhost:8093/query?analyze=true | jq { "operations": [ { "id": "from0", "kind": "from", "spec": { "database": "NOAA_water_database", "hosts": null } }, { "id": "range1", "kind": "range", "spec": { "start": "200h0m0s", "stop": "now" } }, { "id": "count2", "kind": "count", "spec": {} } ], "edges": [ { "parent": "from0", "child": "range1" }, { "parent": "range1", "child": "count2" } ] }
The README.md inside the IFQL project documents all the other available functions.
!Sign up for a free Codeship Account
Scalability
ifqld
is a stateless HTTP daemon, so it can scale up and down easier than InfluxDB (or other databases) because there is no state or data involved. Some functions, like filters
, are pushed down to the engine (not in all cases) because InfluxDB is able to retrieve the time series and all of the required points.
Other functions, like aggregation functions, are not pushed to InfluxDB but are run by the ifqld
process as soon as it gets the data back from the database. This allows us to scale reading and aggregation in a more flexible way. In case of more queries or more computation requirements, we can spin up more ifqld
processes without needing to care that much about the underlying storage.
Another important aspect of scalability is that ifqld
can query more than one influxdb
.
00:04 $ ./bin/ifqld --help Usage: ifqld [OPTIONS] Options for the IFQLD server Application Options: -h, --host= influx hosts to query from. Can be specified more than once for multiple hosts. (default: localhost:8082) [$HOSTS]
Using the CLI argument --host=
or the environment variable $HOSTS
allows you to specify more than one backend. This means that if you are running the open-source version and you are able to manage the data sharding on top of more databases, you can sort of horizontally scale.
Roadmap
The project is unstable and under fast development. The API and the output can change based on yours and our experience with the query language. The idea is to lock the API in early 2018. If you are an active InfluxDB user, try it out and feel free to open PRs or issues with feedback about your use case to help us improve the syntax.