PostgreSQL has a great feature called Foreign Data Wrappers (FDW) that allows it to connect directly to outside systems. Although the setup can be a little complicated, once it's available you can run queries with joins or subqueries against them, insert data, create views, etc. Heroku has dramatically simplified the process of using FDW with PostgreSQL and Redis thanks to Data Links. Let's try it out.
Data Links
We previously used data links with Heroku PGBackups when we used it to connect our live PostgreSQL database to a recently restored backup. That enabled us to run a query across both to reload our accidentally deleted data. That was awesome, I know… but we were only scratching the surface.
PostgreSQL is amazing. It’s the "how did I ever live without this" experience once you tap into its goodness. But despite its features, it still has use cases for which it isn’t ideal, just like any other relational database. Foreign data wrappers virtually remove this problem by allowing you to offload the parts of your data that are better suited to other systems, while still allowing PostgreSQL to interact with them and query them just as if they were in any other table.
Redis, as it turns out, is an excellent fit for almost every case where PostgreSQL might not be ideal. As an in-memory data store, it is ideally suited for high-traffic direct requests for both reads and writes, including specialty data structures like sorted sets, expiring data, pubsub, and more. These structures provide a level of functionality on par with shared objects in a threaded system, without having to deal with the mutex locks and single system limitations of shared objects in a threaded system.
When you put the two together, you'll be hard pressed to find a data problem that you can't solve that doesn't just involve outgrowing the machine. Data links puts them together. You're welcome.
Get Connected
Feel free to walk through these code examples with me. However, be aware that Heroku Data Links is only available on production tier Heroku PostgreSQL databases using 9.4 and above. That means you'll have to pay to play. Detailed instructions can be found here, but so we can get to showing off faster, I'm going to skip ahead.
Once your production tier database is set up using the Ruby getting started app, provision Heroku Redis and then link them up like so:
$ heroku pg:links create REDIS_URL DATABASE_URL --as hredis
And to see that it worked:
$ heroku pg:links === DATABASE_URL (postgresql-vertical-7958) ==== redis_sinuous_9442 Created: 2015-09-11 03:44 UTC Remote: REDIS_URL (redis-sinuous-9442) Remote Name: redis_sinuous_9442 === HEROKU_POSTGRESQL_ONYX_URL (postgresql-vertical-7958) ==== redis_sinuous_9442 Created: 2015-09-11 03:44 UTC Remote: REDIS_URL (redis-sinuous-9442) Remote Name: redis_sinuous_9442
Perfect! Now, let's have some fun.
Query Against Outside Data
Data makes the world go round, but disk writes create more strain on a machine than almost anything else. Counters are a good example to illustrate the problem, because you can use a counter for virtually anything. Could be view counters, usage throttling counters, download counters, share counters, click counters, etc. Usually, there will be some type of database record associated with what you're counting whether that's a blog post, a user record or something else.
PostgreSQL takes no chances with your data, insisting that it be written to disk before it will tell you the write was successful. Redis can write to disk at an interval, such as every second, allowing it to handle logic like this much, much faster.
Here's a quick benchmark of running an increment on 25 records in 10 concurrent threads 50,000 times, so it's clear:
p = Benchmark.measure do threads = [] 10.times { threads << Thread.new { puts Benchmark.measure { 50000.times { |i| Widget.increment_counter(:stock, (i % 25) + 1) } }.real } } threads.each { |thr| thr.join } end puts "PostgreSQL: #{p.real} seconds" PostgreSQL: 258.2195017640479 seconds r = Benchmark.measure do threads = [] 10.times { threads << Thread.new { puts Benchmark.measure { $redis.pipelined { 50000.times { |i| $redis.incr "widget:#{(i % 25) + 1}" } } }.real } } threads.each { |thr| thr.join } end puts "Redis: #{r.real} seconds" Redis: 4.218563660047948 seconds
With the two connected via Data Link, I can run a query in PostgreSQL to join against the Redis data and include it in my query results like so:
SELECT w.id, w.name, w.stock, r.value my_counter FROM widgets w INNER JOIN hredis.redis r ON r.key = CONCAT('widget:',w.id) ORDER BY r.value DESC
Because it's just another query, I can create a view from that. Or I can create a Dataclip to send around the office.
That's just one use case, but you should be getting the idea of how you can include live analytic data into your queries at this point. To do more complex queries, you can periodically write needed values to a cache column in PostgreSQL with a single INSERT/UPDATE statement directly referencing Redis. That makes it a lot easier to combine PG functionality to run a full text search that's geographically limited with PostGIS data against live/trending popularity data in a performant way. There are a lot of possibilities that open up.
Push Data Out
Unfortunately, at the moment the Redis link is read-only. When FDW functionality is expanded to take advantage of write logic, it opens a great many more doors, such as using a trigger to write data out to Redis or Memcached as it's updated.
In very high-performance environments, this can assist in offloading certain read traffic entirely from PostgreSQL by letting it push changes out to caches directly ensuring that cache management or expiration are no longer concerns. To alleviate concerns over any direct wrapper limitations, you can always leverage LISTEN/NOTIFY behavior with an outside script to push changes directly out to caches as well.
Happy linking!