How to Use Heroku PGBackups

Written by: Barry Jones

Backing up your data is one of the most critical activities for your application. Heroku PGBackups makes the entire experience pretty simple but comes with a lot of flexibility too.

Setup a Test App

Heroku provides a free PostgreSQL database to every Rails application, because Heroku loves you. To walk through the backups functionality, let's start with creating a new Heroku app. If you've never setup a Heroku app before, check out their Getting Started page for the language of your choice.

Heroku provides a getting-started Ruby app that already has PostgreSQL configured, which is what I'll be using.

git clone https://github.com/heroku/ruby-getting-started.git
cd ruby-getting-started
heroku create
git push heroku master
heroku run rake db:migrate
heroku open

Alright, now that we've got our test app working, let's load in some sample data.

heroku run rails console
100.times do |i|
  Widget.create(
    name: "Widget #{i}",
    description: "This is my widget. There are many like it, but this one is mine.",
    stock: i
    )
end

Heroku PGBackups Basics

Now that we have some critical data loaded into our system, we want to make sure that if something bad happens we can get it back to its original state. Heroku PGBackups provides a lot of options for doing this, but we'll step through them all.

Create a manual backup

heroku pg:backups capture

This will begin a backup immediately with a progress meter. You can press CTRL + C to get your prompt back, and the backup will continue in the background. It defaults to running against the main DATABASE_URL but if you have multiple databases you can specify the environment variable that the connection string is stored in to trigger a backup there.

Additionally, if you visit the Heroku Apps web interface and click on your Heroku PostgreSQL database you'll notice a button called PG Backups. From here you can quickly create a database snapshot and view all of your existing snapshots to download or delete them.

Scheduling backups

Manual backups are great if you need to pull a copy of the database to run some tests or save things as they are before doing something major with the data. But for most applications we're going to want those backups to be scheduled regularly. Heroku PGBackups lets you set those schedules from the command line like so:

heroku pg:backups schedule DATABASE_URL --at '01:00 America/New_York'

That just scheduled a backup for 1 a.m. EST every night. Heroku keeps seven daily backups and one weekly backup for every plan. As your plan increases so does your backup retention. Premium plans will include 12-month retention with monthly backups in addition to the daily and weekly backups.

You can see a list of your scheduled backups with:

heroku pg:backups schedules
=== Backup Schedules
DATABASE_URL: daily at 1:00 (America/New_York)

And cancel them with:

heroku pg:backups unschedule DATABASE_URL

Download your backups

Whether it's a manual or scheduled backup, at some point you may want to save it outside of Heroku. The PG Backups interface described earlier will give you point and click access to download any of them. But we're all about automation, so ideally that's something we'd want to do from the command line.

Heroku provides a command to give you an expiring link. You can either use the command with no arguments to get the latest backup, or you can specify the name of a specific snapshot. To use that command to automatically download our image we can utilize curl and pass in the url that the Heroku command gives us.

curl -o latest.dump `heroku pg:backups public-url`

Restoring from backups

Loading data back in from one of these is fairly straight forward as well. Just reference a backup name, an app name::backup name to pull from another one of your apps (like production to staging), or a public url of a pg_dump followed by the config reference for the database to restore the data into. Here are a few examples:

heroku pg:backups restore b101 DATABASE_URL
heroku pg:backups restore myapp-production::b101 DATABASE_URL
heroku pg:backups restore 'https://mys3url/that-leads-to-db.dump' DATABASE_URL --app sushi

Be careful when doing restores this way; it will completely overwrite the target database.

The database dumps are just standard PostgreSQL pg_dump files, so you can easily load them into any other non-Heroku PostgresSQL database using pg_restore. More details on how to do that can be found in the Heroku import/export docs.

You can also make direct copies from one database to another using almost the same syntax (assuming the colors listed below are references to database environment variables).

heroku pg:copy COBALT GREEN
heroku pg:copy myapp-production::ORANGE GREEN

Advanced Backups for Heroku PGBackups

In a production application, depending solely on nightly backups probably isn't going to be good enough. What happens if there's a problem in the middle of the day? Manually triggering a backup every hour might seem like a good idea at first, but that could get seriously out of hand and put a lot of extra stress on the database. Worse yet, what happens when even the nightly backups cause enough stress on the database that you'd like to avoid them?

The Write Ahead Log (WAL) records commits before the commits are actually written to the database. The records of these commits are used to stream read replicas as well as backups to allow point-in-time restoration. There are a number of different ways to take advantage of this to solve our backup problems.

Schedule backups from a follower

If you're on a database plan at the Standard level or up, you can create followers/read replicas. A follower will receive streaming updates; if nightly backups against your primary database are out of the question because of the stress involved, scheduling those backups against a follower will provide the same benefit without degrading application performance.

hbspt.cta.load(1169977, '11903a5d-dfb4-42f2-9dea-9a60171225ca');

Point-in-time recovery

Now, imagine a scenario when some very important data gets accidentally deleted, overwritten, changed, or corrupted somehow. Doesn't matter how it happens, user error, developer error, security breach; all that matters is that it happened.

This is where Heroku's Continuous Protection comes into play. To rollback to a previous point in time, you can visit the web interface and create a rollback -- a clone of your database but at a specific point in time. Just specify the date and time, and poof… instant recovery (relative to the size of your database).

You can also create a rollback from the command line. Continuous Protection is set up using the open source tool WAL-E that allows for continuously archiving the PostgreSQL WAL to Amazon S3.

Real World

Now let's look at a real world scenario. You need to get data from a backup without taking down the running application. Running pg:copy (pg_restore) isn't going to be good enough because that will overwrite the existing database, costing you all of your new data since the backup. Point in time restore would even have the same problem, although less pronounced.

Realistically in such a situation, you restore a copy of your backup or rollback database and then proceed to code in a way to find and merge in the data that was missing. We'll simulate that now.

If you haven't already captured a backup with our sample data, do so now. This is a real world situation so I'm going to be working with a production tier database (Standard or higher) and that will mean utilizing functionality that isn't available to Hobby and Free tier databases to handle the restore.

If you want to follow along, here are the instructions to upgrade.

Now open up a heroku run rails console and run the following to delete half of our data in a semi-random fashion:

Widget.all.each_with_index do |widget, i|
  widget.delete if i % 2 == 0
end

Oh no! Now, using one of the above techniques, either create a rollback or create a new database and restore a backup to it. I'm going with the rollback, so I'll visit the web interface and choose the version of my database from a few minutes ago.

Now, let's connect to our database and make sure it has everything.

$ heroku pg:psql HEROKU_POSTGRESQL_CYAN_URL
=> SELECT COUNT(\*) FROM widgets;
100

Perfect! Now how do we get the data merged? There are a lot of ways to do that. You could write a script that connected to both databases and inserted the missing records or use a desktop client, like Navicat, that included data syncing. But there's a much easier way.

PostgreSQL includes something called Foreign Data Wrappers that allow one PostgreSQL database to connect to a number of external data sources like Redis or another PostgreSQL database. Heroku makes this even easier with data links. You can read more about data links there, but for our purposes we want to link our main database (BRONZE) with our restored backup (CYAN) like so:

$ heroku pg:links create HEROKU_POSTGRESQL_CYAN HEROKU_POSTGRESQL_BRONZE
New link 'sighing_duly_2312' successfully created.
$ heroku pg:links

The latter command will show you all of the links that are configured. If for some reason you get an error after the second command, make sure you've installed the pg-extras plugin:

heroku plugins:install git://github.com/heroku/heroku-pg-extras.git

Now that we're connected, open up your psql prompt and let's merge that data with a query.

heroku pg:psql
INSERT INTO widgets (id, name, description, stock, created_at, updated_at) (
  SELECT *
  FROM sighing_duly_2312.widgets backup
  WHERE backup.id NOT IN (SELECT id FROM widgets)
);

That query does a select against our backup database that we restored for all of the ids that weren't in our main table, then inserts them into our main table for us.

Conclusion

PostgreSQL can manage your data integrity, and Heroku PostgreSQL makes sure it's safe in case of emergency with almost no effort on your part. Restoring is a more complicated beast but thanks to Foreign Data Wrappers (pg:links), merging in just what we need gets a lot easier. Always backup your data.

Stay up to date

We'll never share your email address and you can opt out at any time, we promise.