Hypermedia APIs with Apache CouchDB

Written by: Benjamin Young

7 min read

RESTful web services are a thing I love to discuss, build, and use. However, most of them stop short at sending JSON (or XML) back and forth over the wire.

I read the documentation. I write some code with URLs in it and tie my JavaScript, Python, or PHP right into the vendor's API as it's currently structured. Any changes to the URLs, and I need to change my code.

What I know about the web doesn't jive with that. You didn't read any documentation to find out the URL Codeship uses for this blog post or the blog's main page or their services list or sign-up page either. That's because HTML, the original web hypermedia format, has these fabulous things called links. JSON and XML don't.

Let's take a quick dive into a world of browsable, hypermedia APIs.

Hypermedia APIs

Hypermedia APIs are RESTful APIs that use links and media types to avoid some of the chaos of change.

When you visit a web page, you don't have to read the website's documentation to know which thing is something you can click or which part of the site can receive data. The web page has all that information in it. You just click. You fill out forms. You browse.

Web browsers take the HTML documents on the web and make them browsable, make them linked, and make the web updatable via forms. Hypermedia APIs offer that same flexibility and opportunity to code.

Media Type controllery

Hypermedia enables browsable APIs by using media types that are defined to contain hypermedia affordances -- essentially, links and forms.

Sadly, sprinkling URLs into some JSON values doesn't cut it. In essence, application/json is no more a hypermedia, link-containing format than text/plain. They may contain text that is a URL, but they could be anywhere and mean anything. Most importantly, to a JSON parser, they're just strings.

Hypermedia-enabled media types, however, contain this information within their definitions. Here's a short list of some JSON-based hypermedia types:

Many of these also have more specific extensions or additional vocabularies such as Hydra and Schema.org Actions for JSON-LD. In this case, the media type stays the same, but the contents have additional "meaning" specific to APIs.

For today, we're going to run with HAL, the Hypermedia Application Language.

Apache CouchDB: an HTTP Database

Apache CouchDB is a NoSQL database that uses JSON and HTTP for its entire API. It's about as simple and straightforward an API as you'll find. It's also a great foundation for building CRUD applications quickly.

However, it lacks any hypermedia affordances, which means if you land at any of its endpoints, you'll have no idea where to go next or what else may be possible.

The stock API is about as simple as it gets:

  • /{db}

    • metadata about the database (size, number of documents, etc)

    • and the POST endpoint to create JSON documents

  • /{db}/{doc}

    • CRUD (or GPD) for a specific document

  • /{db}/_all_docs

    • the list of documents returned as a JSON array

All three of these endpoints return "pure JSON," typically sent as application/json. Handy, if you know what you're getting at each endpoint (a.k.a., you've read the docs). However, if all you'd been given was the /{db} URL, you'd have no idea how to even find the list of documents within it.

What would this API be like if you didn't have to read the docs? What would it mean to be able to "browse your database" from your code? Let's hypermedia this thing.

HyperCouch(DB)

HyperCouch is an exploration of what CouchDB might be like with a hypermedia API. It wraps the core CouchDB API in various hypermedia formats. It can also be used to hypermedia-enable a CouchDB-based data store such as your next API.

Here are the standard responses to the three endpoints referenced above, along with the new Hypermedia Application Language variations:

GET /{db}

{
  "db_name":"hypercouch",
  "doc_count":2,
  "doc_del_count":0,
  "update_seq":2,
  "purge_seq":0,
  "compact_running":false,
  "disk_size":12393,
  "data_size":7377,
  "instance_start_time":"1459778528583743",
  "disk_format_version":6,
  "committed_update_seq":2
}

Just a pure JSON object with some numbers in it.

Now. If you'd asked for that first and wanted to see the contents of the database, where would you go? Don't go search for the CouchDB docs! Your code can't.

Let's take a look at a HAL variation:

{
  "db_name":"hypercouch",
  "doc_count":2,
  "doc_del_count":0,
  "update_seq":2,
  "purge_seq":0,
  "compact_running":false,
  "disk_size":12393,
  "data_size":7377,
  "instance_start_time":"1459778528583743",
  "disk_format_version":6,
  "committed_update_seq":2,
  "_links": {
    "self": { "href": "/hypercouch/" },
    "index": { "href": "/hypercouch/_all_docs" }
  }
}

HAL+JSON documents are essentially "just JSON" plus a _links object and (optionally) an _embedded object (which we'll look at shortly). The keys within the _links object are the link relationship of the contained links. The included href does what any old href does: It contains a URL. In this case, it's a small additional bit of content that routes the code (or user) to the appropriate endpoint for the primary index within the database.

Let's look at the other two endpoints.

GET /{db}/{doc}

{
  "_id": "db74fd2411e1a0da283b3f0fc8000074",
  "_rev": "1-1d7fcf86f6f6ec8d0e63f6de7af5ce81",
  "first_name": "Benjamin"
}

The _id and _rev values are reserved by CouchDB and record the name and revision identifier of the document.

{
  "_id": "db74fd2411e1a0da283b3f0fc8000074",
  "_rev": "...",
  "first_name": "Benjamin",
  "_links": {
    "self": { "href": "/hypercouch/db74fd2411e1a0da283b3f0fc8000074" },
    "index": { "href": "/hypercouch/_all_docs" },
    "collection": { "href": "/hypercouch/" }
  }
}

The _links object here feels at home among the native _prefixed CouchDB reserved keys. In this case, it contains the link relationships we saw earlier for self and index. I've also added a collection link relationship from RFC6573: "The Item and Collection Link Relations." Essentially, it points back to the database this document is contained within.

GET /{db}/\_all\_docs

{
  "total_rows":2,
  "offset":0,
  "rows":[
    {
      "id": "_design/hal",
      "key": "_design/hal",
      "value": { "rev":"1-0b1f6e8f2662fc5e7efcd9f930acec60" }
    },
    {
      "id": "db74fd2411e1a0da283b3f0fc8000074",
      "key": "db74fd2411e1a0da283b3f0fc8000074",
      "value": { "rev":"1-1d7fcf86f6f6ec8d0e63f6de7af5ce81" }
   }
]}

If CouchDB served its own JSON-based media type for this endpoint, it could have been defined to state that id here was a relative URL. However, since that didn't happen, we'll upgrade this one to HAL also. The changes to this one though are more significant and change the shape of the original JSON:

{
  "total_rows":2,
  "offset":0,
  "_embedded":{
    "item": [
    {
      "id": "_design/hal",
      "key": "_design/hal",
      "value": { "rev":"1-0b1f6e8f2662fc5e7efcd9f930acec60" },
      "_links": {
        "self": { "href": "/hypercouch/_design/hal" }
      }
    },
    {
      "id": "db74fd2411e1a0da283b3f0fc8000074",
      "key": "db74fd2411e1a0da283b3f0fc8000074",
      "value": { "rev":"1-1d7fcf86f6f6ec8d0e63f6de7af5ce81" },
      "_links": {
        "self": { "href": "/hypercouch/db74fd2411e1a0da283b3f0fc8000074" }
      }
    }
  ],
  "_links": {
    "self": { "href": "/hypercouch/_all_docs" },
    "collection": { "href": "/hypercouch/" }
  }
}

The _embedded object contains partial representations of the embedded resources. Each of these partial representations includes its own _links object which points to the self value which can be used to retrieve the entire representation -- which of course contains a link back to the index and collection.

Now, with these links in place, client code could start at any one of these endpoints and could follow the values of those link relationships -- rather than hard coding URLs -- to find its way around the database.

Write Semantics

One thing HAL leaves out of the equation are in-line hints about what methods can be used at which URL. Thankfully, HTTP has that skill.

The HTTP Allow header is intended to provide a comma separated list of HTTP method names available on the requested resource. HTTP also comes with a handy HEAD method that allows you to only get the meta information about the resource -- without the overhead of GETing the entire representation.

Alternatively, it's possible to simply attempt a PUT (for overwrite/update) or POST (for append) operation and see how it goes. If the server's doing its job, it should respond with a 404 Method Not Allowed error. If it's really doing its job well, then that response will have the Allow header.

However, there are efforts, such as the in-progress HAL Forms specification, that allow developers to encode HTML-style form information directly in the HAL response.

HAL Browsing

The HAL community has its own (minimal) browser for finding your way around these HAL responses know as (obviously enough) the HAL Browser. They have an example API which is loaded by default, but it also works on localhost URLs or your very own hypercouch.dev instance.

In the end, your next API may be "just JSON" over HTTP. However, your API will change, and when it does, how will you address those changes? The web, thanks in part to links and media types, has survived a couple decades of incremental change. Perhaps it's time our APIs were this resilient.

Stay up to date

We'll never share your email address and you can opt out at any time, we promise.