Global paas apps - your apps in EU and US with Geo-aware DNS and failover

In this post I would like to take you through how to setup Geo-location aware balancing via DNS provided by Amazon Route53 (and a bit of how it works). With apps in both US and EU.

Geo-location aware means that users get directed to the region that is closest to them - latency wise. Generally this is for user experience, but could also be to allow for things like EU cookie rules, and also for fail-over purposes (region being unavailable, or a network problem - which looks like the same thing to the user anyway!).

Conceptually, this looks like: 

How it works: client wants to access an app - it talks to its local DNS, which is then fed data from the Route53 DNS service. The Route53 service decides, based on latency, and availability, the best destination for their request - and then directs the client to go directly there. Meanwhile, Route53 checks the health of the endpoints continuously so that it can keep the app up even if a region is unavailable (due to failure, network or otherwise).

Sounds simple? ok lets set it up:

1. You need to start with an app, in my case I ran this simple Clickstart for Clojure (purely as it is a trivial app deployed as a war file - but any will do really)

2. Change your build config to deploy the app also to EU as well as US - so in response to a change to your code, it will build, test, and then deploy to both EU and US regions. Run a build to deploy it:

(pick a different name for each region - the name doesn’t matter too much - but they must be different)

3. You need to tell each app to use a url that has its nameservers provided by Route53. I have setup a DNS name: *.beeapp.us to automatically do the GEO DNS magic already (for testing purposes only), for example:

bees app:update -a yourAppId aliases=globalapp.beeapp.us
bees app:update -a yourAppIdEU -ep eu aliases=globalapp.beeapp.us
— this is my example alias - make up your own !

You then are done! Any further changes will be deployed to both regions.
Now, when you go to someapp.beeapp.us - you will be routed to the closest and most available region of your “global” app.

Verifying

Verifying my example - you can see each app responding differently in different areas based on the IP returned:

From somewhere in EU:

nslookup globalapp.beeapp.us
Non-authoritative answer:
Name: globalapp.beeapp.us
Address: 176.34.251.5

From somewhere in the US/Pacific:

nslookup globalapp.beeapp.us
Non-authoritative answer:
Name: globalapp.beeapp.us
Address: 75.101.143.131

Now- the *.beeapp.us service just for demonstration purposes (it may disappear one day, for testing only). For production use - you will need to set up your own Route53 setup with Amazon (we can help you - just ask) - I will take you through the steps and thinking now:

Route 53 Setup

This is how I set the above up, it is simple, I guess, but requires some understanding of DNS concepts. 
Go your Route53 console in AWS, and setup some health checks for the endpoints you are going to be accessing with DNS. This can be simple TCP ”alive” checks or http that can indicate some health about your app: 

Next you need to create your hosted zone for a DNS name you own:

You will need to tell your DNS provider about the Amazon provided name servers (Amazon will tell you how, but it is really a list of names/ports that you copy and past from AWS console to your DNS provider).

Next you have to click on “create record set” on your newly created hosted zone, and set up 2 wildcard entries for the name pattern you want your apps to have (don’t have to be wildcard, but in this case I wanted anything that matches the pattern to be resolved accordingly via DNS):

 (Note the name servers that Amazon provide - your DNS provider will need to know these). You can see I set up *.beeapp.us - 2 times - to A records (and IP address) - but it can be a name if you like. These are duplicated as each “Value” corresponds to the US or EU endpoint I want it to resolve to (based on latency). Each record should look something like:

You can see I have set it to 1 minute, and that it is latency based, according to Amazons definition of a geographic region.  Down the bottom you will see it bound to the health check we created earlier.

See easy? Well, maybe not trivial, but certainly easier than it used to be in the past.

Databases

Well, this is the challenge!
There is no easy answer for this, at least not as easy as the above. It depends more on your application, than the technology. Can your app cope with unavailable data temporarily? - can it cope with partitions, data merging, high latency, and more? - each application has a different answer to this.

For example, say you are using a RDBMS - you could have the master in one region, slave in another - with synchronous replication. This means that you always have to write to the master - but network partitions are not uncommon, master may not be available, or the slave may not be available from the master - do you stop accepting transactions? OK - try master master - how do you deal with “split brain” and reconciling conflicts once the network joins up again?

Making this more complicated is latency - the time to commit/write a transaction - even when the network is perfect, it is still high enough that the user experience can suffer.

Hence people like to choose multi master or nosql distributed data stores - but there is no escaping the fact that you will have network partitions, you will come across design challenges with merging data that has been changed in two places. Modelling your data so that it uses “append only” stores (i.e. data isn’t change so much as new transactions written - CouchDB style) - makes it easier - as does “natural sharding” - where your data is naturally different for the different regions (this makes the fail-over scenario less appealing though).

There is no silver bullet for data - in some ways this is the “hard part” - not to sound discouraging, but it is hard to be generic, every app has different needs (it also does explain why making resilient multi zone regions is a key aim of Amazon - as the latency is low enough that you don’t have to make too many compromises in the name of high availability).

Still, you may find your app, or part of it, can work well as a “global app” - in which case good for you! And go for it, it could be easier than you think.

Add new comment