About PayPal's Node vs Java “fight”

Stephen Connolly's picture

So far I have held back from writing this blog post… but today in my email inbox I saw the following:

Screen Shot 2013 12 13 at 13 38 29

Yep, somebody is pimping their book, because of PayPal’s switch from Java to Node.js.

Let’s set one thing clear up front… namely

Which is the “faster” virtual machine…

I like the JVM. I think it is a superb piece of engineering. When it first came out, Java was dog slow… but by 2006 it was quite fast. At the time, as I was using Java for writing web applications,  my brother asked for some help playing with different protein folding modelling algorithms. I knocked a couple of them out in Java and started running them while working on hand tuning the algorithms in C. Back in 2006, once the first 10,000 iterations had passed the JVM’s full steam optimisations kicked in. My best hand tuned C version of the algorithms were at least 20% slower than the JVM, so my immediate thought was “I must be useless at hand tuning C” so then I implemented a standard algorithm for protein folding in Java and pitted it against the best of breed native code version that my brother’s supervisor had spent grant money getting optimised… JVM was still faster in server mode after compilation threshold had kicked in… by somewhere between 10 and 15%

Of course the reason is that the JVM can optimise for the exact CPU architecture that it is running on and can benefit from call flow analysis  as well as other things. That was Java 6.

Node.js runs on the V8 JavaScript virtual machine. That is a very fast virtual machine for JavaScript. When you are dealing with JavaScript you have an advantage because JavaScript is single threaded you can make all sorts of optimisations that you cannot achieve with the JVM because  the Java virtual machine has to handle potentially multi-threaded code. The downside of V8 is that it is dealing with JavaScript, a language which can provide far fewer hints to the virtual machine. Type information has to be inferred, and some JavaScript programming patterns make such type inference almost impossible.

So which is the faster virtual machine, V8 or JVM? My belief is that when you are coding in the most basic building blocks of either virtual machine (i.e JavaScript vs JVM Byte Code) that the JVM will win out every time. If you start to compare higher up, you may end up comparing apples with oranges and see false comparisons. For example, if we consider this comparison of V8 vs JVM performance at mathematical calculations. This blog post tells us that if we relax our specifications we can calculate things faster. Here is the specification that V8 must use for Math.pow and here is the specification that the JVM must use for Math.pow notice that the JavaScript specification allows for an “implementation-dependent approximation” (of unspecified accuracy) while the JVM version has the addition that 

The computed result must be within 1 ulp of the exact result. Results must be semi-monotonic.

And there is additional restrictions about when numbers can be considered to be integers. V8 has a faster version of Math.pow because the specification that it is implementing allows for a faster version. If we throw off the shackles of the JVM runtime specification we can (and do if you read the blog post) get an equivalently fast result… and if it turns out that we don’t even need the accuracy of V8’s implementation, we can make more trade-offs and get even faster performance.

My point is this:

You will see people throw out micro-benchmarks showing that the JVM is faster than V8 or V8 is faster than the JVM. Unless those benchmarks are comparing like for like, the innate specification differences between the two virtual machines will likely render such comparisons useless. A valid comparison would be between say Nashorn or DynJS and V8. At least then we are comparing on top of the same specification… 

What are PayPal comparing?

Here is what we know about PayPal’s original Java application:

  • It uses their internal framework based on Spring
  • Under minimum load the best page rendering time was 233ms
  • It doesn’t scale very well reaching a plateau at about 11 requests per second.

Here is what we know about PayPal’s Node.js application:

  • It uses their internal kraken.js framework
  • Under minimum load the best page rendering time was 249ms
  • It scales better than the Java application but still doesn’t scale very well.

So we are comparing two crappy applications in terms of scalability and concluding that because the Node.js one scales slightly better, then Node.js is better than Java.

I can only say one thing…

Screen Shot 2013 12 13 at 15 26 48

What we can conclude is that the internal Spring-based framework is overly complex for the task at hand. As Baron Schwartz says:

really? 1.8 pages/sec for a single user in Java, and 3.3 in Node.js? That’s just insanely, absurdly low if that amount of latency is really blamed on the application and the container running it. If the API calls that it depends on aren’t slow, I’d like to see many hundreds of pages per second, if not thousands or even more. Someone needs to explain this much more thoroughly.

Who is to say what performance they would have been able to achieve if they had built their Java application on a more modern framework. Spring brings a lot of functionality to the table. Likely far too much functionality. Most people are moving away from the monolithic application and moving towards smaller more lightweight frameworks… but if you have a corporate mandated framework that you must use when developing Java applications in-house… well you may not have much choice. On the other hand, if you move to a different technology stack there may be no corporate framework that you have to use.

Now we come to the second “benefit”, namely faster development.

We are told from the PayPal blog post that at the comparison point both applications had the same set of functionality…

Are we sure? How much functionality was the in house Spring based framework bringing to the table “for free” (or more correctly for a performance cost)?

I am not defending the in-house Spring framework, but I do find it a stretch to believe that the two applications were delivering the entirity of equivalent functionality. I do believe that the context specific functional tests were passed by both applications. So this tells us that the user will not see a difference between the two applications. But what about logging requests, transactions, etc? What about scalability and load reporting? I don’t want to defend the in-house Spring framework, in part because I find Spring to be an over-baked framework to start with, but potentially that framework is bringing a lot more to the table. If we threw all that extra “goodness” out would the Java developers have been able to develop the application faster? If we asked the Node.js developers to add all that extra “goodness” would they have been able to deliver as fast?

It is likely that we will not know the answers to these questions, what we do know is that it would seem that the extra “goodness” that the in-house framework adds appears to be a waste of time, as they are happy to go into production without them.

In other words, the in-house framework sounds a bit like one of these (at least from the point of view of somebody writing this specific application):

NewImage

So it would not surprise me to hear that you can develop an application, when released from the shackles of an in-house framework, in 33% fewer lines of code and with 40% fewer files…

  • Spring as a base framework loves to have many many small files with lots of boilerplate
  • Even with annotations Spring can be rather XML heavy

If you were using a more modern Java based framework, likely you would not have had the same restrictions. For example I like using Jersey as my base framework, I find that it needs very little boilerplate and helps you to keep clear of the multi-threaded cargo cults that a lot of developers fall into. Node.js also helps you keep clear of the multi-threaded cargo cults… by forcing you to live in a single-threaded world.

OK, so the in-house framework is over-baked and does delivers very bad performance, so all we are left with in terms of benefits is that the Node.js version was

Build almost twice as fast with fewer people

Well first off, two people can develop faster than a team of five people when you are working in a close-knit codebase. The application itself has three routes. If you have a team of up to three developers, you give each one a route and let them code that route out. If you have more than three developers you will have more than one developer per route, which means that they will either end up pair-programming or stepping on each other’s toes. Add on top the unclear difference in delivered specification, i.e. the added “goodness” of the in-house framework… which will require hooking up before you even get out the gate… All we can really say is that this is probably at best an unfair comparison and at worst an apples to oranges comparison.

So what can we conclude?  

The above was my original thought when I read the PayPal blog post. 

  • I think, that in the scope of this application, the in-house framework was over-engineered on top of the over baked Spring framework and it probably does not bring much real value to the table and only costs in terms of a significant performance hit.
  • Any solution built on top of the JVM would have technically been able to be “integrated” with the in-house framework.
  • The only political route to avoid the in-house framework was to ditch the JVM
  • Node.js is simultaneously “just cool enough” and “just serious enough” to be a valid non-JVM candidate (you could try Ruby, but that’s been around long enough that there is likely an in-house framework for that too… and anyway you can run Ruby on the JVM… so it may not be the escape you need)

My take-home for you, the persistent reader who has read all my ramblings in this post…

Don’t build your app on top of a pile of crap in-house framework.

PayPal may have ditched one pile of crap framework based on Spring. What is not clear is whether the scalability limits in their Node.js in-house framework (i.e. 28 simultaneous clients with 64 instances) is a limit of their new framework or a limit of Node.js itself when used with the backing APIs that they have to call through to.

Time will tell, but don’t jump from one platform to another just because apples are greener than oranges.

Update

Just to be clear, this post is not intended as a criticism of PayPal; PayPal’s internal frameworks or their decision to switch from Java to Node.js.

The intention of this post is to criticise anyone who cites a performance gain from 1.8 pages per second to 3.3 pages per second in what cannot be a CPU bound web application as being the primary reason to switch from Java to Node.js.

Similarly anyone citing PayPal’s blog as evidence that Java web development is harder than Node.js is mis-using the evidence. The only evidence on ease of development from PayPal’s blog is that their internal Node.js framework is easier to develop for than their internal Spring-based framework.

My personal opinion is that there were other non-performance related reasons for the switch. For example the reactive programming style enforced by Node.js’s single threaded model may suit the application layer that the switch was made in better than the Spring-based framework that the Java version was written in. Similarly, it may be that the responsible architect analysed the requirements of this tier and came to the conclusion that a lot of what the internal framework brings to the table is just not required for this tier. It is a pity that such detail was not provided in their blog post announcing their switch, as without such detail their blog post is being incorrectly used by others to draw conclusions that are just not supported by the data presented in that blog post. Hopefully PayPay’s development team will provide some of this additional information and analysis that was unfortunately lacking in their first blog post.

Finally, we should always remember that premature optimization is the major root to bugs and performance issues in software engineering. If the application tier they are developing in Node.js is not the bottleneck, in fact until it is proven to be the bottleneck, there is no need to worry about whether it is written in the most performant language or framework. What is most important with those elements that are not the bottleneck is that they be written in the simplest form so that if they do become the bottleneck later on (due to optimization of the current slowest moving part) it will be easy to rework them. 

For an tier with just three routes that is the front-end and likely calling through to multiple APIs, my gut tells me that a reactive framework such as Node.js or Vert.x will give you a very simple expression of the required logic without becoming the bottleneck. Perhaps that was the real reason why Node.js was considered as an experiment for this tier.

 

Stephen Connolly
Elite Developer and Architect
CloudBees