The Dark Side of NPM

Written by: Kfir Erez

For long time I have been a fan of npm for its simplicity. I’m custom to the work with Java ANT (yes, that old...) and Maven projects with their complex deep dependency tree that is why I found the simple, flat package.json as a work of magic. Just run 'npm install' and everything comes to its place. You want new package? No problem, run 'npm install--save thisRealyCoolPackage' and it goes with you for better or worse. Need some scripts to manage your builds, npm provide the mechanism for that in such way that you may think why oh why do I use grunt or gulp... (check Keith Cirkel article on How To Use NPM as a Build Tool

) But like any fairy tale story, there is no story without some dark element lurking in the corner. During the development of new analytics system we, in CloudBees Feature Management.io, decided to reuse some code which was already developed in another project. To accomplish that we defined some very simple goals:

  1. Follow the "micro services" paradigm, as much as possible, of small and independent components which expose an API to the "world" and thus will be reusable from anywhere at any time.

  2. Each component must include its own package.json to define dependencies as well as tests, scripts and more (with great dependency comes great responsibility)

  3. While the components has their own dependency tree they should share instances of the same modules. For example if moduleA and moduleB depends on mongoose they should use the same mongoose instance unless specified differently which means that shared libraries should not included in local "node_modules" unless necessary.

We extract these modules from our "oldProj", organised each with its own package.json and put it all in one shared place.  

Great and simple!!!

Well... not so much

  Due to the nature of our build scripts (different story) we didn't want to put each module in a git of its own. So to make things fast (and dirty) we used the notorious old school methodology of referencing modules using relative paths with the notion that we will fix it later.   When referencing moduleA from different places we end up with something like this: ---analytics---

var moduleA = require('../../../../core/moduleA');

---oldProj---

var moduleA = require('../../../../../../workspace/core/moduleA');

  Beside the ugliness of such code we encounter with another major problem: Since each of the new core modules has its own package.json, each of those modules came with its own "node_modules" folder therefore if both moduleA and moduleB requires mongoose in the same project, we end up with two instances of mongoose which led to other catastrophes.   Before I continue, let me first elaborate on what happened beneath the hood of node.js and module requirement mechanism.

  1. Require moduleA in the code as follow:

    var moduleA = require('../../../../core/moduleA');
  2. module.js resolve the relative path to an absolute path

  3. If there is an entry of that path in module.cache then it returns the instance referenced by that entry.

  4. If not it check if this is a core module (e.g. fs/path/http etc’). If so it returns its instance.

  5. Otherwise module.js starts to lookup for the module in all possible path (according to node_modules, NODE_PATH and global node_modules heuristic)

  6. When module is found it return an instance of that module and save an entry of that instance to the module.cache pointed by the resolved absolute file path.

  7. If not found an exception is thrown

As you may understand from that flow, if both moduleA and moduleB requires mongoose but each has its own node_module folder, module.cache will have two entries of mongoose and thus connections and other stateful properties will be different.   One solution to solve that problem was to require mongoose in the parent module (the application level) and supply its instance when requiring moduleA and moduleB however that introduce a dependency between the parent and these modules which we didn't want to create (e.g. why the parent need to know how moduleA and moduleB implements their persistent???)   Another solution was to change the way we install and require modules. When installing with npm you install:

  1. Locally in a node_module folder where your package.json located. We can automatically install all dependencies from package.json in this folder.

  2. Globally where npm install all its global. However we cannot install dependencies in package.json automatically to that folder (from obvious reason of course).

  3. Install to specific folder using the --prefix directive. However we cannot install dependencies from package.json automatically to that folder (not obvious at all).

I really liked the third option of installing a whole project's dependencies in one specific place but it has one caveat, how can my code require modules from that location. During the research for the solution I found a great article by "Bran van der Meer" on "Better local require() paths for node.js" He has several great solutions on how to get rid of the ugly relative path when requiring modules locally and I decided to use NODE_PATH as part of our solution. I've created the following script which can install dependencies written in local package.json in the first path defined in NODE_PATH (and some more stuff):

#!/usr/bin/env node
/**
* An extended install script which installs packages you have in package.json's dependencies or the packages you have mentioned
*  (similar to npm install) in the first path that is in $NODE_PATH
* This action allow you to install your packages in one repository and require('your_own_private_package') from this
*  private repository (e.g. local folder) without worry about relative path and without worry of duplication of
*  instances (for example module A and B each located in different places without the same root, but both requires
*  module C and both required by module D normally will create two instances of C.
*  However with this approach module D will cache module C for both of them using one instance.
* Created by kfirerez on 7/15/15.
*/

var npm = undefined;
try{
   npm = require("npm");
}catch(e){
   console.log('Cannot find npm. Try to include the global node modules (e.g. /usr/local/lib/node_modules) in NODE_PATH env.');
   console.log('Note that if you like to install your modules in special location other then global you should prefix NODE_PATH with that location');
   process.exit(1);
}
var packageJson = require('./package');

npm.load(packageJson, function (er, npm) {
   // use the npm object, now that it's loaded.

   var packages = resolvePackages(packageJson);
   var location = resolveEXModulesLocation();
   npm.commands.install(location, packages.list, function(){
       console.log('Install command arguments' + JSON.stringify(arguments));
   });
});

/**
* Returns a location as follow:
* 1. If --prefix paramter was specified with a location parameter after it, it will return that location.
* 2. If NODE_PATH contains one location and --allowOnePath is specified in the execution arguments or if it has more
*  than one paths in it.
* 3. Otherwise return './'
* @return {*}
*/
function resolveEXModulesLocation() {
   var paths = [];
   var allowOnePath = false;
   if(process.env.NODE_PATH){
       paths = process.env.NODE_PATH.split(':');
   }
   if(process.argv.length > 2){
       for(var i=2; i<process.argv.length; i++){
           if(process.argv[i] === '--prefix' &amp;&amp; i<process.argv.length-1){
               return process.argv[i+1];
           }
           if(process.argv[i] === '---allowOnePath'){
               allowOnePath = true;
           }
       }
   }
   if(!paths || paths.length === 0 || (paths.length === 1 &amp;&amp; !allowOnePath) ) {
       console.log('NODE_PATH is either empty or contain only one entry which is the global node modules location. ' +
           'If you like to use that entry specify --allowOnePath in the script arguments');
       return './';
   }

   return paths[0];
}

/**
* Resolve which packages need to install. Either a specified package or the packages listed in package.json dependencies
* @param packageJson
* @return {{list: Array}}
*/
function resolvePackages(packageJson){
   if(process.argv.length <=2 &amp;&amp; (!packageJson || !packageJson.dependencies)){
       console.error('You must specify package to install or include dependencies in package.json');
       process.exit(1);
   }

   if(process.argv.length <= 2){
       return {
           list: Object.keys(packageJson.dependencies)
       }
   }
   var packages = {
       list: []
   };
   for(var i=2; i<process.argv.length; i++){
       if(process.argv[i].indexOf('--') === 0){
           switch(process.argv[i]){
               case '--save':
               case '--save-dev':
               case '--save-optional':
               case '--save-exact':
                   //TODO:Fix or handle this issues
                   break;
               case '--prefix':
                   i++;//skip the next argument as well
                   continue;
           }
       }
       packages.list.push(process.argv[i]);
   }
   if(packages.list.length === 0){
       console.error('You must specify package to install or include dependencies in package.json');
       process.exit(1);
   }

   return packages;
}

And reference this script in your package.json (I actually created a component generator using slush to include the script and the reference in package.json by default):

{
 "name": "application-name",
 "version": "0.0.1",
 "scripts": {
   "installEX": "scripts/installEx.js"
 }
}

  To install lodash we will run (same as npm install):

npm run installEx lodash

To install lodash in /Users/me/myProject/core/node_modules:

npm run installEx lodash --prefix /Useres/me/myProject/core

To install all dependencies from package.json to the same place above (this is what I really wanted):

npm run installEx --prefix /Useres/me/myProject/core

And if you define NODE_PATH to that location (e.g. export NODE_PATH=/Useres/me/myProject/core )

npm run installEx --allowOnePath --prefix /Useres/me/myProject/core

  Once NODE_PATH is defined to that location you can require your modules without relative paths in your code and node.js will take them from the given path:

var moduleA = require('core/moduleA');

var mongoose = require('mongoose');

Everything is awesome we can install and require modules from specified place which is not node’s global location and we are happy!!!   Eventually we've decided to drop that solution and go with Sinopia… But that story is for another day (coming soon).

Stay up to date

We'll never share your email address and you can opt out at any time, we promise.