Playing trade-offs with Maven

Stephen Connolly's picture

Sometimes mailing lists are great, sometimes they are not. This post is a response to one of those cases where mailing lists are not great.

Hi all,
I have two jars from an external source and need to produce an über-jar consisting of my project plus those two external jar files.
What is the best way to do this?
Sincerely
Mr A. Random

This type of question comes up every so often. There is a hierarchy of solutions to this type of problem. In fact there are many. I will now present the hierarchy ranked in terms of what is best for the global ecosystem of Maven users.

1.Get thee to Central

The best possible solution to this problem is to get those external jars into the central repository. If that is the case your solution basically becomes:

    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
4.0.0
  com.your-company.your-project
your-module
1.0-SNAPSHOT
  

com.external-company.external-project
external-dep1
2.0


org.external-org.external-project
external-dep2
3.7






maven-shade-plugin
1.4


package

shade




com.external-company.external-project:external-dep1
org.external-org.external-project:external-dep2


true






This is by far the best solution for everyone in the Maven ecosystem:

  • Your jar files transitive dependencies are correct
  • Your IDE will put the shaded artifacts on its classpath as they are regular dependencies
  • Those dependencies are available for others to use
  • Assuming you are publishing your project to Central too, then others can do similar tricks or even tricks you have not anticipated without having to heavily hack their own POM files. 

In short, if you get your upstream dependencies into Central and you are putting your own project into Central, you are a first class citizen of the Maven ecosystem. You are making life better for everyone.


It is a little tricky to decide the relative placement of these next two items, so they will both get called #2

2. Get the external jars into a public Maven repository 

If you cannot convince the people responsible for these external jars to publish to Central, you might find that they are willing to agree on a half-way house, namely publishing to a public Maven repository hosted by somebody else (that somebody else is usually the people responsible for those external jars). The solution from before is nearly there, we just need to add the definitions to our POM:

    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
4.0.0
  com.your-company.your-project
your-module
1.0-SNAPSHOT
  

com.external-company.external-project
external-dep1
2.0


org.external-org.external-project
external-dep2
3.7


 external-companyhttp://repo.external-company.com/external-orghttp://maven-repo.external-org.org/
  


maven-shade-plugin
1.4


package

shade




com.external-company.external-project:external-dep1
org.external-org.external-project:external-dep2


true






This is less optimal than the previous solution because:

  • You now add checking two more Maven repositories for every dependency to both your build and anyone consuming your project. This is because Maven does not know what artifacts a repository contains and so must check each repository for the specified dependency, even if you know that dependency will never exist there, Maven doesn’t. And the problem explodes for each new repository included
  • In corporate environments, the best practice is to have a Maven Repository Manager coupled with * in the settings.xml that all employees use. That means that the employees will need to get corporate approval to add those external repositories to their internal Repository Manager. Ultimately this is a good thing, and why the corporates use a MRM, because it gives them control over the code that is being used, and isolates them from failure or disappearance of the upstream repository (their MRM will have permanently cached the artifacts)
  • In non-corporate environments, I now have to trust that those Maven repositories will not just vanish off the face of the interwebs. While you could argue that there is a similar risk with Central, the reality is that the contents of Central are mirrored to some third parties so if Sonatype were to vanish, the contents of Central are already mirrored to other backup stores managed by other organizations. A six line addition to your settings.xml is all that would be required to get you back up and building until the DNS entries for central were recovered. We cannot say the same for all those small personal/corporate public Maven repositories.  

Now it is not all bad, because:

  • Your jar files transitive dependencies are correct
  • Your IDE will put the shaded artifacts on its classpath as they are regular dependencies
  • Those dependencies are available for others to use
  • Someone could always take those artifacts from the repository and publish them to Central (assuming they meet the validation criteria for publishing to central) 

2. Get the external jars into the internal Maven repository

So these external jar files are closed source. So they cannot be published to Central or some other public Maven repository. Never mind. You can get them into your company’s internal Maven repository. There are lots of reasons why your company should be using its own internal Maven repository. The number one reason is that it isolates the company from the failure of external infrastructure.
If a company is managing its Maven use through an internal Maven repository, then the best practice for that company is to ensure that the internal repository is used to resolve all dependencies. That forces the build to break if somebody tries to pull a dependency from a repository that is not being mirrored and cached by the company’s internal repository. The company will typically mandate a settings.xml something like this:


...


your-company
http://repo.your-company.com/
*


...
 

So if that is the case we just have to upload these external jar files into the internal repository and reuse the solution as if the jars were in Maven Central:

    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
4.0.0
  com.your-company.your-project
your-module
1.0-SNAPSHOT
  

com.external-company.external-project
external-dep1
2.0


org.external-org.external-project
external-dep2
3.7






maven-shade-plugin
1.4


package

shade




com.external-company.external-project:external-dep1
org.external-org.external-project:external-dep2


true






This is by far the best solution for everyone in the Maven ecosystem within your company:

  • Your jar files transitive dependencies are correct
  • Your IDE will put the shaded artifacts on its classpath as they are regular dependencies
  • Those dependencies are available for others to use
  • Assuming you are publishing your project to the internal Maven repository too, then others can do similar tricks or even tricks you have not anticipated without having to heavily hack their own POM files. 
We now enter the realm of solutions that do not involve a Maven repository

4. Use the reactor (and Stephen’s non-maven-jar plugin)

I debated where to place this in the hierarchy, but finally I have settled on putting it here. The main reason is that it keeps closer to the way IDEs should expect Maven to work, but if you find this solution doesn’t work well for you, then the ANT task solution (i.e. the next one) is probably the best for you.
With this solution we need to split to a multi-module build. We will have a directory structure something like this

external-company/
pom.xml
src/
external-depl1.jar
external-org/
pom.xml
src/
external-dep2.jar
pom.xml
your-project/
pom.xml
src/
... 

The root POM will look something like this:

    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  com.your-company.your-project
your-parent
1.0-SNAPSHOT
pom 
  ...

external-company
external-org
your-project

...

...

...

com.github.stephenc.nonmavenjar
non-maven-jar-maven-plugin
1.0
true

...

...


 

The external project POMs will look like

    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
4.0.0
  
com.your-company.your-project
your-parent
1.0-SNAPSHOT

external-depl
non-maven-jar 
 

and

    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  4.0.0

com.your-company.your-project
your-parent
1.0-SNAPSHOT

external-dep2
non-maven-jar 

respectively, and finally the your-project/pom.xml will look quite similar to the solutions from before:

    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
4.0.0
 com.your-company.your-projectyour-parent1.0-SNAPSHOT
  your-module
  
 com.your-company.your-project
      external-dep1
 1.0-SNAPSHOT
    
 com.your-company.your-project
      external-dep2
 1.0-SNAPSHOT
    





maven-shade-plugin
1.4


package

shade




 com.your-company.your-project:external-dep1com.your-company.your-project:external-dep2
                 

true






Now this solution does not do any good for others in the Maven ecosystem, but it does have some plus points:

  • Not really that different from the other solutions.
  • We have used standard techniques (e.g. maven-shade-plugin and regular dependencies) so somebody looking at the your-company/pom.xml file on its own can quite readily discover what it is doing (assuming they are familiar with the shade plugin)
  • The only bit which looks slightly strange is the true in the parent POM and the
    non-maven-jar in the external-*/pom.xml files.

    Maven users should learn that true is required to define non-standard packaging types, so that makes it fairly obvious which plugin is pulling in the
    non-maven-jar. On top of that those POMs are exceedingly trivial, so there should be next to no learning curve.

  • If we eventually get to a stage where those external jars are available from a Maven repository, we can switch to those without making major changes.  

5. Use an ANT task

If you find yourself resorting to this solution a lot, then it might make sense to write a Maven plugin that does the same as this and publish it to Central so that everyone in the Maven ecosystem can benefit. But for now we will assume that there is something very one-time about this problem. One-time says it all. This is a one-off solution for a one-off problem. Use an ANT task.

We are already in the realm of solutions where we cannot rely on getting things into a Maven repository, and we should not build and consume a Maven plugin within the same multi-module reactor (there are always some tricks and hacks that can make it possible, but as a general rule you should not)

In some ways the ANT task looks simpler than the previous solution, but it is less Maven-like. What puts it ranking lower is that if at some future point in time these external dependencies end up in a Maven repository, then you have to completely undo all this solution to pull the dependencies from the Maven repository. You may, quite legitimately, view that as premature optimization. If these solutions were being ranked on that basis it would be probably right up there at the top. This is, however, being ranked on the basis of what is best for the global ecosystem of Maven users, so we assume that everything will eventually end up in a Maven repository, and on that basis we prioritize solutions that are closer to that ideal.

The directory structure is probably the simplest:

pom.xml
src/
external/
external-dep1.jar
external-dep2.jar
main/
... 

The POM itself looks like this:

    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
4.0.0
  com.your-company.your-project
your-module
1.0-SNAPSHOT
  





maven-antrun-plugin
1.7


generate-resources

run




dest="${project.build.outputDirectory}"/>
  dest="${project.build.outputDirectory}"/>
 







This solution at least has the advantage that it is not polluting the transitive dependencies of downstream projects. It is not making things better for people in the Maven ecosystem, but at the same time it is not making things worse. If you get to publish your project to a Maven repository, anyone else consuming via that repository can treat your project as a regular dependency (though the same can be said for the non-maven-jar solution, and that wins over this in that the external jars can also be consumed individually)

For somebody else coming to look at this solution and maintain it afterwards, well we have questions about why the specific phase was chosen, and we need to understand ANT, but this is not overly complex… if we start layering many many other hacks into our POM to make it do what we want, though, it could prove difficult to figure out exactly what is going on and why each phase has been chosen. So in this specific case this is a good solution, but the alarm bells are ready to start ringing.

Everything after this point is “Donny Don’t”. I am listing them here so that I can explain why you should not use them and also to illustrate that they are worse than the solutions above.

6. The file:///${basedir} repository hack

Sooner or later somebody will decide to try this one. I am not entirely sure if I invented this one, certainly I was responsible for a rather prominent use of this hack (in a plugin for the Jenkins/Hudson project some time back in 2007-8), with a corresponding slew of hate mail.

My advice is to steer well clear of this. You might think it is a solution. Technically it is if you are building one and only one project, and as given here, it is the absolute safest I can make it, but it is still not safe.

Project layout:

pom.xml
src/
main/
...
repo/
com/
external-company/
external-project/
external-dep1/
2.0/
external-dep1-2.0.pom
external-dep1-2.0.jar
org/
external-org/
external-project/
external-dep2/
3.7/
external-dep2-3.7.pom
external-dep2-3.7.jar
 

The POM file:

    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
4.0.0
  com.your-company.your-project
your-module
1.0-SNAPSHOT
  

com.external-company.external-project
external-dep1
2.0


org.external-org.external-project
external-dep2
3.7


 your-project-localfile:///${basedir}/src/repo/
  


maven-shade-plugin
1.4


package

shade




com.external-company.external-project:external-dep1
org.external-org.external-project:external-dep2


true






And here is the number one reason to avoid this:

You leak an extra to any downstream projects… except that ${basedir} will evaluate to a different directory depending on the context that Maven is resolving it. It could be:

  • The ${basedir} of the project invoking it
  • The ${basedir} of your project
  • The ${basedir} of your project when resolved from the local cache (i.e. ~/.m2/repository/com/your-company/your-project/your-module/1.0-SNAPSHOT/)

Basically you are pushing a world of pain on anyone consuming your-module as a dependency. That anyone could even be you if you happen to be bundling that jar in a ear/war.

There are even worse ways to do the above hack… you could make the dependencies provided and use dependency:unpack-dependencies rather than the shade plugin… that is worse because now your-module’s exported pom is calling out the dependencies too. OK so they are provided and so should not pollute the classpath, but that is assuming that the downstream consumer is Maven… the consumer downstream from you might be parsing the dependency tree to see what dependencies are supposed to be provided by the container so that it can provide them.

7. The system scope hack

In the realm of solutions for this problem, this is the absolute worst way to solve it.

To understand why, you need to know what system scope is for. System scope is for those rare dependencies that must be in specific locations on the JRE classpath, you know the ${java.home}/lib/ext/… ones.

So the path that you provide when using system scope is supposed to be a path starting with ${java.home}/lib.

Unfortunately Maven does not enforce this, and as a consequence enables the following hack.

The directory structure is as simple as the ANT task one:

pom.xml
src/
external/
external-dep1.jar
external-dep2.jar
main/
... 

The POM itself looks like this:

    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
4.0.0
  com.your-company.your-project
your-module
1.0-SNAPSHOT
  

com.external-company.external-project
external-dep1
2.0
system
${basedir}/src/external/external-dep1.jar


org.external-org.external-project
external-dep2
3.7
system
${basedir}/src/external/external-dep2.jar


  
    

maven-dependency-plugin
2.7


generate-resources

unpack-dependencies


${project.build.outputDirectory}
system






This suffers from all the same problems as the file:///${basedir} repository hack plus:

  • It leaks transitive dependencies to downstream consumers that are not Maven and may be doing other things with the dependency tree (e.g. the same complaint I make about using scope provided with the file:/// hack)
  • It will keep on trying to find the pom files for those system dependencies, thereby slowing the build
  • It is more complex than the ANT task solution, which does the same job with the same project layout and without any of the down sides.

Summary</

So there you have it dear readers (if you have stuck with me this far). There are many many ways to skin a cat with Maven, and there are almost as many ways to rank those potential solutions. But if you have to solve the “unpack external jars into my project and make an überjar” problem, I recommend picking one of the solutions 1-5 above and please please never pick 6 or 7.

If the external jars are open source, I would favour option 1 as the top priority.

If the external jars are closed source, I would favour option 2 (internal repo) as the top priority.

If deploying to a Maven repository is absolutely not an option I would pick one of option 4 or 5. They are both equally valid options. Deciding between 4 and 5 is a question of whether you believe a Maven repository to host those external artifacts could ever be on the cards. If you believe never then pick 5 as it is the simplest. If you believe there is greater than 20% chance they could end up in a Maven repo, I would suggest picking 4 (but you can pick your own threshold). Oh and one final argument in favour of option 4 over option 5, if you will have a second module that needs one of the external jars to make a second überjar, then option 4 wins as you don’t need two copies of the .jar file for the two projects in the same reactor.

—Stephen Connolly
CloudBees
www.cloudbees.com
his blog.

Blog Categories: 

Comments

BTW a variant of #4 that requires no special plugins is for the external JAR placeholder projects to just use the install plugin. https://jira.codehaus.org/secure/attachment/53864/MNG-1867.zip is a complete example. Compared to your plugin the configuration is slightly more verbose, though it has the ability to attach sources and Javadoc which is valuable; it has the disadvantage that ‘mvn test’ at top level will not work unless you first ‘mvn install’ at least once. Probably non-maven-jar-maven-plugin should be improved as per https://github.com/stephenc/non-maven-jar-maven-plugin/issues/1 and then mentioned in MNG-1867.
Stephen Connolly's picture

Actually the install plugin has issues. If you have a plugin bound to the invoked lifecycle that @requiresDependencyResolution then Maven 3.x will go looking for the dependencies, not find them and blow up the build *before* invoking any of the lifecycle on any of the modules. Hence I reject your "alternative" as it actually is so unreliable as to not be worth considering... it would be more "maven like" tha 6 & 7 except that it doesn't work consistently, which puts it below 6 & 7 as it is more pain than it is worth.

Add new comment