FileInputStream / FileOutputStream Considered Harmful

Written by: Stephen Connolly

Suricata suricatta -Auckland Zoo -group-8a (with captions added by Stephen Connolly)

Ok, so you have been given an array of bytes that you have to write to a file. You're a Java developer. You have been writing Java code for years. You got this:

public void writeToFile(String fileName, byte[] content) throws IOException {
  try (FileOutputStream os = new FileOutputStream(fileName)) {
    os.write(content);
  }
}

Can you spot the bug?

What about this method to read the files back again?

public byte[] readFromFile(String fileName) throws IOException {
  byte[] buf = new byte[8192];
  try (FileInputStream is = new FileInputStream(fileName)) {
    int len = is.read(buf);
    if (len < buf.length) {
      return Arrays.copyOf(buf, len);
    }
    ByteArrayOutputStream os = new ByteArrayOutputStream(16384);
    while (len != -1) {
      os.write(buf, 0, len);
      len = is.read(buf);      
    }
    return os.toByteArray();
  }
} 

Spotted the bug yet?

Of course the bug is in the title of this post! We are using FileInputStream and FileOutputStream .

So what exactly is wrong with that?

Have you ever noticed that FileInputStream overrides finalize() ? Same goes for FileOutputStream by the way

Every time you create either a FileInputStream or a FileOutputStream you are creating an object, even if you close it correctly and promptly, it will be put into a special category that only gets cleaned up when the garbage collector does a full GC. Sadly, due to backwards compatibility constraints, this is not something that can be fixed in the JDK any time soon  as there could  be some code out there  where somebody has extended FileInputStream / FileOutputStream and is relying on those finalize() methods ensuring the call to close() .

Now that is not an issue for short lived programs... or for programs that do very little file I/O... but for programs that create a lot of files, it can cause issues. For example, Hadoop found "long GC pauses were devoted to process high number of final references" resulting from the creation of lots of FileInputStream instances.

The solution (at least if you are using Java 7 or newer) is not too hard - apart from retraining your muscle memory - just switch to Files.newInputStream(...) and Files.newOutputStream(...)

Our code becomes:

public void writeToFile(String fileName, byte[] content) throws IOException {
  try (OutputStream os = Files.newOutputStream(Paths.get(fileName))) {
    os.write(content);
  }
}

public byte[] readFromFile(String fileName) throws IOException {
  byte[] buf = new byte[8192];
  try (InputStream is = Files.newInputStream(Paths.get(fileName))) {
    int len = is.read(buf);
    if (len < buf.length) {
      return Arrays.copyOf(buf, len);
    }
    ByteArrayOutputStream os = new ByteArrayOutputStream(16384);
    while (len != -1) {
      os.write(buf, 0, len);
      len = is.read(buf);      
    }
    return os.toByteArray();
  }
} 

A seemingly small change that could reduce GC pauses if you do a lot of File I/O!

Oh and yeah, we're making this change in Jenkins


Learn More

Want to learn the latest in Jenkins? Subscribe to the Jenkins newsletter , Continuous Information . This monthly newsletter contains all of the latest interesting and useful happenings in the Jenkins community, sent directly to your inbox. 

 

 


 

 

 

Stay up to date

We'll never share your email address and you can opt out at any time, we promise.