1. Input/Output Streams

In the previous chapter, as well as the chapter about exceptions, we have already dealt with the basic reading and writing of file contents. This chapter focuses on input and output streams and the continuous flow of data; it can be written to a destination or read from a source, passing through multiple filters. The nesting of Java’s input/output streams is a good example of abstraction and flexibility that also helps a lot when modeling your own filters.

Requirements

  • know type hierarchy of input/output classes

  • be able to distinguish between character-oriented and byte-oriented classes

  • understand decoration of streams

  • be able to send stream data through filters

  • be able to compress data

Data types used in this chapter:

1.1. Direct data streams

In Java, four different types are used: InputStream and OutputStream (byte-oriented reading and writing) and Reader and Writer (character-oriented reading and writing). We start with the exercises first with just those streams that directly write data to a resource or directly read from a resource.

1.1.1. get number of different places (read files) ⭐

Captain CiaoCiao gets two text files, and they look very much the same at first glance. But he wants to know exactly if the two files match exactly or if there are differences.

Exercise:

  • Write a method long distance(Path file1, Path file2) that returns the number of different characters. In computer science this is called Hamming distance.

  • It is assumed that the two files are exactly the same length.

Example: A file contains the string

To Err is Human. To Arr is Pirate.

and another file contains the string

To Arr is Human. To Err is Pirate!

The distance is 3 because 3 symbols do not match.

1.1.2. Convert Python program to Java (write file) ⭐

In the chapter "Imperative Language Concepts", we completed several exercises that write SVG output to the screen — now we want to write that output directly to HTML files. As a reminder, the following HTML contains an SVG with a rectangle of height and width 1 and x/y coordinates 10/10:

<!DOCTYPE html>
<html><body>
 <svg width="256" height="256">
  <rect x="10" y="10" width="1" height="1" style="fill:rgb(0,29,0);" />
 </svg>
</body></html>

In a book about computer generated art, Captain CiaoCiao finds an illustration on the first few pages. The pattern is generated by a Python program:

import Image, ImageDraw

image = Image.new("RGB", (256, 256))
drawingTool = ImageDraw.Draw(image)

for x in range(256):
    for y in range(256):
        drawingTool.point((x, y), (0, x^y, 0))

del drawingTool
image.save("xorpic.png", "PNG")

The Python function point(…​) gets the x-y coordinate and RGB color information, where the three arguments 0, x^y, 0 represent the red, green, blue components.

Exercise:

  • Since Captain CiaoCiao does not like snakes, the Python program must be converted to a Java program.

  • Instead of a PNG file, end up with an HTML file with an SVG block where each pixel is a 1 × 1 SVG rectangle.

Bonus: At the end, open the HTML file with the browser — the desktop class will help you here.

1.1.3. Generate target code (write file) ⭐

From the post office, Captain CiaoCiao is getting more and more letters with pink barcodes. At first he thinks of encoded love letters from Bonny Brain, but then he realizes that there is a so-called destination code on the envelope, which encodes the postal code.

The encoding of the numbers in dashes is as follows, where the underscore _ symbolizes the spacing by a space:

Table 1. Values and encodings
valueencoding

0

| | | |

1

| | | _

2

| | _ |

3

| | _ _

4

| _ | |

5

| _ | _

6

| _ _ |

7

_ | _ |

8

_ | | |

9

_ | | _

Exercise:

  • Write a static method writeTargetCode(String, Writer) that writes a string of digits in the named encoding to a writer.

  • There should be two spaces between the four symbols for a digit.

Example:

  • The string "023" is written to the file as ||||  || |  ||  .

Obtain a Writer from Files to be able to write in the files.

1.1.4. Convert file contents to lowercase (read and write file) ⭐

Text conversions from one format to another are common operations.

Exercise:

  • Open a text file, read each character, convert it to lowercase, and write it to a new file. Write a method that does this and call it convertFileToLowercase(Path inPath, Path outPath).

1.1.5. Convert PPM graphics to ASCII grayscale ⭐⭐⭐

Generating pixel graphics is always a bit more complex because of the different formats. However, there is PPM (Portable Pixel Map), a very simple ASCII-based file format. The specification (http://netpbm.sourceforge.net/doc/ppm.html) is rather simple and a Java program can easily generate PPM images. A disadvantage under Windows, however, is that third-party programs are required for display, such as the free software GIMP (https://www.gimp.org/).

The following example shows the basic structure of a PPM file:

P3
3 2
255
255   0   0
  0 255   0
  0   0 255
255 255   0
255 255 255
  0   0   0

There are various tokens separated by white space. We define the following rules:

  • The first token is the identifier P3.

  • It is followed by the width and height of the image.

  • The maximum color value follows, we always assume 255.

  • Red, green, blue values follow for all pixels from top left to bottom right.

  • Height and width and the color values are always positive.

Exercise:

  • Read a PPM file, and extract all color values.

  • Transfer each color value to a grayscale value.

  • Each point of the graphic should become an ASCII character. Convert each grayscale value from 0 to 255 to an ASCII character.

  • Allow in the program the parameterization of the conversion of the RGB values to the grayscale value, so that the algorithm is interchangeable.

  • Allow the parameterization of the conversion from the grayscale value to the ASCII character.

For conversion to grayscale value the following interface and constant can be used:

[[source,java]

public interface RgbToGray {
  RgbToGray DEFAULT = (r, g, b) -> (r + g + b) / 3;
  int toGray( int r, int g, int b );
}

Java provides a mapping from (int, int) to an int with the IntBinaryOperator, but there is no functional interface that has three parameters.

The average method is performant, but does not match human perception. More realistic mappings take into account that average people perceive colors differently. The well-known 'luminosity method' is: 0.21 R + 0.72 G + 0.07 B.

The interface IntUnaryOperator can be used very well for mapping a grayscale value (int) to an ASCII character (char, expanded to int). A default converter may look like this:

public enum GrayToAscii implements IntUnaryOperator {
  DEFAULT;

  private final char[] ASCII_FOR_SHADE_OF_GRAY =
    // black = 0, white = 255
    "@MBENRWDFQASUbehGmLOYkqgnsozCuJcry1v7lit{}?j|()=~!-/<>\"^_';,:`. ".toCharArray();
  private final int CHARS_PER_RGB = 256 / ASCII_FOR_SHADE_OF_GRAY.length;
  @Override public int applyAsInt( int gray ) {
    return ASCII_FOR_SHADE_OF_GRAY[ gray / CHARS_PER_RGB ];
  }
}

The given string [1] is 64 characters long. Basically, this means black becomes @, and white becomes a space.

Example:

  • The result for upper PPM is:

    kkk
    ? @

1.1.6. Split files (read and write files) ⭐⭐

On Anaa Atoll, the port software has been running on a Commodore PC-30 for about 40 years. Bonny Brain has manipulated the computer successfully, but now the software needs an update, which must be installed via floppy disks. 3.5-inch HD floppy disks can store 1,474,560 bytes (1440 KiB) by default. The software update doesn’t fit on a floppy disk, so software is needed to break up a large file into several small files in a "disk-compatible" manner.

Exercise:

  • Write a program that is passed a file name on the command line and then splits that file into several smaller parts.

Example:

  • The call looks like this:

    $ java com.tutego.exercise.io.FileSplitter Hanjaab.bin

    If the file Hanjaab.bin is 2440 KiB in size, then the Java program will turn it into the files Hanjaab.bin.1 and Hanjaab.bin.2 with sizes 1440 KiB and 1000 KiB.

1.2. Nesting streams

Streams can be nested like Russian dolls; one stream is the actual resource in the core, and other streams are wrapped around it like a hull. Operations that go through the wrappers eventually go into the core.

1.2.1. Quiz: DataInputStream and DataOutputStream ⭐

DataInputStream and DataOutputStream are decorators that enhance a simple InputStream and OutputStream respectively.

1.2.2. Compress number sequences with the GZIPOutputStream ⭐

java.util.zip.GZIPOutputStream is a special output stream that compresses data without loss.

Exercise:

  • Create a compressed file with numbers from 0 to < N written to a GZIPOutputStream using writeLong(…​).

  • Compare the file sizes for different N.

  • At which N is compression worthwhile?

1.3. Serialization

Java uses serialization to allow object states to be written to a data stream, and then later to recreate the object from a data stream; this process is called deserialization.

To convert Java objects to a binary stream and vice versa, the classes ObjectOutputStream and ObjectInputStream are used; all object types to be serialized must be Serializable. We will use the types in the next exercises and see practical examples of serialization. (Serialization(ObjectOutputStream

Both classes are typical decorators:When serializing, the ObjectOutputStream determines the data and writes the serialized byte sequences to the OutputStream specified in the constructor — when reading, it is the other way around, here the ObjectInputStream reads from a passed InputStream.

1.3.1. (de)serialize data for chat and convert it to text ⭐⭐

A chat program should be used to transmit Java objects. However, the chat program can only transmit ASCII characters. Therefore, the objects must not only be (de)serialized, but also converted to or from text format.

Exercise:

  • Write a method String objectToBase64(Object) that serializes an object, then compresses it with a DeflaterOutputStream and returns it Base64 encoded.

  • Write a method deserializeObjectFromBase64(String) that will wrap a Base64 encoded string into a byte stream, unpack it with the InflaterInputStream, and use it as a source for deserialization.

To convert binary data into a string and vice versa, the Base64.Encoder and Base64.Decoder and especially the wrap(…​) method can help.

1.3.2. Quiz: Requirement for serialization ⭐

If we form an object from the following class Inputs, can we serialize it using the ObjectOutputStream? Or what preconditions might not be met?

class Inputs {
  public static class Input {
    String input;
  }
  public List<Input> inputs = new ArrayList<>();
}

1.3.3. Save last inputs ⭐⭐

Bonny Brain regularly uses the STRING2UPPERCASE application, which at its core looks like this:

for ( String line; (line = new Scanner( System.in ).nextLine()) != null; )
  System.out.println( line.toUpperCase() );

But now every user input should be stored in the file system, so that at startup the application displays the input made.

Exercise:

  • Set the following container for all input in the project:

    class Inputs implements Serializable {
      public static class Input implements Serializable {
        String input;
      }
      public List<Input> inputs = new ArrayList<>();
    }
  • Whenever a user input is made, it shall be included in an Inputs object.

  • After each input, Inputs shall be serialized to a file.

  • When the application restarts, all serialized values shall be displayed on the screen at the beginning. Exceptions due to non-existent files or wrong file formats can be logged, but shall be ignored.

  • In Input change the data type String of the object variable input to the data type CharSequence. Restart the program. What happens during the deserialization of inputs? Are there any problems?

  • Set in Inputs and in Input the line

    private static final long serialVersionUID = 1;
  • Restart the program and serialize new data.

  • In Input add the line

    LocalDateTime localDateTime = LocalDateTime.now();

    for an additional object variable. Restart the program: what happens or doesn’t happen?

1.4. Suggested solutions

1.4.1. get number of different places (read files)

com/tutego/exercise/io/HammingDistance.java
public static long distance( Path file1, Path file2 ) throws IOException {

  long filesize1 = Files.size( file1 );
  long filesize2 = Files.size( file2 );

  if ( filesize1 != filesize2 )
    throw new IllegalStateException(
        String.format( "File size is not equal, but %d for %s and %d for %s",
                       filesize1, file1, filesize2, file2 ) );
  long result = 0;

  try ( Reader input1 = Files.newBufferedReader( file1 );
        Reader input2 = Files.newBufferedReader( file2 ) ) {

    for ( int i = 0; i < filesize1; i++ )
      if ( input1.read() != input2.read() )
        result++;
  }

  return result;
}

One important requirement is an equal file size. Therefore, the program first retrieves the file sizes, compares them, and if they do not match, an IllegalStateException follows. The error message is very precise and conveys which file has which size, so outsiders can easily understand the error.

If all goes well, in the next step we build two resources for the two files. We call the Files method newBufferedReader(…​) for a Reader. There are two reasons for this method: first, we want to process strings and not binary streams, hence the Reader and not an InputStream. Second, buffering is important for performance reasons and newBufferedReader(…​) returns a Reader with internal buffer. Individual characters are read from the internal buffer, and there is no file system access for each individual character, which would be slow.

Since we already know the number of characters, a loop runs and asks for one symbol from each of the two streams. If the symbols do not match, we increment a counter, which we return at the end.

The try-with-resources closes the two streams again, even if there should be an error in processing. The method does not handle exceptions, but passes them on to the caller. Errors can occur when requesting the file size, opening the file, and reading the character.

1.4.2. Convert Python program to Java (write file)

com/tutego/exercise/io/XorFractal.java
  final String filename = "xorpic.html";
  try {
    try ( Writer out = Files.newBufferedWriter( Paths.get( filename ) );
          PrintWriter printer = new PrintWriter( out ) ) {

      printer.println( "<!DOCTYPE html>" );
      printer.println( "<html><body><svg width=\"256\" height=\"256\">" );

      for ( int x = 0; x < 256; x++ )
        for ( int y = 0; y < 256; y++ )
          printer.printf(
              "<rect x=\"%d\" y=\"%d\" width=\"1\" " +
              "height=\"1\" style=\"fill:rgb(0,%d,0);\" />",
              x, y, x ^ y );

      printer.println( "</svg></body></html>" );
    }
    Desktop.getDesktop().open( new File( filename ) );
  }
  catch ( IOException e ) {
    e.printStackTrace();
  }
}

The Java and Python languages are very different, and the libraries vary as well. Therefore, there is little in common in the code, almost everything is different.

There are several ways to write to files in Java. The common classes are: FileOutputStream, FileWriter, PrintWriter, Formatter. Types based on OutputStream are omitted, because we don`t want to write bytes, but Unicode characters. Since format strings are quite useful, FileWriter is dropped, and Formatter is left out because it can only write formatted strings, but not just strings without format strings.

Since something can always go wrong with input/output, the Java methods throw exceptions that we have to handle. This is what the first try block takes care of. It catches every IOException.

Files are resources that need to be closed; therefore, the creation of the resource is also put into a try-with-resources block. This particular block does not have a catch branch, because it is supposed to try-with-resources only to close the resource again at the end — any error handling is handled by the outer try-catch block.

First we build a BufferedWriter, then we decorate it with a PrintWriter so that we also have a method for writing formatted strings.

The next step is to write the prolog of the HTML file. In the two nested loops, printf(…​) writes the SVG rectangle to the data stream. The three values in Python are the color values for RGB, where the red and blue parts are 0, so they remain unused. The program writes only the green part, as XOR of the coordinates x and y. The value ranges of x and y are between 0 and 255, and this also happens to be the maximum value for the 8-bit RGB color values.

After passing through the two loops, the try-with-resources block closes the open stream. The fact that the two try blocks are so strangely nested at first sight is due to the fact that after the end of a successful write, the file is to be opened with a browser. However, we have to consider two peculiarities: The try-with-resources must first write and close the file before we are allowed to reopen it for viewing. And we are only allowed to open the file if it was really written without errors. If there was an error while writing, then there must be no opening of the file. This logic converts these two nested write blocks.

1.4.3. Generate target code (write file)

com/tutego/exercise/io/Zielcode.java
private static final String[] ZIELCODE = {
    "||||",     // 0000 = 0
    "||| ",     // 0001 = 1
    "|| |",     // 0010 = 2
    "||  ",     // 0011 = 3
    "| ||",     // 0100 = 4
    "| | ",     // 0101 = 5
    "|  |",     // 0110 = 6
    " | |",     // not 0111 = 7 but 1010 = 10
    " |||",     // 1000 = 8
    " || " };   // 1001 = 9

public static void writeZielcode( String string, Writer writer )
    throws IOException {
  for ( int i = 0; i < string.length(); i++ ) {
    int value = Character.getNumericValue( string.charAt( i ) );
    if ( value >= 0 && value <= 9 ) {
      writer.write( ZIELCODE[ value ] );
      if ( i != string.length() - 1 )
        writer.write( "  " );
    }
  }
}

To solve the task, we need to loop a String character by character and map the character to the symbol sequence. There are different approaches. For example, we could compare the digit to a switch-case and then write the corresponding string to the writer. Another solution offers a map, which we can build up beforehand with a composite of characters with the target code. The proposed solution shown here uses an array, where the entries correspond exactly to the corresponding target codes of this position.

A switch-case can make a case distinction directly on the char, but for an index on the array we need the numeric value; here Character.getNumericValue(…​) helps. The big advantage of this method is that it works for all digits in all languages. A valid result is in the value range between 0 and 9, with this number you can access the array and then write the value into the writer. If we have not yet reached the last digit in the input string, two spaces are written as separators.

The listing contains comments for the array, which show well that the dashes and spaces are in principle nothing else than a binary representation of the number. An anomaly is the number 7, which is not represented as the predictable bit pattern 0111, but with 1010, i.e. symbolically _ | _ |; 1010, however, would be the bit pattern for the number 10. If | _ _ were to represent 7, too much white space would be involved, which could irritate readers — again, the underscore is symbolic for the space.

If we interpret the number as a bit pattern, then a slightly different solution can be programmed that does not require an array:

com/tutego/exercise/io/Zielcode.java
String string = "0123456789";
for ( int i = 0; i < string.length(); i++ ) {
  BigInteger v = new BigInteger(
      string.charAt( i ) == '7' ? "10" : string.substring( i, i + 1 ) );
  System.out.print( v.testBit( 3 ) ? ' ' : '|' );
  System.out.print( v.testBit( 2 ) ? ' ' : '|' );
  System.out.print( v.testBit( 1 ) ? ' ' : '|' );
  System.out.print( v.testBit( 0 ) ? ' ' : '|' );
  System.out.print( "  " );
}

As usual, we loop through the string and first check if there is a 7 at the position. If, we transfer the digit to the string "10"; if the digit is not a seven, we cut out a String of length 1 with exactly the character using substring(…​). The result in both cases is a string. This string goes into the constructor of BigInteger for initialization. BigInteger has a handy method testBit(…​) which answers with true or false whether a bit is set at a position or not. We only have to query the bits 3, 2, 1 and 0 and depending on that either set a space or a vertical bar. Contrary to what the task requires, the output appears directly on the screen.

1.4.4. Convert file contents to lowercase (read and write file)

com/tutego/exercise/io/ConvertFileToLowercase.java
private static final int EOF = -1;

static void convertFileToLowercase( String source, String target )
    throws IOException {
  convertFileToLowercase( Paths.get( source ), Paths.get( target ) );
}

static void convertFileToLowercase( Path source, Path target )
    throws IOException {
  try ( BufferedReader reader = Files.newBufferedReader( source );
        BufferedWriter writer = Files.newBufferedWriter( target ) ) {
    for ( int c; (c = reader.read()) != EOF; )
      writer.write( Character.toLowerCase( (char) c ) );
  }
}

The proposed solution first declares a private static variable EOF, which we will use later because we run through the file character by character, and -1 signals that there are no more characters in the stream.

The actual method convertFileToLowercase(…​) is overloaded once with the parameter type String and once with the parameter type Path. The variant with the filenames creates Path objects and delegates to the actual conversion, to the second method.

Given a Path for the input file and a Path for the output file, we can use the Files methods to request a Reader and Writer. Both objects have the nice property that they buffer automatically, so character-by-character processing is much faster than if Reader and Writer were not buffered. When reading, BufferedReader first creates an 8 KiB buffer, which is then filled to the maximum. Reading of single characters takes place from this buffer first. When writing, the same applies: First all data is collected in an internal buffer and when the buffer is full, the BufferedWriter writes the data of the buffer into the output stream below.

The for loop declares a variable c for the character to be read. In the condition expression of the for loop, the program first reads a character and assigns the result to the variable c; in the next step, it compares with EOF. The loop runs as long as characters can be read. In the body of the loop, the character is converted to an uppercase letter and written to the Writer.

1.4.5. Convert PPM graphics to ASCII grayscale

com/tutego/exercise/io/PPM.java
class PPM {

  public interface RgbToGray {
    RgbToGray DEFAULT = (r, g, b) -> (r + g + b) / 3;
    int toGray( int r, int g, int b );
  }

  public enum GrayToAscii implements IntUnaryOperator {
    DEFAULT;

    // black = 0, white = 255
    private static final char[] ASCII_FOR_SHADE_OF_GRAY =
        "@MBENRWDFQASUbehGmLOYkqgnsozCuJcry1v7lit{}?j|()=~!-/<>\"^_';,:`. "
            .toCharArray();
    private static final int CHARS_PER_RGB =
        256 / ASCII_FOR_SHADE_OF_GRAY.length;
    @Override public int applyAsInt( int gray ) {
      return ASCII_FOR_SHADE_OF_GRAY[ gray / CHARS_PER_RGB ];
    }
  }

  private static final String MAGIC_NUMBER = "P3";

  private PPM() { }

  private static String nextStringOrThrow( Scanner scanner, String msg ) {
    if ( ! scanner.hasNext() )
      throw new IllegalStateException( msg );
    return scanner.next();
  }

  private static int nextIntOrThrow( Scanner scanner, String msg ) {
    if ( ! scanner.hasNextInt() )
      throw new IllegalStateException( msg );
    int number = scanner.nextInt();
    if ( number < 0 )
      throw new IllegalStateException( "Value has to be positive but was "
                                       + number );
    return number;
  }

  public static void renderP3PpmImage( Readable input, RgbToGray rgbToGray,
                                       IntUnaryOperator grayToAscii,
                                       Appendable output )
      throws IOException {

    Scanner scanner = new Scanner( input );

    // Header P3
    String magicNumber = nextStringOrThrow( scanner,
                                            "End of file, missing header" );
    if ( ! magicNumber.equals( MAGIC_NUMBER ) )
      throw new IllegalStateException( "No P3 image file, but " + magicNumber );

    // Width Height
    int width  = nextIntOrThrow( scanner,
                                 "End of file or wrong format for width" );
    int height = nextIntOrThrow( scanner,
                                 "End of file or wrong format for height" );

    // Max color value
    int maxVal = nextIntOrThrow( scanner,
                                 "End of file or wrong format for max value" );
    if ( maxVal != 255 )
      throw new IllegalStateException(
          "Only the maximum color value 255 is allowed but was " + maxVal );

    // Matrix
    for ( int y = 0; y < height; y++ ) {
      for ( int x = 0; x < width; x++ ) {
        int r = nextIntOrThrow( scanner,
                                "End of file or wrong format for red value" );
        int g = nextIntOrThrow( scanner,
                                "End of file or wrong format for green value" );
        int b = nextIntOrThrow( scanner,
                                "End of file or wrong format for blue value" );
        int gray = rgbToGray.toGray( r, g, b );
        output.append( (char) grayToAscii.applyAsInt( gray ) );
      }
      output.append( '\n' );
    }
  }

  public static void renderP3PpmImage( Readable input, Appendable output )
      throws IOException {
    renderP3PpmImage( input, RgbToGray.DEFAULT, GrayToAscii.DEFAULT, output );
  }
}

Since the class has only static methods, a constructor is not necessary and it is set privately. The class does not store its own object states.

For retrieving consecutive tokens the class Scanner is useful. Two kinds of errors can occur: Data can be missing in the stream, or the data type is wrong. Two helper methods nextStringOrThrow(…​) and nextIntOrThrow(…​) simplify the reading of strings and integers respectively, and raise an exception if there is no token in the stream. The method for reading integers also checks whether the number is incorrectly negative, and also throws an IllegalStateException in that case.

Accessible from the outside are the two overloaded methods renderP3PpmImage(…​). Let’s start with the entire method first, which has four paramters:

  1. an input from a Readable

  2. a mapping of type RgbToGray for the RGB value to the grayscale value

  3. a mapping of type IntUnaryOperator for the conversion of a grayscale value to the ASCII value

  4. an Appendable as output destination for writing the result The method may throw a potential IOException because, as usual, input/output errors may occur during reading and writing.

The Scanner is connected to the Readable, which is the source from which it can read data. We fetch a token and expect a special header, P3. This is the only use of the nextStringOrThrow(…​) method.

After reading the header, the height and width must follow. They must not be negative; however, the assignment 0 will not lead to an error, we want to allow that. Afterwards the largest possible color value is read in, which according to our definition must always be 255. In principle the standard allows arbitrary values, but we simplify this.

Once we have the height and width, we can write two nested loops, each reading the three color tones. In principle, one loop would also suffice, but the program may want to refer to the x/y coordinates of the points later. After reading the RGB values, the converter function is called, and the grayscale tone is created, which then becomes the ASCII character via the next mapping. The result from the IntUnaryOperator is an int, which we convert to a char and write to the output stream. At the end of the line we write a newline.

The second method, renderP3PpmImage(…​) accesses the default implementations of the two mappings. Users of the library can choose to use the default converters or pass in their own images.

1.4.6. Split files (read and write files)

com/tutego/exercise/io/FileSplitter.java
private static final int EOF = -1;

private static void splitFile( Path source, int size ) throws IOException {
  Objects.requireNonNull( source );
  Objects.checkIndex( size, Integer.MAX_VALUE );

  try ( InputStream fis = Files.newInputStream( source ) ) {
    byte[] buffer = new byte[ size ];
    for ( int cnt = 1, remaining;
          (remaining = fis.read( buffer )) != EOF;
          cnt++ ) {
      Path path = Paths.get( source + "." + cnt );
      try ( OutputStream fos = Files.newOutputStream( path ) ) {
        fos.write( buffer, 0, remaining );
      }
    }
  }
}

public static void main( String[] args ) {

  if ( args.length == 0 ) {
    System.err.println( "You need to specify a file name to split the file." );
    return;
  }

  try {
    String filename = args[ 0 ];
    splitFile( Paths.get( filename ), 1_474_560 );
  }
  catch ( IOException e ) {
    System.err.println( e.getMessage() );
  }
}

If we later work via the read(…​) method, it will return -1 if no new bytes can be read. For this we introduce a constant EOF.

Our splitFile(…​) method takes a path to the file and the size. The path could be null and the index negative. Although an exception would be thrown later because of this, we want to check the correctness beforehand. Here we turn to two static methods of the Objects class.

If there are input/output exceptions in the following, splitFile(…​) does not catch them — how should also the handling look like? — but passes them upwards. There may be exceptions when opening the file, reading the contents, and writing.

In the first step, we open the file for reading. Since the file is to be processed byte by byte, we obtain an InputStream. This is a resource that the program has to close at the end in any case, so try-with-resources is used. Splitting can be done in two different ways: One possibility would be to open an OutputStream, read bytes from the InputStream and write them to the OutputStream; this would be very memory efficient. The other option is chosen here and saves some program code, but bears the risk for an OutOfMemoryError because the solution reads in an entire byte array in one go, and this array is as large as the passed size. However, with the intended size of a floppy disk this is not to be expected and reading into the buffer and writing directly gives good performance.

The size of the array is size byte. We use the array repeatedly in the loop. In the loop we declare two variables, once a counter for a generated file extension and a variable remaining for the number of actually read bytes from the input stream. The actual reading is implemented in the condition part of the for loop, and after the reading we get the variable remaining updated, which is either -1 or contains the number of bytes actually read.

In the body of the loop, the byte array is written to the file. Files.newOutputStream(…​) returns an output stream. The first argument to the method is a generated Path object that takes the file name and appends a counter after the dot. On the OutputStream write(byte[], int, int) writes the filled part of the array to the file. If the loop is run multiple times, the byte array is guaranteed to be completely filled by the second to last run. In the last pass, probably not size many bytes are read, so less bytes of the array must be written, it is always remaining <= size.

The main(…​) method checks if an argument was passed on the command line, and if so, fileSplit(…​) is called with the argument. We write exceptions to the error output channel.

1.4.7. Quiz: DataInputStream and DataOutputStream

The classes FilterInputStream, FilterOutputStream and also FilterReader and FilterWriter are useful superclasses for custom filters. On the one hand they store the wrapped object in an object variable, and on the other hand they help that not every method from the InputStream, OutputStream, Reader or Writer has to be implemented from scratch. To understand this in more detail let’s take a look at the implementation of FilterOutputStream:

Snippet from the OpenJDK implementation of FilterOutputStream
package java.io;

public class FilterOutputStream extends OutputStream {
    protected OutputStream out;
    // some fields omitted

    public FilterOutputStream(OutputStream out) {
        this.out = out;
    }

    @Override
    public void write(int b) throws IOException {
        out.write(b);
    }

    @Override
    public void write(byte b[]) throws IOException {
        write(b, 0, b.length);
    }

    @Override
    public void write(byte b[], int off, int len) throws IOException {
        if ((off | len | (b.length - (len + off)) | (off + len)) < 0)
            throw new IndexOutOfBoundsException();

        for (int i = 0 ; i < len ; i++) {
            write(b[off + i]);
        }
    }

   // flush() / close() omitted
}

The code makes it clear that only the write(int) method passes to the wrapped stream and the other two write(…​) methods only call write(int). Custom filters are only required to override write(int). For performance reasons, however, it makes sense to implement the other methods as well, because writing entire byte arrays is faster than starting a single write access for each element of an array.

DataInputStream and DataOutputStream are special filter classes, as the UML diagram shows in more detail at method level for DataInputStream; in addition, they implement DataInput and DataOutput.

DataInputStream DataOutputStream UML
Figure 1. UML diagram

The abstract superclasses InputStream and OutputStream work only with byte and byte arrays, just as the superclasses Reader and Writer work only with the data type char, char array or String respectively CharBuffer. The special feature of the classes DataInputStream and DataOutputStream is that they also provide methods for other primitive data types. Thus, integers or floating point numbers can also be read and written. This is a typical example of a decorator that provides a more powerful API and goes for the simple methods in the background. The implementation of readInt() is a good example of how this works:

OpenJDK implementation of readInt() from DataInputStream
public final int readInt() throws IOException {
    int ch1 = in.read();
    int ch2 = in.read();
    int ch3 = in.read();
    int ch4 = in.read();
    if ((ch1 | ch2 | ch3 | ch4) < 0)
        throw new EOFException();
    return ((ch1 << 24) + (ch2 << 16) + (ch3 << 8) + (ch4 << 0));
}

A DataInputStream is a FilterInputStream that references in the protected variable in exactly the data stream that the DataInputStream wraps. To read an integer, four individual bytes must be read from the underlying resource. In the next step, the individual bytes must be positioned and added for the int. Very cleverly done is the query if one of the read(…​) operations results in -1, which would mean the end of the data stream. The individual returns are or-linked, and if one of the values were negative, the overall result would also become negative.

1.4.8. Compress number sequences with the GZIPOutputStream

com/tutego/exercise/io/CompressLotOfNumbers.java
Path tempFile = Files.createTempFile( "numbers", "bin.Z" );

final int n = 4;

try ( OutputStream     fos = Files.newOutputStream( tempFile );
      OutputStream     gos = new GZIPOutputStream( fos );
      DataOutputStream out = new DataOutputStream( gos ) ) {
  for ( int i = 0; i < n; i++ )
    out.writeLong( i );
}

System.out.println( "Uncompressed: " + n * Long.BYTES );
System.out.println( "Compressed:   " + Files.size( tempFile ) );

Files.delete( tempFile );

For the example, we do not want to create a file in the current directory, but in the temp directory of the operating system. This should be deleted periodically, and also we try in Java to delete the temporary file again at the end of the program. After initializing the constant n, which we can easily change later for examples, we build up three streams, which are nested. Since all these streams are of type AutoCloseable, we use a try with resources. The nested streams are like nested rings: In the innermost ring is the output stream, which writes to files. Around it is a stream that compresses: Everything written to the GZIPOutputStream is compressed and then written to the file stream. The last ring is the decorator, with a more powerful API. Therefore, the data type in the try-with-resources is no longer OutputStream, but DataOutputStream, because it has the desired writeLong(long) method. DataOutputStream wraps around the compressing data stream. If we write something into the DataOutputStream, the data is passed to the GZIPOutputStream. The GZIPOutputStream passes the data to the output stream so that it is written.

In the loop we write n numbers. At the end, we calculate how big the file would be if the data were not compressed. We don’t need to create an actual file to do this, we can easily calculate the file size. To get the compressed size, we resort to the Files.size(…​) method; another solution would have been to count the number of bytes flowing through the streams right away — but we didn’t do that here.

The uncompressed size of the file would be 8,000,000 bytes, the compressed size 2,129,303 bytes. Since the data is written as long, many bits are 0. Generated are the bit patterns (the last number stands for 999999, spaces separate the byte blocks):

00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000001
...
00000000 00001111 01000010 00111110
00000000 00001111 01000010 00111111

The compressed file is about a quarter of the original size. Compression of this particular sequence is worthwhile for four or more long elements.

1.4.9. (de)serialize data for chat and convert it to text

The solution consists of two parts: a method for mapping an object to a string, and a mapping of a string representation back to an object. Let’s start with the mapping to a string:

com/tutego/exercise/io/ObjectBase64.java
public static String serializeObjectToBase64( Object object ) {
  ByteArrayOutputStream baos = new ByteArrayOutputStream();

  try ( OutputStream b64os     = Base64.getEncoder().wrap( baos );
        OutputStream dos       = new DeflaterOutputStream( b64os );
        ObjectOutputStream oos = new ObjectOutputStream( dos ) ) {
    oos.writeObject( object );
  }
  catch ( IOException e ) {
    throw new IllegalStateException( e );
  }

  try {
    return baos.toString( StandardCharsets.US_ASCII.name() );
  }
  catch ( UnsupportedEncodingException e ) {
    throw new IllegalStateException( e );
  }
}

The ObjectOutputStream writes the bytes to a DeflaterOutputStream, which compresses the bytes and then passes them on to a OutputStream of the Base64 class. The API is a bit unusual, because normally the target is specified in the constructor of the classes; however, for the Base64 encoder and decoder, this is the wrap(…​) method. The Base64 conversion writes to a ByteArrayOutputStream. To summarize: If writeObject(Object) writes to the ObjectOutputStream, the data goes to the DeflaterOutputStream, then to the Base64 encoder, and finally to the ByteArrayOutputStream. The mapping succeeds, or there is an exception, which is caught and terminates the processing as IllegalStateException, a runtime exception. If the ByteArrayOutputStream contains the data, toString(…​) returns the result. The resulting strings consist of pure ASCII characters, so US_ASCII encoding can be used.

The reverse step turns a string into an object.

com/tutego/exercise/io/ObjectBase64.java
public static Object deserializeObjectFromBase64( String string ) {
  final byte[] bytes = string.getBytes( StandardCharsets.US_ASCII );

  try ( ByteArrayInputStream bis = new ByteArrayInputStream( bytes );
        InputStream b64is        = Base64.getDecoder().wrap( bis );
        InputStream iis          = new InflaterInputStream( b64is );
        ObjectInputStream ois    = new ObjectInputStream( iis ) ) {
    return ois.readObject();
  }
  catch ( IOException | ClassNotFoundException e ) {
    throw new IllegalStateException( e );
  }
}

We need to generate an InputStream from the String so that the ObjectInputStream class can be used. This is a bit of a problem, because Java does not provide a natural way to use the String as a source for an InputStream. Open source libraries such as Google Guava or Apache Commons have solutions here, for example in the form of the Apache class CharSequenceInputStream. Java inherently offers only the other direction, for example with an InputStreamReader, which adapts an InputStream into a Reader, not lets a Reader be represented as an InputStream. Therefore, a StringReader, which is usually used when a string must appear as a Reader, does not help us either.

The chosen solution converts the string into a byte array. This is not really satisfactory, since the input is run twice, once by the conversion and another time by reading from the stream. On the other hand, this should have little practical relevance, and the bit of extra memory is not really a burden for our use case.

After converting to a byte[], ByteArrayInputStream creates the desired InputStream. The input stream consisting of ASCII characters becomes a byte stream via the decoder of Base64. The InflaterInputStream unpacks the data, and finally the ObjectInputStream reconstructs the object via readObject(). The serialized stream contains an identifier which data type is to be reconstructed. This data type could in principle not exist on this virtual machine, which is why a ClassNotFoundException is thrown in this case. Just like a possible IOException we catch these checked exceptions and create an unchecked exception.

1.4.10. Quiz: Requirement for serialization

An instance of Inputs cannot be serialized because Inputs and Input do not implement the Serializable interface. Correct would be:

class Inputs implements Serializable {
  public static class Input implements Serializable {
    String input;
  }
  public List<Input> inputs = new ArrayList<>();
}

The mechanism for serialization traverses an object graph recursively, and all elements must be serializable. In our example:

  1. Serialize Inputs. Is the class Serializable? Yes, then serialize the ArrayList inputs.

  2. Serialize ArrayList. Is the class Serializable? Yes, then serialize internally the array of inputs entries.

  3. Serialize Input. Is the class Serializable? Yes, then serialize the object variable String input.

  4. Serialize String. Is the class Serializable? Yes, then serialize the strings.

Primitive data types are automatically serializable, and visibility does not matter. Static variables are not serialized, even those that are transient. Many core Java types are inherently Serializable, such as String. Enumerations and arrays are also serializable.

1.4.11. Save last inputs

The task consists of a series of statements, but one question needs to be asked and answered before presenting the proposed solution: What happens when an attribute data type changes and an object is to be deserialized on the data stream? The answer is:

java.io.InvalidClassException: com.tutego.exercise.net.Inputs$Input; local class incompatible: stream classdesc serialVersionUID = -8691588030053894297, local class serialVersionUID = 6463495757449665144

There is an exception that reports an incompatible serialVersionUID. The background is the following: Each class has an identifier, the serial version UID. This UID (Unique Identifier) is either statically fixed in the class, or it is dynamically calculated. Since an own UID is not available in the class from the task, the serializer calculates the UID similar to a hash code, only not from the allocations, but from the types. This happens on read and write; if an object is serialized, this UID is also written to the data stream. When reading, the deserializer checks whether the UID in the data stream matches the UID of the class. If there were structural changes, for example the change of a data type, the dynamic UID changes. This is exactly what the exception indicates. The two values represent the UID from the data stream and the calculated UID of the changed class.

If structural changes are not supposed to lead to an exception, a UID must be set manually in the code. This brings us to the proposed solution.

com/tutego/exercise/io/InputHistory.java
class Inputs implements Serializable {
  private static final long serialVersionUID = 1;

  public static class Input implements Serializable {
    private static final long serialVersionUID = 1;
    CharSequence input;
    LocalDateTime localDateTime = LocalDateTime.now();
  }

  public List<Input> inputs = new ArrayList<>();
}

Inputs and also the nested class Input both contain the private static serialVersionUID; the initialized value does not matter. The serialver tool included with the JDK generates exactly the same UID that is written to the data stream even if serialVersionUID is missing. If there is a serialVersionUID and in the data stream the UID matches that of the class, deserialization is more relaxed: Unknown attributes in the data stream are ignored and attributes where the data type has changed are also skipped.

CharSequence is an interface, and interfaces do not usually extend Serializable. Since type checking occurs at runtime and String implements the Serializable interface, there is no error. Nevertheless, String and CharSequence result in different UIDs.

The main program is embedded in a class InputHistory. The constructor of the class reads a file and deserializes the input. Another method, addAndSave(…​), updates Inputs and serializes the result to a file. The main(…​) method ties everything together.

com/tutego/exercise/io/InputHistory.java
public class InputHistory {

  private final static Path FILENAME =
      Paths.get( System.getProperty( "java.io.tmpdir" ),
                 "String2Uppercase.ser" );

  private Inputs inputs;

  InputHistory() {
    try ( InputStream       is  = Files.newInputStream( FILENAME );
          ObjectInputStream ois = new ObjectInputStream( is ) ) {
      inputs = (Inputs) ois.readObject();
      inputs.inputs.forEach( input -> System.out.println( input.input ) );
    }
    catch ( IOException | ClassNotFoundException e ) {
      inputs = new Inputs();
      e.printStackTrace();
    }
  }

  void addAndSave( String string ) {
    Inputs.Input newInput = new Inputs.Input();
    newInput.input = string;
    inputs.inputs.add( newInput );

    try ( OutputStream       os  = Files.newOutputStream( FILENAME );
          ObjectOutputStream oos = new ObjectOutputStream( os ) ) {
      oos.writeObject( inputs );
    }
    catch ( IOException e ) {
      e.printStackTrace();
    }
  }

  public static void main( String[] args ) {
    InputHistory inputHistory = new InputHistory();
    for ( String line;
          (line = new Scanner( System.in ).nextLine()) != null; ) {
      inputHistory.addAndSave( line );
      System.out.println( line.toUpperCase() );
    }
  }
}

The class has two object variables: the file name and a reference inputs to the inputs. The constructor opens an InputStream for the file and initializes the ObjectInputStream with it. readObject(…​) starts deserialization, and if there is an exception, it is caught and a new Inputs object is built. If Inputs could be reconstructed, all strings are output via the forEach(…​) method of the list.

The addAndSave(String) method creates a new Inputs object, sets the passed string into this object and appends the new Inputs object to the inputs list. Then the list is serialized via the ObjectOutputStream. Errors should not occur unless there are file system problems.

The main(…​) method creates an InputHistory object, which activates the constructor that deserializes the file. At the very first start this file does not exist, an exception is thrown, but the program is not aborted. In the following the file is created and grows by console input and saving. At the next program start the deserialization should work and the last entered strings should appear on the screen.


1. The string is a simplification of https://www.pouet.net/topic.php?which=8056&page=1.