1. Input/Output Streams

In the previous chapter, as well as the chapter about exceptions, we have already dealt with the basic reading and writing of file contents. This chapter focuses on input and output streams and the continuous flow of data; it can be written to a destination or read from a source, passing through multiple filters. The nesting of Java’s input/output streams is a good example of abstraction and flexibility that also helps a lot when modeling your filters.

Requirements

  • know type hierarchy of input/output classes

  • be able to distinguish between character-oriented and byte-oriented classes

  • understand decoration of streams

  • be able to send stream data through filters

  • be able to compress data

Data types used in this chapter:

1.1. Direct data streams

In Java, four different types are used: InputStream and OutputStream (byte-oriented reading and writing) and Reader and Writer (character-oriented reading and writing). We start with the exercises first with just those streams that directly write data to a resource or directly read from a resource.

1.1.1. Get the number of different places (read files) ⭐

Captain CiaoCiao gets two text files, and they look very much the same at first glance. But he wants to know exactly if the two files match exactly or if there are differences.

Exercise:

  • Write a method long distance(Path file1, Path file2) that returns the number of different characters. In computer science, this is called Hamming distance.

  • It is assumed that the two files are the same length.

Example: A file contains the string

To Err is Human. To Arr is Pirate.

and another file contains the string

To Arr is Human. To Err is Pirate!

The distance is 3 because the 3 symbols do not match.

1.1.2. Convert Python program to Java (write file) ⭐

In the chapter "Imperative Language Concepts", we completed several exercises that write SVG output to the screen — now we want to write that output directly to HTML files. As a reminder, the following HTML contains an SVG with a rectangle of height and width 1 and x/y coordinates 10/10:

<!DOCTYPE html>
<html><body>
 <svg width="256" height="256">
  <rect x="10" y="10" width="1" height="1" style="fill:rgb(0,29,0);" />
 </svg>
</body></html>

In a book about computer-generated art, Captain CiaoCiao finds an illustration on the first few pages. The pattern is generated by a Python program:

import Image, ImageDraw

image = Image.new("RGB", (256, 256))
drawingTool = ImageDraw.Draw(image)

for x in range(256):
    for y in range(256):
        drawingTool.point((x, y), (0, x^y, 0))

del drawingTool
image.save("xorpic.png", "PNG")

The Python function point(…​) gets the x-y coordinate and RGB color information, where the three arguments 0, x^y, 0 represent the red, green, blue components.

Exercise:

  • Since Captain CiaoCiao dislikes snakes, the Python program must be converted to a Java program.

  • Instead of a PNG file, end up with an HTML file with an SVG block where each pixel is a 1 × 1 SVG rectangle.

Bonus: At the end, open the HTML file with the browser — the desktop class will help you here.

1.1.3. Generate target code (write file) ⭐

From the post office, Captain CiaoCiao is getting more and more letters with pink barcodes. At first, he thinks of encoded love letters from Bonny Brain, but then he realizes that there is a so-called destination code on the envelope, which encodes the postal code.

Generate target code

The encoding of the numbers in dashes is as follows, where the underscore _ symbolizes the spacing by a space:

Table 1. Values and encodings
valueencoding

0

| | | |

1

| | | _

2

| | _ |

3

| | _ _

4

| _ | |

5

| _ | _

6

| _ _ |

7

_ | _ |

8

_ | | |

9

_ | | _

Exercise:

  • Write a static method writeTargetCode(String, Writer) that writes a string of digits in the named encoding to a writer.

  • There should be two spaces between the four symbols for a digit.

Example:

  • The string "023" is written to the file as ||||  || |  ||  .

Obtain a Writer from Files to be able to write in the files.

1.1.4. Convert file contents to lowercase (read and write file) ⭐

Text conversions from one format to another are common operations.

Exercise:

  • Open a text file, read each character, convert it to lowercase, and write it to a new file. Write a method that does this and call it convertFileToLowercase(Path inPath, Path outPath).

1.1.5. Convert PPM graphics to ASCII grayscale ⭐⭐⭐

Generating pixel graphics is always a bit more complex because of the different formats. However, there is PPM (Portable Pixel Map), a simple ASCII-based file format. The specification (http://netpbm.sourceforge.net/doc/ppm.html) is rather simple, and a Java program can easily generate PPM images. A disadvantage under Windows, however, is that third-party programs are required for display, such as the free software GIMP (https://www.gimp.org/).

The following example shows the basic structure of a PPM file:

P3
3 2
255
255   0   0
  0 255   0
  0   0 255
255 255   0
255 255 255
  0   0   0

There are various tokens separated by white space. We define the following rules:

  • The first token is the identifier P3.

  • It is followed by the width and height of the image.

  • The maximum color value follows, we always assume 255.

  • Red, green, blue values follow for all pixels from the top left to bottom right.

  • Height and width and the color values are always positive.

Exercise:

  • Parse a PPM file and retrieve all color values.

  • Convert each color value into its corresponding grayscale value.

  • Transform every pixel in the image into an ASCII symbol by assigning each grayscale value between 0 and 255 to a corresponding ASCII character.

  • Allow in the program the parameterization of the conversion of the RGB values to the grayscale value, so that the algorithm is interchangeable.

  • Allow the parameterization of the conversion from the grayscale value to the ASCII character.

For conversion to grayscale value, the following interface and constant can be used:

[[source,java]

public interface RgbToGray {
  RgbToGray DEFAULT = (r, g, b) -> (r + g + b) / 3;
  int toGray( int r, int g, int b );
}

Java provides a mapping from (int, int) to an int with the IntBinaryOperator, but there is no functional interface that has three parameters.

Although the average method for converting a color image to grayscale is efficient, it may not accurately reflect how humans perceive color. To create a more realistic grayscale image, it is necessary to take into account that people perceive colors differently. One popular method for doing this is the luminosity method, which takes into account the relative contributions of each color channel to perceived brightness. Specifically, the luminosity method assigns weights to each color channel (red, green, and blue) based on their perceived contribution to luminance, with red given a weight of 0.21, green given a weight of 0.72, and blue given a weight of 0.07. By combining these weighted color channel values, the luminosity method produces a grayscale image that more closely matches how people perceive the original color image.

The interface IntUnaryOperator can be used very well for mapping a grayscale value (int) to an ASCII character (char, expanded to int). A default converter may look like this:

public enum GrayToAscii implements IntUnaryOperator {
  DEFAULT;

  private final char[] ASCII_FOR_SHADE_OF_GRAY =
    // black = 0, white = 255
    "@MBENRWDFQASUbehGmLOYkqgnsozCuJcry1v7lit{}?j|()=~!-/<>\"^_';,:`. ".toCharArray();
  private final int CHARS_PER_RGB = 256 / ASCII_FOR_SHADE_OF_GRAY.length;
  @Override public int applyAsInt( int gray ) {
    return ASCII_FOR_SHADE_OF_GRAY[ gray / CHARS_PER_RGB ];
  }
}

The given string [1] is 64 characters long. Basically, this means black becomes @, and white becomes a space.

Example:

  • The result for upper PPM is:

    kkk
    ? @

1.1.6. Split files (read and write files) ⭐⭐

On Anaa Atoll, the port software has been running on a Commodore PC-30 for about 40 years. Bonny Brain has manipulated the computer successfully, but now the software needs an update, which must be installed via floppy disks. 3.5-inch HD floppy disks can store 1,474,560 bytes (1440 KiB) by default. The software update doesn’t fit on a floppy disk, so software is needed to break up a large file into several small files in a "disk-compatible" manner.

Exercise:

  • Write a program that is passed a file name on the command line and then splits that file into numerous smaller parts.

Example:

  • The call looks like this:

    $ java com.tutego.exercise.io.FileSplitter Hanjaab.bin

    If the file Hanjaab.bin is 2440 KiB in size, then the Java program will turn it into the files Hanjaab.bin.1 and Hanjaab.bin.2 with sizes 1440 KiB and 1000 KiB.

1.2. Nesting streams

Streams can be nested like Russian dolls; one stream is the actual resource in the core, and other streams are wrapped around it like a hull. Operations that go through the wrappers eventually go into the core.

1.2.1. Compress number sequences with the GZIPOutputStream ⭐

java.util.zip.GZIPOutputStream is a special output stream that compresses data without loss.

Exercise:

  • Create a compressed file with numbers from 0 to < N written to a GZIPOutputStream using writeLong(…​).

  • Compare the file sizes for different N.

  • At which N is compression worthwhile?

1.2.2. Spreading outputs with a class TeeOutputStream ⭐⭐

Java only provides classes that write one output stream to exactly one other output stream.

Exercise:

  • Write a new class TeeOutputStream that can write to two output streams simultaneously.

Example:

  • Outputs are to be written to two ByteArrayOutputStream:

    ByteArrayOutputStream baos1 = new ByteArrayOutputStream();
    ByteArrayOutputStream baos2 = new ByteArrayOutputStream();
    
    try ( OutputStream tos = new TeeOutputStream( baos1, baos2 );
          Writer ost = new OutputStreamWriter( tos, StandardCharsets.UTF_8 );
          PrintWriter pw = new PrintWriter( ost ) ) {
      pw.print( "Hey" );
      pw.write( '\n' );
      pw.printf( "%d times %d equals %d", 2, 3, 4 );
    }
    
    System.out.println( baos1.toString( StandardCharsets.UTF_8 ) );
    System.out.println( baos2.toString( StandardCharsets.UTF_8 ) );

Note: Consider that if there are exceptions on one channel, the operations are still written to the other channel.

1.2.3. Connect PrintWriter and other decorators ⭐

In this exercise, an entire chain of streams is to be brought together, the → specifies who writes where:

PrintWriterOutputStreamWriterBufferedOutputStreamCheckedOutputStream → file OutputStream.

For chaining of this type, always start with the actual resource.

Exercise:

  • Create a file with a OutputStream coming from Files.new*(...).

  • Use a CheckedOutputStream to be able to calculate a checksum with an Adler32 checksum implementation.

  • Set a java.io.BufferedOutputStream around the CheckedOutputStream for buffering.

  • To switch from the bytes of an OutputStream to the Unicode world we want to use an OutputStreamWriter. Configure it to UTF-8.

  • Then create a java.io.PrintWriter and write some lines of text into it.

Be sure to close the resources with try-with-resources.

1.2.4. Counting character frequencies with a FilterReader ⭐⭐

The language can often be inferred from the frequency of the characters. We are looking for a Java filter that can be inserted as a Writer between other Writer and internally counts the number of different characters.

Exercise:

  • Name the class CharacterFrequencyReader, derive it from java.io.FilterReader.

  • The method toString() should output a table of character frequencies.

Example: Given the string in a resource.

3.14% of sailors are Pi Rates.

If all characters flow through the CharacterFrequencyReader, and the program later calls toString(), a string with the following result should appear on the screen:

(0A) | 1 | 3,23
     | 5 | 16,13
   % | 1 | 3,23
   . | 2 | 6,45
   1 | 1 | 3,23
   3 | 1 | 3,23
   4 | 1 | 3,23
   P | 1 | 3,23
   R | 1 | 3,23
   a | 3 | 9,68
   e | 2 | 6,45
   f | 1 | 3,23
   i | 2 | 6,45
   l | 1 | 3,23
   o | 2 | 6,45
   r | 2 | 6,45
   s | 3 | 9,68
   t | 1 | 3,23

Can this be used to infer the language of the text?

1.3. Serialization

Java uses serialization to allow object states to be written to a data stream, and then later to recreate the object from a data stream; this process is called deserialization.

To convert Java objects to a binary stream and vice versa, the classes ObjectOutputStream and ObjectInputStream are used; all object types to be serialized must be Serializable. We will use the types in the next exercises and see practical examples of serialization. (Serialization(ObjectOutputStream

Both classes are typical decorators: When serializing, the ObjectOutputStream determines the data and writes the serialized byte sequences to the OutputStream specified in the constructor — when reading, it is the other way around, here the ObjectInputStream reads from a passed InputStream.

1.3.1. (De)serialize data for chat and convert it to text ⭐⭐

A chat program should be used to transmit Java objects. However, the chat program can only transmit ASCII characters. Therefore, the objects must not only be (de)serialized but also converted to or from text format.

Exercise:

  • Write a method String objectToBase64(Object) that serializes an object, then compresses it with a DeflaterOutputStream and returns it Base64 encoded.

  • Write a method deserializeObjectFromBase64(String) that will wrap a Base64 encoded string into a byte stream, unpack it with the InflaterInputStream, and use it as a source for deserialization.

To convert binary data into a string and vice versa, the Base64.Encoder and Base64.Decoder and especially the wrap(…​) method can help.

1.3.2. Quiz: Requirement for serialization ⭐

If we form an object from the following class Inputs, can we serialize it using the ObjectOutputStream? Or what preconditions might not be met?

class Inputs {
  public static class Input {
    String input;
  }
  public List<Input> inputs = new ArrayList<>();
}

1.3.3. Save last inputs ⭐⭐

Bonny Brain regularly uses the STRING2UPPERCASE application, which at its core looks like this:

for ( String line; (line = new Scanner( System.in ).nextLine()) != null; )
  System.out.println( line.toUpperCase() );

But now every user input should be stored in the file system so that at startup the application displays the input made.

Exercise:

  • Set the following container for all input in the project:

    class Inputs implements Serializable {
      public static class Input implements Serializable {
        String input;
      }
      public List<Input> inputs = new ArrayList<>();
    }
  • Whenever a user input is made, it shall be included in an Inputs object.

  • After each input, Inputs shall be serialized to a file.

  • When the application restarts, all serialized values shall be displayed on the screen at the beginning. Exceptions due to non-existent files or wrong file formats can be logged, but shall be ignored.

  • In Input change the data type String of the instance variable input to the data type CharSequence. Restart the program. What happens during the deserialization of inputs? Are there any problems?

  • Set in Inputs and Input the line

    private static final long serialVersionUID = 1;
  • Restart the program and serialize new data.

  • In Input add the line

    LocalDateTime localDateTime = LocalDateTime.now();

    for an additional instance variable. Restart the program: what happens or doesn’t happen?


1. The string is a simplification of https://www.pouet.net/topic.php?which=8056&page=1.