1. Process XML, JSON and other Data Formats

Two important data formats for exchanging documents are XML and JSON. XML is historically the older data type, JSON we often find nowadays in communication between a server and a JavaScript application. JSON documents are also popular for configuration files.

While Java SE provides different classes for reading and writing XML documents, JSON support is only available in Java Enterprise Edition or through complementary open source libraries. Many of the tasks in this chapter therefore resort to external libraries.

Description languages form an important category of document formats. They define the structure of the data. Among the most important formats are HTML, XML, JSON and PDF.

Java does not provide support for other data formats except for property files and the ability to process ZIP archives. This is especially true for CSV files, PDFs or Office documents. Fortunately, dozens of open source libraries fill this gap, so you don’t have to program this functionality yourself.

Prerequisites

  • know how to add Maven dependencies

  • know StAX

  • be able to write XML documents

  • be able to create JAXB beans from XML schema files

  • be able to use object XML mapping with JAXB

  • be familiar with JSON library Jackson

  • be able to read ZIP archives

Data types used in this chapter:

1.1. XML processing with Java

There are different Java APIs for handling XML documents. One way is to hold complete XML objects in memory, the other solution is similar to data streams. StAX is a pull API that allows elements to be actively pulled from the data stream and also written. The processing model is optimal for large documents that do not need to be completely in memory.

JAXB provides an easy way to convert Java objects to XML and XML back to Java objects later. Using annotations or external configuration files, the mapping can be precisely controlled.

1.1.1. Write XML file with recipe ⭐

Captain CiaoCiao has so many recipes that he needs a database. He has several quotes for database management systems, and wants to see if they can import all his recipes.

His own recipes are in RecipeML format, an XML format that is loosely specified: http://www.formatdata.com/recipeml/. There is a large database at https://dsquirrel.tripod.com/recipeml/indexrecipes2.html. An example from "Key Gourmet":

<?xml version="1.0" encoding="UTF-8"?>
<recipeml version="0.5">
  <recipe>
    <head>
      <title>11 Minute Strawberry Jam</title>
      <categories>
        <cat>Canning</cat>
        <cat>Preserves</cat>
        <cat>Jams &amp; jell</cat>
      </categories>
      <yield>8</yield>
    </head>
    <ingredients>
      <ing>
        <amt>
          <qty>3</qty>
          <unit>cups</unit>
        </amt>
        <item>Strawberries</item>
      </ing>
      <ing>
        <amt>
          <qty>3</qty>
          <unit>cups</unit>
        </amt>
        <item>Sugar</item>
      </ing>
    </ingredients>
    <directions>
      <step>Put the strawberries in a pan.</step>
      <step>Add 1 cup of sugar.</step>
      <step>Bring to a boil and boil for 4 minutes.</step>
      <step>Add the second cup of sugar and boil again for 4 minutes.</step>
      <step>Then add the third cup of sugar and boil for 3 minutes.</step>
      <step>Remove from stove, cool, stir occasionally.</step>
      <step>Pour in jars and seal.</step>
    </directions>
  </recipe>
</recipeml>

Task:

  • Write a program that outputs an XML document in RecipeML format.

1.1.2. Check if all images have an alt attribute ⭐

Images in HTML documents should always have an alt attribute.

Task:

1.1.3. Writing Java objects with JAXB ⭐

JAXB simplifies access to XML documents by allowing a convenient mapping from a Java object to an XML document and vice versa.

JAXB was included in the Standard Edition in Java 6 and removed in Java 11. To be prepared for current Java versions, include the following in the POM file:

<dependency>
  <groupId>javax.xml.bind</groupId>
  <artifactId>jaxb-api</artifactId>
  <version>2.3.1</version>
</dependency>
<dependency>
  <groupId>org.glassfish.jaxb</groupId>
  <artifactId>jaxb-runtime</artifactId>
  <version>2.3.3</version>
  <scope>runtime</scope>
</dependency>

Task:

  • Write JAXB beans so that we can generate the following XML:

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <ingredients>
        <ing>
            <amt>
                <qty>3</qty>
                <unit>cups</unit>
            </amt>
            <item>Sugar</item>
        </ing>
        <ing>
            <amt>
                <qty>3</qty>
                <unit>cups</unit>
            </amt>
        </ing>
    </ingredients>
  • Creates the classes Ingredients, Ing, Amt.

  • Give the classes corresponding object variables; it is ok if these are public.

  • Consider which annotation to use.

1.1.4. Read in jokes and laugh heartily ⭐⭐

Bonny Brain is also laughing at simple jokes, which she can never have enough of. She finds the site https://sv443.net/jokeapi/v2/joke/Any?format=xml on the Internet, which always provides her with new jokes.

The format is XML, which is good for transporting data, but we are Java developers and want everything in objects! With JAXB we want to read the XML files and convert them into Java objects, so we can develop custom output later.

The first step is to automatically generate JAXB beans from an XML schema file. The schema for the Joke page is as follows — don’t worry, you don’t have to understand it.

<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="data">
    <xs:complexType>
      <xs:sequence>
        <xs:element type="xs:string" name="category" />
        <xs:element type="xs:string" name="type" />
        <xs:element name="flags">
          <xs:complexType>
            <xs:sequence>
              <xs:element type="xs:boolean" name="nsfw" />
              <xs:element type="xs:boolean" name="religious" />
              <xs:element type="xs:boolean" name="political" />
              <xs:element type="xs:boolean" name="racist" />
              <xs:element type="xs:boolean" name="sexist" />
            </xs:sequence>
          </xs:complexType>
        </xs:element>
        <xs:element type="xs:string" name="setup" />
        <xs:element type="xs:string" name="delivery" />
        <xs:element type="xs:int" name="id" />
        <xs:element type="xs:string" name="error" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

The provider does not provide a schema, so it is generated from the XML using https://www.freeformatter.com/xsd-generator.html.

Task:

  • Load the XML schema definition at http://tutego.de/download/jokes.xsd, and place the file in the Maven directory /src/main/resources.

  • Add the following element to the POM file:

    <build>
    <plugins>
      <plugin>
        <groupId>org.codehaus.mojo</groupId>
        <artifactId>jaxb2-maven-plugin</artifactId>
        <version>2.5.0</version>
        <executions>
          <execution>
            <id>xjc</id>
            <goals>
              <goal>xjc</goal>
            </goals>
          </execution>
        </executions>
        <configuration>
          <packageName>com.tutego.exercise.xml.joke</packageName>
          <sources>
            <source>src/main/resources/jokes.xsd</source>
          </sources>
          <generateEpisode>false</generateEpisode>
          <outputDirectory>${basedir}/src/main/java</outputDirectory>
          <clearOutputDir>false</clearOutputDir>
          <noGeneratedHeaderComments>true</noGeneratedHeaderComments>
          <locale>en</locale>
        </configuration>
      </plugin>
    </plugins>
    </build>

    The plugin section includes org.codehaus.mojo:jaxb2-maven-plugin and configures it; all options are explained at https://www.mojohaus.org/jaxb2-maven-plugin/Documentation/v2.5.0/index.html.

  • From the command line, launch mvn generate-sources. This will generate two classes in the com.tutego.exercise.xml.joke package:

    • Data

    • ObjectFactory

  • Use JAXB to get a joke from the URL https://sv443.net/jokeapi/v2/joke/Any?format=xml and convert it to an object.

1.2. JSON

Java SE does not bring any support for JSON, Jakarta EE does. A popular implementation is Jackson, which can also map Java objects to JSON and reconstruct Java objects from JSON objects.

Jackson is well modularized. There are three core modules — Streaming, Annotations and Databind — and several third-party modules, especially for more specialized data types such as from Guava, javax.money and additions such as for performance optimization, which for example generate bytecode instead of relying on reflection.

The Databind module has a dependency on streaming and annotations, so developers get all the core functionality with it. Include the following dependency in the Maven POM:

<dependency>
  <groupId>com.fasterxml.jackson.core</groupId>
  <artifactId>jackson-databind</artifactId>.
  <version>2.12.3</version>
</dependency>

1.2.1. Hacker News JSON exploit. ⭐

The page Hacker News (https://news.ycombinator.com/) was briefly introduced in chapter "Network Programming".

The URL https://hacker-news.firebaseio.com/v0/item/24857356.json returns a JSON object of the message with ID 24857356. The response looks (formatted and slightly shortened for the kids) like this:

{
   "by":"luu",
   "descendants":257,
   "id":24857356,
   "kids":[
      24858151,
      24857761,
      24858192,
      24858887
   ],
   "score":353,
   "time":1603370419,
   "title":"The physiological effects of slow breathing in the healthy human",
   "type":"story",
   "url":"https://breathe.ersjournals.com/content/13/4/298"
}

Jackson can be used to convert this JSON into a Map:

ObjectMapper mapper = new ObjectMapper();
Map map = mapper.readValue( src, Map.class );

There are src different sources for the data, for example of type String, File, Reader, InputStream, URL …​

Task:

  • Write a new method Map<?, ?> news(long id) that, using Jackson, obtains the JSON document at "https://hacker-news.firebaseio.com/v0/item/" + id + ".json" and converts it to a Map and returns it.

Example:

  • news(24857356).get("title")"The physiological effects of slow breathing in the healthy human"

  • news(111111).get("title")null.

1.2.2. Read and write editor configurations as JSON ⭐⭐

The developers are working on a new editor for Captain CiaoCiao, and the configurations should be saved in a JSON file.

Task:

  • Write a class Settings so that the following configurations can be mapped:

    {
      "editor" : {
        "cursorStyle" : "line",
        "folding" : true,
        "fontFamily" : [ "Consolas, 'Courier New', monospace" ],
        "fontSize" : 22,
        "fontWeight" : "normal"
      },
      "workbench" : {
        "colorTheme" : "Default Dark+"
      },
      "terminal" : {
        "integrated.unicodeVersion" : "11"
      }
    }
  • The JSON file gives a good indication of the data types:

    • cursorStyle is String, folding is boolean, fontFamily is an array or List.

  • If an attribute is not set, which means null, it should not be written.

  • For terminal the contained key values are unknown, they shall be contained in a Map<String, String>.

1.3. HTML

HTML is an important markup language. The Java standard library does not provide support for HTML documents, except for what the javax.swing.JEditorPane can do, which is to render HTML 3.2 and a subset of CSS 1.0.

For Java programs to be able to write and read HTML documents correctly and validly, and to be able to read nodes, we have to turn to (open source) libraries.

1.3.1. Load Wikipedia images with jsoup ⭐⭐

The popular open source library jsoup (https://jsoup.org/) loads the content of web pages and represents the content in a tree in memory.

Include the following dependency in the POM:

<dependency>
  <groupId>org.jsoup</groupId>
  <artifactId>jsoup</artifactId>
  <version>1.13.1</version>
</dependency>

Task:

1.4. Office documents

Microsoft Office continues to be at the top when it comes to word processing and spreadsheets. For many years, the binary file format has been well known, and there are Java libraries for reading and writing. Meanwhile, processing Microsoft Office documents has become much easier since the documents are, at their core, XML documents that are combined into a ZIP archive. Java support is very good.

1.4.1. Generate Word files with screenshots ⭐⭐

Read the wikipedia entry for POI: https://de.wikipedia.org/wiki/Apache_POI.

Task:

  1. Add the following for Maven in the POM to include Apache POI and the necessary dependencies for DOCX:

    <dependency>
      <groupId>org.apache.poi</groupId>
      <artifactId>poi-ooxml</artifactId>
      <version>4.1.2</version>
    </dependency>
  2. Study the source code of SimpleImages.java.

  3. Java allows you to capture screenshots, like this:

    private static byte[] getScreenCapture() throws AWTException, IOException {
      BufferedImage screenCapture = new Robot().createScreenCapture( SCREEN_SIZE );
      ByteArrayOutputStream os = new ByteArrayOutputStream();
      ImageIO.write( screenCapture, "jpeg", os );
      return os.toByteArray();
    }
  4. Write a Java program that takes a screenshot every 5 seconds for 20 seconds and attaches the image to the Word document.

1.5. Archives

Files with metadata are collected in archives. A well-known and popular archive format is ZIP, which not only combines the data, but also compresses it. Many archive formats can also store the files encrypted and store checksums, so that errors in the transfer can be detected later.

Java offers two possibilities for compression: Since Java 7 there is a ZIP file system provider and already since Java 1.0 there are the classes ZipFile and ZipEntry.

1.5.1. Play insect sounds from ZIP archive ⭐⭐

Bonny Brain likes to listen to the sounds of insects and uses the WAV collection of https://catalog.data.gov/dataset/bug-bytes-sound-library-stored-product-insect-pest-sounds, where various audio files are offered for download in a ZIP.

Task:

  • Study the documentation at https://christian-schlichtherle.bitbucket.io/truezip/truezip-path/.

  • Include two dependencies in the Maven POM:

    <dependency>
      <groupId>en.schlichtherle.truezip</groupId>
      <artifactId>truezip-path</artifactId>
      <version>7.7.10</version>
    </dependency>
    
    <dependency>
      <groupId>en.schlichtherle.truezip</groupId>
      <artifactId>truezip-driver-zip</artifactId>
      <version>7.7.10</version>
    </dependency>
  • Download the ZIP with the insect sounds, but do not unpack it.

  • Build a TPath object for the ZIP file.

  • Transfer all filenames from the ZIP file into a list: Files.newDirectoryStream(…​) helps here.

  • Write an infinite loop, and

    • select a random WAV file,

    • open the random file with Files.newInputStream(…​), decorate it with a BufferedInputStream and open an AudioSystem.getAudioInputStream(…​). Play the WAV file and access the following code, where ais the AudioInputStream.

      Clip clip = AudioSystem.getClip();
      clip.open( ais );
      clip.start();
      TimeUnit.MICROSECONDS.sleep( clip.getMicrosecondLength() + 50 );
      clip.close();

      In chapter "Exceptions" we had worked with the javax.sound API before.

1.6. Suggested solutions

1.6.1. Write XML file with recipe

The proposed solution starts with two records for the data:

com/tutego/exercise/xml/RecipeMLwriterDemo.java
record Recipe(
    String head$title,
    List<String> head$categories,
    String head$yield,
    List<Ingredient> ingredients,
    List<String> directions) {

  record Ingredient(
      String ing$amt$qty,
      String ing$amt$unit,
      String ing$item
  ) {}
}

A recipe is represented by the two types Recipe and Ingredient. Ingredient is a nested type, which expresses well the relationship between the two types Recipe and Ingredient. In principle, one could declare a separate record (or class) for each subelement, but this would be too much for the proposed solution. Therefore, the variable names with the dollar express the type hierarchy.

Before we start with our own program, one observation: many elements are written, which entails many statements of the following type:

writer.writeStartElement( ... );
...
writer.writeEndElement();

For this kind of problem, the execute-around pattern is useful. A thought experiment:

write.element( "my-tag", () -> {
  ...
} );

We can pass the tag, a block representing the body, and at the end we want to write the end tag. Since the Java library does not provide such a feature, the proposed solution introduces a separate helper class HierarchicalXmlWriter, a facade around the XMLStreamWriter:

com/tutego/exercise/xml/RecipeMLwriterDemo.java
class HierarchicalXmlWriter implements AutoCloseable {

  private final OutputStream outputStream;

  interface XMLStreamWriterBlock {
    void write() throws XMLStreamException;
  }

  private final XMLStreamWriter writer;

  HierarchicalXmlWriter( OutputStream outputStream ) throws XMLStreamException {
    this.outputStream = outputStream;
    XMLOutputFactory outputFactory = XMLOutputFactory.newFactory();
    this.writer = outputFactory.createXMLStreamWriter(
        outputStream, StandardCharsets.UTF_8.name() );
    writer.writeStartDocument( "utf-8", "1.0" );
  }

  @Override public void close() throws XMLStreamException, IOException {
    try {
      writer.writeEndDocument();
      writer.close();
    }
    finally {
      outputStream.close();
    }
  }

  void element( String tag, XMLStreamWriterBlock block ) throws XMLStreamException {
    writer.writeStartElement( tag );
    block.write();
    writer.writeEndElement();
  }

  void string( String tag, String text ) throws XMLStreamException {
    element( tag, () -> writer.writeCharacters( text ) );
  }
}

The constructor takes the sink where the XML document will be written. The passed OutputStream is stored in an object variable so that it can be closed later as a resource. Furthermore, the XMLStreamWriter is requested and saved via XMLOutputFactory so that the XMLStreamWriter can also be closed in close(). Finally, the constructor writes the XML prolog.

The XMLStreamWriterBlock is AutoCloseable, so that use as a resource in try-with-resources is possible. The close() method sets the end tag from the XML document, closes the XMLStreamWriter and the OutputStream. Important: The XMLStreamWriter does not independently pass the close() to the underlying resource, as is usually the case with input/output decorators. The OutputStream should also be closed if an exception occurs when calling the two XMLStreamWriter methods.

The first helper method element(String tag, XMLStreamWriterBlock block) set the start tag, execute the block and write the end tag. The second helper method string(String tag, String string) writes start tag, the text inside and the end tag.

The main class RecipeMLwriterDemo can access HierarchicalXmlWriter and now build the XML blocks as desired:

com/tutego/exercise/xml/RecipeMLwriterDemo.java
var ingredient1 = new Recipe.Ingredient( "30", "cups", "fat" );
var ingredient2 = new Recipe.Ingredient( "1", "kg", "sugar" );
var recipe = new Recipe( "Fat Jam", List.of( "Canning", "Preserves" ), "8",
                         List.of( ingredient1, ingredient2 ),
                         List.of( "Start", "End" ) );
try ( var write = new HierarchicalXmlWriter( System.out ) ) {
  write.element( "recipe", () -> {
    write.element( "head", () -> {
      write.string( "title", recipe.head$title() );
      write.element( "categories", () -> {
        for ( String cat : recipe.head$categories() )
          write.string( "cat", cat );
      } );
      write.string( "yield", recipe.head$yield() );
    } );
    write.element( "ingredients", () -> {
      for ( Recipe.Ingredient ingredient : recipe.ingredients() ) {
        write.element( "ing", () -> {
          write.element( "ing", () -> {
            write.string( "qty", ingredient.ing$amt$qty() );
            write.string( "unit", ingredient.ing$amt$unit() );
          } );
          write.string( "item", ingredient.ing$item() );
        } );
      }
    } );
    write.element( "directions", () -> {
      for ( String step : recipe.directions() )
        write.string( "step", step );
    } );
  } );
}
catch ( XMLStreamException | IOException e ) {
  e.printStackTrace();
}

For those interested in RecipeML’s XML schema: https://github.com/tranchis/xsd2thrift/blob/master/contrib/recipeml.xsd. However, the format has gone quiet.

Java 8 Backport

The static List.of(…​) method can be rewritten by Arrays.asList(…​).

1.6.2. Check if all images have an alt attribute

com/tutego/exercise/xml/XhtmlHasImgTagWithAltAttribute.java
static void reportMissingAltElements( Path path ) {
  try ( InputStream is = Files.newInputStream( path ) ) {
    XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
    xmlInputFactory.setProperty( XMLConstants.ACCESS_EXTERNAL_DTD, "" );
    xmlInputFactory.setProperty( XMLConstants.ACCESS_EXTERNAL_SCHEMA, "" );
    XMLStreamReader parser = xmlInputFactory.createXMLStreamReader( is );
    while ( parser.hasNext() ) {
      parser.next();
      boolean isStartElement =
          parser.getEventType() == XMLStreamConstants.START_ELEMENT;
      if ( isStartElement ) {
        boolean isImgTag = "img".equalsIgnoreCase( parser.getLocalName() );
        if ( isImgTag && !containsAltAttribute( parser ) )
          System.err.printf( "img does not contain alt attribute:%n%s%n",
                             parser.getLocation() );
      }
    }
  }
  catch ( IOException | XMLStreamException e ) {
    throw new RuntimeException( e );
  }
}

private static boolean containsAltAttribute( XMLStreamReader parser ) {
  return IntStream.range( 0, parser.getAttributeCount() )
      .mapToObj( parser::getAttributeLocalName )
      .anyMatch( "alt"::equalsIgnoreCase );
}

The Path object passed to the createXMLStreamReader(…​) method is the basis for an InputStream, which we pass to createXMLStreamReader(…​) to get an XMLStreamReader with this input stream. Unfortunately, to date (as of Java 17), an XMLStreamReader is not AutoCloseable, so it cannot be closed in try-with-resources. However, this is not dramatic when reading, we close the InputStream of the file very well via a try-with-resources.

Passing the data through an XMLStreamReader always looks the same: hasNext() tells whether there are still tokens in the data stream, and if so, fetches the next token with next(). This is similar to Scanner and Iterator. The call to next() changes the state of the XMLStreamReader element, and getEventType() returns an integer to identify the incoming data. This can be e.g. the start of the document, a processing instruction, a comment, text or even a start element. Instead of integers we use constants, interface XMLStreamReader extends XMLStreamConstants. When an element starts, it could be an img element. So getLocalName() asks the parser for the element name and compares it to img — case-insensitive. If this is true, we have found an img tag. Now the question is whether the alt attribute is also set. This is answered by our own method containsAltAttribute(…​). If the img tag has no alt attribute, there is a message on the standard error channel and via getLocation() the exact location can also be identified and specified in the error message.

containsAltAttribute(…​) gets the XMLStreamReader as parameter and runs all attributes from 0 to getAttributeCount(). If an attribute alt exists, regardless of the assignment, the method returns true, otherwise false.

Java 8 backport

The solution uses the static of(…​) methods to build lists. An alternative for Java 8 is: Arrays.asList(…​).

1.6.3. Writing Java objects with JAXB

com/tutego/exercise/xml/JaxbRecipeML.java
@XmlRootElement
class Ingredients {
  public Ing[] ing;
}

class Ing {
  public Amt amt;
  public String item;
}

class Amt {
  public int qty;
  public String unit;
}
com/tutego/exercise/xml/JaxbRecipeML.java
Ing ing1 = new Ing();
Amt amt1 = new Amt();
amt1.qty = 3;
amt1.unit = "cups";
ing1.amt = amt1;
ing1.item = "Strawberries";

Ing ing2 = new Ing();
Amt amt2 = new Amt();
amt2.qty = 3;
amt2.unit = "cups";
ing2.amt = amt2;
ing2.item = "Sugar";

Ingredients ingredients = new Ingredients();
ingredients.ing = new Ing[]{ ing1, ing2 };

JAXB.marshal( ingredients, System.out );

Working with JAXB is easy:

  1. You write classes with a parameterless constructor and use either setters/getters or public object variables for the data,

  2. builds an object graph and writes it with JAXB.marshal(ingredients, System.out) to an output stream, for example to the console.

For compatibility reasons, the proposed solution sets the @XmlRootElement annotation to the root element Ingredients. This is no longer necessary for current JAXB implementations, but is used for compatibility reasons so that the solution also works under Java 8, which contains a slightly older JAXB version (JAXB RI 2.2.8), currently 2.4.0.

1.6.4. Read in jokes and laugh heartily

JAXB focuses on JavaBeans that use annotations to tell the JAXB framework how to map objects to XML or how to map XML to objects. We can write and annotate these JAXB beans by hand, or we can have them generated from a schema. This variant was asked for in the task, and the generated class Data starts like this:

source,java]

@XmlAccessorType(XmlAccessType.FIELD)
@XmlType(name = "", propOrder = {
    "category", "type", "flags", "setup", "delivery", "id", "error"
})
@XmlRootElement(name = "data")
public class Data {

    @XmlElement(required = true)
    protected String category;
    @XmlElement(required = true)
    protected String type;
    ...
}

To the client:

com/tutego/exercise/xml/JaxbJokeReceiver.java
try {
  URL url = new URL( "https://sv443.net/jokeapi/v2/joke/Any?format=xml" );
  Data data = JAXB.unmarshal( url, Data.class );
  System.out.println( data.getSetup() );
  System.out.println( data.getDelivery() );
  System.out.printf( "Not Safe for Work? %s%n", data.getFlags().isNsfw() );
  System.out.printf( "Religious? %s%n", data.getFlags().isReligious() );
  System.out.printf( "Political? %s%n", data.getFlags().isPolitical() );
  System.out.printf( "Racist? %s%n", data.getFlags().isRacist() );
  System.out.printf( "Sexist? %s%n", data.getFlags().isSexist() );
}
catch ( MalformedURLException e ) {
  System.err.println( "malformed URL has occurred" );
  e.printStackTrace();
}
catch ( DataBindingException e ) {
  System.err.println( "failure in a JAXB operation" );
  e.printStackTrace();
}

JAXB.unmarshal(…​) allows constructing Java objects from an XML stream from various data sources, including a URL. So if we build a URL object and put it on the endpoint of the joke, unmarshal(…​) directly returns a Data object. The Data object then provides different getters, and the data can be read. Two exceptions can occur: The format of the URL could be invalid, which gives us a MalformedURLException, or the XML format cannot be mapped to the JavaBean, in which case the result is a DataBindingException.

1.6.5. Hacker News JSON exploit.

The Jackson type ObjectMapper provides readValue(…​) which returns the JSON document in a Map of nested key-value pairs. Two suggested solutions:

com/tutego/exercise/json/HackerNewsJackson.java
public static Map<?, ?> news( long id ) {
  try {
    String url = "https://hacker-news.firebaseio.com/v0/item/"+id+".json";
    return new ObjectMapper().readValue( new URL( url ), Map.class );
  }
  catch ( IOException e ) {
    return Collections.emptyMap();
  }
}

The readValue(…​) method of the ObjectMapper object can take a URL in the first parameter and thus obtain the JSON document directly from the network and convert it into a Map. This means that only one line is needed. The program catches exceptions and returns an empty Map in that case.

A look at the Jackson implementation shows that url.openStream() is used, but under Java 11 the HTTP client API is more powerful in terms of configuring authenticator, thread pool, proxy, SSL context and more.

com/tutego/exercise/json/HackerNewsJackson.java
public static Map<?, ?> news( long id ) {
  HttpClient client = HttpClient.newHttpClient();
  ObjectMapper mapper = new ObjectMapper();

  String url = "https://hacker-news.firebaseio.com/v0/item/" + id + ".json";
  HttpRequest request = HttpRequest
      .newBuilder( URI.create( url ) )
      .timeout( Duration.ofSeconds( 5 ) )
      .build();

  try {
    InputStream body =
        client.send( request, HttpResponse.BodyHandlers.ofInputStream() ).body();
    return mapper.readValue( body, Map.class );
  }
  catch ( IOException | InterruptedException e ) {
    return Collections.emptyMap();
  }
}

For HttpClient we fall back to a default configuration with HttpClient.newHttpClient(). The program also creates the ObjectMapper without any extras. The HttpRequest is directed to the URL, then a timeout is set again. The result should be in a format that readValue(…​) can accept; here InputStream, byte array or String are suitable for example. The InputStream is advantageous, because in the best case this needs the least memory when reading.

1.6.6. Read and write editor configurations as JSON

The JSON hierarchy is automatically derived from the fact that Settings references an Editor and a Workbench.

com/tutego/exercise/json/Settings.java
import java.util.*;

public class Settings {

  enum FontWeight {
    normal, bold
  }

  public static class Editor {
    public String cursorStyle = "line";
    public boolean folding = true;
    public List<String> fontFamily =
        Arrays.asList( "Consolas, 'Courier New', monospace" );
    public int fontSize = 14;
    public FontWeight fontWeight = FontWeight.normal;
  }

  public static class Workbench {
    public String colorTheme = "Default Dark+";
    public String iconTheme;
  }

  public Editor editor = new Editor();
  public Workbench workbench = new Workbench();
  public Map<String, String> terminal = new HashMap<>();
}

Jackson directly accesses the object variables, and takes the lowercase identifiers for the JSON object. Jackson will directly map lists to JSON arrays. Enumerations are also directly written and can be read back in. With Map<String, String> terminal arbitrary key-value pairs of strings can be used, which are not bound to special object variables, but come into the associative memory. Also, a special annotation for the root of the document (@XmlRootElement in JAXB) is not necessary. Jackson has much lower requirements on types than JAXB.

com/tutego/exercise/json/EditorPreferences.java
public class EditorPreferences {

  private static final Path FILENAME = Paths.get(
      /*System.getProperty( "user.home" ),*/ ".editor-configuration.json" );

  private final ObjectMapper jsonMapper = new ObjectMapper();

  private Settings settings = new Settings();

  public EditorPreferences() {
    jsonMapper.enable( SerializationFeature.INDENT_OUTPUT );
    jsonMapper.setSerializationInclusion( JsonInclude.Include.NON_NULL );
  }

  public Settings settings() {
    return settings;
  }

  public Settings load() {
    try ( InputStream is = Files.newInputStream( FILENAME ) ) {
      settings = jsonMapper.readValue( is, Settings.class );
      return settings;
    }
    catch ( IOException e ) {
      return settings;
    }
  }

  public void save() {
    try ( OutputStream os = Files.newOutputStream( FILENAME ) ) {
      jsonMapper.writeValue( os, settings );
    }
    catch ( IOException e ) {
      throw new IllegalStateException( e );
    }
  }
}
com/tutego/exercise/json/EditorPreferencesDemo.java
EditorPreferences preferences = new EditorPreferences();
preferences.save();

Settings settings = preferences.load();
settings.editor.fontSize = 22;
settings.terminal.put( "integrated.unicodeVersion", "11" );
preferences.save();

EditorPreferences has a constructor in which two things are configured at ObjectMapper:

  • First of all, the output should be formatted, because configuration files are made for users, so the file should not be as short as possible and save all spaces, but have wraps and inserts.

  • If a string is empty, Jackson writes it the same way as reference variables that are null. In our example, writing null values is not desired; this property sets setSerializationInclusion(JsonInclude.Include.NON_NULL). In the configuration it is possible to set this once globally, via ObjectMapper or locally via annotations.

Internally, EditorPreferences creates a Settings object that reconstructs load() from the JSON file and writes in the save() method. readValue(…​) and writeValue(…​) of the ObjectMapper object are responsible for the actual mapping.

1.6.7. Load Wikipedia images with jsoup

com/tutego/exercise/net/WikipediaImageLoader.java
  String url = "https://de.wikipedia.org/wiki/Wikipedia:Hauptseite";
  Document doc = Jsoup.parse( new URL( url ), 1000 /* ms */ );

  for ( Element img : doc.select( "img[src~=(?i)\\.(png|gif|jpg)]" ) ) {
    String imgUrl = img.absUrl( "src" );
    String filename = imgUrl.replaceAll( "[^a-zA-Z0-9_.-]", "_" );
    try ( InputStream imgStream = new URL( imgUrl ).openStream() ) {
      Files.copy( imgStream, Paths.get( filename ),
                  StandardCopyOption.REPLACE_EXISTING );
    }
  }
}

The class Jsoup has the static method parse(…​), which can build the HTML document from different sources. In our case we choose directly the URL object. When accessing the network a timeout must be given to Jsoup, which we set to 1000 milliseconds. The parse(…​) method returns an org.jsoup.nodes.Document object. There are two ways to access this Document and extract elements:

  • DOM methods like getElementById(String id) or child(int index).

  • selector expressions as known from CSS

The proposed solution works with the select(…​) method. img[…​] stands for all img tags, while src~= specifies via a regular expression what should be true for the src attribute, namely that the strings match \.(png|gif|jpg), i.e. have the file extension .png, .gif or .jpg. The (?i) flag activates the search regardless of case.

The result of the select(…​) method is of type Elements, a subclass of ArrayList. A List is Iterable and can be conveniently traversed with an extended for loop. Each element in this list is of type Element. We could use the attr("src") call to get the set URL of the image, but more useful is the absUrl(…​) method, which resolves the URL absolutely.

If we later download the images, then we can’t directly use this URL as the filename, because there are illegal symbols there that cause problems in the file system. The string method replaceAll(…​) returns a new cleaned up string that we can use as filename. The next step is to build a URL object and open an input stream to this image and copy it to the local file system via Files.copy(…​); we have written the code before in the ImageDownloader task.

1.6.8. Generate Word files with screenshots

com/tutego/exercise/fileformat/ScreenCapturesInDocx.java
private static final int TOTAL_NUMBER_OF_SCREEN_CAPTURES  = 3;
private static final int DURATION_BETWEEN_SCREEN_CAPTURES = 5;
private static final Rectangle SCREEN_SIZE =
    new Rectangle( Toolkit.getDefaultToolkit().getScreenSize() );

private static byte[] getScreenCapture() throws AWTException, IOException {
  BufferedImage screenCapture = new Robot().createScreenCapture( SCREEN_SIZE );
  ByteArrayOutputStream os = new ByteArrayOutputStream();
  ImageIO.write( screenCapture, "jpeg", os );
  return os.toByteArray();
}

private static void appendImage( XWPFDocument doc, byte[] imageBytes )
    throws IOException, InvalidFormatException {
  XWPFRun paragraph = doc.createParagraph().createRun();
  paragraph.addPicture( new ByteArrayInputStream( imageBytes ),
                        Document.PICTURE_TYPE_JPEG,
                        UUID.randomUUID().toString(),
                        Units.toEMU( SCREEN_SIZE.width / 100. * 20 ),
                        Units.toEMU( SCREEN_SIZE.height / 100. * 20 ) );
  paragraph.addBreak();
}

public static void main( String[] args ) throws Exception {
  try ( XWPFDocument xwpfDocument = new XWPFDocument() ) {
    for ( int i = 0; i < TOTAL_NUMBER_OF_SCREEN_CAPTURES; i++ ) {
      appendImage( xwpfDocument, getScreenCapture() );
      TimeUnit.SECONDS.sleep( DURATION_BETWEEN_SCREEN_CAPTURES );
    }

    Path tempFile = Files.createTempFile( "screen-captures", ".docx" );
    try ( OutputStream out = Files.newOutputStream( tempFile ) ) {
      xwpfDocument.write( out );
    }
    System.out.println( "Written to " + tempFile );
  }
}

The solution consists of three methods. The first method getScreenCapture() returns a byte[] with the screen content as JPEG. Java can do this via the Robot class, which is intended for automation. (The Robot class can be used to move the cursor and send keystrokes). The result of createScreenCapture(…​) for the whole screen size is of type BufferedImage, an internal image format. To convert it to JPEG format, the program resorts to the ImageIO.write(…​) method, which first writes the BufferedImage to a ByteArrayOutputStream and then converts it to a byte array and returns it.

The second method is appendImage(…​); it appends an image to an existing XWPFDocument. Since each image is placed in its own paragraph, a Paragraph is built first and then the image is added via addPicture(…​). The method expects an input stream to the picture as well as a unique identifier, and size information. The image is scaled a bit.

The main method main(…​) opens a new XWPFDocument, then takes a screen capture, appends it to the document, waits a second, then takes another screen capture until the desired maximum number is reached. The document is only in memory so far. Files.createTempFile(…​) creates a file in the temporary directory and writes the office document to this file.

1.6.9. Play insect sounds from ZIP archive

com/tutego/exercise/io/TrueZipDemo.java
Path path = new TPath( filename );

List<Path> wavFiles = new ArrayList<>();
try ( DirectoryStream<Path> entries = Files.newDirectoryStream( path ) ) {
  entries.forEach( wavFiles::add );
}

while ( true ) {
  int randomIndex = ThreadLocalRandom.current().nextInt( wavFiles.size() );
  Path randomWavFile = wavFiles.get( randomIndex );
  try ( InputStream fis = Files.newInputStream( randomWavFile );
        // for mark/reset support we need a BufferedInputStream
        BufferedInputStream bis = new BufferedInputStream( fis );
        AudioInputStream ais = AudioSystem.getAudioInputStream( bis ) ) {
    Clip clip = AudioSystem.getClip();
    clip.open( ais );
    clip.start();
    TimeUnit.MICROSECONDS.sleep( clip.getMicrosecondLength() + 50 );
    clip.close();
  }
}

The solution consists of the following parts:

  1. Reading all audio files from the archive filename .

  2. Selecting a random audio file

  3. Playing the audio file

TrueZIP uses its own Path implementation TPath for its work. The constructor can be passed a String, Path, URI or File object. If we use our ZIP file and have constructed TPath, newDirectoryStream(..) returns all directory contents, caching them in a list of Path objects.

The infinite loop starts and selects a random file from the list. An input stream is opened and then decorated with BufferedInputStream. This is necessary because the audio system requires a special feature on input streams, namely that markers are supported. The input stream of TrueZIP does not support this, at least in the current version.

After that the AudioInputStream can be opened and the Clip can be played. The duration of the clip can be queried via getMicrosecondLength(…​), and this is how long we wait after starting playback. We add a small buffer of 50 microseconds on top. The clip is closed, and the next loop cycle follows.