1. Files, Directories and File Access

Even though much is moving to the cloud and database: The file system is still an important storage and place to organize documents. Even Bonny Brain and Captain CiaoCiao still store a lot locally — there is enough that must not get into the open.


  • know the File class, Path interface and Files class in the basics

  • be able to create temporary files

  • be able to retrieve metafiles from files and directories

  • be able to list and filter directory contents

  • read and write complete files

  • know RandomAccessFile

Data types used in this chapter:

1.1. Path and Files

As with so many things in Java, there is the "old" way and the "new" way when it comes to file processing. In many examples you still see code with the types java.io.File, FileInputStream, FileOutputStream, FileReader and FileWriter, but these types are no longer up-to-date, so in this chapter we will deal exclusively with Path and Files, because these types allow the use of virtual file systems, like a ZIP archive. File is now only required when actually dealing with files or directories of the local file system; examples would be opening files with the programs associated by the operating system or redirecting data streams from externally started programs.

1.1.1. Display saying of the day ⭐

Every now and then Captain CiaoCiao can’t quite get motivated. A motivational or sense saying for the day brings the grumpy head on new thoughts. The task is to program an application that generates an HTML file with a saying and then opens the browser to display this text. The exercise can be solved with two methods of java.nio.files.Files.


  • Create a temporary file ending with the file suffix .html using an appropriate files method.

  • Write HTML code in the new temporary file, such as the following:

    <!DOCTYPE html><html><body>
    'The things we steal tell us who we are.'
    - Thomas of Tew
  • Find a method from the java.awt.Desktop class that opens the default browser and displays the HTML file as a result.

1.1.2. Merge hiding places ⭐

With certain Files methods, an entire file can be read and rewritten line by line in a single step.

Captain CiaoCiao collects potential hiding places in a large text file. But often he spontaneously thinks of more hiding places and quickly writes them into a new file. Now he takes his time and cleans up and merges everything; the small text files are to be merged with the large file. It is important that the order of the entries in the large file is not changed and only the entries from the small files are included if they do not appear in the large file, because it may be that the main file already contains the hiding places.


  • Write a method mergeFiles(Path main, Path... temp) that opens the master file, adds all temporary contents, and then writes back the master file.

1.1.3. Create copies of a file ⭐⭐

If you copy a file to the same folder in Windows Explorer, a copy is created. This copy automatically gets a new name. We are looking for a Java program that replicates this behavior.


  • Write a Java method cloneFile(Path path) that creates copies of files, generating the file names systematically. Suppose <name> symbolizes the file name, then the first copy will be Copy of <name> and thereafter the file names should be Copy (<number>) of <name>.

  • If you call the methods on directories or there are other errors, the method may throw an IOException.


  • Suppose a file is called Top Secret UFO Files.txt. Then the new file names should look like this:

    • Copy of Top Secret UFO Files.txt

    • Copy (2) of from Top Secret UFO Files.txt

    • Copy (3) of from Top Secret UFO Files.txt

    • etc.

1.1.4. Generate a directory listing ⭐

On the command line, the user can display directory contents and metadata, just as a file selection dialog displays files to the user.


  • Using Files and the newDirectoryStream(…​) method, write a program that lists the directory contents for the current directory.

  • Call the dir program under DOS. Rebuild the output of the directory listing completely. The header and footer are not necessary.

1.1.5. Search for large GIF files ⭐

There’s a mess on Bonny Brain’s hard drive, partly because she stores all her images in exactly one directory. Now the pictures from the last treasure hunt are untraceable! All she remembers is that the images were saved in GIF format and they were over 1024 pixels wide.


  • Given is any directory. Search in this directory (not recursively!) for all images that are of type GIF and have a minimum width of 1024 pixels.

Access the following code to read the widths and perform GIF checking:

private static final byte[] GIF87aGIF89a = "GIF87aGIF89a".getBytes();
private static boolean isGifAndWidthGreaterThan1024( Path entry ) {
  if ( ! Files.isRegularFile( entry ) || ! Files.isReadable( entry ) )
    return false;

  try ( RandomAccessFile raf = new RandomAccessFile( entry.toFile(), "r" ) ) {
    byte[] bytes = new byte[ 8 ];
    raf.read( bytes );

    if ( ! Arrays.equals( bytes, 0, 6, GIF87aGIF89a, 0, 6 ) &&
         ! Arrays.equals( bytes, 0, 6, GIF87aGIF89a, 6, 12 ) )
      return false;

    int width = bytes[ 6 ] + (bytes[ 7 ] << 8);
    return width > 1024;
  catch ( IOException e ) {
    throw new UncheckedIOException( e );

The method reads the first bytes and checks if the first 6 bytes match either the string GIF87a or GIF89a. In principle, this test can also be implemented with ! new String(bytes, 0, 6).matches("GIF87a|GIF89a"), but that would cause some temporary objects in memory.

After the check, the program reads 2 bytes for the width and converts the bytes to a 16-bit integer.

1.1.6. Descend directories recursively and find empty text files ⭐

There is still a big mess on Bonny Brain’s hard drive. For some inexplicable reason, it has many text files with 0 bytes.


  • Using a FileVisitor, run recursively from a chosen starting directory through all subdirectories, looking for empty text files.

  • Text files are files that have the file extension .txt (case-insensitive).

  • If found, show the absolute path of the file on the console.

1.1.7. Develop your own utility library for file filters ⭐⭐⭐

The Files class provides three static methods to query all entries in a directory:

  • newDirectoryStream(Path dir)

  • newDirectoryStream(Path dir, String glob)

  • newDirectoryStream(Path dir, DirectoryStream.Filter<? super Path> filter)

The result is always a DirectoryStream<Path>. The first method does not filter the results, the second method allows a glob string such as *.txt, and the third method allows any filter.

java.nio.file.DirectoryStream.Filter<T> is an interface that filters must implement. The method is boolean accept(T entry) and is like a predicate.

The Java library declares the interface but no implementation.


  • Write various implementations of DirectoryStream.Filter that can check files for

    • attributes (like readable, writable)

    • the length

    • the file extensions

    • the filename via regular expressions

    • magic initial identifiers

Ideally, the API allows all filters to be concatenated, something like this:

DirectoryStream.Filter<Path> filter =
    regularFile.and( readable )
               .and( largerThan( 100_000 ) )
               .and( magicNumber( 0x89, 'P', 'N', 'G' ) )
               .and( globMatches( "*.png" ) )
               .and( regexContains( "[-]" ) );

try ( DirectoryStream<Path> entries =Files.newDirectoryStream( dir, filter ) ) {
  entries.forEach( System.out::println );

1.2. Random access to file contents

For files, an input/output stream can be obtained and read or written from beginning to end. Another API allows random access, i.e. a position pointer.

1.2.1. Output last line of a text file ⭐⭐

Crew members write all actions in an electronic logbook, with new entries appended at the end. No entry is longer than 100 characters, the texts are written in UTF-8.

Now Captain CiaoCiao is interested in the last entry. What does a Java program look like if only the last line is to be read from a file? Since there are already a lot of entries in the log, it is not possible to read the file completely.


  • Write a program that returns the last line of a text file.

  • Find a solution that does not need unnecessary memory.

Consider whether ([^\r\n]*)$ can be used in a meaningful way.

1.3. Suggested solutions

1.3.1. Display saying of the day

try {
  String html = "<!DOCTYPE html><html><body>" +
      "›The things we steal tell us who we are.‹ - Thomas von Tew" +
  Path tmpPath = Files.createTempFile( "wisdom", ".html" );
  Files.writeString( tmpPath, html );
  Desktop.getDesktop().open( tmpPath.toFile() );
catch ( IOException e ) {
  System.err.println( "Couldn't write HTML file in temp folder or open file" );

In the proposed solution, we are dealing with three central statements. At the beginning there is the creation of the file in the temporary directory. In the method createTempFile(…​) we can specify a part of the name as well as a suffix, and we choose the extension .html, so that later the operating system can select the appropriate viewer via this file extension. It returns createTempFile(…​) the generated Path, which we use to write the string into this file.

For writing a string there is the method writeString(Path path, CharSequence csq, OpenOption…​ options). The class String implements CharSequence. We do not need options in our case, that would be important if for example existing data should be overwritten.

open(…​) is one of the few methods that require a File object. From the Path we generate a File object and use it to open the browser, which should be associated with rendering HTML files. Alternatively, for web pages we can use browse(URI) and get the URI from the path via toUri().

Java 8 Backport

The writeString(…​) method was added in Java 11. For Java 8, we have to take a workaround, such as Files.write(tmpPath, Collections.singleton(html)). The chosen write(…​) method is used a bit strangely at first sight — because the second parameter must be an Iterable<? extends CharSequence>. But we have only one String. Therefore we generate a Collection from exactly one single element and thus fulfill the signature of the method.

1.3.2. Merge hiding places

public static void mergeFiles( Path main, Path... temp ) throws IOException {
  Iterable<Path> paths =
      Stream.concat( Stream.of( main ), Stream.of( temp ) )::iterator;
  Collection<String> words = new LinkedHashSet<>();

  for ( Path path : paths )
    try ( Stream<String> lines = Files.lines( path ) ) {
      lines.forEach( words::add );

  Files.write( main, words );

For our task, the LinkedHashSet data structure is ideally suited, because as a set it contains elements only once, and it takes into account the order of the inserted elements. We only have to take care that the rows of the first file come first into the data structure and then the rows of the remaining files.

For reading the lines and inserting them into the data structure, the first file should be treated in the same way as the rest of the files. But the unification is only possible by a workaround, because the first data type in the parameter list is a single Path variable, and then follows a vararg, i.e. a Path array. The proposed solution first puts the first element into a Stream and combines that with a second stream of the elements in the vararg array; the result is a Stream<Path>. We only have to run this stream. In theory, forEach(…​) can be used here, but there is a problem: input/output operations throw checked exceptions, and these do not get along with the lambda expressions. The Stream is therefore converted to an Iterable so that we can use the extended for loop. The ::iterator method reference returns an expression of type Iterable; a neat trick, since Stream itself does not implement Iterable.

The extended for loop runs over the files, reads in all the lines, puts them into the data structure, and finally writes all the lines back to the first file.

1.3.3. Create copies of a file

private static final String COPY_OF          = "Copy of %s";
private static final String NUMBERED_COPY_OF = "Copy (%d) of %s";

public static void cloneFile( Path path ) throws IOException {

  if ( Files.isDirectory( path ) )
    throw new IllegalArgumentException(
        "Path has to be a file but was a directory" );

  Path parent   = path.getParent();
  Path filename = path.getFileName();

  Path copyPath = parent.resolve( String.format( COPY_OF, filename ) );

  for ( int i = 2; Files.exists( copyPath ); i++ )
    copyPath = parent.resolve( String.format( NUMBERED_COPY_OF, i, filename ) );

  Files.copy( path, copyPath );

The algorithm from the proposed solution proceeds by generating possible filenames in order and testing until a free filename is found. In round brackets there is a counter starting at 2. There is no Copy (1) of <Name>.

The method cloneFile(Path path) starts with a query if a directory was passed as path by mistake, and raises an exception in that case; we cannot clone directories. If it is a file, we extract the directory of the file and the filename.

The first sample for a possible new filename starts with Copy of and does not yet contain a counter. We can test this filename for existence with Files.exists(…​). If the file exists, we have to continue with a counter. Therefore we set this existence test as a condition in a for loop and use a counter variable i, which we initialize with 2 at the beginning to be able to represent the counter in round brackets. In the body of the loop, the variable copyPath is reassigned, always with the loop counter in round brackets. We run through the loop until we find a copyPath that does not exist. Then the loop terminates, and Files.copy(…​) creates a copy of the file with the path specification of copyPath.

To make it easier to change the strings for different languages, they are extracted as constants.

1.3.4. Generate a directory listing

private static final DateTimeFormatter ddMMyyyy_hhmm =
    DateTimeFormatter.ofPattern( "dd.MM.yyyy  hh:mm" );

static void listDirectory( Path dir ) throws IOException {
  try ( DirectoryStream<Path> entries = Files.newDirectoryStream( dir ) ) {
    for ( Path path : entries ) {
      Instant instant = Files.getLastModifiedTime( path ).toInstant();
      LocalDateTime dateTime = LocalDateTime.ofInstant( instant,
                                                        ZoneId.systemDefault() );
      String formattedDateTime = dateTime.format( ddMMyyyy_hhmm );
      String dirLength = Files.isDirectory( path )
                         ? "<DIR>         "
                         : String.format( "%,14d", Files.size( path ) );
      String filename = path.getFileName().toString();
      System.out.printf( "%s   %s %s%n", formattedDateTime, dirLength, filename );

To solve the task, different APIs come together. We need the class Files, the type Path, date/time calculations and format strings.

Since the file operations can throw potential exceptions, but we cannot handle them, our method will pass possible exceptions to the caller. Nevertheless, a try-with-resources comes into play — the resource is the DirectoryStream. If you program quick and dirty, you will often see the DirectoryStream as an Iterable, located to the right of the colon of the extended for loop. But the DirectoryStream is a resource that needs to be closed. So we find the extended for loop to loop through all entries in the directory in the next step.

The variable path now contains a path, which can stand for a file or a directory. In any case we want to get the time of the last access. Although Files.getLastModifiedTime(…​) returns the necessary FileTime object, the toString() method does not return anything appealing. Therefore a little detour is necessary to get a nice output: First, the FileTime is converted to an Instant, which is then converted to a LocalDateTime, and this allows us to use a DateTimeFormatter with the pattern "dd.MM.yyyy hh:mm", for whose format we have introduced a separate constant.

We have the date and time, now the other segments follow. Depending on whether the path is a directory or a file, we have to set <DIR> or, in the case of a file, ask for the file length; String.format(…​) brings the number of bytes to an appropriate length.

In the last step we ask for the name of the file or directory and put everything together in one line. This line starts with the formatted date and time, then the indication if it is a directory, otherwise the file length, and finally the file name or directory name.

1.3.5. Search for large GIF files

For the solution we fall back on a Files method:

DirectoryStream<Path> newDirectoryStream(Path dir, DirectoryStream.Filter<? super Path> filter).

For us, DirectoryStream.Filter<? super Path> filter is relevant because it can be used to implement a criteria for limiting the results. For context, let’s look at the UML diagram:

DirectoryStream UML
Figure 1. UML diagram of DirectoryStream and dependent types

The Filter is a nested type of DirectoryStream. If we need to pass a filter, we need to implement the Filter interface and the accept(…​) method. The implementation can be done by a class, by a lambda expression or by a method reference. And "luckily" boolean isGifAndWidthGreaterThan1024(Path entry) matches boolean accept(Path entry), which suggests a method reference.

Path directory = Paths.get( name );
try {
  try ( DirectoryStream<Path> files =
            Files.newDirectoryStream( directory,
                                      FindBigGifImages::isGifAndWidthGreaterThan1024 ) ) {
    files.forEach( System.out::println );
catch ( IOException e ) {

The Files.newDirectoryStream(…​) method returns the DirectoryStream, which is AutoCloseable and must be closed again at the end. This is conveniently done by try-with-resources. The DirectoryStream is also an Iterable, so we can run it in an extended for loop or with a forEach(…​) and an appropriate Consumer.

1.3.6. Descend directories recursively and find empty text files

public static void findEmptyTextFiles( Path base, Consumer<Path> callback )
    throws IOException {
  class PrintingFileVisitor extends SimpleFileVisitor<Path> {
    public FileVisitResult visitFile( Path visitedFile,
                                      BasicFileAttributes fileAttributes ) {
      if ( visitedFile.toString().toLowerCase().endsWith( ".txt" )
          && fileAttributes.size() == 0L )
        callback.accept( visitedFile );
      return FileVisitResult.CONTINUE;
  Files.walkFileTree( base, new PrintingFileVisitor() );

The solution implements the findEmptyTextFiles(…​) method with two parameters: the first for the base directory and the second for a consumer that is called when a path is found.

The static method Files.walkFileTree(…​) recursively walks the file system from a directory to a desired depth. In our case, we do not limit the depth and do not provide any other options. The method must be passed an implementation of FileVisitor. This is not a functional interface, but an interface with four methods. We could implement the interface ourselves, but the Java library provides a simple implementation with SimpleFileVisitor.

FileVisitor UML
Figure 2. UML diagram of FileVisitor, the subclass SimpleFileVisitor and the enumeration FileVisitResult for the returns

From this class we subclass and override the visitFile(…​) method relevant for us, which is called whenever walkFileTree(…​) finds a file. Filter does not allow walkFileTree(…​), which is a pity, because newDirectoryStream(…​) allows filter. Consequently, we have to implement the criteria that the filename ends with .txt and is 0 bytes in size ourselves. Thankfully, visitFile(…​) gives us

  1. the path, so we can check if the filename ends in .txt, and

  2. the attributes, so we can test if the file is empty.

If both criteria are correct, we call the callback function and pass the path to the file. Then we want to continue the search in the directory, and for this the method returns FileVisitResult.CONTINUE.

1.3.7. Develop your own utility library for file filters

class FileFilters {

  public interface AbstractFilter extends DirectoryStream.Filter<Path> {
    default AbstractFilter and( AbstractFilter other ) {
      return path -> accept( path ) && other.accept( path );

    default AbstractFilter negate() {
      return path -> ! accept( path );

    static AbstractFilter not( AbstractFilter target ) {
      return target.negate();

   * Tests if a {@code Path} is readable.
  public static final AbstractFilter readable = Files::isReadable;

   * Tests if a {@code Path} is writable.
  public static final AbstractFilter writable = Files::isWritable;

   * Tests if a {@code Path} is a regular file.
  public static final AbstractFilter directory = Files::isDirectory;

   * Tests if a {@code Path} is a regular file.
  public static final AbstractFilter regularFile = Files::isRegularFile;

   * Tests if a {@code Path} is hidden.
  public static final AbstractFilter hidden = Files::isHidden;

   * Tests if the file size of a {@code Path} is zero.
  public static final AbstractFilter empty = path -> Files.size( path ) == 0L;

   * Tests if the file size of a {@code Path} is larger than the specified size.
  public static AbstractFilter largerThan( long size ) {
    return path -> Files.size( path ) > size;

   * Tests if the file size of a {@code Path} is smaller than the specified size.
  public static AbstractFilter smallerThan( long size ) {
    return path -> Files.size( path ) < size;

   * Tests if a {@code Path} is older than the specified {@code FileTime}.
  public static AbstractFilter olderThan( FileTime other ) {
    return path -> Files.getLastModifiedTime( path ).compareTo( other ) > 0;

   * Tests if a {@code Path} has a specified suffix, ignoring case, e.g. ".txt".
  public static AbstractFilter suffix( String suffix, String... more ) {
    return path ->
        Stream.concat( Stream.of( suffix ), Stream.of( more ) )
            .anyMatch( aSuffix -> {
              String filename  = path.toString();
              int suffixLen    = aSuffix.length();
              int suffixOffset = filename.length() - suffixLen;
              return filename.regionMatches( /* ignore case */ true,
                                             suffixOffset, suffix, 0, suffixLen );
            } );

   * Tests if the content of a {@code Path} starts with a specified sequence of bytes.
  public static AbstractFilter magicNumber( int... bytes ) {
    ByteBuffer byteBuffer = ByteBuffer.allocate( bytes.length );
    for ( int b : bytes ) byteBuffer.put( (byte) b );
    return magicNumber( byteBuffer.array() );

   * Tests if the content of a {@code Path} starts with a specified sequence of bytes.
  public static AbstractFilter magicNumber( byte... bytes ) {
    return path -> {
      try ( InputStream in = Files.newInputStream( path ) ) {
        byte[] buffer = new byte[ bytes.length ];
        in.read( buffer );
        // If file is smaller than bytes.length, the result is false
        return Arrays.equals( bytes, buffer );

   * Tests if a {@code Path} regexContains a specified regex.
  public static AbstractFilter regexContains( String regex ) {
    return path -> Pattern.compile( regex ).matcher( path.toString() ).find();

   * Tests if a filename of a {@code Path} matches a given glob string.
  public static AbstractFilter globMatches( String glob ) {
    return path -> path.getFileSystem().getPathMatcher( "glob:" + glob )
                       .matches( path.getFileName() );

The method Files.newDirectoryStream(…​) allows recursive traversal of a directory tree. The method expects an implementation of the functional interface Filter and its accept(…​) method as passing:

boolean accept(T entry) throws IOException

The interface has no additional default or static methods, and furthermore the type argument is always Path in our case, so a new interface AbstractFilter is to be created as a subtype of Filter with two additional default methods and one static method. The and(…​) method associates two AbstractFilters with a logical And, the negate() method negates the result of its own accept(…​) method, and the static not(…​) method returns a new AbstractFilter object, which also negates the result.

The FileFilters class declares various constants and methods. Whenever something has to be parameterized, a method is used; if no parameterization is necessary, a constant is sufficient. All following constants are of the datatype AbstractFilter and the methods return AbstractFilter. The constants are initialized with a corresponding method from the Files class via the method reference.

Let’s focus on the more exciting methods.

  • hasSuffix(…​) joins all possible file extensions to a Stream and then queries whether the path has one of the passed file extensions. To do this, the path is first converted to a string, and regionMatches(…​) is used to test the suffix, case-insensitive. Checking via regionMatches(…​) is a bit more cluttered in code than working with toLowerCase(…​) and endsWith(…​), but regionMatches(…​) does not build temporary objects and is a little more performant.

  • magicNumber(byte...) takes a variable number of bytes, throwing a NullPointerException if the parameter variable is null. Otherwise

    1. an input stream is opened,

    2. exactly as many bytes are read in as the parameter array is large,

    3. the two arrays are compared with Arrays.equals(..). If the file is smaller than the passed number of bytes, then the equals(…​) method will return false in any case, because the first thing the method checks is the same number of elements.

  • magicNumber(int...) is the overloaded variant of magicNumber(byte...), because bytes in the parameter list are inflexible due to the value range -128 to +127; developers would have to write, for example:

    magicNumber( (byte) 0x89, (byte) 'P', (byte) 'N', (byte) 'G' )

    The int…​ data type is more convenient for callers; so it could be:

    magicNumber( 0x89, 'P', 'N', 'G' )

    Therefore, magicNumber(int..) converts the int... to a byte[] and delegates to the main method.

  • regexContains(…​) takes a regular expression, compiles it, then applies it to the path, and if the find() method returns true, matches the filename to the regular expression.

  • globMatches(…​) matches the filename against a glob string; these are simple expressions, like *.txt or ??-??-1976. getPathMatcher(…​) returns a PathMatcher with the prefix glob: or regex:. The implementation of regexContains(…​) does not fall back to a PathMatcher with regular expression, because the PathMatcher tests for a full match, but in the solution a partial search is desired.

1.3.8. Output last line of a text file

private static final int MAX_LINE_LENGTH = 100;
private static final int MAX_NUMBER_OF_BYTES_PER_UTF_8_CHAR = 4;

private static void printLastLine( String filename ) throws IOException {
  try ( RandomAccessFile file = new RandomAccessFile( filename, "r" ) ) {
    file.seek( file.length() - blockSize );
    byte[] bytes = new byte[ blockSize ];
    file.read( bytes );

    String string = new String( bytes, StandardCharsets.UTF_8 );
    Matcher matcher = Pattern.compile( "([^\\r\\n]*)$" ).matcher( string );
    if ( matcher.find() )
      System.out.println( matcher.group( 1 ) );

The solution has two steps: First, a block is read at the end of the file and then the last line is extracted from this block.

To determine the correct block size, we multiply the maximum line length known from the task (100) by the maximum number of bytes expected per UTF-8 character; in UTF-8 encoding, a maximum of four characters can encode a symbol. The task becomes more difficult if the maximum line length is unknown, but the product MAX_LINE_LENGTH * MAX_NUMBER_OF_BYTES_PER_UTF_8_CHAR tells us that the last block of this size in the file also contains the last line.

In the next step, we set the file pointer to the beginning of the last block. We read in a byte array and convert it to a string after UTF-8 encoding. If the last line does not have the maximum length, then the read block contains the remains of the previous line or lines with end-of-line characters as well as the last line. For extracting the last line we can use lastIndexOf(…​), but the regular expression ([^\r\\n]*)$ gives us the last line much nicer. The regular expression is composed as follows:

  1. The dollar sign at the end signals the end of the input.

  2. The group in round brackets represents what we want to extract.

  3. In the character class [^\r\n] the little hat stands for the negation that we want all characters that are not end-of-line characters. [^\r\n]* with asterisk gives us a sequence of such non-line-end characters.

If there is a match, we output it, otherwise there is no output.