Skip to content

Latest commit

 

History

History
88 lines (56 loc) · 12.4 KB

Flows.adoc

File metadata and controls

88 lines (56 loc) · 12.4 KB

Flows

About manipulation of flows (including byte streams, character flows, files) in Java.

Note
The reader should understand Relative and absolute paths. The reader should know the Varargs construct. (The last exercice uses Maven.)

Overview

There are four essential kinds of objects that represent flows of information in Java.

It is typical for a flow of character to be encoded as a flow of bytes. For example, to store the string Hello, world in a text file on your hard disk, using the UTF-8 encoding standard, the string would be first encoded as the bytes 48, 65, 6C, 6C, 6F, 2C, 20, 77, 6F, 72, 6C, 64 in hexadecimal (or equivalently in decimal, 72, 101, 108, 108, 111, 44, 32, 119, 111, 114, 108, 100), then written (as a bit stream) on the disk. In Java, a mapping between characters and bytes is called a character set. You should always opt for the UTF-8 character set when you have a choice, as it is the standard general purpose character set.

The most common way of obtaining a flow is to read from a file. To designate the file, you typically use a path relative to your project directory (the place where pom.xml is stored, when you use Maven).

Two essential classes help you deal with files: Path and Files. For example, use Files#newBufferedWriter() to obtain a Writer that writes to a file.

Classes that represent flows (such as the four above) implement Closeable. This signals that they must be closed after use. (This serves to release resources that might have been reserved by the operating system, or to proceed with finalizing operations, such as effectively writing to disk by flushing the cache.) Do not forget to close your flows after use by calling close(). Use the try-with-resources statement for this, as it is a simple way of making sure close() is duly called.

Note
Many examples on the internet still use the File class. This approach should now be considered deprecated. Using Path and Files, as illustrated here, should be favored: it is a simpler API altogether, and readily adapts to other kinds of file systems such as in-memory file systems, as explained below.

Exercice: Byte flows

  • Create a file in.txt in your project directory, containing Hello, world.

  • In Java, create a Path object designating that file.

  • Use the appropriate static method from the Files class to obtain an InputStream representing the content of that file.

  • Read and print the bytes from that stream using a loop, one by one.

  • Check that the bytes are as expected.

Here is a solution. Don’t cheat! Try it for yourself before looking.

  • Move the file around in your hard disk (for example, put in into the src/main sub-directory, or put in into a directory higher up the hierarchy, one that is not included in your project). Adapt the code.

In this example, the specific encoding used for encoding characters to bytes does not matter much, because all encodings used in practice encode the basic latin alphabet in the same way. Problems can arise when using “more exotic” characters. For example, the character é is encoded as the bytes C3, A9 under UTF-8, but as E9 under ISO-8859-1.

Exercice: Determine encoding

Create a text file containing Hé ! using your favorite text editor, and print its byte content using a stream, as previously. From this, determine whether your editor possibly used the UTF-8 encoding.

Reading from character flows

Character flows should be used preferably to byte flows when reading or writing text data (as in the exercice above). Such a flow usually reads from (or writes to) a byte flow, decoding from or encoding to bytes as demanded. Note that this conversion will depend on the character set specified to the object representing the character flow.

Exercice: Character flows

Adapt the strategy used in the first exercice in order to use a Reader to print the content of your file containing Hé !. Use various character sets. Check that it prints Hé ! or prints something else, depending on the character set your editor uses to encode the characters, and the character set your Reader object uses to decode the bytes.

A more realistic exercice

  • Create (manually) a text file containing multiple persons. Each person is described by two lines: her first name then her last name. Thus the file contains n times 2 lines for describing n persons. Create a class Person with a first name and a last name. Create a method readPersons that accepts a Reader as parameter, uses it to read such a file, and returns a list of persons.

We now want to test this method. We could create a file, let the method read from that file, and then delete the file, but we prefer to avoid creating a file unnecessarily. This is possible thanks to the abstraction represented by Reader: remember that such an object represents a flow of characters, and this may come from a file, from memory, from the network, …

  • Create a unit test that creates a String equal to firstname\nlastname. The test gives the string to your readPersons method created in the previous exercice. To do this, use a StringReader. The test asserts that the created person has the right first and last names.

Exercice: Write two files

In this exercice we will write code that writes two files, hello.txt and subfolder/bye.txt.

  • Define a method helloBye(Path) that accepts a Path (considered to represent a folder). Define another path instance from that path, representing the file hello.txt sitting in the folder represented by that path (thus, for example, if given a path representing /home/user/afolder/, your new path instance should represent /home/user/afolder/hello.txt). Write the string Hello, world in that file. Check that it works.

  • Extend your method so that it creates a path representing the folder subfolder as a child folder of the path received as argument, create that folder, and in that folder, create a file bye.txt containing Bye bye!. Check that this works.

  • Create somewhere (manually, with your file browser) some folders test1/ and test2/asubfolder/. Define a main method that calls helloBye() twice, giving it paths representing test1/ and then test2/asubfolder/. Check that your code has created the four expected files.

Path operations

A Path represents a path in a tree, and that tree represents a “file” system. From a given Path instance (representing, thus, a path in a file system), you can obtain other Path instances, representing relative or absolute paths that relate to that first path. See here for the essentials about this.

File system providers

Each Path instance is bound to a FileSystemProvider. The default file system provider permits to access the “normal” file system, thus, the files and folders sitting on the hard disk where your code runs. When building a path from String instances, using the Path.of(String, String…) method, the returned instance is bound to that (default) file system provider. For example, Path.of("/home/user/stuff.txt") represents the file stuff.txt sitting in the folder /home/user.

The Path mechanism in Java is actually more general than this, as it can represent a path in a file system that is not the “normal” file system. Such Path instances are bound to other (non default) FileSystemProvider instances. For example, an in-memory file system, or a file system that can access the content of a zip (or a jar) file.

Constructing paths from other paths

There are two important ways of obtaining a Path instance. One is to build it from String instances, as illustrated here above. Another is to obtain a path from another path. The crucial difference is that when you obtain your instance from another path, your instance is bound to the same provider as the other path provider. Thus, if you receive a Path instance that is bound to an in-memory file system, and obtain, say, a child of that path (by calling resolve(String) on that path), you obtain another Path instance bound to this same in-memory file system. This is very handy: in this way, you can create general code that deals with any file system given by your user, even if you know nothing about the specifics of those other file systems.

Exercice: abstract path concept

Here we will reuse our previous code to write at different places in our default file system and to a zip file.

  • Create a new zip file and obtain the FileSystem instance that represents it, thanks to this sample code. Obtain the root path of this file system with getPath(""). Pass this path to helloBye(). Check (manually) that you have successfully created a zip file containing the expected files. Modify the code of the main method (but not the one of helloBye) so that the zip file will contain files myfolder/hello.txt and myfolder/subfolder/bye.txt.

  • Assume that you modify helloBye so that it creates the paths it needs using Path.of (and using strings, converting the path given as argument to a string by using toString()) instead of by creating the paths it needs from the path instance given as argument. What would continue to work, among what we did so far, and what would fail? Explain, using the explanations given in the section here above, why it is impossible to make this whole exercice work if creating path instances using Path.of.

Make sure your methods that write data can write anywhere

A method that writes data (for example, converts some object to an XML representation) should be able to write not only on a file sitting on a hard disk, but also in memory or generally on any instance of Path: this makes it more general, at no cost.

For example, instead of asXml(String fileName), design your class with a method asXml(): String (if the expected data size is small) and asXml(Path outputPath).

In particular, it often happens that some unit test needs to call a method that writes data, then needs to re-read the data just written, to check for correctness. In that case, writing to a physical file is inelegant: creating a physical file just to read it and deleting it afterwards is a waste of time and resources, and practically speaking, it requires to find some place on the hard disk where your code has write access, then make sure that somehow the file gets deleted afterwards. It is much better to write in memory.

References

See Oracle’s Basic I/O tutorial.