- Overview
- Exercice: Byte flows
- Exercice: Determine encoding
- Reading from character flows
- Exercice: Character flows
- A more realistic exercice
- Exercice: Write two files
- Path operations
- File system providers
- Constructing paths from other paths
- Exercice: abstract path concept
- Make sure your methods that write data can write anywhere
- References
About manipulation of flows (including byte streams, character flows, files) in Java.
Note
|
The reader should understand Relative and absolute paths. The reader should know the Varargs construct. (The last exercice uses Maven.) |
There are four essential kinds of objects that represent flows of information in Java.
-
InputStream: to read flows of bytes.
-
OutputStream: to write flows of bytes.
-
Reader: to read flows of characters.
-
Writer: to write flows of characters.
It is typical for a flow of character to be encoded as a flow of bytes. For example, to store the string Hello, world
in a text file on your hard disk, using the UTF-8 encoding standard, the string would be first encoded as the bytes 48, 65, 6C, 6C, 6F, 2C, 20, 77, 6F, 72, 6C, 64 in hexadecimal (or equivalently in decimal, 72, 101, 108, 108, 111, 44, 32, 119, 111, 114, 108, 100), then written (as a bit stream) on the disk. In Java, a mapping between characters and bytes is called a character set. You should always opt for the UTF-8
character set when you have a choice, as it is the standard general purpose character set.
The most common way of obtaining a flow is to read from a file. To designate the file, you typically use a path relative to your project directory (the place where pom.xml
is stored, when you use Maven).
Two essential classes help you deal with files: Path
and Files
. For example, use Files#newBufferedWriter()
to obtain a Writer
that writes to a file.
Classes that represent flows (such as the four above) implement Closeable
. This signals that they must be closed after use. (This serves to release resources that might have been reserved by the operating system, or to proceed with finalizing operations, such as effectively writing to disk by flushing the cache.) Do not forget to close your flows after use by calling close()
. Use the try-with-resources statement for this, as it is a simple way of making sure close()
is duly called.
Note
|
Many examples on the internet still use the File class. This approach should now be considered deprecated. Using Path and Files , as illustrated here, should be favored: it is a simpler API altogether, and readily adapts to other kinds of file systems such as in-memory file systems, as explained below.
|
-
Create a file
in.txt
in your project directory, containingHello, world
. -
In Java, create a
Path
object designating that file. -
Use the appropriate static method from the
Files
class to obtain anInputStream
representing the content of that file. -
Read and print the bytes from that stream using a loop, one by one.
-
Check that the bytes are as expected.
Here is a solution. Don’t cheat! Try it for yourself before looking.
-
Move the file around in your hard disk (for example, put in into the
src/main
sub-directory, or put in into a directory higher up the hierarchy, one that is not included in your project). Adapt the code.
In this example, the specific encoding used for encoding characters to bytes does not matter much, because all encodings used in practice encode the basic latin alphabet in the same way. Problems can arise when using “more exotic” characters. For example, the character é
is encoded as the bytes C3, A9 under UTF-8
, but as E9 under ISO-8859-1
.
Create a text file containing Hé !
using your favorite text editor, and print its byte content using a stream, as previously. From this, determine whether your editor possibly used the UTF-8
encoding.
Character flows should be used preferably to byte flows when reading or writing text data (as in the exercice above). Such a flow usually reads from (or writes to) a byte flow, decoding from or encoding to bytes as demanded. Note that this conversion will depend on the character set specified to the object representing the character flow.
Adapt the strategy used in the first exercice in order to use a Reader
to print the content of your file containing Hé !
. Use various character sets. Check that it prints Hé !
or prints something else, depending on the character set your editor uses to encode the characters, and the character set your Reader
object uses to decode the bytes.
-
Create (manually) a text file containing multiple persons. Each person is described by two lines: her first name then her last name. Thus the file contains n times 2 lines for describing n persons. Create a class
Person
with a first name and a last name. Create a methodreadPersons
that accepts aReader
as parameter, uses it to read such a file, and returns a list of persons.
We now want to test this method. We could create a file, let the method read from that file, and then delete the file, but we prefer to avoid creating a file unnecessarily. This is possible thanks to the abstraction represented by Reader
: remember that such an object represents a flow of characters, and this may come from a file, from memory, from the network, …
-
Create a unit test that creates a
String
equal tofirstname\nlastname
. The test gives the string to yourreadPersons
method created in the previous exercice. To do this, use aStringReader
. The test asserts that the created person has the right first and last names.
In this exercice we will write code that writes two files, hello.txt
and subfolder/bye.txt
.
-
Define a method
helloBye(Path)
that accepts aPath
(considered to represent a folder). Define another path instance from that path, representing the filehello.txt
sitting in the folder represented by that path (thus, for example, if given a path representing/home/user/afolder/
, your new path instance should represent/home/user/afolder/hello.txt
). Write the stringHello, world
in that file. Check that it works. -
Extend your method so that it creates a path representing the folder
subfolder
as a child folder of the path received as argument, create that folder, and in that folder, create a filebye.txt
containingBye bye!
. Check that this works. -
Create somewhere (manually, with your file browser) some folders
test1/
andtest2/asubfolder/
. Define amain
method that callshelloBye()
twice, giving it paths representingtest1/
and thentest2/asubfolder/
. Check that your code has created the four expected files.
A Path
represents a path in a tree, and that tree represents a “file” system. From a given Path
instance (representing, thus, a path in a file system), you can obtain other Path
instances, representing relative or absolute paths that relate to that first path. See here for the essentials about this.
Each Path
instance is bound to a FileSystemProvider
. The default file system provider permits to access the “normal” file system, thus, the files and folders sitting on the hard disk where your code runs. When building a path from String
instances, using the Path.of(String, String…)
method, the returned instance is bound to that (default) file system provider. For example, Path.of("/home/user/stuff.txt")
represents the file stuff.txt
sitting in the folder /home/user
.
The Path
mechanism in Java is actually more general than this, as it can represent a path in a file system that is not the “normal” file system. Such Path
instances are bound to other (non default) FileSystemProvider
instances. For example, an in-memory file system, or a file system that can access the content of a zip (or a jar) file.
There are two important ways of obtaining a Path
instance. One is to build it from String
instances, as illustrated here above. Another is to obtain a path from another path. The crucial difference is that when you obtain your instance from another path, your instance is bound to the same provider as the other path provider. Thus, if you receive a Path
instance that is bound to an in-memory file system, and obtain, say, a child of that path (by calling resolve(String)
on that path), you obtain another Path
instance bound to this same in-memory file system. This is very handy: in this way, you can create general code that deals with any file system given by your user, even if you know nothing about the specifics of those other file systems.
Here we will reuse our previous code to write at different places in our default file system and to a zip file.
-
Create a new
zip
file and obtain theFileSystem
instance that represents it, thanks to this sample code. Obtain the root path of this file system withgetPath("")
. Pass this path tohelloBye()
. Check (manually) that you have successfully created a zip file containing the expected files. Modify the code of themain
method (but not the one ofhelloBye
) so that the zip file will contain filesmyfolder/hello.txt
andmyfolder/subfolder/bye.txt
. -
Assume that you modify
helloBye
so that it creates the paths it needs usingPath.of
(and using strings, converting the path given as argument to a string by usingtoString()
) instead of by creating the paths it needs from the path instance given as argument. What would continue to work, among what we did so far, and what would fail? Explain, using the explanations given in the section here above, why it is impossible to make this whole exercice work if creating path instances usingPath.of
.
A method that writes data (for example, converts some object to an XML representation) should be able to write not only on a file sitting on a hard disk, but also in memory or generally on any instance of Path
: this makes it more general, at no cost.
For example, instead of asXml(String fileName)
, design your class with a method asXml(): String
(if the expected data size is small) and asXml(Path outputPath)
.
In particular, it often happens that some unit test needs to call a method that writes data, then needs to re-read the data just written, to check for correctness. In that case, writing to a physical file is inelegant: creating a physical file just to read it and deleting it afterwards is a waste of time and resources, and practically speaking, it requires to find some place on the hard disk where your code has write access, then make sure that somehow the file gets deleted afterwards. It is much better to write in memory.
See Oracle’s Basic I/O tutorial.