RFC: Modular Filesystems C API #101

mihaimaruseac · 2019-05-06T17:13:29Z

Feedback period will be open until 2019-05-27

Modular Filesystems C API

Status	Accepted
Author(s)	Mihai Maruseac (mihaimaruseac@google.com)
Sponsor	Gunhan Gulsoy (gunan@google.com)
Updated	2019-05-06

Summary

Define the API for the plugin-based implementation of filesystems under modular TensorFlow (as proposed in #77)

A major module of the modular TensorFlow involves all operations regarding files and the filesystem. Saving models, checkpointing, reading files as input to models (i.e. images, audio, etc.), and many other file manipulations are all tasks that TensorFlow has to support.

While users can use the underlying operating system’s API (such as calling fopen and friends on Posix environments, for example), it is definitely better to have a cross-platform, cross-environment API that can be used everywhere. TensorFlow provides such an API at the moment but it results in compiling code for all known filesystems in the final binary. We propose a new design where all filesystems are implemented as plugins and only the required ones are added at runtime. To do so, we will first need to design a C API for the filesystems and then integrate this with the rest of the modularization effort.

alextp

Thanks for sending the RFC!

Overall I think two things are confusing:

it's not clear what parts of the existing system need to be understood by users using the APIs proposed here, and you go into a lot of detail about the file APIs in tf. I think you should fork that documentation off into a separate doc
the key detail of how to handle versioning and upgrading is buried, and you show more than one implementation of the core structs. I think it'd be simpler if you only showed the intended final implementation of each struct (assuming you go with making the structs public, which I advise you to avoid and instead expose them only through functions to allocate/free and set/get the individual fields).

20190506-filesystem-plugin-modular-tensorflow.md

rfcs/20190506-filesystem-plugin-modular-tensorflow.md

The RFC files got moved to a new directory but text was still using the old path.

yongtang · 2019-05-07T15:40:38Z

rfcs/20190506-filesystem-plugin-modular-tensorflow.md

+```cpp
+// Operations on a TF_RandomAccessFile
+typedef struct TF_RandomAccessFileOps {
+  size_t (*const Read)(const TF_RandomAccessFile*, uint64, size_t, char*, TF_Status*);


@mihaimaruseac One very useful method for random access file could be Size or GetSize which returns the size of the file. At the moment, File Size is not visible from RandomAccessFile which is really inconvenient in many situations.

The reason being that many file format actually has the meta data at the end of the file (not at the beginning of the file). One good (and important) example is the zip file. Zip file's meta data is stored at the end of the file (central directory). So any parser to open a zip file will have to find the file size first before processing.

With a GetSize (or Size) API, RandomAccessFile could be greatly optimized.

Note technically it is possible to find the file size with Read to probe through until the EOF is reached, but this is really not efficient.

Of course, some file systems may not be able to provide a Size before hand (e.g., in stream fashion). I think by default GetSize could simply return NotImpemented. So any file size based optimization will only be "best effort".

For optimization, if a GetSize/Size API is exposed, then reading files could try GetSize first. If NotImplemented is returned, then a normal approach of probe through until the end is used. Otherwise the size of the file could be directly used.

Note: ZIP is the format mentioned but a lot of the files are actually zip files with [Content_Types].xml or META-INFO file inside.

For example, Windows office files (pptx, docx, etc) are all zip files. Java jar files are zip files. And python whl files are zip files as well.

You could just use unzip to decompress pptx, jar, whl actually and see each individual content files.

I agree with this request. I only included the API calls that are currently included in the existing classes but I guess we can expand these. I'm going to add a new section on proposed additions to the op tables.

Actually one of the fields of Read is the offset from where you want to read and the other is the number of bytes to read. So, if reading a zip file, you can go directly to the end of the file and read the size section from there.

Sure, we can add it as convenience function but we only identify the filesystem plugin based on the scheme part of the URI, not the extension (hdfs://path/to/file.zip and /path/to/file.zip will use different implementations but /path/to/file.zip and /path/to/another/file.txt will use the same ops)

@mihaimaruseac while it is possible to read the content with offset, the Read API does not give you the option to specify offset = -10 to read from 10 bytes before the end of the file EOF. In oder to achieve that, we will need to find out the length of the file (e.g., say 100), and subtract by 10 (100 - 10 = 90) to read from offset 90 (vs. -10).

The filesystem plugin will be consumed by a client calling it (a function inside tensorflow, or another plugin/customer ops outside of tensorflow repo).

For example, inside tensorflow one function might want to parse a file (not necessary zip file) with file name hdfs://path/to/special/file. Outside of tensorflow another plugin want to access Tensorflow's file system schema URI gcs://path/to/another/file.

Eventually the function (inside or outside tensorflow) will need to access Read (or proposed GetSize/GetFileSize) through the plugin.

If GetSize/GetFileSize is not exposed, then size is unknown to this function (function inside tensorflow, or another plugin using tensorflow). This function will have to probe through to find the EOF.

I wasn't clear enough, sorry. Let me try with an example.

Suppose we live in a world where there are only 3 types of files: *.start which have the size metadata at the beginning, *.end which have it at the end and *.none which have no size metadata.

Suppose we also have only two filesystems: A (a:///path/to/file) and B (b:///path/to/file). Let's assume that A allows rewinds (so you can read the file and then come back to its start) while B is only streaming (once you read some part of the file, you can't go back, unless you open the file again).

Suppose we have a piece of user code which takes as input a filepath and returns the size of the file, let's call this ExternalFileSize(const string& path).

There are 4 posibilities:

We keep the API as it is. In this case, ExternalFileSize(path) will call Env::Default()->NewRandomAccessFile(...) to get a new RandomAccessFile object and will have to call Read on it until the end increasing a size variable and then returning it. This, I agree with you, is not that good.

// This code is outside TensorFlow size_t ExternalFileSize(const string& path) { RandomAccessFile file; Env::Default()->NewRandomAccessFile(path, &file); return ReadFromStartToEndAndGetSize(&file); } // Extracting this as separate function as I refer to it below // This is still outside TensorFlow ReadFromStartToEndAndGetSize(std::unique_ptr<RandomAccessFile> file) { size_t size = 0; // ignore casts for size_t to uint64 and from Status to bool while (file->Read(size, amount_to_read, output, result)) { // assume no partial reads for now as it doesn't matter here size += amount_to_read; } return size; }

We add a GetSize method in the plugin API and in RandomAccessFile class. To implement ExternalFileSize(path) we now call NewRandomAccessFile and GetSize on the returned object. However, NewRandomAccessFile dispatches on the scheme, so both plugins need to handle *.start, *.none and *.end files inside the GetSize equivalent, resulting in code duplication as well as behavior that is not common across plugins. This is what I was mentioning above, in the second paragraph.

// This code is outside TensorFlow size_t ExternalFileSize(const string& path) { RandomAccessFile file; Env::Default()->NewRandomAccessFile(path, &file); return file->GetSize(); } // This code is in plugin (both A and B) size_t GetFileSize(const TF_RandomAccessFile *file) { const char *filename = file->filename; // SizeAtStart, SizeAtEnd, ReadFirstBlock, ReadLastBlock, ReadSizeFromMetadata, ReadFromStartToEndAndGetSize // are all functions implemented in each plugin, code duplication if (SizeAtStart(filename)) { char* metadata = ReadFirstBlock(file); return ReadSizeFromMetadata(metadata); } else if (SizeAtEnd(filename)) { char* metadata = ReadLastBlock(file); return ReadSizeFromMetadata(metadata); } else { // This calls the plugin function, not the external one! return ReadFromStartToEndAndGetSize(file); } }

We allow Read to receive a negative argument for the offset. In this case, ExternalFileSize(path) first looks at the path extension. If it is none it calls Read and increments a size variable until end of file, returning this size. If it is end, it calls Read with the proper negative offset, reads until EOF and returns the size from the metadata. If it is start, it reads from start, only the metadata and returns the size field. Since Read now is on core tensorflow, each plugin only needs to implement it with support for negative offsets. If negative offsets are not supported (i.e., streaming filesystems) then Read with negative offset fails and code falls back to the same case as none extension above.

// This code is outside TensorFlow size_t ExternalFileSize(const string& path) { RandomAccessFile file; Env::Default()->NewRandomAccessFile(path, &file); // SizeAtStart, SizeAtEnd, ReadSizeFromMetadata, ReadFromStartToEndAndGetSize // implemented only once in the external code if (SizeAtStart(filename)) { file->Read(0, amount_to_read, output, metadata); return ReadSizeFromMetadata(metadata); } else if (SizeAtEnd(filename)) { if (file->Read(-amount_to_read, amount_to_read, output, metadata)) { return ReadSizeFromMetadata(metadata); } else { // filesystem doesn't support negative offsets (filesystem B) // so, fallback to default return ReadFromStartToEndAndGetSize(file); } } else { return ReadFromStartToEndAndGetSize(file); } return file->GetSize(); } // This code is in plugin A size_t Read(const TF_RandomAccessFile *file, uint64 offset, ...) { if (offset < 0) { // position from end and read ... } else { // read directly ... } } // This code is in plugin B size_t Read(const TF_RandomAccessFile *file, uint64 offset, ...) { if (offset < 0) { // error as seek is not supported } else { // read directly ... } }

We change Read to only receive the number of bytes to read and add a Seek method to specify where the read should happen from. Seek will have an argument for the relative position (negative offset, positive offset) and another enum argument to specify if the position change should be from current position, start of file or end of file.

// This code is outside TensorFlow size_t ExternalFileSize(const string& path) { RandomAccessFile file; Env::Default()->NewRandomAccessFile(path, &file); // SizeAtStart, SizeAtEnd, ReadSizeFromMetadata, ReadFromStartToEndAndGetSize // implemented only once in the external code if (SizeAtStart(filename)) { file->Read(amount_to_read, output, metadata); return ReadSizeFromMetadata(metadata); } else if (SizeAtEnd(filename) && file->Seek(-amount_to_read, FILE_END)) { file->Read(amount_to_read, output, metadata); return ReadSizeFromMetadata(metadata); } else { return ReadFromStartToEndAndGetSize(file); } return file->GetSize(); }

This is similar to 3. above, but has a clearer API.

I would go with 4 or 3 instead of 2 as you propose.

@mihaimaruseac Thanks. I think 3 or 4 makes sense 👍 . For 3, maybe the offset will be int64 instead of uint64 (to allow negative numbers)?

It has to be int64, indeed

Option (4) would prevent concurrent reads from the RandomAccessFile (and require additional locking ...) - while this functionality is currently underused I expect this to change.

I would really like to see a GetSize() (2) method for RandomAccessFile - currently GetFileSize() is used as a substituted either before or after NewRandomAccessFile().
GetSize() could ensure on most file systems that both operations actually refer to the same file object in the presence of rename/delete file operations.

So that leaves option 3 to read the size after decompression.

Indeed, GetFileSize() returns the size of the file on disk. Probably we can also implement GetSize() which does the same thing, returning the size of the file at the moment it was read (and filesystems can decide to not implement it in which case we'll default to calling GetFileSize() from the filesystem object itself)

…nges

…classing

mihaimaruseac · 2019-05-07T17:18:28Z

Thank you @alextp for all the review. I incorporated all of your changes, either by inline editing, extracting to appendix or deferring to an open questions section (including the way I'm leaning to) at the end of the document.

r4nt · 2019-08-06T12:36:42Z

rfcs/20190506-filesystem-plugin-modular-tensorflow.md

+or the core TensorFlow. As such, here are a few requirements that we think are
+needed:
+
+1. Once a plugin is registered, the function pointer to plugin provided


Wording: either "to a plugin provided implementation" or "to plugin-provided implementations"?

r4nt · 2019-08-06T12:38:58Z

rfcs/20190506-filesystem-plugin-modular-tensorflow.md

+
+1. Furthermore, it is ideal to have compilation failures when trying to compile
+   plugins against an old API interface which will surely fail the version
+   metadata tests. For example, if a method's signature changes this will break


Generally, if you want ABI compatibility, you cannot change function signatures, right? You can only ever add new functions. This is what we see with a lot of ABI compat systems that create foo(), foo2(), foo3() etc.

Yes, that's exactly the case.

r4nt · 2019-08-06T12:40:03Z

rfcs/20190506-filesystem-plugin-modular-tensorflow.md

+   integration tests but will also be caught during compilation of plugin.
+   However, something that might not be caught at compile time is when an
+   optional method becomes required or a new method is added. We can maintain
+   _source compatibility_ by always increasing version numbers when a method


A different solution is having newer implementations always be optional in the sense that if foo3() is not implemented, you'll get an easy to detect error that you can propagate to the caller.

r4nt · 2019-08-06T12:45:30Z

rfcs/20190506-filesystem-plugin-modular-tensorflow.md

+
+## Existing implementation
+
+The implementation of a filesystem support in TensorFlow is hairy, due to the


Wording: s/a //

r4nt · 2019-08-06T12:47:06Z

rfcs/20190506-filesystem-plugin-modular-tensorflow.md

+1. Although this design is mostly about C++ implementation, we should ensure
+   Python code and other language bindings will be able to continue working with
+   the modular design.
+


Have you thought about debugging, and specifically performance debugging? Given that input pipelines are often a core part of the ML bottlenecks, how can we ensure that folks will understand how time is spent within a plugin?

r4nt · 2019-08-06T12:53:31Z

rfcs/20190506-filesystem-plugin-modular-tensorflow.md

+class PluginFilesystemIOTest : public ::testing::Test {
+ protected:
+  void SetUp() override {
+    filesystem_ = new TF_Filesystem;


In C++ code, use a unique_ptr instead?

r4nt · 2019-08-06T12:55:51Z

rfcs/20190506-filesystem-plugin-modular-tensorflow.md

+} TF_FilesystemOps;
+```
+
+We also provide some convenience calls that plugins can use to fill in these


What part of the code is "we" here? A common library folks will depend on, or is this going to be calling back into the main TF code? If so, the complexity of creating a .lib file for Windows to make this work is going to play into the decision.

"we" here is more like "we, the authors of the plugin interface".

I'll have to reconsider the Windows issue, thanks for pointing it out

r4nt · 2019-08-06T12:58:01Z

rfcs/20190506-filesystem-plugin-modular-tensorflow.md

+      : filename_(filename), file_(file), ops_(ops) {}
+  virtual ~ModularRandomAccessFile() {
+    ops_->Cleanup(const_cast<TF_RandomAccessFile*>(file_));
+    delete file_;


Why does this take ownership of file? Generally that seems like an anti-pattern; if you have a reason, I'd call that out in the comments.

r4nt · 2019-08-06T12:59:02Z

rfcs/20190506-filesystem-plugin-modular-tensorflow.md

+};
+
+Status ModularRandomAccessFile::Read(uint64 offset, size_t n, StringPiece* result, char* scratch) const {
+  TF_Status* plugin_status = TF_NewStatus();


If TF_Status is just a struct, why not put it on the stack?

I guess it can also go on the stack. Will have to see how it is used elsewhere and also will have to learn modern C++ :)

Actually, it cannot go on the stack as it needs to be private to plugins for ABI compatibility reasons.

r4nt · 2019-08-06T13:05:05Z

rfcs/20190506-filesystem-plugin-modular-tensorflow.md

+1. **Should we pass plugin structs by value during registration?** Doing so
+   makes ownership clear and prevents memory allocation bugs. Passing them by
+   value results in a copy but we need a copy for security reasons anyway. We're
+   leaning towards **Yes**.


Is this about the structs that just wrap the void*? In that case, compilers will handle exactly like they would a void* (so no extra copy).

This is about the structs containing the function pointers.

mihaimaruseac · 2019-08-06T14:12:49Z

Thanks @r4nt for the extra review and suggestions. Will consider these too while implementing.

ziliangpeng · 2020-01-21T03:35:19Z

Is there a place to track the implementation status of this RFC?

byronyi · 2020-01-21T04:09:20Z

Is there a place to track the implementation status of this RFC?

https://github.com/tensorflow/tensorflow/search?q=modular+filesystem&type=Commits

mihaimaruseac · 2020-01-21T15:53:27Z

For now all the implementation will reside in tensorflow/c/experimental/filesystem.

POSIX support should be completed. I'm working on windows now then I will update this and send an email with instructions on how to test the experimental filesystems instead of the non-modular ones. At that time, we will start working on converting all other filesystems and then we will switch TF to use the modular ones (by moving the code to the proper locations)

googlebot added the cla: yes label May 6, 2019

mihaimaruseac changed the title ~~Propose an RFC for filesystem plugins under modular TensorFlow~~ Filesystem plugins under modular TensorFlow May 6, 2019

Propose an RFC for filesystem plugins under modular TensorFlow

d98208c

mihaimaruseac requested review from ewilderj, goldiegadde and martinwicke as code owners May 6, 2019 17:52

alextp reviewed May 6, 2019

View reviewed changes

ewilderj changed the title ~~Filesystem plugins under modular TensorFlow~~ RFC: Filesystem plugins under modular TensorFlow May 6, 2019

ewilderj changed the title ~~RFC: Filesystem plugins under modular TensorFlow~~ RFC: Modular Filesystems C API May 6, 2019

ewilderj added the RFC: Proposed RFC Design Document label May 6, 2019

ewilderj reviewed May 6, 2019

View reviewed changes

rfcs/20190506-filesystem-plugin-modular-tensorflow.md Outdated Show resolved Hide resolved

ewilderj reviewed May 6, 2019

View reviewed changes

rfcs/20190506-filesystem-plugin-modular-tensorflow.md Outdated Show resolved Hide resolved

mihaimaruseac and others added 2 commits May 6, 2019 13:38

Update path to image files

ecb0a14

The RFC files got moved to a new directory but text was still using the old path.

Add mention of cloud environments

8976233

yongtang reviewed May 7, 2019

View reviewed changes

mihaimaruseac added 8 commits May 7, 2019 08:47

Expand compatibility defintion/requirements

93b3eae

Better time based qualifiers

71e48d8

Specify that scheme uniquely identifies filesystem implementation

af8125b

Separate higher level APIs into appendix

e920d7f

Specify that new method will call the old one when only signature cha…

0ee2988

…nges

Specify that versioning information is ellided when presenting API

8418416

Env class can be just extended with a new lookup layer instead of sub…

0e08da2

…classing

Add questions, cost and leaning

32ec6e1

mihaimaruseac added 4 commits May 7, 2019 10:26

Fix appendix link

b590412

Fix spelling of elided

96b3366

Remove duplicated character typo

5e009ca

Remove needless capitalization

b0886bc

ewilderj approved these changes Jun 7, 2019

View reviewed changes

ewilderj merged commit f8212cf into tensorflow:master Jun 7, 2019

ewilderj added RFC: Accepted RFC Design Document: Accepted by Review and removed RFC: Proposed RFC Design Document labels Jun 7, 2019

mihaimaruseac deleted the filesystem-rfc branch June 9, 2019 22:27

mihaimaruseac mentioned this pull request Aug 3, 2019

RFC: Modular TensorFlow #77

Merged

yongtang mentioned this pull request Aug 4, 2019

Add list_feather_columns function in eager mode tensorflow/io#404

Merged

r4nt reviewed Aug 6, 2019

View reviewed changes

This was referenced Oct 3, 2019

tf.io.gfile.mkdir does not throw exception if path exists tensorflow/tensorflow#32972

Closed

tf.io.gfile.GFile does not work with Python zipfile tensorflow/tensorflow#32975

Closed

No stat cache for s3 file system tensorflow/tensorflow#33057

Closed

mihaimaruseac mentioned this pull request Nov 25, 2019

tf.io.gfile.glob does not list all files in a Google Cloud Storage bucket tensorflow/tensorflow#34371

Open

mihaimaruseac mentioned this pull request Dec 18, 2019

GetTempFilename is not implemented tensorflow/tensorflow#24974

Closed

mihaimaruseac mentioned this pull request Jan 23, 2020

files_io.get_matching_files fails for valid filenames that contain globs tensorflow/tensorflow#35489

Closed

mihaimaruseac mentioned this pull request Mar 14, 2020

image.rotate Segfault with TensorFlow 2.2.0rc0 and tf-nightly tensorflow/addons#1298

Closed

gabrieldemarmiesse mentioned this pull request Mar 16, 2020

Make a better UX around the issue of tensorflow ABI instability. tensorflow/addons#1317

Closed

mihaimaruseac mentioned this pull request May 20, 2020

Implement __reduce__ on tf.keras.Model to copy.deepcopy and pickle tensorflow/tensorflow#39609

Closed

mihaimaruseac mentioned this pull request Jun 11, 2020

File system scheme 's3' not implemented tensorflow/tensorflow#40302

Closed

mihaimaruseac mentioned this pull request Jun 22, 2020

Add arm64 third-party CI tensorflow/tensorflow#40463

Closed

mihaimaruseac mentioned this pull request Jul 14, 2020

tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme 'gs' not implemented (file: 'gs://tfds-data/datasets/mnist') tensorflow/tensorflow#38477

Closed

vaibhavsingh007 mentioned this pull request Jul 16, 2020

TF to s3a:// access needed tensorflow/tensorflow#41459

Closed

yongtang mentioned this pull request Sep 10, 2020

Convert file system implementations to use C file system API in tensorflow tensorflow/io#1111

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Modular Filesystems C API #101

RFC: Modular Filesystems C API #101

mihaimaruseac commented May 6, 2019 •

edited

Loading

alextp left a comment

yongtang May 7, 2019

mihaimaruseac May 7, 2019

mihaimaruseac May 7, 2019 •

edited

Loading

yongtang May 7, 2019

mihaimaruseac May 7, 2019 •

edited

Loading

yongtang May 7, 2019

mihaimaruseac May 7, 2019

suphoff May 9, 2019

mihaimaruseac May 9, 2019

mihaimaruseac commented May 7, 2019

r4nt Aug 6, 2019

r4nt Aug 6, 2019

mihaimaruseac Aug 6, 2019

r4nt Aug 6, 2019

r4nt Aug 6, 2019

r4nt Aug 6, 2019

r4nt Aug 6, 2019

r4nt Aug 6, 2019

mihaimaruseac Aug 6, 2019

r4nt Aug 6, 2019

r4nt Aug 6, 2019

mihaimaruseac Aug 6, 2019

mihaimaruseac Aug 6, 2019

r4nt Aug 6, 2019

mihaimaruseac Aug 6, 2019

mihaimaruseac commented Aug 6, 2019

ziliangpeng commented Jan 21, 2020

byronyi commented Jan 21, 2020

mihaimaruseac commented Jan 21, 2020


		## Existing implementation

		The implementation of a filesystem support in TensorFlow is hairy, due to the

RFC: Modular Filesystems C API #101

RFC: Modular Filesystems C API #101

Conversation

mihaimaruseac commented May 6, 2019 • edited Loading

Modular Filesystems C API

Summary

alextp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mihaimaruseac May 7, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mihaimaruseac May 7, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mihaimaruseac commented May 7, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mihaimaruseac commented Aug 6, 2019

ziliangpeng commented Jan 21, 2020

byronyi commented Jan 21, 2020

mihaimaruseac commented Jan 21, 2020

mihaimaruseac commented May 6, 2019 •

edited

Loading

mihaimaruseac May 7, 2019 •

edited

Loading

mihaimaruseac May 7, 2019 •

edited

Loading