-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace message Tree with a topologically sorted varint delimited stream of Directory messages #229
Comments
It looks like I've been able to solve (3) and (4) on the Buildbarn side without making any protocol changes, as it turns out that Protobuf's wire format has some interesting properties:
So if you look at an REv2 Tree's binary representation, it's typically little more than this:
Basically the same as what I proposed above, except for the presence of field tags and a lack of topological sorting. Anyway, I've successfully managed to patch up Buildbarn to deal with Tree objects in a streaming manner as follows:
With these changes in place, I'm capable of efficiently working with Tree objects that are hundreds of megabytes in size. I'm still going to leave this issue open, as issues (1) and (2) remain. Furthermore, I'm not too happy about maintaining my own Protobuf message parser. My solution would break as soon as someone adds a simple integer field to the Tree message, as my parser is only capable of processing 'bytes' fields. |
I'm currently trying to improve the performance of handling of large output directories (Tree messages), having sizes in the order of hundreds of megabytes. In the process, I have realised that there is a lot of value in enforcing that the Directory messages contained in them are topologically sorted. Two practical use cases: - When instantiating the contents of a Tree on a local file system, having the Tree be topologically sorted allows you to immediately create files and directories in the right place. - When needing to resolve the properties of a single file by path, a topologically sorted Tree permits resolution by doing a simple forward scan. Especially when other features like compression are taken into account, it's useful if Tree messages can be processed in a streaming manner. One practical issue is that most Protobuf libraries don't offer APIs for processing messages in a streaming manner. This means that implementors who want to achive these optimisations will need to write their own message parsers; at least for the Tree tree itself. To make this as painless as possible, we also require that the Tree is stored in some normal form. Fixes: bazelbuild#229
I'm currently trying to improve the performance of handling of large output directories (Tree messages), having sizes in the order of hundreds of megabytes. In the process, I have realised that there is a lot of value in enforcing that the Directory messages contained in them are topologically sorted. Two practical use cases: - When instantiating the contents of a Tree on a local file system, having the Tree be topologically sorted allows you to immediately create files and directories in the right place. - When needing to resolve the properties of a single file by path, a topologically sorted Tree permits resolution by doing a simple forward scan. Especially when other features like compression are taken into account, it's useful if Tree messages can be processed in a streaming manner. One practical issue is that most Protobuf libraries don't offer APIs for processing messages in a streaming manner. This means that implementors who want to achive these optimisations will need to write their own message parsers; at least for the Tree tree itself. To make this as painless as possible, we also require that the Tree is stored in some normal form. Fixes: bazelbuild#229
I'm currently trying to improve the performance of handling of large output directories (Tree messages), having sizes in the order of hundreds of megabytes. In the process, I have realised that there is a lot of value in enforcing that the Directory messages contained in them are topologically sorted. Two practical use cases: - When instantiating the contents of a Tree on a local file system, having the Tree be topologically sorted allows you to immediately create files and directories in the right place. - When needing to resolve the properties of a single file by path, a topologically sorted Tree permits resolution by doing a simple forward scan. Especially when other features like compression are taken into account, it's useful if Tree messages can be processed in a streaming manner. One practical issue is that most Protobuf libraries don't offer APIs for processing messages in a streaming manner. This means that implementors who want to achive these optimisations will need to write their own message parsers; at least for the Tree tree itself. To make this as painless as possible, we also require that the Tree is stored in some normal form. Fixes: bazelbuild#229
I'm currently trying to improve the performance of handling of large output directories (Tree messages), having sizes in the order of hundreds of megabytes. In the process, I have realised that there is a lot of value in enforcing that the Directory messages contained in them are topologically sorted. Two practical use cases: - When instantiating the contents of a Tree on a local file system, having the Tree be topologically sorted allows you to immediately create files and directories in the right place. - When needing to resolve the properties of a single file by path, a topologically sorted Tree permits resolution by doing a simple forward scan. Especially when other features like compression are taken into account, it's useful if Tree messages can be processed in a streaming manner. One practical issue is that most Protobuf libraries don't offer APIs for processing messages in a streaming manner. This means that implementors who want to achive these optimisations will need to write their own message parsers; at least for the Tree tree itself. To make this as painless as possible, we also require that the Tree is stored in some normal form. Fixes: bazelbuild#229
I'm currently trying to improve the performance of handling of large output directories (Tree messages), having sizes in the order of hundreds of megabytes. In the process, I have realised that there is a lot of value in enforcing that the Directory messages contained in them are topologically sorted. Two practical use cases: - When instantiating the contents of a Tree on a local file system, having the Tree be topologically sorted allows you to immediately create files and directories in the right place. - When needing to resolve the properties of a single file by path, a topologically sorted Tree permits resolution by doing a simple forward scan. Especially when other features like compression are taken into account, it's useful if Tree messages can be processed in a streaming manner. One practical issue is that most Protobuf libraries don't offer APIs for processing messages in a streaming manner. This means that implementors who want to achive these optimisations will need to write their own message parsers; at least for the Tree tree itself. To make this as painless as possible, we also require that the Tree is stored in some normal form. Fixes: bazelbuild#229
I'm currently trying to improve the performance of handling of large output directories (Tree messages), having sizes in the order of hundreds of megabytes. In the process, I have realised that there is a lot of value in enforcing that the Directory messages contained in them are topologically sorted. Two practical use cases: - When instantiating the contents of a Tree on a local file system, having the Tree be topologically sorted allows you to immediately create files and directories in the right place. - When needing to resolve the properties of a single file by path, a topologically sorted Tree permits resolution by doing a simple forward scan. Especially when other features like compression are taken into account, it's useful if Tree messages can be processed in a streaming manner. One practical issue is that most Protobuf libraries don't offer APIs for processing messages in a streaming manner. This means that implementors who want to achive these optimisations will need to write their own message parsers; at least for the Tree tree itself. To make this as painless as possible, we also require that the Tree is stored in some normal form. Fixes: bazelbuild#229
I'm currently trying to improve the performance of handling of large output directories (Tree messages), having sizes in the order of hundreds of megabytes. In the process, I have realised that there is a lot of value in enforcing that the Directory messages contained in them are topologically sorted. Two practical use cases: - When instantiating the contents of a Tree on a local file system, having the Tree be topologically sorted allows you to immediately create files and directories in the right place. - When needing to resolve the properties of a single file by path, a topologically sorted Tree permits resolution by doing a simple forward scan. Especially when other features like compression are taken into account, it's useful if Tree messages can be processed in a streaming manner. One practical issue is that most Protobuf libraries don't offer APIs for processing messages in a streaming manner. This means that implementors who want to achive these optimisations will need to write their own message parsers; at least for the Tree tree itself. To make this as painless as possible, we also require that the Tree is stored in some normal form. Fixes: bazelbuild#229
I'm currently trying to improve the performance of handling of large output directories (Tree messages), having sizes in the order of hundreds of megabytes. In the process, I have realised that there is a lot of value in enforcing that the Directory messages contained in them are topologically sorted. Two practical use cases: - When instantiating the contents of a Tree on a local file system, having the Tree be topologically sorted allows you to immediately create files and directories in the right place. - When needing to resolve the properties of a single file by path, a topologically sorted Tree permits resolution by doing a simple forward scan. Especially when other features like compression are taken into account, it's useful if Tree messages can be processed in a streaming manner. One practical issue is that most Protobuf libraries don't offer APIs for processing messages in a streaming manner. This means that implementors who want to achive these optimisations will need to write their own message parsers; at least for the Tree tree itself. To make this as painless as possible, we also require that the Tree is stored in some normal form. Fixes: bazelbuild#229
remote-apis PR 230 added a way where producers of Tree messages can indicate that the directories contained within are stored in topological order. The advantage of using such an ordering is that it permits instantiation of such objects onto a local file system in a streaming fashion. The same holds for lookups of individual paths. Even though Bazel currently does not gain from this, this change at least modifies Bazel's REv2 client to emit topologically sorted trees. This makes it possible for tools such as Buildbarn's bb-browser to process them more efficiently. More details: - bazelbuild/remote-apis#229 - bazelbuild/remote-apis#230
* Regenerate the Go source code for the Remote Execution protocol * Add a hint for indicating that a Tree is topologically sorted I'm currently trying to improve the performance of handling of large output directories (Tree messages), having sizes in the order of hundreds of megabytes. In the process, I have realised that there is a lot of value in enforcing that the Directory messages contained in them are topologically sorted. Two practical use cases: - When instantiating the contents of a Tree on a local file system, having the Tree be topologically sorted allows you to immediately create files and directories in the right place. - When needing to resolve the properties of a single file by path, a topologically sorted Tree permits resolution by doing a simple forward scan. Especially when other features like compression are taken into account, it's useful if Tree messages can be processed in a streaming manner. One practical issue is that most Protobuf libraries don't offer APIs for processing messages in a streaming manner. This means that implementors who want to achive these optimisations will need to write their own message parsers; at least for the Tree tree itself. To make this as painless as possible, we also require that the Tree is stored in some normal form. Fixes: #229
remote-apis PR 230 added a way where producers of Tree messages can indicate that the directories contained within are stored in topological order. The advantage of using such an ordering is that it permits instantiation of such objects onto a local file system in a streaming fashion. The same holds for lookups of individual paths. Even though Bazel currently does not gain from this, this change at least modifies Bazel's REv2 client to emit topologically sorted trees. This makes it possible for tools such as Buildbarn's bb-browser to process them more efficiently. More details: - bazelbuild/remote-apis#229 - bazelbuild/remote-apis#230
remote-apis PR 230 added a way where producers of Tree messages can indicate that the directories contained within are stored in topological order. The advantage of using such an ordering is that it permits instantiation of such objects onto a local file system in a streaming fashion. The same holds for lookups of individual paths. Even though Bazel currently does not gain from this, this change at least modifies Bazel's REv2 client to emit topologically sorted trees. This makes it possible for tools such as Buildbarn's bb-browser to process them more efficiently. More details: - bazelbuild/remote-apis#229 - bazelbuild/remote-apis#230
remote-apis PR 230 added a way where producers of Tree messages can indicate that the directories contained within are stored in topological order. The advantage of using such an ordering is that it permits instantiation of such objects onto a local file system in a streaming fashion. The same holds for lookups of individual paths. Even though Bazel currently does not gain from this, this change at least modifies Bazel's REv2 client to emit topologically sorted trees. This makes it possible for tools such as Buildbarn's bb-browser to process them more efficiently. More details: - bazelbuild/remote-apis#229 - bazelbuild/remote-apis#230 Partial commit for third_party/*, see #16463. Signed-off-by: Sunil Gowroji <sgowroji@google.com>
remote-apis PR 230 added a way where producers of Tree messages can indicate that the directories contained within are stored in topological order. The advantage of using such an ordering is that it permits instantiation of such objects onto a local file system in a streaming fashion. The same holds for lookups of individual paths. Even though Bazel currently does not gain from this, this change at least modifies Bazel's REv2 client to emit topologically sorted trees. This makes it possible for tools such as Buildbarn's bb-browser to process them more efficiently. More details: - bazelbuild/remote-apis#229 - bazelbuild/remote-apis#230 Closes #16463. PiperOrigin-RevId: 487196375 Change-Id: Iafcfd617fc101fec7bfa943552113ce57ab8041b
remote-apis PR 230 added a way where producers of Tree messages can indicate that the directories contained within are stored in topological order. The advantage of using such an ordering is that it permits instantiation of such objects onto a local file system in a streaming fashion. The same holds for lookups of individual paths. Even though Bazel currently does not gain from this, this change at least modifies Bazel's REv2 client to emit topologically sorted trees. This makes it possible for tools such as Buildbarn's bb-browser to process them more efficiently. More details: - bazelbuild/remote-apis#229 - bazelbuild/remote-apis#230 Closes #16463. PiperOrigin-RevId: 487196375 Change-Id: Iafcfd617fc101fec7bfa943552113ce57ab8041b
remote-apis PR 230 added a way where producers of Tree messages can indicate that the directories contained within are stored in topological order. The advantage of using such an ordering is that it permits instantiation of such objects onto a local file system in a streaming fashion. The same holds for lookups of individual paths. Even though Bazel currently does not gain from this, this change at least modifies Bazel's REv2 client to emit topologically sorted trees. This makes it possible for tools such as Buildbarn's bb-browser to process them more efficiently. More details: - bazelbuild/remote-apis#229 - bazelbuild/remote-apis#230 Partial commit for third_party/*, see #16463. Signed-off-by: Sunil Gowroji <sgowroji@google.com>
* Emit Tree objects in topological order remote-apis PR 230 added a way where producers of Tree messages can indicate that the directories contained within are stored in topological order. The advantage of using such an ordering is that it permits instantiation of such objects onto a local file system in a streaming fashion. The same holds for lookups of individual paths. Even though Bazel currently does not gain from this, this change at least modifies Bazel's REv2 client to emit topologically sorted trees. This makes it possible for tools such as Buildbarn's bb-browser to process them more efficiently. More details: - bazelbuild/remote-apis#229 - bazelbuild/remote-apis#230 Closes #16463. PiperOrigin-RevId: 487196375 Change-Id: Iafcfd617fc101fec7bfa943552113ce57ab8041b * Emit Tree objects in topological order remote-apis PR 230 added a way where producers of Tree messages can indicate that the directories contained within are stored in topological order. The advantage of using such an ordering is that it permits instantiation of such objects onto a local file system in a streaming fashion. The same holds for lookups of individual paths. Even though Bazel currently does not gain from this, this change at least modifies Bazel's REv2 client to emit topologically sorted trees. This makes it possible for tools such as Buildbarn's bb-browser to process them more efficiently. More details: - bazelbuild/remote-apis#229 - bazelbuild/remote-apis#230 Partial commit for third_party/*, see #16463. Signed-off-by: Sunil Gowroji <sgowroji@google.com> Signed-off-by: Sunil Gowroji <sgowroji@google.com> Co-authored-by: Ed Schouten <eschouten@apple.com>
I know that on numerous occasions we've had discussions within the working group on whether the Tree vs. Directory dichotomy is truly necessary, and whether there are ways to eliminate it. Let me make it clear up front that this issue is filed under the assumption that we do want to keep these separate like we have today, which is why I'm not posting this under #159 or #170. It's just low hanging fruit.
The last couple of days I've worked on optimising handling of large Tree objects in bb-clientd. Whereas some of the other Protobuf messages can sometimes become hundreds of kilobytes in size (e.g., Command), it's not impossible for Trees to become dozens of megabytes. While working on optimising this area, I realised that my life would have been a lot easier if Trees were stored as a varint delimited stream of Directory messages (just like Bazel
--build_event_binary_file
). My suggestion would be to make it topologically sorted (parents before children), so that the root of the Tree comes first. Some advantages:The text was updated successfully, but these errors were encountered: