Skip to content

Commit

Permalink
DataLoaders 8: docs, guide, etc (#4567)
Browse files Browse the repository at this point in the history
Some docs for the whole thing.

It heavily relies on linking to the examples for exact details and to
avoid rot.

It's probably extremely bad. Help welcome.


---

Part of a series of PRs to make it possible to load _any_ file from the
local filesystem, by any means, on web and native:
- #4516
- #4517 
- #4518 
- #4519 
- #4520 
- #4521 
- #4565
- #4566
- #4567

Co-authored-by: Nikolaus West <niko@rerun.io>
  • Loading branch information
teh-cmc and nikolausWest authored Dec 22, 2023
1 parent 803a1cd commit d72a4c9
Show file tree
Hide file tree
Showing 8 changed files with 219 additions and 126 deletions.
5 changes: 4 additions & 1 deletion crates/re_data_source/src/data_loader/loader_external.rs
Original file line number Diff line number Diff line change
Expand Up @@ -62,11 +62,14 @@ pub fn iter_external_loaders() -> impl ExactSizeIterator<Item = std::path::PathB
// ---

/// A [`crate::DataLoader`] that forwards the path to load to all executables present in
/// the user's `PATH` with a name that starts with `EXTERNAL_DATA_LOADER_PREFIX`.
/// the user's `PATH` with a name that starts with [`EXTERNAL_DATA_LOADER_PREFIX`].
///
/// The external loaders are expected to log rrd data to their standard output.
///
/// Refer to our `external_data_loader` example for more information.
///
/// Checkout our [guide](https://www.rerun.io/docs/howto/open-any-file?speculative-link) on
/// how to implement external loaders.
pub struct ExternalLoader;

impl crate::DataLoader for ExternalLoader {
Expand Down
19 changes: 13 additions & 6 deletions crates/re_data_source/src/data_loader/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -33,14 +33,21 @@ use re_log_types::{ArrowMsg, DataRow, LogMsg};
///
/// ## Registering custom loaders
///
/// TODO(cmc): web guide in upcoming PR
/// Checkout our [guide](https://www.rerun.io/docs/howto/open-any-file?speculative-link).
///
/// ## Execution
///
/// **All** registered [`DataLoader`]s get called when a user tries to open a file, unconditionally.
/// **All** known [`DataLoader`]s get called when a user tries to open a file, unconditionally.
/// This gives [`DataLoader`]s maximum flexibility to decide what files they are interested in, as
/// opposed to e.g. only being able to look at files' extensions.
///
/// If a [`DataLoader`] has no interest in the given file, it should fail as soon as possible
/// with a [`DataLoaderError::Incompatible`] error.
///
/// Iff all [`DataLoader`]s (including custom and external ones) return with a [`DataLoaderError::Incompatible`]
/// error, the Viewer will show an error message to the user indicating that the file type is not
/// supported.
///
/// On native, [`DataLoader`]s are executed in parallel.
///
/// [Rerun files]: crate::SUPPORTED_RERUN_EXTENSIONS
Expand Down Expand Up @@ -78,8 +85,8 @@ pub trait DataLoader: Send + Sync {
/// possible (e.g. didn't even manage to open the file).
/// Otherwise, they should log errors that happen in an asynchronous context.
///
/// If a [`DataLoader`] has no interest in the given file, it should successfully return
/// without pushing any data into `tx`.
/// If a [`DataLoader`] has no interest in the given file, it should fail as soon as possible
/// with a [`DataLoaderError::Incompatible`] error.
#[cfg(not(target_arch = "wasm32"))]
fn load_from_path(
&self,
Expand Down Expand Up @@ -111,8 +118,8 @@ pub trait DataLoader: Send + Sync {
/// possible (e.g. didn't even manage to open the file).
/// Otherwise, they should log errors that happen in an asynchronous context.
///
/// If a [`DataLoader`] has no interest in the given file, it should successfully return
/// without pushing any data into `tx`.
/// If a [`DataLoader`] has no interest in the given file, it should fail as soon as possible
/// with a [`DataLoaderError::Incompatible`] error.
fn load_from_file_contents(
&self,
store_id: re_log_types::StoreId,
Expand Down
3 changes: 2 additions & 1 deletion docs/content/getting-started/installing-viewer.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Installing the Rerun Viewer
order: 0
order: -1
---

The [Rerun Viewer](../reference/viewer/overview.md) can be installed independent of the SDK language you're using.
Expand All @@ -20,6 +20,7 @@ In any case you should be able to run `rerun` afterwards to start the Viewer.
You'll be welcomed by an overview page that allows you to jump into some examples.
If you're facing any difficulties, don't hesitate to [open an issue](https://github.com/rerun-io/rerun/issues/new/choose) or [join the Discord server](https://discord.gg/PXtCgFBSmH).

The Rerun Viewer has built-in support for opening many kinds of files, and can be [extended to open any other file type](../howto/open-any-file.md) without needing to modify the Rerun codebase itself.

To start getting your own data logged & visualized in the viewer check one of the respective getting started guides:
* [Python](python.md)
Expand Down
62 changes: 62 additions & 0 deletions docs/content/howto/open-any-file.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
---
title: Open any file
order: -10
---

The Rerun Viewer has built-in support for opening many kinds of files, and can be extended to open any other file type without needing to modify the Rerun codebase itself.

The viewer can load files in 3 different ways:
- via CLI arguments (e.g. `rerun myfile.jpeg`),
- using drag-and-drop,
- using the open dialog in the Rerun Viewer.

All these file loading methods support loading a single file, many files at once (e.g. `rerun myfiles/*`), or even folders.

⚠ Drag-and-drop of folders does [not yet work](https://github.com/rerun-io/rerun/issues/4528) on the web version of the Rerun Viewer ⚠

The following file types have built-in support in the Rerun Viewer:
- Native Rerun files: `rrd`
- 3D models: `gltf`, `glb`, `obj`
- Images: `avif`, `bmp`, `dds`, `exr`, `farbfeld`, `ff`, `gif`, `hdr`, `ico`, `jpeg`, `jpg`, `pam`, `pbm`, `pgm`, `png`, `ppm`, `tga`, `tif`, `tiff`, `webp`.
- Point clouds: `ply`.
- Text files: `md`, `txt`.

With the exception of `rrd` files that can be streamed from an HTTP URL (e.g. `rerun https://demo.rerun.io/version/latest/examples/dna/data.rrd`), we only support loading files from the local filesystem for now, with [plans to make this generic over any URI and protocol in the future](https://github.com/rerun-io/rerun/issues/4525).

## Adding support for arbitrary filetypes

Internally, the [`DataLoader`](https://docs.rs/re_data_source/latest/re_data_source/trait.DataLoader.html?speculative-link) trait takes care of loading files into the Viewer.

There are 3 broad kinds of `DataLoader`s: _builtin_, _external_ and _custom_.
_External_ and _custom_ are the two ways of extending the file loading system that we'll describe below.

When a user attempts to open a file in the Viewer, **all** known `DataLoader`s are notified of the path to be opened, unconditionally.
This gives `DataLoader`s maximum flexibility to decide what files they are interested in, as opposed to e.g. only being able to look at a file's extension.

Once notified, a `DataLoader` can return a [`DataLoaderError::Incompatible`](https://docs.rs/re_data_source/latest/re_data_source/enum.DataLoaderError.html?speculative-link#variant.Incompatible) error to indicate that it doesn't support a given file type.
If, and only if, all loaders known to the Viewer return an `Incompatible` error code, then an error message is shown to the user indicating that this file type is not (_yet_) supported.

In these instances of unsupported files, we expose two ways of implementing and registering your `DataLoader`s, explained below.

### External data-loaders

The easiest way to create your own `DataLoader` is by implementing what we call an "external loader": a stand alone executable written in any language that the Rerun SDK ships for. Any executable on your `$PATH` with a name that starts with `rerun-loader-` will be treated as a `DataLoader`.

This executable takes a file path as input on `stdin` and outputs Rerun logs on `stdout`.
It will be called by the Rerun Viewer when the user opens a file, and be passed the path to that file.
From there, it can log data as usual, using the [`stdout` logging sink](../reference/sdk-operating-modes?speculative-link#standard-inputoutput).

The Rerun Viewer will then automatically load the data streamed to the external loader's standard output.

Like any other `DataLoader`, an external loader will be notified of all file openings, unconditionally.
To indicate that it does not support a given file, the loader has to exit with a [dedicated status code](https://docs.rs/rerun/latest/rerun/constant.EXTERNAL_DATA_LOADER_INCOMPATIBLE_EXIT_CODE.html?speculative-link).

Check out our examples for [C++](https://github.com/rerun-io/rerun/tree/main/examples/cpp/external_data_loader), [Python](https://github.com/rerun-io/rerun/tree/main/examples/python/external_data_loader) and [Rust](https://github.com/rerun-io/rerun/tree/main/examples/rust/external_data_loader) that cover every steps in details.

### Custom data-loaders

Another Rust-specific approach is to implement the `DataLoader` trait yourself and register it in the Rerun Viewer.

To do so, you'll need to import `rerun` as a library, register your `DataLoader` and then start the viewer from code.

Check out our [example](https://github.com/rerun-io/rerun/tree/main/examples/rust/custom_data_loader) that cover all these steps in details.
Loading

0 comments on commit d72a4c9

Please sign in to comment.