Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mid-Level File API Proposal #50

Closed
rylev opened this issue Mar 25, 2019 · 21 comments
Closed

Mid-Level File API Proposal #50

rylev opened this issue Mar 25, 2019 · 21 comments

Comments

@rylev
Copy link
Collaborator

rylev commented Mar 25, 2019

Addresses #47 in part by proposing a mid-level API. The possibility for a higher level API is left open.

File API

The mid-level file API aims to implement File I/O on top of the raw JS apis found here: https://w3c.github.io/FileAPI/ which includes listing files, creating files, and reading files.

General Differences

In general this API will mostly differ from the web_sys interface in the following ways:

No js_sys Types

js_sys tends to be fairly "stringly" typed, so where we have known enums, the crate will provide bespoke types.

This is also true for errors which are usually exposed as JsValues. In this crate they will be exposed as well-formed Rust types.

Callbacks

All callbacks will be of the appropriate closure type (e.g., FnOnce).

The API

FileList

JS provides a basic collection type over files (usually from an tag). In general FileList will be an opaque struct that offers a rich interface to the collection through an Iterator implementation.

Blob

Blob is a "file like" interface. The crate would expose this as a Rust interface from which the specific types File and RawData inherit. These types would be roughly 1 to 1 translations of the File and Blob objects just with less "stringly" typed interfaces. For example, RawData would accept a known list of mime-types

FileReader

The FileReader API is a bit wonky and can easily be used in the incorrect way. For instance, if the user calls readAsArrayBuffer or other "read" APIs before the data is loaded, the operation won't return data. The crate will expose an API that more properly models the "read" state machine.

This will be done through a futures API which takes the user through the state machine:

let file_reader = FileReader::new();
file_reader.read_as_text(&blob).map(move |contents| {
  element.set_inner_html(contents);
})

The various "read" methods will take ownership of self so that users can't try to read multiple files at once. These methods return an InProgressRead struct that is a future and also contains an abort method for aborting the upload.

The events "error", "load", and "loadend" are all representable through the future and thus aren't exposed on the file reader. The events "progress", "loadstart", and "abort" are still exposed as methods that take the appropriate Rust closure.

FileReaderSync

This API would mirror the future based API just in a different namespace.

Drawbacks, Rationale, and Alternatives

FileReader exposes a very different API from its JS counterpart. We could do a more 1 to 1 translation of the API and leave the improved API for a higher-level crate. This would make this crate only a very marginal improvement on what is already exposed in js_sys.

Unresolved Questions

Is there an ergonomic way to expose FileReader "progress", "loadstart", and "abort" events as part of the future API?
Is there a better way to model the relationship between Blob and File? Possibilities include an enum and a File type which contains a Blob type.

@fitzgen
Copy link
Member

fitzgen commented Mar 26, 2019

So FileList will be something like this?

pub struct FileList { ... }

impl FileList {
    // Returns `None` if `input` does not have `type="file"`?
    pub fn new(input: &web_sys::InputElement) -> Option<Self> { ... }
}

impl Iterator for FileList {
    type Item = gloo_file::Blob;
    fn next(&mut self) -> Option<Self::Item> { ... }
}

This makes sense to me.

Does it make sense to also give it a method to get the ith file? I think so, but it is also an addition that can be made in a semver compatible way.


Blob is a "file like" interface. The crate would expose this as a Rust interface from which the specific types File and RawData inherit. These types would be roughly 1 to 1 translations of the File and Blob objects just with less "stringly" typed interfaces. For example, RawData would accept a known list of mime-types.

So gloo_file::File corresponds to the Web's File, and gloo_file::RawData corresponds to the Web's Blob?

Can we re-use the mime crate instead of defining our own enum of known mime types? My one concern would be code size footprint.

I'm not super familiar with the file API, so: as far as the file API is concerned (ignoring other API's potential Blob usage), is there a meaningful difference between File and Blob? Does it make sense to have a single type for both, some whose accessors return an Option, and if it is backed by a Blob and not a file, return None?


FileReader

...

This will be done through a futures API which takes the user through the state machine:

It probably makes sense for us to manually define the state machine future here, but the state_machine_future crate is worth taking a look at (specifically the shape of the code it generates for you) for inspiration on how to define nice state machines in Rust that leverage the type system: https://github.com/fitzgen/state_machine_future

The events "error", "load", and "loadend" are all representable through the future and thus aren't exposed on the file reader. The events "progress", "loadstart", and "abort" are still exposed as methods that take the appropriate Rust closure.

At this level of abstraction API (sort of mid-to-high), I would expect that the progress events are exposed as a stream (or eventually a Signal<PercentProgress>).


FileReaderSync

Can we maybe punt on the sync file reader until after we get the normal async FileReader API designed and implemented? It seems like something we can add post facto in a semver compatible way, is easier to design than the async version (follows the "solve hard problems first" principle), and I think (correct me if I'm wrong!) it is typically used less often.


My biggest question, coming away from reading this proposal, is whether it makes sense to jump directly to a state machine futures API, or whether we should cut out a mid-level API between web-sys and the state machine futures. It isn't totally clear to me, and I suspect you might have more insight here, given that you've started prototyping this design. How much of the code is dealing with the state machine stuff, and how much is wrangling -sys types? If it is almost all state machine stuff, then maybe it doesn't make sense to try and spin out a mid-level API. But if there is significant -sys type wrangling and clean up going on, then perhaps we should build the state machine on top of a mid-level API.

@rylev
Copy link
Collaborator Author

rylev commented Mar 26, 2019

@fitzgen awesome feedback thanks. The web-sys wrangling for everything but the FileReader api is pretty minimal. Using closures directly with web-sys is fairly painful and so it might be worth it to have a more 1 to 1 API at the mid level (though the API is extremely non-rustic).

After writing code today I'm starting to think we should have that 1 to 1 API, but I would hope most people wouldn't use it.

@rylev
Copy link
Collaborator Author

rylev commented Mar 26, 2019

Can we re-use the mime crate instead of defining our own enum of known mime types? My one concern would be code size footprint.

Yes, the code as it currently stands uses the mime crate.

is there a meaningful difference between File and Blob? Does it make sense to have a single type for both, some whose accessors return an Option, and if it is backed by a Blob and not a file, return None?

So, there's not a huge difference. Files have additional metadata like a name, but that's about it (at least as far as I'm aware of). We could bring them together in one type, but I'm not sure what to call it. Calling it a File is wrong because some things are just blobs of data. Calling everything a Blob feels weird because files when you're actually dealing with files, it's nice to call them that.

for inspiration on how to define nice state machines in Rust that leverage the type system: https://github.com/fitzgen/state_machine_future

Cool crate! I will take a look

I would expect that the progress events are exposed as a stream (or eventually a Signal).

We should definitely have this. The only real question is will there be two crates (mid and high level crates) or just one high level crate?

Can we maybe punt on the sync file reader until after we get the normal async FileReader API designed and implemented?

Yep. The only interesting question here is can we enforce at compile time that these APIs only get used in web workers?

@Pauan
Copy link
Contributor

Pauan commented Mar 27, 2019

We should definitely have this. The only real question is will there be two crates (mid and high level crates) or just one high level crate?

We've more-or-less decided that we should have one crate, with submodules: callbacks for the mid-level callback API, and futures for the high-level Futures API.

@fitzgen
Copy link
Member

fitzgen commented Mar 27, 2019

After going through the draft/WIP PR I don't think it makes sense to have different submodules for the whole API of callbacks and futures in this case, since it is only a few methods on the FileReader that would change.

Instead, I think we should just separate the read results into the two different modules, and have FileReader in the root module:

pub mod callbacks {
    // Thin wrapper around `EventListener` to enable cancellation of reads via `drop`.
    pub struct ReadAsString {
        listener: gloo_events::EventListener,
    }
    impl ReadAsString {
        pub fn forget(self) {
            self.listener.forget();
        }
    }
}

#[cfg(feature = "futures")]
pub mod futures {
    // Thin futures-based wrapper over the event listener and oneshot receiver.
    pub struct ReadAsString {
        receiver: oneshot::Receiver<String>,
        listener: gloo_events::EventListener,
    }
    impl Future for ReadAsString {
        ...
    }
}

impl FileReader {
    pub fn with_read_as_string<F>(f: F) -> self::callbacks::ReadAsString
    where
        F: FnOnce(String)
    { ... }

    #[cfg(feature = "futures")]
    pub fn read_as_string() -> self::futures::ReadAsString { ... }
}

Thoughts?

@rylev
Copy link
Collaborator Author

rylev commented Mar 27, 2019

It's true that it's only a few methods for the various reads, but how we handle more fine grained control (e.g., loadprogress, loadstart, etc)? Do we only want to expose these as a stream? What are the functions that return this stream called?

@rylev
Copy link
Collaborator Author

rylev commented Apr 2, 2019

@fitzgen @Pauan This proposal predates the proposal flow from #65 and as such there is already a pull request with the majority of the implementation done (#51). Do we need to modify this proposal before accepting it? I think there might be a few elements that were first agreed upon in the implementation PR and are not reflected here.

@Pauan
Copy link
Contributor

Pauan commented Apr 3, 2019

At the very least, I think it would be good to post a sketch in here of the public API (just the types, excluding the implementation).

@rylev
Copy link
Collaborator Author

rylev commented Apr 5, 2019

Ok here's how the proposed API will look like:

First we have FileList which is simply a list of Files. This is an opaque struct that offers a way to iterate over Files and get individual files by index:

struct FileList { ... }

impl FileList {
  fn from_raw_input(input: &web_sys::HtmlInputElement) -> Option<Self>  { }

  fn from_raw(inner: web_sys::FileList) -> Self { }

  fn get(index: usize) -> Option<File> { }

  fn len(&self) -> usize { }

  fn iter(&self) -> FileListIter { }

  fn into_iter(self) -> FileListIntoIter { }

  fn to_vec(&self) -> Vec<File> { }
}

There is no FileList::new since creating a raw web_sys::FileList is not possible without going through a web_sys::HtmlInputElement.

Next we have a blob module that includes the trait BlobLike

trait BlobLike {
    fn size(&self) -> usize { }

    #[cfg(feature = "mime")]
    fn mime_type(&self) -> Result<mime::Mime, mime::FromStrError> { }

    fn raw_mime_type(&self) -> String { }

    fn as_raw(&self) -> &web_sys::Blob;
}

There are two structs that implement this trait: Blob and File.

#[derive(Debug, Clone)]
struct Blob { ... }

impl Blob {
  fn new<T>(content: Option<T>, mime_type: Option<String>) -> Blob
    where
        T: std::convert::Into<BlobContents> // We'll look at BlobContents below
{ ... }

  fn from_raw(inner: web_sys::Blob) -> Blob  { }

  fn slice(&self, start: usize, end: usize) -> Blob { }
}

#[derive(Debug, Clone)]
pub struct File { ... }

impl File {
  fn new<T>(
        name: String,
        contents: Option<T>,
        mime_type: Option<String>,
        last_modified_date: Option<u64>,
    ) -> File
    where
        T: std::convert::Into<BlobContents>,
    { ... }

  fn from_raw(inner: web_sys::File) -> File { }

  fn name(&self) -> String { }

  fn last_modified_date(&self) -> u64 { }

  fn size(&self) -> u64 { }

  fn slice(&self, start: usize, end: usize) -> File { }
  
  fn as_blob(self) -> Blob { }
}

Both Blob and File come with builders that allow for creating new Blobs and Files in the builder style and not having to use new directly.

BlobContents is simply a new-type around wasm_bindgen::JsValues that can be used as the content of Blobs and Files:

#[derive(Debug, Clone)]
pub struct BlobContents {
    inner: wasm_bindgen::JsValue,
}

There are there conversions from types into BlobContents only for the types that make sense:

impl std::convert::Into<BlobContents> for &str
impl std::convert::Into<BlobContents> for &[u8] 
impl std::convert::Into<BlobContents> for Blob 
impl std::convert::Into<BlobContents> for js_sys::ArrayBuffer

Lastly there's the FileReader which allows reading from BlobLike objects:

#[derive(Clone, Debug)]
pub struct FileReader { }

impl FileReader {
  fn new() -> FileReader { }

  fn read_as_string(self, blob: &impl BlobLike) -> ReadAs<String> { }

  fn read_as_data_url(self, blob: &impl BlobLike) -> ReadAs<String> { }

  fn read_as_array_buffer(self, blob: &impl BlobLike) -> ReadAs<js_sys::ArrayBuffer>

  fn on_abort<F>(&mut self, callback: F) 
    where F: FnMut(AbortEvent) + 'static { }

  fn on_progress<F>(&mut self, callback: F) 
    where F: FnMut(ProgressEvent) + 'static { }

  fn on_load_start<F>(&mut self, callback: F) 
    where F: FnMut(LoadStartEvent) + 'static { }

  fn on_load_end<F>(&mut self, callback: F) 
    where F: FnMut(LoadEndEvent) + 'static { }
}

pub struct ReadAs<T> { ... }
impl<T> Future for ReadAs<T> {
    type Item = T;
    type Error = FileReaderError;
    ... 
}

// Make sure that dropping the Future properly aborts the reading
impl<T> std::ops::Drop for ReadAs<T> {
    fn drop(&mut self) {
        if self.inner.ready_state() < 2 {
            self.inner.abort();
        }
    }
}

We do not expose explicit error and load callbacks since those are exposed through the futures interface.

Questions:

Should we expose error and load as callbacks and have a separate feature for futures? An issue with this is what should the behavior be if a user sets a load callback and then uses the futures backed API? There can only be one callback called on load.

Should we include a Stream backed state machine API for all the different states a file reader can be in? This also has the question of what happens if the user sets explicit callbacks and uses the stream API?

Does the naming all make sense? FileReaders are used from reading from BlobLike things. I kept the naming similar to how they are in the raw bindings, but that might not make sense.

@Pauan @fitzgen I'm going to stop any work on #51 until this proposal is accepted since that's the newly agreed upon process. Let me know what you think!

@fitzgen
Copy link
Member

fitzgen commented Apr 5, 2019

Overall this looks really great and I think we are very close :)

FileList

Looks great!

I'd also add:

impl FileList {
    fn as_raw(&self) -> &web_sys::FileList { ... }
}

impl AsRef<web_sys::FileList> for FileList { ... }

FileReader

I don't think we should derive Clone since closures aren't really clone-able.

I think starting with read_as_string and read_as_array_buffer is great; all
other functionality (eg read_as_u8_vec or something that does the copy for
you) can be built on top of that and added in semver compatible ways. And we
probably want to add that functionality almost immediately (or even in the
initial impl; up to you if you'd like to do a little more design now or delay
until later).

However, I think we need to be slightly careful about the ReadAs<T> future,
to make sure that we know it will allow ReadAs<Vec<u8>> and ReadAs<Vec<u32>>
etc. We could do that with some trait stuff, but it would have to be a public
trait since ReadAs<T> is publicly visible. But now I'm wondering: is it worth
it / what are we getting from this? What if we just had different types?
ReadAsString, ReadAsArrayBuffer, ReadAsVecU8, etc. I think this would be
simpler.

  fn on_abort<F>(&mut self, callback: F)
    where F: FnMut(AbortEvent) + 'static { }

  // ...

  fn on_load_start<F>(&mut self, callback: F)
    where F: FnMut(LoadStartEvent) + 'static { }

  fn on_load_end<F>(&mut self, callback: F)
    where F: FnMut(LoadEndEvent) + 'static { }

All of these on_* callbacks other than prgoress can be FnOnce, right?


Should we expose error and load as callbacks and have a separate feature for
futures? An issue with this is what should the behavior be if a user sets a
load callback and then uses the futures backed API? There can only be one
callback called on load.

If we were to expose a callbacks interface, I think we would want to have
distinct callbacks::FileReader and futures::FileReader, which would resolve
this issue.

Also we technically can avoid the "only one callback" issue if we use
addEventListener instead of the on_* properties, and store a set of
Event Listeners inside the FileReader to keep them alive (or force users to
manage their lifetimes). I don't see a path to a super great API here though.

Should we include a Stream backed state machine API for all the different
states a file reader can be in? This also has the question of what happens if
the user sets explicit callbacks and uses the stream API?

This is sort of the same question as above, and I think these issues are arising
due to a mixing of callbacks and futures in one layer. Earlier we concluded that
it didn't make sense to have separate callbacks and futures layers for files,
but these questions are making me strongly reconsider that.

What do you think of this:

mod callbacks {
    #[derive(Clone, Debug)]
    pub struct FileReader { }

    impl FileReader {
      fn new() -> FileReader { }

      fn read_as_string<F>(self, blob: &impl BlobLike, callback: F)
        where F: FnOnce(String);

      fn read_as_data_url<F>(self, blob: &impl BlobLike, callback: F)
        where F: FnOnce(String);

      fn read_as_array_buffer<F>(self, blob: &impl BlobLike, callback: F)
        where F: FnOnce(&web_sys::ArrayBuffer);

      fn on_abort<F>(&mut self, callback: F)
        where F: FnMut(AbortEvent) + 'static { }

      fn on_progress<F>(&mut self, callback: F)
        where F: FnMut(ProgressEvent) + 'static { }

      fn on_load_start<F>(&mut self, callback: F)
        where F: FnMut(LoadStartEvent) + 'static { }

      fn on_load_end<F>(&mut self, callback: F)
        where F: FnMut(LoadEndEvent) + 'static { }
    }
}

mod futures {
    #[derive(Clone, Debug)]
    pub struct FileReader { }

    impl FileReader {
      fn new() -> FileReader { }

      fn read_as_string(self, blob: &impl BlobLike) -> ReadAs<String> { }

      fn read_as_data_url(self, blob: &impl BlobLike) -> ReadAs<String> { }

      fn read_as_array_buffer(self, blob: &impl BlobLike) -> ReadAs<js_sys::ArrayBuffer> { }

      // Either expose a stream of `start, progress*, (end | abort)` enum...
      fn lifecycle(&self) -> LifecycleStream { }

      // ... or futures and streams for each of start, progress, end, and abort:
      fn on_abort(&self) -> OnAbortFuture { }
      fn on_end(&self) -> OnEndFuture { }
      fn on_progress(&self) -> OnProgressStream { }

      // Or both?
    }
}

While the API surface is doubled, the implementation of the futures version
should be very straightforward to build on top of the callbacks version, so the
effort of implementation and maintenance should not be doubled.

I think this resolves all of the dissonance between callbacks and futures in the
same API, as well.

Does the naming all make sense? FileReaders are used from reading from
BlobLike things. I kept the naming similar to how they are in the raw
bindings, but that might not make sense.

I must admit I am not in love with BlobLike and BlobContents but I also have
zero better suggestions :)

@rylev
Copy link
Collaborator Author

rylev commented Apr 8, 2019

@fitzgen +1 points on FileList and FileReader - I can update the proposal with that. Having a generic ReadAs type was for implementation convenience but you're right it might push us into a corner we can't get out of. I'll seperate them.

I think the separation of FileReader into two implementations (futures based and callbacks based), gets us the "correctness" we want while still providing flexibility. Let's go down that route.

Either expose a stream of start, progress*, (end | abort) enum... ... or futures and streams for each of start, progress, end, and abort

This I'm on sure of. I think having both would be good, but perhaps we want to stick with just one for now. I need to think this over a bit.

I must admit I am not in love with BlobLike and BlobContents but I also have zero better suggestions :)

Me neither :-( MDN refers to BlobLike as just Blob since it's not problem for them to refer to interfaces and concrete options by the same name. We don't have that luxury. BlobContents is referred to by many different names none of which are very good: BlobParts, Bits, Array. We could just change it to Contents since it will be inside the blob module... Thoughts?

@rylev
Copy link
Collaborator Author

rylev commented Apr 8, 2019

One open question is if the read_as family of methods should take ownership self. The original reason for doing this was to properly enforce the order the API should be used in. If you want to set various callbacks (e.g., progress), you need to do this before kicking off the read.

However, as it currently stands, this means that the FileReader can only be used to read once even though this is not the case normally. The current proposal for XMLHttpRequest (#37) takes the opposite view and allows reuse of the the object. This opens up for the possibility of using the API wrong (e.g., accidentally initializing a request before setting the appropriate callbacks), but it also allows reuse.

Since Stream and Future based APIs remove a lot of these footguns, I think it makes sense for the callback APIs to be more of a 1 to 1 mapping of how the APIs are in the browser even if it opens up for the possibility of misuse. Therefore I propose that we don't take ownership of self when reading.

@fitzgen
Copy link
Member

fitzgen commented Apr 8, 2019

However, as it currently stands, this means that the FileReader can only be used to read once even though this is not the case normally. The current proposal for XMLHttpRequest (#37) takes the opposite view and allows reuse of the the object. This opens up for the possibility of using the API wrong (e.g., accidentally initializing a request before setting the appropriate callbacks), but it also allows reuse.

Since Stream and Future based APIs remove a lot of these footguns, I think it makes sense for the callback APIs to be more of a 1 to 1 mapping of how the APIs are in the browser even if it opens up for the possibility of misuse. Therefore I propose that we don't take ownership of self when reading.

It seems to me that one can always attach progress et al listeners after the read has been kicked off (but you won't get called for earlier events, of course) so it doesn't seem like that big of a deal to not consume self. Note that if we don't consume self, then the callbacks for on_end and on_abort cannot be FnOnce anymore.

The other thing is that we can allow re-use for the futures methods via giving the FileReader back as an argument to either the on_end and on_abort futures or by making the ReadAsWhatever futures yield a (Whatever, FileReader).

I don't have strong opinions either way on this whole topic, so I defer to you / others.

@Pauan
Copy link
Contributor

Pauan commented Apr 9, 2019

Thanks for your work on this, here is my thorough review:

FileList::from_raw_input

I don't like this method. I can't quite put my finger on it, but somehow it feels not right.

I guess it's because this feels like it should be a files() method on Input, not a method on FileList.

Maybe we can leave this off, since we can add it in later in a backwards compatible way?

FileList::from_raw

Should this be From<web_sys::FileList> instead?

FileList::get

In addition to this method, should it also implement the Index trait?

Also, it needs to accept u32, not usize.

FileList::len

This needs to return u32, not usize.

FileList::len

I think it'd be really cool if we could somehow Deref to a slice, then we'd get all the slice methods for free.

I think the only way we could do that is if we represented FileList internally as a Vec<File>, which might actually be reasonable.

FileList::into_iter

What's the use case for this?

FileList::to_vec

Isn't this just files.iter().collect()?


BlobLike::mime_type

Are we sure that the mime crate matches perfectly with the web's mime behavior?

BlobLike::size

This one is really tricky. According to the spec this should be u64, but JS doesn't have support for that.

So the correct thing to do would be to return f64, but that's really not great, since that implies that it supports non-integers, but that's not true.

On the other hand, u32 isn't great either, because if the size is more than 32-bits but less than 53-bits, it means it will overflow...

I don't have any good suggestions for this one. I just know that it definitely shouldn't be usize.


Blob::new

What's with the Option<T>? As far as I know, the contents of a Blob cannot be null.

Also, rather than making mime_type an Option, I think it would be a lot nicer to split it into two methods: Blob::new and Blob::new_with_mime_type.

Blob::from_raw

Similarly, should this be From<web_sys::Blob> instead?

Blob::slice

Just like size, this should technically be i64, but we can't do that.

I don't have any good suggestions for this, but it also shouldn't be usize.


File::new

Just like Blob::new, why is the contents Option<T>?

For the options, I think we should create an actual struct (similar to EventListenerOptions in gloo-events) which contains the options. And then we should have two methods: File::new and File::new_with_options.

File::from_raw

Similarly, should this be From<web_sys::File>?

File::last_modified_date

I'm not sure, but maybe this should be called File::date_last_modified?

File::size

Same comments as BlobLike::size. Also, since File implements BlobLike, why does it need a separate size method?

File::slice

Same comments as Blob::slice. Also, since File and Blob both implement slice, shouldn't slice be on BlobLike instead?

File::as_blob

This should accept &self, not self. Also, we should probably get rid of this method and instead just implement Deref<Target = Blob>.

Actually... if we implement Deref<Target = Blob>, then we don't have any need for BlobLike anymore, so we can get rid of it (and move its methods into Blob).


FileReader::read_as_string / FileReader::read_as_data_url / FileReader::read_as_array_buffer

Why does this accept &impl BlobLike and not A where A: BlobLike?

Also, just a side note, I really like that ReadAs is generic, good job on that.

@Pauan
Copy link
Contributor

Pauan commented Apr 9, 2019

@fitzgen All of these on_* callbacks other than prgoress can be FnOnce, right?

That depends on if we want to allow for reading multiple files with a single FileReader or not.

Personally, I'm fine with only reading one file per FileReader.

In certain conditions it might mean some extra allocations, but in practice I think that should be rare: you usually want one FileReader per file anyways (to handle progress/errors correctly).

If we're going with the "one FileReader per file" model, I think that the API will need some significant rethinking. I think we can simplify it a lot.

P.S. I'm not liking this whole "do design in GitHub issues" approach. It makes it really hard to do proper reviews of the API designs (see my post above for an example of how awkward it is).

Can we move to a system where we do API design in pull requests instead (like how RFCs are handled)? Maybe with an [RFC] in the title to separate them from the implementation PRs.

@fitzgen
Copy link
Member

fitzgen commented Apr 11, 2019

P.S. I'm not liking this whole "do design in GitHub issues" approach. It makes it really hard to do proper reviews of the API designs (see my post above for an example of how awkward it is).

Can we move to a system where we do API design in pull requests instead (like how RFCs are handled)? Maybe with an [RFC] in the title to separate them from the implementation PRs.

Yeah I'm +1 on this -- want to make a PR to CONTRIBUTING.md adjusting the workflow? We can discuss the finer points of this in that thread.

@rylev
Copy link
Collaborator Author

rylev commented Apr 13, 2019

@fitzgen @Pauan I put the latest updates in a gist - hopefully this will make it easier while we figure out an RFC process: https://gist.github.com/rylev/b4991af28b2d1b6bed769b19c1611182

I've gone with FileReader reading functions consuming self. This simplifies things and still allows for the predominate use case (reading one blob).

I've also split the FileReader into two modules one for callbacks and one for Futures/Streams.

@fitzgen I've kept the ReadAs struct generic. I don't fully understand what issue we might run into with this. Can you expand on that a bit?

@Pauan here's the feedback to your feedback:

  • Re: raw functions - I've got rid of almost all of them and replaced them just with From impls.
  • I've made changes to the len functions, but I'm not 100% sure why getting an item should require that the index is a u32. Can you expand on this?
  • "Are we sure that the mime crate matches perfectly with the web's mime behavior?" - Yes - they both conform to the same RFC
  • "I'm not sure, but maybe this should be called File::date_last_modified?" - I'm not sure. It's actually just last_modified in JS. I think we should change it to that.
  • I don't think we can implement slices for blobs or files. That requires reading the file which needs to be done through the reader and cannot be synchronous.
  • I feel uncomfortable implementing Deref<Target = Blob>. I know this pattern is used in the various wasm-bindgen crates, but I think generally it's an antipattern. I think we should have a longer discussion how we, in general, want to handle inheritance hierarchies at this level.

@Pauan
Copy link
Contributor

Pauan commented Apr 15, 2019

I've made changes to the len functions, but I'm not 100% sure why getting an item should require that the index is a u32. Can you expand on this?

With JS Array, the index is always u32 (trying to go higher triggers errors). If wasm eventually gets wasm64, then usize will become u64, which is incorrect. So they have to be fixed at u32.

I assume that things work similarly for FileList, Blob, and File, but I'm not actually sure... they might have higher limits than Array.

But even if they had higher limits, that still leaves us with only two choices: u32 or f64. Using usize would be really weird, and u64 doesn't work (it causes errors in JS).

I don't think we can implement slices for blobs or files. That requires reading the file which needs to be done through the reader and cannot be synchronous.

My suggestion was for FileList, not Blob or File.

I know this pattern is used in the various wasm-bindgen crates, but I think generally it's an antipattern.

Yup, I know that feeling. I originally proposed using traits, but we had a long discussion where they convinced me that Deref is the right approach. It's long, but worth reading.

Since in this case Blob and File map directly to the underlying JS types (which already have an inheritance relationship), I think it makes sense to use Deref.

Pauan added a commit that referenced this issue Apr 15, 2019
@fitzgen
Copy link
Member

fitzgen commented Apr 15, 2019

@fitzgen I've kept the ReadAs struct generic. I don't fully understand what issue we might run into with this. Can you expand on that a bit?

Sure thing.

This implementation:

  impl<T> Future for ReadAs<T> { ... }

Doesn't make sense as written. It will either

  • need to have a trait bound that describes how we get can get a specific T from the file reader / callback (and specific implementations of that helper trait)
    trait ReadFileAs {
        fn read_file_as(?) -> Self;
    }
    impl ReadFileAs for Vec<u8> { ... }
    impl ReadFileAs for String { ... }
    // etc...
    
    impl<T> Future for ReadAs<T>
    where
        T: ReadFileAs,
    { ... }
  • or it will need to be broken up into multiple Future implementations like
    impl Future for ReadAs<Vec<u8>> { ... }
    impl Future for ReadAs<String> { ... }
    // etc...

But the larger question is: why make a generic struct unless API consumers are going to leverage that genericism? Generics in public APIs have a learning/complexity cost, and unless we are gaining something in return, we should avoid it.

@rylev
Copy link
Collaborator Author

rylev commented Apr 17, 2019

@fitzgen I've updated the proposal to no longer have a generic ReadAs future.

Closing this in favor of #76.

@rylev rylev closed this as completed Apr 17, 2019
@Pauan
Copy link
Contributor

Pauan commented Apr 18, 2019

@fitzgen But the larger question is: why make a generic struct unless API consumers are going to leverage that genericism? Generics in public APIs have a learning/complexity cost, and unless we are gaining something in return, we should avoid it.

The reason I like the generic is because it makes it really clear to the consumer what the return type will be:

fn read_as_string(self, blob: &impl BlobLike) -> ReadAs<String> { }
fn read_as_data_url(self, blob: &impl BlobLike) -> ReadAs<String> { }
fn read_as_array_buffer(self, blob: &impl BlobLike) -> ReadAs<js_sys::ArrayBuffer> { }
fn read_as_string(self, blob: &impl BlobLike) -> ReadAsString { }
fn read_as_data_url(self, blob: &impl BlobLike) -> ReadAsString { }
fn read_as_array_buffer(self, blob: &impl BlobLike) -> ReadAsArrayBuffer { }

The first seems a lot clearer to me, since it explicitly tells you what the type will be. It's similar to something like Into<String>.

The second is more opaque, requiring you to actually look at the docs for ReadAsString to figure out the return type.

And no, I don't think it's correct to assume that ReadAsString returns a String: it could return a JsString, or maybe even something else, so you have to actually read the docs. Whereas ReadAs<String> is quite unambiguous, even without reading the docs.

I'm not going to block on that though, since the ReadAsString style is fairly idiomatic for Rust.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants