Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce allocation #94

Closed
kazimuth opened this issue Mar 9, 2019 · 22 comments
Closed

Reduce allocation #94

kazimuth opened this issue Mar 9, 2019 · 22 comments
Labels
crate:fluent-bundle Issues related to fluent-bundle crate design-decision Issues pending design decision, usually Rust specific enhancement help wanted We need help making decisions or writing PRs for this.
Milestone

Comments

@kazimuth
Copy link
Contributor

kazimuth commented Mar 9, 2019

The fluent API is currently very allocation-intensive. It could be possible to reduce allocations by e.g. changing the arguments of format from HashMap<&'a str, FluentValue> to something like &'a [&'a str, FluentValue<'a>], and letting FluentValue store references instead of owned strings.

@zbraniecki
Copy link
Collaborator

Hi, thanks for the feedback!

I'm currently working on #93 which is targeting fluent-rs 0.6 and hopefully will address this issue as well :)

@zbraniecki zbraniecki added this to the 0.6 milestone Mar 11, 2019
@zbraniecki
Copy link
Collaborator

The patch just landed. @kazimuth - can you verify that this issue is fixed now?

@kazimuth
Copy link
Contributor Author

kazimuth commented Apr 10, 2019

Looks like progress! Those benchmark numbers are exciting. It looks like simple formatting is crazy fast now.

Unfortunately the API still looks pretty allocation-heavy for complex formatting. Ideally, it would look something like:

pub fn format(
    &self,
    writer: &mut std::io::Writer,
    path: &str,
    args: Option<&[(&str, FluentValue)]>
) -> Option<Vec<FluentError>>

and FluentValue would look like:

enum FluentValue<'a> {
    String(Cow<'a>),
    Number(Cow<'a>)
}

or even:

enum FluentValue<'a> {
    String(&'a str),
    Number(&'a str)
}

With this API, the library would only have to allocate in the error case.

By taking a std::io::Writer, you don't have to allocate strings to return; the user will shove your bytes wherever they want them. It also makes the library way more interoperable with all the other libraries that use Writer. You could of course add a helper format_to_string that allocates and returns a fresh string for convenience. (Or you could leave the format API as-is and add a format_to_writer method, either way is fine.)

By taking a &[(&str, FluentValue)], you free the user from having to create a HashMap for every call. Note that your benchmarks currently initialize the argument HashMap outside the core benchmark loop; moving that inside might give you a better sense of the true cost of calling format().

(If you want to still use a HashMap internally, you could use a thread_local! RefCell<HashMap> and insert the argument slice into it during each call to format + clear it after each call. This would amortize the allocation cost over many calls; and locking a thread-local RefCell is basically free, there's very little cost there. It shouldn't ever cause errors either, as long as the library doesn't call format recursively.)

Finally, by making FluentValue not need to own its contents, you free the user from having to copy things into FluentValue to make calls.

Does this all make sense?

@zbraniecki
Copy link
Collaborator

I'm not sure about the Writer, but I'll read more about it, thanks!

As for FluentValue - I think we're in a pretty good place - https://github.com/projectfluent/fluent-rs/blob/master/fluent-bundle/src/types.rs#L112-L119 - it does take Cow!

@zbraniecki
Copy link
Collaborator

@Manishearth - do you have any thoughts on this proposal?

@Manishearth
Copy link
Collaborator

Worth looking into the Writer thing, might be good to use W: Writer instead of &mut Writer to avoid the overhead of dynamic dispatch. But also, dynamic dispatch may not be that expensive anyway.

The HashMap.... that's trickier, since this would probably slow down argument lookups. In case we expect there to not be many arguments in the common case, a SmallVec makes more sense (perhaps with some trait magic so that you can just pass down a vec or a single argument and it gets converted).

This also avoids having to introduce an extra 'args lifetime to Scope, since I don't think we can assume the arguments have a lifetime of 'bundle.

(Is it worth making the argument name in args a Cow? Is there a use case for the argument name not coming from the bundle? That lifetime seems out of place)

@kazimuth
Copy link
Contributor Author

As for FluentValue - I think we're in a pretty good place - https://github.com/projectfluent/fluent-rs/blob/master/fluent-bundle/src/types.rs#L112-L119 - it does take Cow!

Oh nice! I didn't see that, guess i was looking at an older branch.

Worth looking into the Writer thing, might be good to use W: Writer instead of &mut Writer to avoid the overhead of dynamic dispatch. But also, dynamic dispatch may not be that expensive anyway.

Yeah, either way.

The HashMap.... that's trickier, since this would probably slow down argument lookups. In case we expect there to not be many arguments in the common case, a SmallVec makes more sense (perhaps with some trait magic so that you can just pass down a vec or a single argument and it gets converted).

Hmm, yeah, that's sorta a tough question. Measurement would help. One way to address that is the internal thread-local HashMap i described.

Other question: have you thought about interning strings? Not the translation strings, but variable names and such. There's some crates on crates.io for it, string-interner is a good one. You might be able to get some decent speedups out of that. On the downside, it would probably complicate the implementation, and might end up complicating the API as well.

@Manishearth
Copy link
Collaborator

Interned strings are definitely useful for compiler-ish applications. Rust and Servo both do this.

But yeah, you basically end up moving most of your String code to dealing with Idents. On the plus side, string comparisons are fast.

@Manishearth
Copy link
Collaborator

(I'd caution against using thread local scratch space for stuff like this, it doesn't usually pan out well)

@kazimuth
Copy link
Contributor Author

Huh, could you elaborate some? I'm just curious, I haven't worked enough with thread locals to be familiar with their downsides

@Manishearth
Copy link
Collaborator

TLS is pretty rare in Rust, and the pattern of reusing scratch space via globals is also quite rare. It might be fine, but it's basically not considered good practice unless absolutely necessary.

@zbraniecki zbraniecki added design-decision Issues pending design decision, usually Rust specific enhancement crate:fluent-bundle Issues related to fluent-bundle crate help wanted We need help making decisions or writing PRs for this. labels Apr 17, 2019
@zbraniecki
Copy link
Collaborator

I looked into it some more, in particular, I turned some internal types to use Writer and it looks quite good! I'm particularly happy that in cases of concatenation I don't need to clone owned Cows.

I'll look further into this (and the args) once we update the API to match the latest proposal in JS - projectfluent/fluent.js#380 )

@zbraniecki
Copy link
Collaborator

With the recent changes, I think we're in a pretty good spot.

I'm happy to continue looking into this, in particular the Write and further optimizations around FluentArgs as list of tuples instead of a hashmap, but I'll move it to 0.8 for now.

@zbraniecki zbraniecki modified the milestones: 0.7, 0.8 Jul 29, 2019
@cmyr
Copy link
Contributor

cmyr commented Aug 15, 2019

I'm digging into fluent for use in druid, and was thinking a bit about the ergonomics of the current API.

I'd earlier been sketching out the API I wanted to expose from druid, and in my imagining, this API allowed the args to be provided via closure. Here's a very rough sketch of what I was imagining (this code is bad, but I hope you get the idea):

// strings.flt
// app-menu-about-title = About { $app_name }

/// Some wrapper type that includes some `BundleResource`
struct I18nBundle { .. }

impl I18nBundle {
    /// ignoring the lifetimes for now
    fn localize(&self, message: &str, attrs: impl Fn(&str) -> Option<FluentValue>) -> String {
        // as a convenience we will return the key as the value if it's missing?
        self.inner.resolve(message, attrs).unwrap_or_else(|| message.to_owned())
    }
}

let bundle = I18nBundle::new();
let menu_title: String = bundle.localize("app-menu-about-title", |key| {
    match key {
        "app_name" => "MyApp".into(),
        _ => "".into(),
    }
});

assert_eq!(menu_title.as_str(), "About MyApp");

An advantage of this is that the current API could be implemented on top of it, by just moving a map into the closure.

If I were thinking about implementing this, I might also consider something like,

trait FluentArgs {
    fn get(&self, key: &str) -> Option<FluentValue>;
}

Which could have impls for the current hashmap, for a generic closure with the correct signature, or for a custom type.

This would require some changes to types and lifetimes, but it would greatly increase the amount of flexibility available to people who will be designing APIs on top of fluent.

Forgive me if I'm missing some detail that makes this unworkable; it is very much a rough sketch, and I haven't dug too deeply into fluent so far, so it's possible i'm overlooking something obvious. Very happy to hear any thoughts!

@zbraniecki
Copy link
Collaborator

@cmyr thanks for the suggestion! I'm excited to explore the right API for this feature!

@Pike what do you think?

@Pike
Copy link
Contributor

Pike commented Aug 16, 2019

Generally speaking, I think that fluent-rs should use APIs that work well for Rust.

In this particular case, though, I'm concerned that we're putting error recovery in the hand of the application developer. Notably, in

let menu_title: String = bundle.localize("app-menu-about-title", |key| {
    match key {
        "app_name" => "MyApp".into(),
        _ => "".into(),
    }
});

the error is "", instead of None.

The other more vague concern I have is that this enables programmers to compute values based on the variables they're asked for. Like $lang-de could be resolved to Deutsch. Which leads us into the territory where the programmer understands what she did, but the localizers start doing random stuff.

This API looks like quite a foot gun, tbh.

@cmyr
Copy link
Contributor

cmyr commented Aug 16, 2019

Forgive my bad code; it would be just as easy for this to return Option<FluentValue>. I want to re-affirm that the code provided is very much a sketch, and there's certainly a lot of room for improvement.

How to handle runtime failures is an important question, but I think that is relevant to any higher-level API.

@Pike I don't understand the $lang-de example. Is it that the programmer is returning a non-localized value for something that should be localized?

Digression:

Something I would really like is some sort of static check that (at least) messages that are referenced in code exist in included FTL files. One thing I was thinking about is a proc macro that you use to annotate message identifiers (say ftl_msg!("hello-world")), and which during compilation builds a registry of all the messages in a project, which can then be checked against various bundles.

@zbraniecki
Copy link
Collaborator

In this particular case, though, I'm concerned that we're putting error recovery in the hand of the application developer. Notably, in

In your example, we use FluentValue enum which can be FluentValue::None or FluentValue::Error even.
That allows us to return error scenarios from functions and I guess it would make args act similarly.

I also would like to understand better your $lang-de example.

From my perspective, the biggest challenge of the proposal from @cmyr is that it reverses the equation, and makes certain things dynamic which may be a pro or a cons.

For example, it enables non-deterministic arguments:

let menu_title: String = bundle.localize("app-menu-about-title", |key| {
    match random() {
        0 => FluentValue::Number("5"),
        1 => FluentValue::String("App 2")
    }
});

I imagine that flexibility may be tempting, but I'm not sure if it actually leads to a good API that separates concerns between developers and localizers.

@cmyr - can you elaborate on the advantage of that? I'd also prefer it in a separate issue (this is just about reducing allocations, yours is a proposal for the API change).

Something I would really like is some sort of static check that (at least) messages that are referenced in code exist in included FTL files.

Yeah, we'll definitely want a procedural macro for that. I've been recently playing with proc macros for unic-langid and got it working quite nicely.
I expect to get to something like this in fluent-rs 0.9 tho, because first we need to stabilize the core API before we dive into helpers.

@cmyr
Copy link
Contributor

cmyr commented Aug 17, 2019

I think that perhaps the issue you're surfacing is also a separate one: the issue here is that messages are not strongly typed. Maybe in our ideal world, each message is converted to a struct, and that struct has a custom constructor that requires the user pass in the correct arguments.

For instance, there's nothing to stop me from reproducing your example using the fluent_args! macro.

I agree that I'm derailing this issue a bit, but the most basic idea of this proposal — using a closure instead of a HashMap to provide args — is fundamentally about reducing allocation; the closure can be on the stack, and can capture references to the environment.

Glad to hear you've been experimenting with proc macros already, I think there are some potentially major features that can might unlock.

@Pike
Copy link
Contributor

Pike commented Aug 22, 2019

It also allows the programmer to call back into the localization stack when resolving arguments, and create infinite loops if the localizer finds a way to use different variables than the programmer used in English, for example.

@zbraniecki
Copy link
Collaborator

hi all.

I landed a bunch of changes as part of #193 that should help with user experience while also lowering the overhead - I moved a number of HashMaps to Vec<(key, value)>.

Please, let me know if you think this is helpful for this issue!

@zbraniecki zbraniecki modified the milestones: 0.13, 0.14 Sep 25, 2020
@zbraniecki
Copy link
Collaborator

closing for now. reopen if needed please!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crate:fluent-bundle Issues related to fluent-bundle crate design-decision Issues pending design decision, usually Rust specific enhancement help wanted We need help making decisions or writing PRs for this.
Projects
None yet
Development

No branches or pull requests

5 participants