Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add custom formatter for Fire result #345

Merged
merged 6 commits into from
Apr 16, 2022
Merged

Conversation

beasteers
Copy link
Contributor

@beasteers beasteers commented Jul 7, 2021

Fixes #344 (see issue for more details)

This lets you define a function that will take the result from the Fire component and allows the user to alter it before fire looks at it to render it.

Why? Because you want to define a global way of formatting your data for your CLI across a family of functions/classes where it is impractical (and a major pain) to wrap them all.

Outputting tabular data:

import random
def data():
    return [
        {
            'is_up': random.choice([False, True]), 
            'value_A': random.random(), 
            'value_B': random.random() + 100,
            'id': random.randint(1000, 5000)} 
        for i in range(8)
    ]

import fire
fire.Fire(data)

Outputs this:

{"is_up": true, "value_A": 0.6004291859538904, "value_B": 100.77910907893889, "id": 474}
{"is_up": false, "value_A": 0.1406617230697117, "value_B": 100.35721554966845, "id": 740}
{"is_up": true, "value_A": 0.3612392830626744, "value_B": 100.60814663568802, "id": 509}
{"is_up": false, "value_A": 0.11247653550250092, "value_B": 100.2673181440675, "id": 305}
{"is_up": false, "value_A": 0.9505598630828166, "value_B": 100.84615141986525, "id": 85}
{"is_up": true, "value_A": 0.17544933002396768, "value_B": 100.66062056951291, "id": 385}
{"is_up": false, "value_A": 0.25245927587860695, "value_B": 100.75492369068093, "id": 923}
{"is_up": true, "value_A": 0.9237200249249168, "value_B": 100.94228120845642, "id": 702}

But if we can define a formatter.

import tabulate

def fancy_table(result):
    if not result:  # show a message instead of an empty response
        return 'nada. sorry.'
    
    # display a list of dicts as a table
    if isinstance(result, (list, tuple)) and all(isinstance(x, dict) for x in result):
        return tabulate.tabulate([
            {col: cell_format(value) for col, value in row.items()}
            for row in result
        ], headers="keys")

    return result  # otherwise, let fire handle it

def cell_format(value, decimals=3, bool=('🌹', '🥀')):
    if value is True:
        return bool[0]
    if value is False:
        return bool[1]
    if isinstance(value, float):
        return '{:.{}f}'.format(value, decimals)
    return value

fire.Fire(data, formatter=fancy_table)

Outputs this:

is_up      value_A    value_B    id
-------  ---------  ---------  ----
🌹           0.115    100.013  1821
🌹           0.439    100.167  4242
🥀           0.68     100.345  2937
🥀           0.074    100.119  4675
🌹           0.189    100.462  4571
🌹           0.221    100.342  1522
🌹           0.02     100.452  2363
🥀           0.023    100.812  2433

Using a formatter means that:

  • if you don't provide a formatter, or if your formatter returns the original value (lambda x: x) then nothing is changed
  • if you want to change how some value is displayed, you can just return what you want it to render as
    e.g. lambda x: {'idk why, but': x} if isinstance(x, list) else x
  • if you are unable to make a class __str__ representation look like you want it to, you can handle it in the formatter instead
    e.g. lambda x: custom_str(x) if isinstance(x, some_class) else x
  • if you want to display nested dictionaries as yaml, you can use yaml.dump as a formatter
  • you can suppress output e.g. lambda x: None
  • you can handle printing yourself by printing inside the formatter and returning None to suppress fire formatting

@google-cla google-cla bot added the cla: yes Author has signed CLA label Jul 7, 2021
@luxiaba
Copy link

luxiaba commented Dec 3, 2021

can't agree more, as mentioned here, after each iteration of parsing, the component is updated to another new object, so there is no way to uniformly process the final result (maybe we need to remove some keys in the final dict or beautify the final result).

actually, i think it might be better to let users to provide a customized print method instead of a formatter. At present, _PrintResult mainly uses `print' without any parameters, and there is no way to interfere with the behavior of print (such as remove new line: end='').

@beasteers
Copy link
Contributor Author

Glad other people find this useful too!

That is an interesting idea. I still like the idea of a formatter because it allows you to do a minimal override and pre-process the message if you want to, but still rely on the built in output mechanisms for everything else - so if you override it for one type, builtin formatting for everything else will still work as normal.

def my_formatter(x):
    # handle a class that doesn't have a nice string output
    if isinstance(x, tf.keras.Model):  # like a keras model
        x.summary()  # prints a nice output
        return  # return empty so the default object rendering doesn't happen
    return x  # everything else is handled normally, e.g. dicts, lists, objects

But what you're talking about is possible with the formatter argument in this PR. You can just print from inside the formatter function and then return None (or don't return anything) and it'll work just like your _PrintResult suggestion.

def my_formatter(x):
    print(x, end=' ')  # print instead of return and you can add print arguments

@luxiaba
Copy link

luxiaba commented Dec 3, 2021

aha, here my_formatter is like a customized print method(at least it seems did the job of printing, instead of always returning something or influencing others obj like a formatter, as for _PrintResult , maybe we can use it as a default printer):

before:

# The command succeeded normally; print the result.
_PrintResult(component_trace, verbose=component_trace.verbose)
result = component_trace.GetResult()
return result

after:

# The command succeeded normally; print the result.
result = component_trace.GetResult()

# show result
if has_customized_printer():
  customized_printer(result, component_trace.verbose)
else:
  # here we can separate the helper display from `_PrintResult`, let it keep the simple display function.
  _PrintResultWithoutHepler(result, verbose=component_trace.verbose)

# show helper
if should_show_help():
  show_help(component_trace, component_trace.verbose)
return result

Anyway, it's just a little bit of my personal thought, and I think your unifed formatter is already great idea, hope to merged ASAP☺️

@beasteers
Copy link
Contributor Author

beasteers commented Dec 3, 2021

Hmm not sure if I understand what your suggestion is 🤔

Ya the custom formatter is meant to be really flexible and can already handle both of our use-cases. (You don't have to change anything in this PR for your after: code to work the way you intended. By returning nothing from the formatter, you will be bypassing fire's outputting, just like you wanted.

But to clarify - in 95% percent of cases, you will not need to print inside of a formatter function, you just return the result. If you want it to print out a string, you can just return a string. But if you need to control how something is printed, you are free to do it using the way described above.

Yeah I hope it gets merged soon too! Thanks for your input!

@dbieber
Copy link
Member

dbieber commented Dec 3, 2021

Yes, this is a welcome change!
Note unfortunately the time I have to review and merge is limited now; it'll likely not be until early next year that we get this checked in.

Have a look at #188 (comment)
for an older discussion about how we want to do this.

Summary of top-level thoughts:

  • Let's call the argument display rather than formatter, for consistency with other arguments we might add in the future (e.g. serialize)
  • I like the approach of allowing the display function to "pass through" to the default by returning a value.
    • However, if the display function doesn't return a value, we still want result to be set correctly in interactive mode.
  • Doesn't need to be in the first pass, but I want to support multiple possible signatures for the argument to display so that the five use-cases at the start of the comment all work.

@beasteers
Copy link
Contributor Author

beasteers commented Dec 3, 2021

Note unfortunately the time I have to review and merge is limited now; it'll likely not be until early next year that we get this checked in.

yep no worries! I probably wouldn't be able to work on it until then anyways either.

Let's call the argument display rather than formatter, for consistency with other arguments we might add in the future (e.g. serialize)

love it, sounds much better ✨

However, if the display function doesn't return a value, we still want result to be set correctly in interactive mode.

I've never used interactive mode so I'll need to dig in to see what this means, but the display function doesn't replace the result value used in chaining. So regardless of what the display function returns, the fire chaining will still work the same way because it still uses the original value. (the format call happens inside the _PrintResult call which has no return value).

(let me know if I misunderstood haha)

Doesn't need to be in the first pass, but I want to support multiple possible signatures for the argument to display so that the five use-cases at the start of the comment all work.

Hmm maybe we could have format=fancy_table and display=print? It may be easier to decouple the output mode (less/print) with the result processing so you can mix and match.

@beasteers
Copy link
Contributor Author

beasteers commented Dec 3, 2021

Wait actually reading thru that other thread more, this PR is actually providing serialize as you are describing it there (not display), because I designed it with this use-case in mind:

def json_format(x):
    return json.dumps(x, indent=4)

fire.Fire(..., format=json_format)

as well as csv, yaml, (fancy_table is also just a serialization) etc...

Like this was never meant as a display function that needs to manage printing out anything (though you are free to do it as a side effect if you want)

It's primarily just to provide serialization overrides, with the ability to fallback to default serialization implicitly (by returning something other than None/str).

Also with this method, if you want to combine multiple formatters that work for different types, all you'd have to do is:

fire.Fire(..., format=lambda x: (
        list_of_dicts_as_table(
            list_of_lists_as_csv(
                dicts_as_yaml(x)))
))

which should work out of the box currently (provided you define those formatters). But this composition would not be possible without the pass-through mechanism.

@dbieber
Copy link
Member

dbieber commented Dec 3, 2021

Yes! That's exactly the direction I was thinking in:

The user can specify either (or both) of two arguments (as you saw, I'm thinking to call them serialize and display):

The rough signature of serialize is serialize(result) -> output (no side-effects).
The rough signature of display is display(output) -> None (with the side-effect of displaying the output).

For your use-case, you could in principle use either serialize or display, but like you say I think serialize is more appropriate. If you use serialize, the recommended approach would be to return the table as a string. If you use display, then you would have to print the string yourself rather than returning it.

@beasteers
Copy link
Contributor Author

Ok cool! In that case I don't think there's much more to do other than rename formatter= to serialize= and maybe add a couple builtin serializers.

Idk if we should do display= as a separate PR? I'm less familiar with the nuances around output and paging that ppl are requesting.

(Obvs we can decide that down the road when we both have time)

@dbieber
Copy link
Member

dbieber commented Dec 4, 2021

Yes, adding them in separate PRs would be fine.

(Edit: And it goes without saying you don't need to do both of them. Thank you for your contribution so far. If you want to help out adding more, that's wonderful and welcome, but of course entirely up to you. :))

@beasteers
Copy link
Contributor Author

sounds good! yeah I'll probably let someone more motivated by the display overriding handle it since I don't have much of an opinion about how it should be dealt with.

Cool this is exciting tho!

# Allow users to modify the return value of the component and provide
# custom formatting.
if callable(serialize):
result = serialize(result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll want to handle the not-callable case too.
I think if we cannot call serialize, we raise a FireError.

Copy link
Contributor Author

@beasteers beasteers Dec 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we just do:

if serialize:
    result = serialize(result)

which will just raise whatever error the object would raise if tried to be called (probably typeerror)

And this would allow any falsey value to disable serialize (None, False, etc)

Copy link
Contributor Author

@beasteers beasteers Dec 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed - it will now raise FireError("serialize argument {} must be empty or callable.".format(serialize))

if isinstance(x, dict):
return ', '.join('{}={!r}'.format(k, v) for k, v in x.items())
if x == 'special':
return ['SURPRISE!!', "I'm a list!"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name "serialize" implies that the output of serialize is either bytes or a string.
But that's not what we're requiring.
I wonder if there's a better name (maybe that's "format" after all.)
I'll think on this, but curious for your thoughts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "process" or "finalize"...
I'll keep thinking.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgive my unpolished thinking aloud:

Maybe "serialize" is fine, and the logic is: "if you don't finish serializing it, we'll continue serializing it for you using the default serializer."
And then "display" will only ever be given bytes or a string, rather than being directly given the output of serialize.
One challenge with this is that it disallows different serialization for different display modes.
We could (in a subsequent PR) allow serialize to accept the "kind" parameter too to reenable this capability.

Copy link
Contributor Author

@beasteers beasteers Dec 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I mean I personally like format too, but serialize can also work and I think that rationale is fine

One challenge with this is that it disallows different serialization for different display modes

I'm not sure I understand the motivation behind this. Do you mean that you'd want the display mode to have its own default serializer to fall back on? If that's the case, then maybe there could be another mode of serializer override tied to the display object.

# could also be defined as a class if you want
def mydisplay(x):
    print(x)
def serialize(x):
    return x
mydisplay.serialize = serialize

# then internally _Fire(..., display=mydisplay)
default_serialize = getattr(display, 'serialize', None)
if default_serialize:
    result = default_serialize(result)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note: one thing that would be nice is if we could have some sort of override via the CLI where the user could do --format json or --format csv or something like that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes about the naming - I like serialize and format because they both imply a string result. I think there might be a problem with process or finalize cuz they could imply that you have the ability to modify the result after each iteration (after each getattr / getitem / call).

I can't think of any other words better than those two tho.

Maybe "serialize" is fine, and the logic is: "if you don't finish serializing it, we'll continue serializing it for you using the default serializer."

I think this is fine because even if the user doesn't realize that they can't return a string from it, it's not going to break anything, it might just make a little more work for them.

fire/core.py Show resolved Hide resolved
@dbieber
Copy link
Member

dbieber commented Apr 16, 2022

This looks good! I'm going to squash-and-merge it in now.

The next steps are:

  • Document this new feature!
  • Confirm interactive mode works as expected even when serialize= is set.
  • Add a default serializer function as in Option to print help to std out #188
  • Add the display= argument next

Thanks again for the PR!

@dbieber dbieber merged commit 8bddeec into google:master Apr 16, 2022
dbieber added a commit that referenced this pull request Apr 16, 2022
dbieber added a commit that referenced this pull request Apr 16, 2022
* Lint error cleanup following #345
* Makes new serialize= test deterministic
@beasteers
Copy link
Contributor Author

Out of curiosity @dbieber, do you have an idea of when you think this will make it to a public release? Not a huge deal, I'm just doing some planning

@dbieber
Copy link
Member

dbieber commented May 16, 2022

I'd guess early September, but it's a high variance estimate.

@dbieber
Copy link
Member

dbieber commented Dec 12, 2022

The release went out today. This is the most significant change in the release and is highlighted in the release notes.
Thanks for your patience, and of course thanks for the contribution as well!

@beasteers
Copy link
Contributor Author

Amazing thanks for the release!! One thought that I've had since adding this functionality is that it'd be nice to be able to customize formatting based on the method called. For example

class CLI:
    def ls(self): pass
    def get(self, id): pass
    
def serialize(result, calls):
    if calls[-1].func == CLI.get:  print("do something else")
    ...

But that would break being able to do simple things like serialize=yaml.dumps so I'm not sure how you'd handle both cases (aside from signature inspection)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Author has signed CLA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Override Result Formatting for pretty and non-intrusive CLIs 🌈✨
3 participants