Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cli for eval #32

Merged
merged 11 commits into from
Jul 7, 2023
Merged

Add cli for eval #32

merged 11 commits into from
Jul 7, 2023

Conversation

suzhoum
Copy link
Collaborator

@suzhoum suzhoum commented Jul 1, 2023

Issue #, if available:

Description of changes:
Adds CLI support for eval logic. Made a few changes to naming, and fixed a couple of bugs.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@suzhoum suzhoum requested a review from gidler July 1, 2023 01:42
@gidler
Copy link
Collaborator

gidler commented Jul 3, 2023

I general these changes look good to me. One concern is that I believe Nick is planning to run evaluation tests by importing logic from this module, like aggregate_amlb_results, run_evaluation_openml and run_generate_clean_openml. I'm afraid if we change the function names and signatures as we port it to typer we could break his workflow. I think we should either not change the function names/signatures and just improve the CLI, or get Nick's feedback on if these changes are acceptable to him.

@suzhoum
Copy link
Collaborator Author

suzhoum commented Jul 3, 2023

I general these changes look good to me. One concern is that I believe Nick is planning to run evaluation tests by importing logic from this module, like aggregate_amlb_results, run_evaluation_openml and run_generate_clean_openml. I'm afraid if we change the function names and signatures as we port it to typer we could break his workflow. I think we should either not change the function names/signatures and just improve the CLI, or get Nick's feedback on if these changes are acceptable to him.

That's a very good point @gidler . @Innixma Do you think the updated function names and signatures make sense? If you feel the other way or have better suggestions, I can update in the PR.

Copy link
Contributor

@Innixma Innixma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I had a comment about if this is now preventing easy use of some functions within other python code now that they are decorated with @app.command().

src/autogluon/bench/eval/evaluation/evaluate_results.py Outdated Show resolved Hide resolved
src/autogluon/bench/eval/evaluation/evaluate_results.py Outdated Show resolved Hide resolved
src/autogluon/bench/eval/evaluation/evaluate_results.py Outdated Show resolved Hide resolved
src/autogluon/bench/eval/evaluation/evaluate_results.py Outdated Show resolved Hide resolved
Comment on lines +8 to +9
@app.command()
def aggregate_amlb_results(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if I want to run this file directly? Should we have a if __name__ == "__main__": logic?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. I updated in the way that we are able to run the file directly by

python src/autogluon/bench/eval/scripts/aggregate_amlb_results.py

It's supported by typer rather than argparse, which works similar in terminal

Typer CLI is also available for people who don't want to clone the source code:

agbench aggregate-amlb-results

Comment on lines 15 to 36
@app.command()
def clean_amlb_results(
benchmark_name: str = typer.Option(
..., help="Benchmark name populated by benchmark run, in format <benchmark_name>_<timestamp>"
),
results_dir: str = typer.Option("data/results/", help="Root directory of raw and prepared results."),
results_input_dir: str = typer.Option(
None,
help="Directory of the results file '<file_prefix><constraint_str><benchmark_name_str>.csv' getting cleaned. Can be an S3 URI. If not provided, it defaults to '<results_dir>input/raw/'",
),
results_output_dir: str = typer.Option(
None,
help="Output directory of cleaned file. Can be an S3 URI. If not provided, it defaults to '<results_dir>input/prepared/openml/'",
),
file_prefix: str = typer.Option("results_automlbenchmark", help="File prefix of the input results files."),
benchmark_name_in_input_path: bool = False,
constraints: str = typer.Option(
None,
help="List of AMLB constraints, refer to https://github.com/openml/automlbenchmark/blob/master/resources/constraints.yaml",
),
out_path_prefix: str = typer.Option("openml_ag_", help="Prefix of result file."),
out_path_suffix: str = typer.Option("", help="suffix of result file."),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we still call this as a function in python code after this change? If not, that might be problematic.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can, though the only tricky thing is that we need to supply "None" or default values for all in the function arguments. I'm thinking about another solution, which is to write a typer function wrapper around these functions. That might solve the problem of providing both CLI interface and normal function usage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to preserve normal function usage if at all possible, otherwise it will become awkward to chain these function calls in higher level code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, if there is a design pattern that suggests something different from what I am describing, I'd be interested to learn more.

Comment on lines 53 to 54
else:
constraints = constraints.split(",")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is problematic. So I can't call this function with pure python anymore and pass a list?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept the original function, and have added a typer wrapper to take constraints as list by providing --constraints constraint_1 --constraints constraint_2.

@suzhoum suzhoum force-pushed the add_cli_for_eval branch 2 times, most recently from 266d4c0 to c076d3e Compare July 7, 2023 18:25
Copy link
Contributor

@Innixma Innixma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming the remaining comments are addressed

agbench evaluate-amlb-results --frameworks_run framework_1 --frameworks_run framework_2 --paths openml_ag_ag_bench_20230707T070230.csv --results-dir-input data/results/input/prepared/openml --no-clean-data
"""
evaluate(
frameworks_run=frameworks_run if frameworks_run else None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why A if A else None? What is A when we would do else?

Copy link
Collaborator Author

@suzhoum suzhoum Jul 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is due to the fact that typer treats List default to [] instead of None. So I'm trying to force [] to be None in order to align with evaluate() signatures.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, long-term we may want to figure out a way to make it actually default to None to avoid this hack. There is also an edge-case where the user actually provides [] but it is treated as None incorrectly with the current logic.

Comment on lines 25 to 122
framework_nan_fill: str | None = None,
problem_type: List[str] | str | None = None,
folds_to_keep: List[int] | None = None,
framework_nan_fill: Optional[str] = None,
problem_type: Union[List[str], str, None] = None,
folds_to_keep: Optional[List[int]] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from __future__ import annotations allows to use | for type hints to represent or, eliminating the need for Optional[FOO] , and instead using FOO | None, which is easier to read.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know! Will change back

Comment on lines -157 to +260
if len(folds_to_keep) > 1:
if len(frameworks_run) > 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be this:

if len(folds_to_keep) > 1 and len(frameworks_run) > 1:

Copy link
Collaborator Author

@suzhoum suzhoum Jul 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out! compute_win_rate_per_dataset() had been updated to allow folds==1, is it intended?

# if num_folds <= 1:
# raise AssertionError('Not enough folds to calculate stderr')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, if it works with folds==1 then you can ignore this.

compute_win_rate_per_dataset(
f1=frameworks_run[0], f2=frameworks_run[1], results_raw=results_raw, folds=folds_to_keep
)
if compute_z_score and len(folds_to_keep) > 1:
if compute_z_score and len(frameworks_run) > 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if compute_z_score and len(folds_to_keep) > 1 and len(frameworks_run) > 1:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, will update!

@suzhoum suzhoum merged commit 00cb94e into autogluon:master Jul 7, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants