Add cli for eval #32

suzhoum · 2023-07-01T01:41:47Z

Issue #, if available:

Description of changes:
Adds CLI support for eval logic. Made a few changes to naming, and fixed a couple of bugs.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

gidler · 2023-07-03T23:02:25Z

I general these changes look good to me. One concern is that I believe Nick is planning to run evaluation tests by importing logic from this module, like aggregate_amlb_results, run_evaluation_openml and run_generate_clean_openml. I'm afraid if we change the function names and signatures as we port it to typer we could break his workflow. I think we should either not change the function names/signatures and just improve the CLI, or get Nick's feedback on if these changes are acceptable to him.

suzhoum · 2023-07-03T23:12:34Z

I general these changes look good to me. One concern is that I believe Nick is planning to run evaluation tests by importing logic from this module, like aggregate_amlb_results, run_evaluation_openml and run_generate_clean_openml. I'm afraid if we change the function names and signatures as we port it to typer we could break his workflow. I think we should either not change the function names/signatures and just improve the CLI, or get Nick's feedback on if these changes are acceptable to him.

That's a very good point @gidler . @Innixma Do you think the updated function names and signatures make sense? If you feel the other way or have better suggestions, I can update in the PR.

Innixma

Looks good, but I had a comment about if this is now preventing easy use of some functions within other python code now that they are decorated with @app.command().

src/autogluon/bench/eval/evaluation/evaluate_results.py

Innixma · 2023-07-06T22:39:44Z

src/autogluon/bench/eval/scripts/aggregate_amlb_results.py

+@app.command()
+def aggregate_amlb_results(


What if I want to run this file directly? Should we have a if __name__ == "__main__": logic?

That's right. I updated in the way that we are able to run the file directly by

python src/autogluon/bench/eval/scripts/aggregate_amlb_results.py

It's supported by typer rather than argparse, which works similar in terminal

Typer CLI is also available for people who don't want to clone the source code:

agbench aggregate-amlb-results

Innixma · 2023-07-06T22:43:08Z

src/autogluon/bench/eval/scripts/run_generate_clean_openml.py

+@app.command()
+def clean_amlb_results(
+    benchmark_name: str = typer.Option(
+        ..., help="Benchmark name populated by benchmark run, in format <benchmark_name>_<timestamp>"
+    ),
+    results_dir: str = typer.Option("data/results/", help="Root directory of raw and prepared results."),
+    results_input_dir: str = typer.Option(
+        None,
+        help="Directory of the results file '<file_prefix><constraint_str><benchmark_name_str>.csv' getting cleaned. Can be an S3 URI. If not provided, it defaults to '<results_dir>input/raw/'",
+    ),
+    results_output_dir: str = typer.Option(
+        None,
+        help="Output directory of cleaned file. Can be an S3 URI. If not provided, it defaults to '<results_dir>input/prepared/openml/'",
+    ),
+    file_prefix: str = typer.Option("results_automlbenchmark", help="File prefix of the input results files."),
+    benchmark_name_in_input_path: bool = False,
+    constraints: str = typer.Option(
+        None,
+        help="List of AMLB constraints, refer to https://github.com/openml/automlbenchmark/blob/master/resources/constraints.yaml",
+    ),
+    out_path_prefix: str = typer.Option("openml_ag_", help="Prefix of result file."),
+    out_path_suffix: str = typer.Option("", help="suffix of result file."),


Can we still call this as a function in python code after this change? If not, that might be problematic.

Yes we can, though the only tricky thing is that we need to supply "None" or default values for all in the function arguments. I'm thinking about another solution, which is to write a typer function wrapper around these functions. That might solve the problem of providing both CLI interface and normal function usage.

I would like to preserve normal function usage if at all possible, otherwise it will become awkward to chain these function calls in higher level code.

Alternatively, if there is a design pattern that suggests something different from what I am describing, I'd be interested to learn more.

Innixma · 2023-07-06T22:46:35Z

src/autogluon/bench/eval/scripts/run_generate_clean_openml.py

+    else:
+        constraints = constraints.split(",")


This is problematic. So I can't call this function with pure python anymore and pass a list?

I kept the original function, and have added a typer wrapper to take constraints as list by providing --constraints constraint_1 --constraints constraint_2.

…eplace argparse

Innixma

LGTM assuming the remaining comments are addressed

Innixma · 2023-07-07T19:42:34Z

src/autogluon/bench/eval/scripts/run_evaluation_openml.py

+    agbench evaluate-amlb-results --frameworks_run framework_1 --frameworks_run framework_2 --paths openml_ag_ag_bench_20230707T070230.csv --results-dir-input data/results/input/prepared/openml --no-clean-data
+    """
+    evaluate(
+        frameworks_run=frameworks_run if frameworks_run else None,


Why A if A else None? What is A when we would do else?

This is due to the fact that typer treats List default to [] instead of None. So I'm trying to force [] to be None in order to align with evaluate() signatures.

Ok, long-term we may want to figure out a way to make it actually default to None to avoid this hack. There is also an edge-case where the user actually provides [] but it is treated as None incorrectly with the current logic.

Innixma · 2023-07-07T19:44:45Z

src/autogluon/bench/eval/scripts/run_evaluation_openml.py

-    framework_nan_fill: str | None = None,
-    problem_type: List[str] | str | None = None,
-    folds_to_keep: List[int] | None = None,
+    framework_nan_fill: Optional[str] = None,
+    problem_type: Union[List[str], str, None] = None,
+    folds_to_keep: Optional[List[int]] = None,


from __future__ import annotations allows to use | for type hints to represent or, eliminating the need for Optional[FOO] , and instead using FOO | None, which is easier to read.

Good to know! Will change back

Innixma · 2023-07-07T19:46:30Z

src/autogluon/bench/eval/scripts/run_evaluation_openml.py

-    if len(folds_to_keep) > 1:
+    if len(frameworks_run) > 1:


It should be this:

if len(folds_to_keep) > 1 and len(frameworks_run) > 1:

Thanks for pointing this out! compute_win_rate_per_dataset() had been updated to allow folds==1, is it intended?

autogluon-bench/src/autogluon/bench/eval/evaluation/evaluate_utils.py

Lines 444 to 445 in a9ba7b1

# if num_folds <= 1:

# raise AssertionError('Not enough folds to calculate stderr')

oh, if it works with folds==1 then you can ignore this.

Innixma · 2023-07-07T19:46:44Z

src/autogluon/bench/eval/scripts/run_evaluation_openml.py

        compute_win_rate_per_dataset(
            f1=frameworks_run[0], f2=frameworks_run[1], results_raw=results_raw, folds=folds_to_keep
        )
-    if compute_z_score and len(folds_to_keep) > 1:
+    if compute_z_score and len(frameworks_run) > 1:


if compute_z_score and len(folds_to_keep) > 1 and len(frameworks_run) > 1:

Good catch, will update!

suzhoum added 8 commits June 29, 2023 06:22

lint

f24f4c0

update config

8faf635

fix for benchmark name that contains module name

9c1d7e2

add input and output dirs

9cb99b5

convert to typer CLI for aggregate_amlb_results

b106ef5

convert to typer CLI for clean_amlb_results

271c5e8

convert to typer CLI for evaluate_amlb_results

effb39e

update README

762a53c

suzhoum requested a review from gidler July 1, 2023 01:42

Innixma reviewed Jul 6, 2023

View reviewed changes

use typer CLI as a wrapper command on top of original function, and r…

9680b81

…eplace argparse

suzhoum force-pushed the add_cli_for_eval branch 2 times, most recently from 266d4c0 to c076d3e Compare July 7, 2023 18:25

Innixma approved these changes Jul 7, 2023

View reviewed changes

suzhoum force-pushed the add_cli_for_eval branch from 1eb8a9c to c093200 Compare July 7, 2023 21:50

suzhoum added 2 commits July 7, 2023 21:50

fix

8607c9b

address comments

cb904ef

suzhoum force-pushed the add_cli_for_eval branch from c093200 to cb904ef Compare July 7, 2023 21:51

suzhoum merged commit 00cb94e into autogluon:master Jul 7, 2023
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cli for eval #32

Add cli for eval #32

suzhoum commented Jul 1, 2023

gidler commented Jul 3, 2023

suzhoum commented Jul 3, 2023

Innixma left a comment

Innixma Jul 6, 2023

suzhoum Jul 7, 2023

Innixma Jul 6, 2023

suzhoum Jul 6, 2023

Innixma Jul 6, 2023

Innixma Jul 6, 2023

Innixma Jul 6, 2023

suzhoum Jul 7, 2023

Innixma left a comment

Innixma Jul 7, 2023

suzhoum Jul 7, 2023 •

edited

Loading

Innixma Jul 7, 2023

Innixma Jul 7, 2023

suzhoum Jul 7, 2023

Innixma Jul 7, 2023

suzhoum Jul 7, 2023 •

edited

Loading

Innixma Jul 7, 2023

Innixma Jul 7, 2023

suzhoum Jul 7, 2023

	# if num_folds <= 1:
	# raise AssertionError('Not enough folds to calculate stderr')

		@app.command()
		def aggregate_amlb_results(

Add cli for eval #32

Add cli for eval #32

Conversation

suzhoum commented Jul 1, 2023

gidler commented Jul 3, 2023

suzhoum commented Jul 3, 2023

Innixma left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Innixma left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

suzhoum Jul 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

suzhoum Jul 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

suzhoum Jul 7, 2023 •

edited

Loading

suzhoum Jul 7, 2023 •

edited

Loading