-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel tabular tasks #41
Conversation
2203f12
to
9b73052
Compare
f38bf32
to
0ca757b
Compare
For tabular, we also support custom framework defined by amlb_user_dir. By mounting the amlb_user_dir to custom_configs/amlb_configs and lambda dir src/autogluon/bench/cloud/aws/batch_stack/lambdas, we are able to make user_dir visible to the package.
use default and custom amlb config to determine amlb tasks and specific fold to run (capped by `folds`)
c3a0a92
to
b05595e
Compare
b05595e
to
250de48
Compare
250de48
to
7e3878f
Compare
@@ -89,7 +121,85 @@ def save_configs(configs: dict, uid: str): | |||
return config_file_path | |||
|
|||
|
|||
def process_combination(combination, keys, metrics_bucket, batch_job_queue, batch_job_definition): | |||
def clone_automlbenchmark_repo(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this function clone the whole repo or only the resource folder? Seems like it's later. If so, consider renaming the function and provide a comment on why we are doing this. Currently, it's kind of confusing
with open(file, "r") as f: | ||
amlb_benchmark_configs = yaml.safe_load(f) | ||
for item in amlb_benchmark_configs: | ||
folds = min(item.get("folds", default_max_folds), default_max_folds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? If the user specified folds=20, we will only get the default_max_folds, which is 10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default_max_folds
is actually the amlb config folds
in contraints.yaml
. For example, if constraint == 'test', the folds
, aka default_max_folds
will be set to 2, and even if the user specified folds=20
for individual task
, we will only run max 2 folds.
# Iterate through the combinations and the second set of keys | ||
for combo in specific_key_combinations: | ||
for benchmark, tasks in config["module_configs"]["tabular"]["fold_to_run"].items(): | ||
for task, fold_numbers in tasks.items(): | ||
for fold_num in fold_numbers: | ||
new_config = {key: config[key] for key in common_keys} | ||
new_config.update(dict(zip(specific_keys, combo))) | ||
new_config["amlb_benchmark"] = benchmark | ||
new_config["amlb_task"] = task | ||
new_config["fold_to_run"] = fold_num | ||
job_id, config_s3_path = process_combination( | ||
new_config, metrics_bucket, batch_job_queue, batch_job_definition | ||
) | ||
job_configs[job_id] = config_s3_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this basically is generating a config for each fold in each task in each benchmark? And the config would only contain keys that are presented in the original config? It takes some time to understand these code, maybe consider adding some high level comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. What you described was mostly correct. Since only amlb_benchmark
was required, if amlb_task
does not have keys, we assume all tasks in the amlb_benchmark
will be run; similarly, if folds_to_run
does not have any key, we assume all folds in amlb_task
will be run. If the aforementioned have keys, we only populate folds_to_run
for the existing keys.
job_configs = {} | ||
|
||
# Generate combinations for the first set of keys | ||
specific_key_combinations = list( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are actually values instead of keys right? I saw on line 302 it was zipped with the keys. This naming is kind of confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
0dd6020
to
3783112
Compare
3783112
to
91e7840
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Issue #, if available:
Description of changes:
This PR adds a few feature improvements to tabular module
AutoGluon:stable
) insetup.sh
so that docker container is built with the required version of framework.amlb_user_dir
or the default amlbresources/
, and generate config combination for Batch jobs based on individual fold inaws
mode.amlb_user_dir
to pre-configured directories was required to use theamlb_user_dir
content during docker build, and in lambda function.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.