Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add postprocessing helpers for run labelling #21

Merged
merged 2 commits into from
Apr 26, 2023

Conversation

simonbowly
Copy link
Member

Adds the run_label function in the postprocessing module which builds a label based on version and changed parameters.

Example usage:

import grblogtools as glt
from grblogtools.postprocessing import run_label
summary = glt.get_dataframe(["data/*.log"])
run_label(summary)

Output is a Series, the same as current "Log" column for the glass4 examples, but built from the parameter values so does not rely on the file naming convention.

0     912-MIPFocus2-Presolve1
1     912-Cuts2-Heuristics0.1
2     912-Cuts2-Heuristics0.1
3     912-MIPFocus2-Presolve1
4     912-Cuts2-Heuristics0.1
               ...           
59    912-MIPFocus1-Presolve2
60                  912-Cuts1
61          912-Heuristics0.1
62          912-Heuristics0.0
63    912-Cuts2-Heuristics0.0
Length: 64, dtype: object 

@simonbowly
Copy link
Member Author

@ronaldvdv would this do what you are after for #13?

I left omit_params empty by default - I guess either MIPGap, TimeLimit or both might be sensible to exclude, but it depends on what you are analyzing, so I think it would be better for the user to set this explicitly.

mattmilten pushed a commit that referenced this pull request Mar 31, 2022
* replace parameter_change with check_parameter_change

* make the summary line one according to google style

* remove super and make the docstring according to google style

* fix the isort error

* add simplex parser

* fix the bug (typo)

* simplify the simplex parser

* adjust the tests

* add barrier parser

* remove the extra log example

* add lp examples

* add the barrier parser

* add simplex parser

* fix the imports

* remove tests for simplex

* add tests for continuous parser

* add two more comments
@ronaldvdv
Copy link

@simonbowly I would love to have this merged, with one change suggested. The run label now includes three components (version, changed parameters, seed). Can we also have the changed parameters, formatted slightly differently (CutPasses=1, Presolve=2 or Default as we often do internally) as a separate column? Happy to discuss and adjust the PR if you agree.

@simonbowly
Copy link
Member Author

@ronaldvdv yes I was playing with this last week. The code is in a completely different state from when this PR/issue was last touched, so had to be rewritten. It no longer needs to be in post-processing since the data is in a structured state before being extracted as a dataframe. Here's the behaviour:

>>> (
    glt.parse("data/*.log").summary()
    [['Version', 'Label', 'Seed', 'ChangedParams']]
    .sort_values(['Seed', 'ChangedParams']).head(20)
)
   Version                Label  Seed  ChangedParams
60   9.1.2              Default     0              0
6    9.1.2                Cuts0     0              1
15   9.1.2                Cuts1     0              1
24   9.1.2                Cuts2     0              1
27   9.1.2          Heuristics0     0              1
30   9.1.2        Heuristics0.1     0              1
39   9.1.2            MIPFocus1     0              1
45   9.1.2            MIPFocus2     0              1
51   9.1.2            MIPFocus3     0              1
54   9.1.2            Presolve1     0              1
57   9.1.2            Presolve2     0              1
0    9.1.2    Cuts0-Heuristics0     0              2
3    9.1.2  Cuts0-Heuristics0.1     0              2
9    9.1.2    Cuts1-Heuristics0     0              2
12   9.1.2  Cuts1-Heuristics0.1     0              2
18   9.1.2    Cuts2-Heuristics0     0              2
21   9.1.2  Cuts2-Heuristics0.1     0              2
33   9.1.2  MIPFocus1-Presolve1     0              2
36   9.1.2  MIPFocus1-Presolve2     0              2
42   9.1.2  MIPFocus2-Presolve1     0              2

'Label' lists non-default parameters, ignoring seed, logfile, and timelimit (Default indicates no changes). We could allow the user to customize that by providing an argument to the summary() method, if they want to omit a different list of parameters from the labels, include the version in the label, etc

Does that do the trick? Do the names make sense? You just want an = sign added in the params list?

@ronaldvdv
Copy link

Cool, I was already expecting some more hidden work behind the scenes :-D Looks good!

  • For the argument to customize: what if we allow the user to provide a callback that takes a dict (parameter->value mapping) as an argument, potentially other things like the version and returns a string? The default value for that would be a function that mimics your formatting (MIPFocus1-Presolve1) and leaves out seed/logfile/timelimit, but all of that can be completely customized. My preferred formatting (MIPFocus=1, Presolve=1) could just be another function that's included and can be passed as the callback. If needed we could provide a helper function that takes some settings (separator, parameters to ignore, version or not) and returns a callback.
  • For the column names, I feel it would make sense to make Label and ChangedParams more aligned. What about ChangedParams and NumChangedParams?

@simonbowly
Copy link
Member Author

Sure, a callback makes sense. I would not go with 'ChangedParams' if it might contain anything including the version though. The use of the field is for labelling in plots, so to me Label makes sense. However, we could have the callback return a dict so you can add whatever fields you want. e.g.

def label_callback(params: Dict[str, Union[str,float,int]], model_name: str, model_path: Path, version: str, seed: int):
    ...
    return {'Label': some_label, 'NumChangedParams': some_count}

and you get the fields you returned in the summary dataframe?

@simonbowly
Copy link
Member Author

Discussed with @mattmilten, and we think this is becoming a bit complex. A simpler alternative: we provide a column "ChangedParams" which contains the parameter dictionary. Then, the user can use pandas functions like .apply to turn this into a string easily, with whatever column name they like. We can add some docs to show simple examples.

@simonbowly
Copy link
Member Author

Essentially, it would allow you to do this:

>>> (
    glt.parse("data/*.log").summary()
    ['ChangedParams'].apply(lambda d: "-".join(f"{k}={v}" for k, v in d.items() if k != 'TimeLimit')
    .head()
)
0      Heuristics=0-Cuts=0
1      Heuristics=0-Cuts=0
2      Heuristics=0-Cuts=0
3    Heuristics=0.1-Cuts=0
4    Heuristics=0.1-Cuts=0
Name: ChangedParams, dtype: object

which is exactly equivalent to providing a callback to summary(). @ronaldvdv what do you think?

@ronaldvdv
Copy link

Yes, love it! Much better to have the original data (parameter dictionary) available to the end user.

Adds a 'ChangedParams' column to the summary dataframe, containing a
dictionary of non-default parameter values for the run.
@simonbowly
Copy link
Member Author

@mattmilten can you please review? This would close #13.

I think a 2.1.0 can be released after merging this, there are a few unreleased changes waiting.

@mattmilten
Copy link
Member

Looks great, thanks!

@mattmilten mattmilten merged commit bf26a3c into Gurobi:master Apr 26, 2023
@simonbowly simonbowly deleted the postprocessing branch April 26, 2023 12:53
@Gurobi Gurobi deleted a comment from gbkgwyneth Apr 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants