Add postprocessing helpers for run labelling #21

simonbowly · 2022-01-13T22:02:49Z

Adds the run_label function in the postprocessing module which builds a label based on version and changed parameters.

Example usage:

import grblogtools as glt
from grblogtools.postprocessing import run_label
summary = glt.get_dataframe(["data/*.log"])
run_label(summary)

Output is a Series, the same as current "Log" column for the glass4 examples, but built from the parameter values so does not rely on the file naming convention.

0     912-MIPFocus2-Presolve1
1     912-Cuts2-Heuristics0.1
2     912-Cuts2-Heuristics0.1
3     912-MIPFocus2-Presolve1
4     912-Cuts2-Heuristics0.1
               ...           
59    912-MIPFocus1-Presolve2
60                  912-Cuts1
61          912-Heuristics0.1
62          912-Heuristics0.0
63    912-Cuts2-Heuristics0.0
Length: 64, dtype: object

simonbowly · 2022-01-13T22:06:20Z

@ronaldvdv would this do what you are after for #13?

I left omit_params empty by default - I guess either MIPGap, TimeLimit or both might be sensible to exclude, but it depends on what you are analyzing, so I think it would be better for the user to set this explicitly.

* replace parameter_change with check_parameter_change * make the summary line one according to google style * remove super and make the docstring according to google style * fix the isort error * add simplex parser * fix the bug (typo) * simplify the simplex parser * adjust the tests * add barrier parser * remove the extra log example * add lp examples * add the barrier parser * add simplex parser * fix the imports * remove tests for simplex * add tests for continuous parser * add two more comments

ronaldvdv · 2023-03-08T15:14:03Z

@simonbowly I would love to have this merged, with one change suggested. The run label now includes three components (version, changed parameters, seed). Can we also have the changed parameters, formatted slightly differently (CutPasses=1, Presolve=2 or Default as we often do internally) as a separate column? Happy to discuss and adjust the PR if you agree.

simonbowly · 2023-03-08T23:07:03Z

@ronaldvdv yes I was playing with this last week. The code is in a completely different state from when this PR/issue was last touched, so had to be rewritten. It no longer needs to be in post-processing since the data is in a structured state before being extracted as a dataframe. Here's the behaviour:

>>> (
    glt.parse("data/*.log").summary()
    [['Version', 'Label', 'Seed', 'ChangedParams']]
    .sort_values(['Seed', 'ChangedParams']).head(20)
)
   Version                Label  Seed  ChangedParams
60   9.1.2              Default     0              0
6    9.1.2                Cuts0     0              1
15   9.1.2                Cuts1     0              1
24   9.1.2                Cuts2     0              1
27   9.1.2          Heuristics0     0              1
30   9.1.2        Heuristics0.1     0              1
39   9.1.2            MIPFocus1     0              1
45   9.1.2            MIPFocus2     0              1
51   9.1.2            MIPFocus3     0              1
54   9.1.2            Presolve1     0              1
57   9.1.2            Presolve2     0              1
0    9.1.2    Cuts0-Heuristics0     0              2
3    9.1.2  Cuts0-Heuristics0.1     0              2
9    9.1.2    Cuts1-Heuristics0     0              2
12   9.1.2  Cuts1-Heuristics0.1     0              2
18   9.1.2    Cuts2-Heuristics0     0              2
21   9.1.2  Cuts2-Heuristics0.1     0              2
33   9.1.2  MIPFocus1-Presolve1     0              2
36   9.1.2  MIPFocus1-Presolve2     0              2
42   9.1.2  MIPFocus2-Presolve1     0              2

'Label' lists non-default parameters, ignoring seed, logfile, and timelimit (Default indicates no changes). We could allow the user to customize that by providing an argument to the summary() method, if they want to omit a different list of parameters from the labels, include the version in the label, etc

Does that do the trick? Do the names make sense? You just want an = sign added in the params list?

ronaldvdv · 2023-03-09T14:55:15Z

Cool, I was already expecting some more hidden work behind the scenes :-D Looks good!

For the argument to customize: what if we allow the user to provide a callback that takes a dict (parameter->value mapping) as an argument, potentially other things like the version and returns a string? The default value for that would be a function that mimics your formatting (MIPFocus1-Presolve1) and leaves out seed/logfile/timelimit, but all of that can be completely customized. My preferred formatting (MIPFocus=1, Presolve=1) could just be another function that's included and can be passed as the callback. If needed we could provide a helper function that takes some settings (separator, parameters to ignore, version or not) and returns a callback.
For the column names, I feel it would make sense to make Label and ChangedParams more aligned. What about ChangedParams and NumChangedParams?

simonbowly · 2023-03-10T08:51:56Z

Sure, a callback makes sense. I would not go with 'ChangedParams' if it might contain anything including the version though. The use of the field is for labelling in plots, so to me Label makes sense. However, we could have the callback return a dict so you can add whatever fields you want. e.g.

def label_callback(params: Dict[str, Union[str,float,int]], model_name: str, model_path: Path, version: str, seed: int):
    ...
    return {'Label': some_label, 'NumChangedParams': some_count}

and you get the fields you returned in the summary dataframe?

simonbowly · 2023-03-10T09:20:49Z

Discussed with @mattmilten, and we think this is becoming a bit complex. A simpler alternative: we provide a column "ChangedParams" which contains the parameter dictionary. Then, the user can use pandas functions like .apply to turn this into a string easily, with whatever column name they like. We can add some docs to show simple examples.

simonbowly · 2023-03-10T09:29:10Z

Essentially, it would allow you to do this:

>>> (
    glt.parse("data/*.log").summary()
    ['ChangedParams'].apply(lambda d: "-".join(f"{k}={v}" for k, v in d.items() if k != 'TimeLimit')
    .head()
)
0      Heuristics=0-Cuts=0
1      Heuristics=0-Cuts=0
2      Heuristics=0-Cuts=0
3    Heuristics=0.1-Cuts=0
4    Heuristics=0.1-Cuts=0
Name: ChangedParams, dtype: object

which is exactly equivalent to providing a callback to summary(). @ronaldvdv what do you think?

ronaldvdv · 2023-03-10T09:40:42Z

Yes, love it! Much better to have the original data (parameter dictionary) available to the end user.

Adds a 'ChangedParams' column to the summary dataframe, containing a dictionary of non-default parameter values for the run.

simonbowly · 2023-04-26T05:35:50Z

@mattmilten can you please review? This would close #13.

I think a 2.1.0 can be released after merging this, there are a few unreleased changes waiting.

mattmilten · 2023-04-26T11:06:43Z

Looks great, thanks!

simonbowly force-pushed the postprocessing branch from 9380b6c to 2605c6a Compare March 8, 2023 23:09

simonbowly force-pushed the postprocessing branch from 2605c6a to 449ceb4 Compare April 26, 2023 05:08

simonbowly requested a review from mattmilten April 26, 2023 05:11

simonbowly added 2 commits April 26, 2023 15:32

Add ChangedParams column

24f8fe7

Adds a 'ChangedParams' column to the summary dataframe, containing a dictionary of non-default parameter values for the run.

Update changelog

aa53003

simonbowly force-pushed the postprocessing branch from e52d3ec to aa53003 Compare April 26, 2023 05:34

mattmilten merged commit bf26a3c into Gurobi:master Apr 26, 2023

simonbowly deleted the postprocessing branch April 26, 2023 12:53

Gurobi deleted a comment from gbkgwyneth Apr 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add postprocessing helpers for run labelling #21

Add postprocessing helpers for run labelling #21

simonbowly commented Jan 13, 2022

simonbowly commented Jan 13, 2022

ronaldvdv commented Mar 8, 2023

simonbowly commented Mar 8, 2023

ronaldvdv commented Mar 9, 2023

simonbowly commented Mar 10, 2023

simonbowly commented Mar 10, 2023

simonbowly commented Mar 10, 2023

ronaldvdv commented Mar 10, 2023

simonbowly commented Apr 26, 2023

mattmilten commented Apr 26, 2023

Add postprocessing helpers for run labelling #21

Add postprocessing helpers for run labelling #21

Conversation

simonbowly commented Jan 13, 2022

simonbowly commented Jan 13, 2022

ronaldvdv commented Mar 8, 2023

simonbowly commented Mar 8, 2023

ronaldvdv commented Mar 9, 2023

simonbowly commented Mar 10, 2023

simonbowly commented Mar 10, 2023

simonbowly commented Mar 10, 2023

ronaldvdv commented Mar 10, 2023

simonbowly commented Apr 26, 2023

mattmilten commented Apr 26, 2023