Add a search filter to merge models #1986

cjmcgill · 2020-06-26T00:52:25Z

Motivation or Problem

The mergeModels.py script runs very slowly for large models, taking hours or days for moderately large models. This appears to be driven by the number of isomorphic checks that must be run, with every species from model2 having to be compared against most species in model1 and the same for reactions.

Description of Changes

Added a molecular weight filter for the species search, where species from model2 will only be compared against species in model1 with the same molecular weight. Similarly for reaction comparisons, reactions will only be compared against reactions that have the same sum of molecular weights for their reactants. This is done by placing the species and reactions in search dictionaries keyed by their molecular weights.
After a species (or reaction) from model2 is matched to one from model1, that species is removed from the model1 search dictionary. After a match, no subsequent species in model2 should be expected to match that species again. This is a secondary speed up function, only becoming significant when the model is very large and has many species or reactions with the same weight.

Testing

Passed the unit test for mergeModelsTest.py.

Tested a combination of the outputs from rmg examples ethane-oxidation and ch3no2. Same number of species and reactions in the result for this branch as for the master.

Merged a model for dodecane pyrolysis with itself (438 species, 12510 reactions). No erroneous nonmatches were added. The run time on a laptop was reduced from 35 minutes to 28 seconds. This speed increase was achieved with just the first change. A model large enough to observe benefits from the second change has not been attempted.

Reviewer Tips

The second change would affect how duplicate reactions are handled by the merge function. As of now, the merge function would block any number of reactions in the merging model as long as there was at least 1 identical reaction in the original. This change would allow through a number of reactions in excess of what was in the original (I believe chosen at random). This would be a change for when the merging model has more versions of a reaction than the original, but it's not entirely clear that duplicate reactions are being handled well as it stands now. This has the potential to provide a further speed increase with uncertain implications for the edge case where there are duplicate reactions.

I'm not advocating for this necessarily if this is any kind of regularly encountered situation, but the existing handling of duplicates is somewhat strange so I don't know if this is disqualifying.

Example for A reactions in original merged with B reactions in new model.
Current form:

	0	B1	B1B2	B1B2B3
0	0	B1	B1B2	B1B2B3
A1	A1	A1	A1	A1
A1A2	A1A2	A1A2	A1A2	A1A2

With Change:

	0	B1	B1B2	B1B2B3
0	0	B1	B1B2	B1B2B3
A1	A1	A1	A1B2	A1B2B3
A1A2	A1A2	A1A2	A1A2	A1A2B3

codecov · 2020-06-26T01:49:15Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (300c782) 55.32% compared to head (4a35b7c) 55.33%.
Report is 66 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1986      +/-   ##
==========================================
+ Coverage   55.32%   55.33%   +0.01%     
==========================================
  Files         125      125              
  Lines       37143    37158      +15     
==========================================
+ Hits        20550    20563      +13     
- Misses      16593    16595       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mjohnson541 · 2020-07-01T20:34:09Z

Once you've done the species comparison you can map names between models and you know what species are only in model 1 and what species are only in model 2. When comparing reactions can't you just compare whether not there's a reaction with the same appropriately mapped reactant and product names? For example for a reaction from model 1 can't you just convert the names to model 2 (if you can't it's unique) and then compare the names with the reactant and product names for each reaction in model 2?

cjmcgill · 2020-07-03T18:36:10Z

Once you've done the species comparison you can map names between models and you know what species are only in model 1 and what species are only in model 2. When comparing reactions can't you just compare whether not there's a reaction with the same appropriately mapped reactant and product names? For example for a reaction from model 1 can't you just convert the names to model 2 (if you can't it's unique) and then compare the names with the reactant and product names for each reaction in model 2?

Absolutely. I'll take another spin at doing the comparison that way. The name comparison is a lot faster than the isomorphism check so that would speed things up a lot. Without the filter and the dictionary key check, it will still be doing this check a lot of times, we'll see what speed looks like. Maybe I can find a way to key a dictionary with a name set for a filtered search.

Seems like the molecular weight check is still something that will work well for the species search even if I take it off of the reaction search.

This approach does rest on the assumption that the original model will not have species that are isomorphic matches with each other. This is probably an assumption we are comfortable with.

The question about how to deal with duplicates remains. Though I think I'm coming back around to the position that we shouldn't add any number of instances of a reaction if there are any versions of it in the original model.

github-actions · 2023-06-21T22:48:44Z

This pull request is being automatically marked as stale because it has not received any interaction in the last 90 days. Please leave a comment if this is still a relevant pull request, otherwise it will automatically be closed in 30 days.

github-actions · 2023-09-20T08:07:41Z

This pull request is being automatically marked as stale because it has not received any interaction in the last 90 days. Please leave a comment if this is still a relevant pull request, otherwise it will automatically be closed in 30 days.

github-actions · 2023-09-28T22:13:44Z

Regression Testing Results

⚠️ One or more regression tests failed.
Please download the failed results and run the tests locally or check the log to see why.

Detailed regression test results.

Regression test aromatics:

Reference: Execution time (DD:HH:MM:SS): 00:00:01:24
Current: Execution time (DD:HH:MM:SS): 00:00:01:29
Reference: Memory used: 2109.12 MB
Current: Memory used: 2103.01 MB

aromatics Passed Core Comparison ✅

Original model has 15 species.
Test model has 15 species. ✅
Original model has 11 reactions.
Test model has 11 reactions. ✅

aromatics Passed Edge Comparison ✅

Original model has 106 species.
Test model has 106 species. ✅
Original model has 358 reactions.
Test model has 358 reactions. ✅

Observables Test Case: Aromatics Comparison

✅ All Observables varied by less than 0.500 on average between old model and new model in all conditions!

aromatics Passed Observable Testing ✅

Regression test liquid_oxidation:

Reference: Execution time (DD:HH:MM:SS): 00:00:02:52
Current: Execution time (DD:HH:MM:SS): 00:00:02:59
Reference: Memory used: 2219.38 MB
Current: Memory used: 2212.44 MB

liquid_oxidation Failed Core Comparison ❌

Original model has 37 species.
Test model has 37 species. ✅
Original model has 215 reactions.
Test model has 215 reactions. ✅

Non-identical kinetics! ❌
original:
rxn: CCCC(C)O[O](20) + CCCCCO[O](103) <=> oxygen(1) + CCCC(C)[O](64) + CCCCC[O](128) origin: Peroxyl_Disproportionation
tested:
rxn: CCCC(C)O[O](20) + CCCCCO[O](103) <=> oxygen(1) + CCCC(C)[O](64) + CCCCC[O](128) origin: Peroxyl_Disproportionation

k(1bar)	300K	400K	500K	600K	800K	1000K	1500K	2000K
k(T):	7.83	7.49	7.23	7.02	6.68	6.42	5.95	5.61
k(T):	3.77	4.45	4.86	5.14	5.48	5.68	5.96	6.09

kinetics: Arrhenius(A=(3.18266e+20,'cm^3/(mol*s)'), n=-2.694, Ea=(0,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-5R-R_7R!H->C_N-7C-inRing""")
kinetics: Arrhenius(A=(3.2e+12,'cm^3/(mol*s)'), n=0, Ea=(3.756,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-5R-R_7R!H->C_N-7C-inRing_Ext-5R-R""")
kinetics: Estimated from node Root_Ext-5R-R_7R!H->C_N-7C-inRing
kinetics: Estimated from node Root_Ext-5R-R_7R!H->C_N-7C-inRing_Ext-5R-R

liquid_oxidation Failed Edge Comparison ❌

Original model has 202 species.
Test model has 202 species. ✅
Original model has 1613 reactions.
Test model has 1610 reactions. ❌
The original model has 4 reactions that the tested model does not have. ❌
rxn: C[CH]CCCO(157) + CCCCCO[O](103) <=> CC=CCCO(183) + CCCCCOO(105) origin: Disproportionation
rxn: C[CH]CCCO(157) + CCCCCO[O](103) <=> C=CCCCO(184) + CCCCCOO(105) origin: Disproportionation
rxn: C[CH]CCCO(157) + C[CH]CCCO(157) <=> CC=CCCO(183) + CCCCCO(130) origin: Disproportionation
rxn: C[CH]CCCO(157) + C[CH]CCCO(157) <=> C=CCCCO(184) + CCCCCO(130) origin: Disproportionation
The tested model has 1 reactions that the original model does not have. ❌
rxn: CCCCCO[O](103) + CCCCCO[O](103) <=> oxygen(1) + CCCCC=O(120) + CCCCCO(130) origin: Peroxyl_Termination

Non-identical kinetics! ❌
original:
rxn: CCCC(C)O[O](20) + CCCCCO[O](103) <=> oxygen(1) + CCCC(C)[O](64) + CCCCC[O](128) origin: Peroxyl_Disproportionation
tested:
rxn: CCCC(C)O[O](20) + CCCCCO[O](103) <=> oxygen(1) + CCCC(C)[O](64) + CCCCC[O](128) origin: Peroxyl_Disproportionation

k(1bar)	300K	400K	500K	600K	800K	1000K	1500K	2000K
k(T):	7.83	7.49	7.23	7.02	6.68	6.42	5.95	5.61
k(T):	3.77	4.45	4.86	5.14	5.48	5.68	5.96	6.09

kinetics: Arrhenius(A=(3.18266e+20,'cm^3/(mol*s)'), n=-2.694, Ea=(0,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-5R-R_7R!H->C_N-7C-inRing""")
kinetics: Arrhenius(A=(3.2e+12,'cm^3/(mol*s)'), n=0, Ea=(3.756,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-5R-R_7R!H->C_N-7C-inRing_Ext-5R-R""")
kinetics: Estimated from node Root_Ext-5R-R_7R!H->C_N-7C-inRing
kinetics: Estimated from node Root_Ext-5R-R_7R!H->C_N-7C-inRing_Ext-5R-R

Observables Test Case: liquid_oxidation Comparison

✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions!

liquid_oxidation Passed Observable Testing ✅

Regression test nitrogen:

Reference: Execution time (DD:HH:MM:SS): 00:00:01:47
Current: Execution time (DD:HH:MM:SS): 00:00:02:04
Reference: Memory used: 2236.55 MB
Current: Memory used: 2213.81 MB

nitrogen Failed Core Comparison ❌

Original model has 41 species.
Test model has 41 species. ✅
Original model has 360 reactions.
Test model has 359 reactions. ❌
The original model has 1 reactions that the tested model does not have. ❌
rxn: HNO(48) + HCO(13) <=> NO(38) + CH2O(18) origin: H_Abstraction

nitrogen Failed Edge Comparison ❌

Original model has 132 species.
Test model has 132 species. ✅
Original model has 997 reactions.
Test model has 995 reactions. ❌
The original model has 2 reactions that the tested model does not have. ❌
rxn: HNO(48) + HCO(13) <=> NO(38) + CH2O(18) origin: H_Abstraction
rxn: HON(T)(83) + HCO(13) <=> NO(38) + CH2O(18) origin: Disproportionation

Observables Test Case: NC Comparison

✅ All Observables varied by less than 0.200 on average between old model and new model in all conditions!

nitrogen Passed Observable Testing ✅

Regression test oxidation:

Reference: Execution time (DD:HH:MM:SS): 00:00:03:17
Current: Execution time (DD:HH:MM:SS): 00:00:03:52
Reference: Memory used: 2089.97 MB
Current: Memory used: 2086.11 MB

oxidation Passed Core Comparison ✅

Original model has 59 species.
Test model has 59 species. ✅
Original model has 694 reactions.
Test model has 694 reactions. ✅

oxidation Passed Edge Comparison ✅

Original model has 230 species.
Test model has 230 species. ✅
Original model has 1526 reactions.
Test model has 1526 reactions. ✅

Observables Test Case: Oxidation Comparison

✅ All Observables varied by less than 0.500 on average between old model and new model in all conditions!

oxidation Passed Observable Testing ✅

Regression test sulfur:

Reference: Execution time (DD:HH:MM:SS): 00:00:01:08
Current: Execution time (DD:HH:MM:SS): 00:00:01:14
Reference: Memory used: 2183.42 MB
Current: Memory used: 2182.35 MB

sulfur Passed Core Comparison ✅

Original model has 27 species.
Test model has 27 species. ✅
Original model has 74 reactions.
Test model has 74 reactions. ✅

sulfur Failed Edge Comparison ❌

Original model has 89 species.
Test model has 89 species. ✅
Original model has 227 reactions.
Test model has 227 reactions. ✅
The original model has 1 reactions that the tested model does not have. ❌
rxn: O(4) + SO2(15) (+N2) <=> SO3(16) (+N2) origin: primarySulfurLibrary
The tested model has 1 reactions that the original model does not have. ❌
rxn: O(4) + SO2(15) (+N2) <=> SO3(16) (+N2) origin: primarySulfurLibrary

Observables Test Case: SO2 Comparison

The following observables did not match:

❌ Observable species O=S=O varied by more than 0.100 on average between old model SO2(15) and new model SO2(15) in condition 1.

⚠️ The following reaction conditions had some discrepancies:
Condition 1:
Reactor Type: IdealGasReactor
Reaction Time: 0.01 s
T0: 900 K
P0: 30 bar
Initial Mole Fractions: {'S': 0.000756, '[O][O]': 0.00129, 'N#N': 0.997954}

sulfur Failed Observable Testing ❌

Regression test superminimal:

Reference: Execution time (DD:HH:MM:SS): 00:00:00:43
Current: Execution time (DD:HH:MM:SS): 00:00:00:49
Reference: Memory used: 2351.65 MB
Current: Memory used: 2346.19 MB

superminimal Passed Core Comparison ✅

Original model has 13 species.
Test model has 13 species. ✅
Original model has 21 reactions.
Test model has 21 reactions. ✅

superminimal Failed Edge Comparison ❌

Original model has 18 species.
Test model has 18 species. ✅
Original model has 28 reactions.
Test model has 28 reactions. ✅

Non-identical kinetics! ❌
original:
rxn: O2(2) + [O]O(4) => [O]OO[O](17) + [H](3) origin: PDepNetwork #5
tested:
rxn: O2(2) + [O]O(4) => [O]OO[O](17) + [H](3) origin: PDepNetwork #5

k(1bar)	300K	400K	500K	600K	800K	1000K	1500K	2000K
k(T):	-83.19	-61.32	-48.07	-39.17	-27.88	-20.98	-11.59	-6.79
k(T):	-82.04	-60.38	-47.27	-38.46	-27.30	-20.50	-11.24	-6.53

kinetics: Chebyshev(coeffs=[[-39.45,-3.91e-05,-2.721e-05,-1.511e-05],[38.1,-3.967e-05,-2.761e-05,-1.533e-05],[0.4317,-1.786e-05,-1.243e-05,-6.901e-06],[0.1078,-7.486e-06,-5.21e-06,-2.893e-06],[0.02697,-2.978e-06,-2.072e-06,-1.15e-06],[-0.006763,-1.152e-06,-8.017e-07,-4.45e-07]], kunits='cm^3/(mol*s)', Tmin=(300,'K'), Tmax=(2000,'K'), Pmin=(0.01,'atm'), Pmax=(98.692,'atm'))
kinetics: Chebyshev(coeffs=[[-38.7,-3.614e-05,-2.516e-05,-1.397e-05],[37.66,-3.773e-05,-2.626e-05,-1.458e-05],[0.396,-1.739e-05,-1.21e-05,-6.721e-06],[0.09623,-7.406e-06,-5.155e-06,-2.862e-06],[0.01887,-2.94e-06,-2.046e-06,-1.136e-06],[-0.003473,-1.16e-06,-8.076e-07,-4.483e-07]], kunits='cm^3/(mol*s)', Tmin=(300,'K'), Tmax=(2000,'K'), Pmin=(0.01,'atm'), Pmax=(98.692,'atm'))
Identical kinetics comments:
kinetics:

Regression test RMS_constantVIdealGasReactor_superminimal:

Reference: Execution time (DD:HH:MM:SS): 00:00:03:17
Current: Execution time (DD:HH:MM:SS): 00:00:03:24
Reference: Memory used: 2736.23 MB
Current: Memory used: 2729.36 MB

RMS_constantVIdealGasReactor_superminimal Passed Core Comparison ✅

Original model has 13 species.
Test model has 13 species. ✅
Original model has 19 reactions.
Test model has 19 reactions. ✅

RMS_constantVIdealGasReactor_superminimal Passed Edge Comparison ✅

Original model has 13 species.
Test model has 13 species. ✅
Original model has 19 reactions.
Test model has 19 reactions. ✅

Observables Test Case: RMS_constantVIdealGasReactor_superminimal Comparison

✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions!

RMS_constantVIdealGasReactor_superminimal Passed Observable Testing ✅

Regression test RMS_CSTR_liquid_oxidation:

Reference: Execution time (DD:HH:MM:SS): 00:00:08:09
Current: Execution time (DD:HH:MM:SS): 00:00:08:28
Reference: Memory used: 2730.66 MB
Current: Memory used: 2705.98 MB

RMS_CSTR_liquid_oxidation Failed Core Comparison ❌

Original model has 37 species.
Test model has 37 species. ✅
Original model has 232 reactions.
Test model has 233 reactions. ❌
The tested model has 1 reactions that the original model does not have. ❌
rxn: CCO[O](36) <=> [OH](22) + CC=O(62) origin: intra_H_migration

RMS_CSTR_liquid_oxidation Failed Edge Comparison ❌

Original model has 206 species.
Test model has 206 species. ✅
Original model has 1508 reactions.
Test model has 1508 reactions. ✅
The original model has 2 reactions that the tested model does not have. ❌
rxn: CCCO[O](36) <=> CC[CH]OO(45) origin: intra_H_migration
rxn: CCO[O](34) <=> C[CH]OO(62) origin: intra_H_migration
The tested model has 2 reactions that the original model does not have. ❌
rxn: CCO[O](36) <=> [OH](22) + CC=O(62) origin: intra_H_migration
rxn: CCCO[O](34) <=> [OH](22) + CCC=O(44) origin: intra_H_migration

Non-identical kinetics! ❌
original:
rxn: CCCO[O](36) + CCCC(C)O[O](33) <=> oxygen(1) + CCC[O](96) + CCCC(C)[O](61) origin: Peroxyl_Disproportionation
tested:
rxn: CCCO[O](34) + CCCC(C)O[O](33) <=> oxygen(1) + CCC[O](98) + CCCC(C)[O](65) origin: Peroxyl_Disproportionation

k(1bar)	300K	400K	500K	600K	800K	1000K	1500K	2000K
k(T):	3.69	4.39	4.82	5.10	5.45	5.66	5.94	6.08
k(T):	7.83	7.49	7.23	7.02	6.68	6.42	5.95	5.61

kinetics: Arrhenius(A=(3.2e+12,'cm^3/(mol*s)'), n=0, Ea=(3.866,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-5R-R_7R!H->C_N-7C-inRing_Ext-5R-R""")
kinetics: Arrhenius(A=(3.18266e+20,'cm^3/(mol*s)'), n=-2.694, Ea=(0,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-5R-R_7R!H->C_N-7C-inRing""")
kinetics: Estimated from node Root_Ext-5R-R_7R!H->C_N-7C-inRing_Ext-5R-R
kinetics: Estimated from node Root_Ext-5R-R_7R!H->C_N-7C-inRing

Observables Test Case: RMS_CSTR_liquid_oxidation Comparison

✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions!

RMS_CSTR_liquid_oxidation Passed Observable Testing ✅

beep boop this comment was written by a bot 🤖

github-actions · 2023-12-28T08:07:47Z

This pull request is being automatically marked as stale because it has not received any interaction in the last 90 days. Please leave a comment if this is still a relevant pull request, otherwise it will automatically be closed in 30 days.

JacksonBurns · 2024-01-07T22:44:10Z

All parties originally involved are otherwise engaged, and this is a nice to have not a need to have - closing this.

cjmcgill added 4 commits June 17, 2020 22:22

Added molecular weight filter for merge model search

8b7083d

Remove search targets from merge model after matched once

eba43bb

Removed an unnecessary type check

fd6e10b

Added comments and clarified a variable name

893bc9e

cjmcgill requested review from mjohnson541 and amarkpayne June 26, 2020 00:52

cjmcgill self-assigned this Jun 26, 2020

amarkpayne changed the base branch from master to main October 20, 2021 03:55

github-actions bot added the stale stale issue/PR as determined by actions bot label Jun 21, 2023

amarkpayne removed their request for review June 22, 2023 00:21

github-actions bot removed the stale stale issue/PR as determined by actions bot label Jun 22, 2023

github-actions bot added the stale stale issue/PR as determined by actions bot label Sep 20, 2023

Merge branch 'main' into merge_models

4a35b7c

github-actions bot removed the stale stale issue/PR as determined by actions bot label Sep 29, 2023

github-actions bot added stale stale issue/PR as determined by actions bot and removed stale stale issue/PR as determined by actions bot labels Dec 28, 2023

JacksonBurns closed this Jan 7, 2024

JacksonBurns deleted the merge_models branch January 7, 2024 22:44

JacksonBurns added the abandoned abandoned issue/PR as determined by actions bot label Jan 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a search filter to merge models #1986

Add a search filter to merge models #1986

cjmcgill commented Jun 26, 2020 •

edited

Loading

codecov bot commented Jun 26, 2020 •

edited

Loading

mjohnson541 commented Jul 1, 2020

cjmcgill commented Jul 3, 2020

github-actions bot commented Jun 21, 2023

github-actions bot commented Sep 20, 2023

github-actions bot commented Sep 28, 2023

Regression test aromatics:

Regression test liquid_oxidation:

Regression test nitrogen:

Regression test oxidation:

Regression test sulfur:

Regression test superminimal:

Regression test RMS_constantVIdealGasReactor_superminimal:

Regression test RMS_CSTR_liquid_oxidation:

github-actions bot commented Dec 28, 2023

JacksonBurns commented Jan 7, 2024

Add a search filter to merge models #1986

Add a search filter to merge models #1986

Conversation

cjmcgill commented Jun 26, 2020 • edited Loading

Motivation or Problem

Description of Changes

Testing

Reviewer Tips

codecov bot commented Jun 26, 2020 • edited Loading

Codecov Report

mjohnson541 commented Jul 1, 2020

cjmcgill commented Jul 3, 2020

github-actions bot commented Jun 21, 2023

github-actions bot commented Sep 20, 2023

github-actions bot commented Sep 28, 2023

Regression Testing Results

Regression test aromatics:

Regression test liquid_oxidation:

Regression test nitrogen:

Regression test oxidation:

Regression test sulfur:

Regression test superminimal:

Regression test RMS_constantVIdealGasReactor_superminimal:

Regression test RMS_CSTR_liquid_oxidation:

github-actions bot commented Dec 28, 2023

JacksonBurns commented Jan 7, 2024

cjmcgill commented Jun 26, 2020 •

edited

Loading

codecov bot commented Jun 26, 2020 •

edited

Loading