fix: addressing indeterminate Map orderings in tests #220

kaiyaok2 · 2022-03-04T12:48:01Z

Description

Using Nondex when running command mvn edu.illinois:nondex-maven-plugin:1.1.2:nondex in the tests directory after installing all dependencies, several tests in CSVSaverWithMultiOutputsTest.java and EnsembleExportTest.java produce flaky results (nondeterministically pass or fail). The reason is that the outputs in the tests are dependent on iterators for hashmaps, which does not guarantee a fixed order by default. (From the Javadoc: "HashMap class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.") For example, in the function save(Path csvPath, Dataset<T> dataset, Set<String> responseNames) in CSVSaver.java, the code is using the iterator defined for ImmutableFeatureMap that deals with a Collection of values in a HashMap. Therefore, there is no guaranteed order for the entries in headerLine[], which leads to possible test flakiness.

To resolve the problem, I changed the code to construct a sorted ArrayList for all keys in the referenced HashMap to ensure determinate iteration order.

Motivation

The change is necessary as related functionalities / tests may fail when, for example, the Java version upgrades in the future, or when the code is run in a different environment.

OCA

https://oca.opensource.oracle.com/api/v1/oca-requests/6710/documents/6730/download

Signed-off-by: Kaiyao Ke <kaiyaok2@illinois.edu>

Craigacp · 2022-03-15T14:31:20Z

Unfortunately switching the label maps over to LinkedHashMap isn't a serialization compatible change as it doesn't fix any issues with existing models. The CSVSaver change is useful though as we shouldn't have been depending on the variable ordering there.

There are a few more tests which have issues when run under nondex, but I've not figured out the root cause of all of them yet. Some of them are due to a lack of label ordering in the evaluations which is straightforward to fix (and mostly is due to the toString based tests needing an exact string match), but others have picked up some order dependence from somewhere I can't quite identify.

kaiyaok2 · 2022-03-15T17:31:32Z

Hi there, thanks for the response!
The LinkedHashMap approach is to deal with 2 flaky tests in EnsembleExportTest.java: testHeterogeneousClassificationExport and testHeterogeneousMultiLabelExport. Currently the constructors for ImmutableMultiLabelInfo and ImmutableLabelInfo are using Map.Entry<String,MutableLong> e : labelCounts.entrySet() to setup the 2 HashMaps idLabelMap and LabelIDMap. We need to make these two maps order deterministic to avoid flakiness in the 2 tests.
(e.g. line 226 in EnsembleExportTest.java calls ensemble.BaggingTrainer.train(), which then calls MutableDataset.getOutputIDInfo(). In that function the code calls multilabel.MultiLabelInfo.generateImmutableOutputInfo(), which finally calls the constructor for ImmutableMultiLabelInfo.)
Currently nondex is not reporting flakiness in the tests directory other than CSVSaverWithMultiOutputsTest.java and EnsembleExportTest.java, so are the other flaky tests you mentioned in modules other than tests?
Also if the LinkedHashMap approach is problematic I can definitely first update the PR without including those changes~

Craigacp · 2022-03-15T19:40:35Z

Sure, but I'd prefer the code not to rely upon iteration order in the output infos, as it isn't an invariant guaranteed by the output info contract (and we had a bug in v4.0 and v4.1 due to issues where the regression code was depending on iteration order when it shouldn't have been). I ran nondex on an internal repo we use for validating Tribuo releases which has all the public Tribuo tests in a single project, and hit the errors you found as well as some others. This branch fixes some of the flakyness, but I've not had chance to run down all of it (Craigacp@d9e5825). There were some failures in the way that the single label tests were run, K-Means reproducibility test, and those in the ensemble export you found which haven't been fixed that way, which suggests there is a deeper bug or we're depending on label order when we shouldn't be.

kaiyaok2 · 2022-03-16T07:05:45Z

I updated a commit with a fix without switching to LinkedHashMap.
I used some breakpoints to find the cause of flakiness in EnsembleExportTest. Bascially the bug shall just be label order issues (rather than something deep):

the setup of idLabelMap(line 63 in ImmutableMultiLabelInfo.java and line 64 in ImmutableLabelInfo.java) used values retrieved from an iterator from entrySet() and then used it as the value corresponding to the integer key counter. Then as entrySet() allows permutation, idLabelMap can be something like [1 -> a, 2 -> b] in one run and [1 -> b, 2 -> a] in another run. We need to fix iteration order to avoid this.
in line 177, the test calls WeightedEnsembleModel.createEnsembleFromExistingModels(). Then in lines 197 and 203 in WeightedEnsembleModel.java, the current code calls the iterator of ImmutableOutputInfo<T> to add elements into a list. However, outputInfo is an instance of ImmutableInfoIterator (which implements the ImmutableOutputInfo interface), and its defined iterator is inherited from an iterator of an unordered entryset of a hashmap (line 198 in ImmutableMultiLabelInfo.java). Therefore, the iterator does not guarantee any order, so there can be order permutations in the lists firstList and comparisonList, leading to the exception thrown in line 207 of WeightedEnsembleModel.java. This issue can be resolved if we change the lists into sets to allow order permutation.

Also I updated the solution to flakiness in CSVSaver.

Craigacp · 2022-03-16T15:26:15Z

WeightedEnsembleModel and ImmutableMultiLabelInfo/ImmutableLabelInfo shouldn't need the models which use them to have the same indices. I think it's probably a problem in the onnx export which is making an assumption that the indices are all the same when this is in general not true. There should be a permute operation as part of the export of the combiner, which is probably necessary for all the outputs.

kaiyaok2 · 2022-03-16T16:16:27Z

I wonder if I understand your message correctly - is it undesirable to sort the keyset?

Craigacp · 2022-03-16T16:21:41Z

The indices for both features and output dimensions are considered an internal implementation detail of a model, and we try pretty hard to not have them leak out of a model, even to other models within an ensemble. The feature ids are guaranteed (and documented) to be in the lexicographic order of the feature names, though even this might be worth reconsidering in a future version of Tribuo. There's no documented constraint on how the output ids are generated, and so anything that depends on them being generated in a specific order is a bug. Fixing the problem by adding a constraint on the ordering is going to make more bits of Tribuo depend on that ordering, rather than less, and that will make it harder to evolve the system in the future.

kaiyaok2 · 2022-03-16T17:42:13Z

Thanks for your clarification! I understand the problem now.
I am pretty sure that if we do not enforce any order on output ids, then line 207 of WeightedEnsembleModel.java will always return an exception upon permutations of iteration order. If this line is commented out, then assertion errors occur at lines 94 and 160 in OnnxTestUtils.java. tribuo.getOutput().getLabel() and external.getOutput().getLabel() shall both return [(Y_2,1.0), (Y_0,1.0)], but external.getOutput().getLabel() returns [(Y_1,0.699999988079071), (Y_2,1.0)]. So i think the issue is indeed in onnx though.

Craigacp · 2022-03-16T18:00:11Z

Sorting inside WeightedEnsembleModel.createFromExistingModels to check that the output domains are equal is completely fine, it's just that it can't depend on a direct comparison. To be honest it may be better to add an domainEquals method to the OutputInfo interface which does this check, rather than having it be externalised in WeightedEnsembleModel, but I've not thought through what the exact contract is there.

After looking at the ONNX export of WeightedEnsembleModel that's definitely not robust to the ensemble members having different output indices (or feature indices for that matter). It needs a permute operation added to the input and output of each ensemble member (or an identity node if we know that they are all the same). That's a fairly complicated and involved change so I'll put it on our todo list.

kaiyaok2 · 2022-03-16T18:21:25Z

I agree that changes to ONNX shall be nontrivial, so I updated this pr to only include changes for CSVSaver. Appreciate your time and patience looking into this!

Craigacp

LGTM thanks.

Craigacp · 2022-03-22T13:51:19Z

Thanks for flagging the iteration order issues. I've got a fix for the ensemble export, and some label ordering flakiness in the other tests which I'll make a PR for shortly.

kaiyaok2 · 2022-03-23T10:27:45Z

Thanks for flagging the iteration order issues. I've got a fix for the ensemble export, and some label ordering flakiness in the other tests which I'll make a PR for shortly.

No problem! Glad it works out

* Fixed Flakiness Signed-off-by: Kaiyao Ke <kaiyaok2@illinois.edu> * Fixed Flakiness Signed-off-by: Kaiyao Ke <kaiyaok2@illinois.edu> * removed need for linkedhashmaps * removed print message * include only csvsaver changes * removed print message

kaiyaok2 added 2 commits March 4, 2022 20:19

Fixed Flakiness

81c97f8

Signed-off-by: Kaiyao Ke <kaiyaok2@illinois.edu>

Fixed Flakiness

2d89318

Signed-off-by: Kaiyao Ke <kaiyaok2@illinois.edu>

kaiyaok2 and others added 2 commits March 16, 2022 14:58

removed need for linkedhashmaps

a526552

Merge branch 'oracle:main' into main

04d94f5

kaiyaok2 added 2 commits March 16, 2022 15:12

removed print message

f6a551d

Merge branch 'main' of https://github.com/kaiyaok2/tribuo into main

b9b7d83

kaiyaok2 added 2 commits March 17, 2022 02:16

include only csvsaver changes

3b52d4c

removed print message

dcaf760

Craigacp added the OCA signed This PR is from a person/organisation who has signed the OCA label Mar 22, 2022

Craigacp approved these changes Mar 22, 2022

View reviewed changes

Craigacp merged commit 7da1456 into oracle:main Mar 22, 2022

Craigacp mentioned this pull request Mar 22, 2022

Fixes for HashSet iteration order problems #225

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: addressing indeterminate Map orderings in tests #220

fix: addressing indeterminate Map orderings in tests #220

kaiyaok2 commented Mar 4, 2022 •

edited

Loading

Craigacp commented Mar 15, 2022

kaiyaok2 commented Mar 15, 2022 •

edited

Loading

Craigacp commented Mar 15, 2022

kaiyaok2 commented Mar 16, 2022 •

edited

Loading

Craigacp commented Mar 16, 2022

kaiyaok2 commented Mar 16, 2022

Craigacp commented Mar 16, 2022

kaiyaok2 commented Mar 16, 2022

Craigacp commented Mar 16, 2022

kaiyaok2 commented Mar 16, 2022

Craigacp left a comment

Craigacp commented Mar 22, 2022

kaiyaok2 commented Mar 23, 2022

fix: addressing indeterminate Map orderings in tests #220

fix: addressing indeterminate Map orderings in tests #220

Conversation

kaiyaok2 commented Mar 4, 2022 • edited Loading

Description

Motivation

OCA

Craigacp commented Mar 15, 2022

kaiyaok2 commented Mar 15, 2022 • edited Loading

Craigacp commented Mar 15, 2022

kaiyaok2 commented Mar 16, 2022 • edited Loading

Craigacp commented Mar 16, 2022

kaiyaok2 commented Mar 16, 2022

Craigacp commented Mar 16, 2022

kaiyaok2 commented Mar 16, 2022

Craigacp commented Mar 16, 2022

kaiyaok2 commented Mar 16, 2022

Craigacp left a comment

Choose a reason for hiding this comment

Craigacp commented Mar 22, 2022

kaiyaok2 commented Mar 23, 2022

kaiyaok2 commented Mar 4, 2022 •

edited

Loading

kaiyaok2 commented Mar 15, 2022 •

edited

Loading

kaiyaok2 commented Mar 16, 2022 •

edited

Loading