Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: addressing indeterminate Map orderings in tests #220

Merged
merged 8 commits into from
Mar 22, 2022

Conversation

kaiyaok2
Copy link
Contributor

@kaiyaok2 kaiyaok2 commented Mar 4, 2022

Description

Using Nondex when running command mvn edu.illinois:nondex-maven-plugin:1.1.2:nondex in the tests directory after installing all dependencies, several tests in CSVSaverWithMultiOutputsTest.java and EnsembleExportTest.java produce flaky results (nondeterministically pass or fail). The reason is that the outputs in the tests are dependent on iterators for hashmaps, which does not guarantee a fixed order by default. (From the Javadoc: "HashMap class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.") For example, in the function save(Path csvPath, Dataset<T> dataset, Set<String> responseNames) in CSVSaver.java, the code is using the iterator defined for ImmutableFeatureMap that deals with a Collection of values in a HashMap. Therefore, there is no guaranteed order for the entries in headerLine[], which leads to possible test flakiness.

To resolve the problem, I changed the code to construct a sorted ArrayList for all keys in the referenced HashMap to ensure determinate iteration order.

Motivation

The change is necessary as related functionalities / tests may fail when, for example, the Java version upgrades in the future, or when the code is run in a different environment.

OCA

https://oca.opensource.oracle.com/api/v1/oca-requests/6710/documents/6730/download

kaiyaok2 added 2 commits March 4, 2022 20:19
Signed-off-by: Kaiyao Ke <kaiyaok2@illinois.edu>
Signed-off-by: Kaiyao Ke <kaiyaok2@illinois.edu>
@Craigacp
Copy link
Member

Unfortunately switching the label maps over to LinkedHashMap isn't a serialization compatible change as it doesn't fix any issues with existing models. The CSVSaver change is useful though as we shouldn't have been depending on the variable ordering there.

There are a few more tests which have issues when run under nondex, but I've not figured out the root cause of all of them yet. Some of them are due to a lack of label ordering in the evaluations which is straightforward to fix (and mostly is due to the toString based tests needing an exact string match), but others have picked up some order dependence from somewhere I can't quite identify.

@kaiyaok2
Copy link
Contributor Author

kaiyaok2 commented Mar 15, 2022

Hi there, thanks for the response!
The LinkedHashMap approach is to deal with 2 flaky tests in EnsembleExportTest.java: testHeterogeneousClassificationExport and testHeterogeneousMultiLabelExport. Currently the constructors for ImmutableMultiLabelInfo and ImmutableLabelInfo are using Map.Entry<String,MutableLong> e : labelCounts.entrySet() to setup the 2 HashMaps idLabelMap and LabelIDMap. We need to make these two maps order deterministic to avoid flakiness in the 2 tests.
(e.g. line 226 in EnsembleExportTest.java calls ensemble.BaggingTrainer.train(), which then calls MutableDataset.getOutputIDInfo(). In that function the code calls multilabel.MultiLabelInfo.generateImmutableOutputInfo(), which finally calls the constructor for ImmutableMultiLabelInfo.)
Currently nondex is not reporting flakiness in the tests directory other than CSVSaverWithMultiOutputsTest.java and EnsembleExportTest.java, so are the other flaky tests you mentioned in modules other than tests?
Also if the LinkedHashMap approach is problematic I can definitely first update the PR without including those changes~

@Craigacp
Copy link
Member

Sure, but I'd prefer the code not to rely upon iteration order in the output infos, as it isn't an invariant guaranteed by the output info contract (and we had a bug in v4.0 and v4.1 due to issues where the regression code was depending on iteration order when it shouldn't have been). I ran nondex on an internal repo we use for validating Tribuo releases which has all the public Tribuo tests in a single project, and hit the errors you found as well as some others. This branch fixes some of the flakyness, but I've not had chance to run down all of it (Craigacp@d9e5825). There were some failures in the way that the single label tests were run, K-Means reproducibility test, and those in the ensemble export you found which haven't been fixed that way, which suggests there is a deeper bug or we're depending on label order when we shouldn't be.

@kaiyaok2
Copy link
Contributor Author

kaiyaok2 commented Mar 16, 2022

I updated a commit with a fix without switching to LinkedHashMap.
I used some breakpoints to find the cause of flakiness in EnsembleExportTest. Bascially the bug shall just be label order issues (rather than something deep):

  • the setup of idLabelMap(line 63 in ImmutableMultiLabelInfo.java and line 64 in ImmutableLabelInfo.java) used values retrieved from an iterator from entrySet() and then used it as the value corresponding to the integer key counter. Then as entrySet() allows permutation, idLabelMap can be something like [1 -> a, 2 -> b] in one run and [1 -> b, 2 -> a] in another run. We need to fix iteration order to avoid this.
  • in line 177, the test calls WeightedEnsembleModel.createEnsembleFromExistingModels(). Then in lines 197 and 203 in WeightedEnsembleModel.java, the current code calls the iterator of ImmutableOutputInfo<T> to add elements into a list. However, outputInfo is an instance of ImmutableInfoIterator (which implements the ImmutableOutputInfo interface), and its defined iterator is inherited from an iterator of an unordered entryset of a hashmap (line 198 in ImmutableMultiLabelInfo.java). Therefore, the iterator does not guarantee any order, so there can be order permutations in the lists firstList and comparisonList, leading to the exception thrown in line 207 of WeightedEnsembleModel.java. This issue can be resolved if we change the lists into sets to allow order permutation.

Also I updated the solution to flakiness in CSVSaver.

@Craigacp
Copy link
Member

WeightedEnsembleModel and ImmutableMultiLabelInfo/ImmutableLabelInfo shouldn't need the models which use them to have the same indices. I think it's probably a problem in the onnx export which is making an assumption that the indices are all the same when this is in general not true. There should be a permute operation as part of the export of the combiner, which is probably necessary for all the outputs.

@kaiyaok2
Copy link
Contributor Author

I wonder if I understand your message correctly - is it undesirable to sort the keyset?

@Craigacp
Copy link
Member

The indices for both features and output dimensions are considered an internal implementation detail of a model, and we try pretty hard to not have them leak out of a model, even to other models within an ensemble. The feature ids are guaranteed (and documented) to be in the lexicographic order of the feature names, though even this might be worth reconsidering in a future version of Tribuo. There's no documented constraint on how the output ids are generated, and so anything that depends on them being generated in a specific order is a bug. Fixing the problem by adding a constraint on the ordering is going to make more bits of Tribuo depend on that ordering, rather than less, and that will make it harder to evolve the system in the future.

@kaiyaok2
Copy link
Contributor Author

Thanks for your clarification! I understand the problem now.
I am pretty sure that if we do not enforce any order on output ids, then line 207 of WeightedEnsembleModel.java will always return an exception upon permutations of iteration order. If this line is commented out, then assertion errors occur at lines 94 and 160 in OnnxTestUtils.java. tribuo.getOutput().getLabel() and external.getOutput().getLabel() shall both return [(Y_2,1.0), (Y_0,1.0)], but external.getOutput().getLabel() returns [(Y_1,0.699999988079071), (Y_2,1.0)]. So i think the issue is indeed in onnx though.

@Craigacp
Copy link
Member

Sorting inside WeightedEnsembleModel.createFromExistingModels to check that the output domains are equal is completely fine, it's just that it can't depend on a direct comparison. To be honest it may be better to add an domainEquals method to the OutputInfo interface which does this check, rather than having it be externalised in WeightedEnsembleModel, but I've not thought through what the exact contract is there.

After looking at the ONNX export of WeightedEnsembleModel that's definitely not robust to the ensemble members having different output indices (or feature indices for that matter). It needs a permute operation added to the input and output of each ensemble member (or an identity node if we know that they are all the same). That's a fairly complicated and involved change so I'll put it on our todo list.

@kaiyaok2
Copy link
Contributor Author

I agree that changes to ONNX shall be nontrivial, so I updated this pr to only include changes for CSVSaver. Appreciate your time and patience looking into this!

@Craigacp Craigacp added the OCA signed This PR is from a person/organisation who has signed the OCA label Mar 22, 2022
Copy link
Member

@Craigacp Craigacp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks.

@Craigacp Craigacp merged commit 7da1456 into oracle:main Mar 22, 2022
@Craigacp
Copy link
Member

Thanks for flagging the iteration order issues. I've got a fix for the ensemble export, and some label ordering flakiness in the other tests which I'll make a PR for shortly.

@kaiyaok2
Copy link
Contributor Author

Thanks for flagging the iteration order issues. I've got a fix for the ensemble export, and some label ordering flakiness in the other tests which I'll make a PR for shortly.

No problem! Glad it works out

Craigacp pushed a commit that referenced this pull request Mar 25, 2022
* Fixed Flakiness

Signed-off-by: Kaiyao Ke <kaiyaok2@illinois.edu>

* Fixed Flakiness

Signed-off-by: Kaiyao Ke <kaiyaok2@illinois.edu>

* removed need for linkedhashmaps

* removed print message

* include only csvsaver changes

* removed print message
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA signed This PR is from a person/organisation who has signed the OCA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants