Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve collection types #523

Merged
merged 51 commits into from
Jan 16, 2020
Merged

Improve collection types #523

merged 51 commits into from
Jan 16, 2020

Conversation

ggalmazor
Copy link
Contributor

@ggalmazor ggalmazor commented Dec 18, 2019

Closes #458

This PR is built on top of #526 to bring a review of the collection types involved in the DAG. The first commit is Migrate Triggerable.targets to Set 5d20c1a

The main goal would be to use the most appropriate collection type for the task in hand e.g. using sets when we don't want to have duplicates, and linked list supported collections when the order of elements is important.

Important topics:

  • Makes part of the DAG building algorithm immutable and side-effect free to make it easier to understand.

    It's important that we understand what are the memory use implications and decide whether we want to undo some of these side-effects as a tradeoff for consuming less memory.

    We could also study ways to keep everything easy to understand (immutable & side-effect free) while programming in a way that the garbage collector helps keep memory use in line.

  • Refactors the DAG building process to take advantage of specific collection types

    The key point here is that we're able to keep the insertion order and avoid having to sort the DAG, which would bring performance improvements.

What has been done to verify that this works as intended?

Added automated tests

Why is this the best possible solution? Were any other approaches considered?

I think this question doesn't apply. Maybe reviewers can make specific questions.

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

This PR should have no behavior changes, although due to the changes to the DAG building code, some change propagation events might appear in a different order. This shouldn't be a problem because triggerable chain is always guaranteed to happen in the right order according to whether they have dependencies or not.

Do we need any specific form for testing your changes? If so, please attach one.

No.

Does this change require updates to documentation? If so, please file an issue here and include the link below.

No.

@ggalmazor ggalmazor requested a review from lognaturel December 18, 2019 12:04
@ggalmazor ggalmazor marked this pull request as ready for review December 20, 2019 07:58
@codecov-io
Copy link

codecov-io commented Dec 20, 2019

Codecov Report

Merging #523 into master will decrease coverage by 0.24%.
The diff coverage is 87.87%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #523      +/-   ##
============================================
- Coverage     53.11%   52.87%   -0.25%     
+ Complexity     3127     3104      -23     
============================================
  Files           245      245              
  Lines         13374    13340      -34     
  Branches       2573     2566       -7     
============================================
- Hits           7104     7053      -51     
- Misses         5443     5467      +24     
+ Partials        827      820       -7
Impacted Files Coverage Δ Complexity Δ
...org/javarosa/core/model/condition/Recalculate.java 70% <ø> (-10%) 9 <0> (-2)
...a/org/javarosa/core/model/condition/Condition.java 32.43% <ø> (-5.41%) 9 <0> (-2)
...a/xform/parse/StandardBindAttributesProcessor.java 80.21% <100%> (-1.6%) 25 <0> (ø)
src/main/java/org/javarosa/core/model/FormDef.java 69.65% <100%> (ø) 155 <1> (ø) ⬇️
...java/org/javarosa/core/model/QuickTriggerable.java 70.37% <100%> (-18.52%) 14 <1> (-6)
...org/javarosa/core/model/condition/Triggerable.java 81.25% <100%> (-1.47%) 22 <4> (-2)
...a/org/javarosa/xform/parse/FormInstanceParser.java 70.94% <82.14%> (-1.1%) 114 <4> (+2)
...n/java/org/javarosa/core/model/TriggerableDag.java 71.81% <90.62%> (+0.57%) 74 <26> (-6) ⬇️
...avarosa/core/model/QuickTriggerableComparator.java 0% <0%> (-73.34%) 0% <0%> (-5%)
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 02fb9e0...ac646eb. Read the comment docs.

@ggalmazor
Copy link
Contributor Author

ggalmazor commented Dec 21, 2019

Christmas is here and I know it's going to be hard to push this one forward, especially since it requires some degree of agreement about the correctness and completeness of the DAG test cases.

Since I'd like to continue working on the codebase on top of this PR, It would be great to at least have a sanity check to the code changes themselves, focusing on how memory usage has changed compared to the current implementation that reuses objects by passing them as input args to methods and mutating them (side-effects) inside them.

If we determine that now JR would use more memory than before, it would be great to find a way to maintain the immutable & side-effect free implementation in this PR, and, only if this is not possible, change it back to a side-effect based, object reusing style.

@dcbriccetti, I was hoping that you could help us with this ;)

@ggalmazor
Copy link
Contributor Author

ggalmazor commented Jan 8, 2020

I've rebased the branch of this PR on top of #526 to make it more concise. I've also taken the chance to divide some commits into smaller chunks to make them easier to review and give more context.

@ggalmazor
Copy link
Contributor Author

ggalmazor commented Jan 8, 2020

I've also checked that no object cloning is done during the DAG build process, which would support our current understanding that the immutable, side-effect free code in this PR shouldn't increase noticeably memory usage.

Looking forward @dcbriccetti's take on this ;)

This removes the need for duplicate checking and makes explicit that this set is dependant on (insertion) order.
We need the code to be easier to understand when it comes to recursive algorithms. Mutation of input arguments makes it super hard, and this commit tries to solve that by making those methods return something instead.
It turns out that the newDestinationSet variable was redundant and it looks like the real output of the method is the deps set (to be removed in the next commit)
Remove the output arg. The block where elements get added can be further simplified using Map.getOrDefault or the alternative code that is compatible with our target Android API level (next commit)
Simplify adding elements into the result set
Now the algorithm is easier to understand with no overhead (same number of iterations in the inner loop)
We want to extract some methods to structure the process a bit more
There's no need for clearing the collection if we're reassigning it
Copy link
Member

@lognaturel lognaturel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive work.

Overall, I am convinced that this is safe and that the changes are easy to follow. After our conversation about the impact of creating additional collections at each recursive step, I am convinced that the memory impact will be minimal. The garbage collector will presumably be working harder but hopefully that wouldn't have much impact even on underpowered devices.

There are two commits I'd like to spend a little more time with when I have a fresh head: 73e6364 and 86e2745. I still wanted to get you some comments in the mean time.

As a result, some methods can be made static. Also review visibility and set it to private where possible
@ggalmazor
Copy link
Contributor Author

ggalmazor commented Jan 9, 2020

I reworded the commit messages to fit the subject under 72 chars and I didn't realize that the links to commits you were hoping to study better would change as a result. Sorry, @lognaturel!

73e6364 is still there, but the one about avoiding to hit the main instance is in 2bd11f6 now

The inlined method was confusing and misguiding because the nested loop didn't make much sense, and because the naming suggested that the method would return all descendants recursively, which neither was needed or actually happening.

Then, the naming wasn't helping to understand what was going on either. By using a language closer to DAG terminology, I hope it's easier now to understand what the buildDagEdges method does.
@ggalmazor
Copy link
Contributor Author

We need a test that explores deeply nested computations and repeat groups before 2bd11f6

If source can't cascade, then the targets set will be empty.
Avoid confusion by contextualizing the word "target" in the comment, which can be related to the DAG edge target and the triggerable target tree reference.
The DAG edge is a set because there can't be duplicates.
@ggalmazor
Copy link
Contributor Author

ggalmazor commented Jan 13, 2020

I think I've gone through your latest comments in the PR and now I'm trying to focus exclusively on the changes regarding the mutually exclusive recursive methods to get all descendant refs from 73e6364 and 2bd11f6

In order to start from a safe place, I'm reverting the related changes from 73e6364 (restore getChildrenOfReference, and getChildrenRefsOfElement from 73e6364)

Then, I've evaluated the move of those methods from TriggerablesDag to FormInstanceParser, and I've assessed that it's safe to keep it because we can establish that they will only get called when the triggerable can cascade which, in turn, it's only true for groups with a relevance condition. The if (triggerable.isCascadingToChildren()) made sure about that in the original code at TriggerablesDag.getDependantTriggerables

This should make it safe to replace the descendant target computation from TriggerablesDag.getDependantTriggerables to FormInstanceParser.applyInstanceProperties. If you're not confident about this, we should make it a target for new tests.

With this setup, my plan is to dissect FormInstance.getTemplatePath and write tests that exercise both forks of the if block at FormInstanceParser.getDescendantRefs.

@ggalmazor
Copy link
Contributor Author

ggalmazor commented Jan 14, 2020

I've been playing with this test to understand how the check in FormInstanceParse.getDescendantRefs works:

@Test
public void name2() {
    TreeElement root = new TreeElement("data");
    TreeElement group = new TreeElement("group");
    TreeElement field = new TreeElement("field");
    root.addChild(group);
    group.addChild(field);
    FormInstance formInstance = new FormInstance(root);

    TreeReference ref = TreeReference.rootRef()
        .extendRef("data", 0)
        .extendRef("group", 0);
    TreeElement templatePath = formInstance.getTemplatePath(ref);

    Set<TreeReference> descendantRefs = FormInstanceParser.getDescendantRefs(formInstance, new EvaluationContext(formInstance), ref);
    System.out.println(descendantRefs);
}

The breakthrough comes when declaring the group element to have a multiplicity to 1, which forces the getDescendantRefs method to follow the else path in the check, which uses the EvaluationContext to expand the tree reference and get the corresponding elements.

Unfortunately, there's no way the parser would reproduce this scenario when parsing a form's XML document. We can't describe in XML a group at multiplicity 1 without declaring another group at multiplicity 0. This is key because the check at getDescendantRefs will revert to the zeroth group when resolving the mainInstance.getTemplatePath(original); if no group with jr:template="" is declared.

These findings confirm our theory that the else branch is not really needed because, even though we can artificially exercise it from our tests by manipulating the contents of the instance, no form can actually recreate that scenario.

Note that this method was originally declared in TriggerableDag
The methods are recursive, and will return all descendants, not only children
@lognaturel
Copy link
Member

I have a couple of new questions above. Other than that, I have completed my review. It looks like a lot of things can now be private in TriggerableDag. Worth doing here?

@lognaturel
Copy link
Member

Thanks, @ggalmazor! Have you had a chance to compare benchmarks from before and after this change? It seems like there could be some positive performance implications that come with some of the simplifications and O(1) collection changes.

After more interactive review, I feel more confident that it's unlikely that this will have significant negative memory implications. So @dcbriccetti, I think we're ok without additional review on this one. Thanks!

@lognaturel lognaturel merged commit bf184a7 into getodk:master Jan 16, 2020
@ggalmazor ggalmazor deleted the improve_collection_types branch February 5, 2020 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't include self-references in dependency graph
3 participants