Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks for Parsing and caching auto generated XForms #454

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

JohnTheBeloved
Copy link
Contributor

@JohnTheBeloved JohnTheBeloved commented Jul 5, 2019

Closes #

It's an improvement of the recent Benchmarking PR and intended to run Benchmarks on Synthetic/Dynamic forms instead of static forms

What has been done to verify that this works as intended?

Benchmarks have being run on travis CI and results

Why is this the best possible solution? Were any other approaches considered?

It's still a Work in progress, so suggestions are most welcome

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

Changes aren't part of the main sourceset so no user impact is expected

Do we need any specific form for testing your changes? If so, please attach one.

Does this change require updates to documentation? If so, please file an issue here and include the link below.

@JohnTheBeloved JohnTheBeloved changed the title Benchmark refactor WIP: Benchmark refactor Jul 5, 2019
@lognaturel
Copy link
Member

Wow, there's a lot here, @JohnTheBeloved! Can you please describe your high-level approach and what decisions you made along the way? For example, why did you choose to generate main instances with the full range of possible controls and types? Do you expect there to be a performance difference between them? If so, then how can results between runs be compared?

Broadly, it seems to me that in this context, the forms that benchmarks are run over need to be known and stable for results to be comparable. That is, dynamically generating a certain number of elements of a predictable structure seems very valuable. I'm less sure about what the randomness brings and am interested in hearing more about why you chose that design.

@JohnTheBeloved
Copy link
Contributor Author

JohnTheBeloved commented Jul 8, 2019

Representing the whole XForm Structure in code can be a complex process as it comprises of various parts of the model and body as stated in the XForm specification.

This PR does not cover all the parts of the specification, rather the major parts of the xform structure which can be used to substantiatively verify some aspects of JavaRosa performance - in this case 1. Creating FormDef and 2. Caching the FormDef.

However, the synthetic XForm Builder I implemented wasn't limited to our current benchmarks in mind (which is reason you see other control types defined- they can be removed). Only 2 control types were used for this benchmarks

  1. input text(string)
  2. select control (was used due to the time taken to populate the options of a question)

I am not assuming that I am very experienced in most parts of the XForm structure and also that this PR covers all possible parts of the XForm Definition (XFormSpec)

The implementation was rather to quickly analyse benchmarks for different complexities of an XForm - complexity in in terms of

  1. No Of Questions,
  2. No Of Question Groups,
  3. No of Internal Secondary Instances
  4. No of External Secondary Instances
  5. No of Elements in each secondary instance

The Builder Design pattern was employed in creating the different parts of the XML strings of the XForm Structure

  1. Main Instance
  2. Secondary Instance
  3. Bindings
  4. Controls

Some of the areas not implemented into the XFormBuilder class include

  1. Itext translations
  2. Other control types other than from select list and input text

As you can see below, this summarises the XFormBuilder build flow for easier maintainability

buildHtml()
                .buildHead()
                .buildBody()
                .buildTitle()
                .buildModel()
                .buildPrimaryInstance()
                .buildInternalSecondaryInstances()
                .buildExternalSecondaryInstances()
                .buildBind()
                .buildControls()
                .toString();

each build X method abstracts logic for creation of the corresponding X part of the Xform Structure

TO answer your questions directly

Why did you choose to generate main instances with the full range of possible controls and types?

Other control types were defined in case there is a future need for them to be used, they are not being used currently apart from the select and input text

The select was specifically used because the time taken to populate the the options of a question was put into consideration both when running the benchmarks and during manual testing of the forms
All odd indexes are text input controls while even indexes are select input controls

Do you expect there to be a performance difference between them?

Other than considering time to populate the options, NO , also the relative performance between control types is not what was intended.

I'm less sure about what the randomness brings and am interested in hearing more about why you chose that design

By Random, you mean the options of the select control, It's being used to pick any of the internal instances randomly.

@JohnTheBeloved
Copy link
Contributor Author

JohnTheBeloved commented Jul 8, 2019

The snippet below defines the benchmark parameters for running possible combinations that define an XForm.

In this case benchmarks

         @Param({"10", "500"})
        public int noOfQuestions = 500;
        @Param({"10", "50"})
        public int noOfInternalSecondaryInstances = 10;
        @Param({"50", "1000"})
        public int noOf2ndryInstanceElements = 1000;
        @Param({"0"})
        public int noOfQuestionGroups = 0;
        @Param({"0","50"})
        public int noOfExternalSecondaryInstances = 1;

The Benchmark would be run for combinations of the products of the parameters.

FormDef2CacheBenchmark.runBenchmark  50    0    1    0    10   avgt    2   0.004           s/op
FormDef2CacheBenchmark.runBenchmark  50    0    1    0    200  avgt    2   0.022           s/op
FormDef2CacheBenchmark.runBenchmark  50     0   1    0    500  avgt    2   0.049           s/op
FormDef2CacheBenchmark.runBenchmark  50     0   10   0    10   avgt    2   0.031           s/op
FormDef2CacheBenchmark.runBenchmark  50     0   10   0    200  avgt    2   0.048           s/op
FormDef2CacheBenchmark.runBenchmark  50     0   10   0    500  avgt    2   0.075           s/op
FormDef2CacheBenchmark.runBenchmark  500    0   1    0     10  avgt    2   0.031           s/op
FormDef2CacheBenchmark.runBenchmark  500    0   1    0    200  avgt    2   0.048           s/op
FormDef2CacheBenchmark.runBenchmark  500    0   1    0    500  avgt    2   0.075           s/op
FormDef2CacheBenchmark.runBenchmark  500    0   10   0     10  avgt    2   0.283           s/op
FormDef2CacheBenchmark.runBenchmark  500    0   10   0    200  avgt    2   0.310           s/op
FormDef2CacheBenchmark.runBenchmark  500    0   10   0    500  avgt    2   0.337           s/op
FormDef2CacheBenchmark.runBenchmark  5000   0   1    0     10  avgt    2   0.294           s/op
FormDef2CacheBenchmark.runBenchmark  5000   0   1    0    200  avgt    2   0.311           s/op
FormDef2CacheBenchmark.runBenchmark  5000   0   1    0    500  avgt    2   0.344           s/op
FormDef2CacheBenchmark.runBenchmark  5000   0   10   0     10  avgt    2   2.810           s/op
FormDef2CacheBenchmark.runBenchmark  5000   0   10   0    200  avgt    2   2.846           s/op
FormDef2CacheBenchmark.runBenchmark  5000   0   10   0    500  avgt    2   2.894           s/op
FormDef2CacheBenchmark.runBenchmark  50000  0   1    0     10  avgt    2   2.810           s/op
FormDef2CacheBenchmark.runBenchmark  50000  0   1    0    200  avgt    2   2.867           s/op
FormDef2CacheBenchmark.runBenchmark  50000  0   1    0    500  avgt    2   2.903           s/op
FormDef2CacheBenchmark.runBenchmark  50000  0   10   0     10  avgt    2  29.020           s/op
FormDef2CacheBenchmark.runBenchmark  50000  0   10   0    200  avgt    2  30.210           s/op
FormDef2CacheBenchmark.runBenchmark  50000  0   10   0    500  avgt    2  29.340           s/op

@lognaturel
Copy link
Member

Thank you for the extra context. In general, the simpler the approach and code, the more likely it is someone else can give meaningful feedback and eventually build on it, especially when there are no comments or context in commits. I completely understand wanting to plan ahead and it's really cool that you dove deep into generating forms. That said, it would really have helped to have as narrowly-scoped of a PR as possible.

At a high level, the Cache2FormDefBenchmark, FormDef2CacheBenchmark and XForm2FormDefBenchmark look like exactly the implementations to evaluate form opening experience. Building fake secondary instances that have values and labels that are sequential numbers looks great.

We talked about this briefly last week but I want to confirm one more time. The results you are getting out of these benchmarks are correlated with the results you have gotten in profiling sessions on real Android devices running Collect, right? That is, if these benchmarks show a linear relationship between number of elements in a secondary instance and time taken to load from cache, it's also roughly linear in profiling? I say roughly because the benchmarking environment is much more controlled but if somehow the shape of the curve is different, there's a problem. I can't imagine what would lead to that but stranger things have happened.

The select was specifically used because the time taken to populate the the options of a question was put into consideration both when running the benchmarks and during manual testing of the forms

Are you saying you actually measured a difference between having selects and other question types in the primary instance? That is surprising to me because I don't believe choices are populated from secondary instances until the question is displayed. Either way, this assumption and its implications should be documented explicitly in the code or in a commit message.

Focusing on the three benchmark classes listed above, could you please write a little about how you picked the parameter values you used? Do you feel like all combinations in the cross product add value? What relationships are you trying to identify and would it be possible to identify them with fewer cases? The 10 instances with 50000 elements each case in particular seems unrealistic and time consuming to run. Is it possible to give a CSV of input values like with junit to avoid the full cross product?

build.gradle Outdated
@@ -77,15 +79,15 @@ jmh {
exclude = "(BenchmarkTemplate)"
threads = 1
fork = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What led to these changes? In particular, I thought you ended up with a high number of warmup iterations because the results weren't stable otherwise. Is that not the case?

@Param({"0"})
public int noOfQuestionGroups = 1;
@Param({"0"})
public int noOfExternalSecondaryInstances = 50;
Copy link
Member

@lognaturel lognaturel Jul 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the explicit value that the parameters are set to are ignored in favor of the injected values. Is there a reason to have those explicit values? It would be clearer to just declare the fields without initializing them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was set in other to debug and test as a java class without running the benchmarks,

@codecov-io
Copy link

Codecov Report

Merging #454 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #454      +/-   ##
============================================
- Coverage     49.03%   49.02%   -0.02%     
+ Complexity     2940     2939       -1     
============================================
  Files           246      246              
  Lines         13698    13698              
  Branches       2650     2650              
============================================
- Hits           6717     6715       -2     
  Misses         6132     6132              
- Partials        849      851       +2
Impacted Files Coverage Δ Complexity Δ
...a/org/javarosa/core/services/PrototypeManager.java 79.16% <0%> (-8.34%) 8% <0%> (-1%)
...org/javarosa/core/model/condition/Triggerable.java 67.76% <0%> (ø) 25% <0%> (ø) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 196cd18...8808a60. Read the comment docs.

@JohnTheBeloved
Copy link
Contributor Author

Hi @lognaturel

I initially created the draft to give to an idea of how I implemented the mock xform creator,

I have made a couple of clean ups which should make things a bit more clearer.

As for the comparison between the benchmarks and profilings, Yes they are commensurate, Do you need some data to back that up?

I just assumed the options were used during the Question creation in a way, I have remove the select list control, so we just have input text as the only control type being used

The parameters that were used categorised the Questions, and Options to

  1. Small
  2. Medium
  3. Large
  4. Extralarge

I don't think it we have to benchmark only real world parameters at all times, The outrageous parameters also serve as a means of a stress test for the methods and also to see where efficiency begins to drop.

@JohnTheBeloved JohnTheBeloved marked this pull request as ready for review July 10, 2019 09:09
@JohnTheBeloved JohnTheBeloved changed the title WIP: Benchmark refactor Benchmark for Parsing and caching auto generated XForms Jul 10, 2019
@JohnTheBeloved JohnTheBeloved changed the title Benchmark for Parsing and caching auto generated XForms Benchmarks for Parsing and caching auto generated XForms Jul 10, 2019
@lognaturel
Copy link
Member

As for the comparison between the benchmarks and profilings, Yes they are commensurate, Do you need some data to back that up?

That's great. If you've taken a look and confirmed it, nothing more needed, I think.

I'll take another pass through soon. Overall my goal is not to be too picky about this now. We can look at individual benchmarks in more detail as they're being used to evaluate possible changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants