Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Parallelized map and optimize Database search API #2669

Merged
merged 10 commits into from
Oct 7, 2024

Conversation

ndegwamartin
Copy link
Collaborator

@ndegwamartin ndegwamartin commented Sep 7, 2024

IMPORTANT: All PRs must be linked to an issue (except for extremely trivial and straightforward changes).

Fixes #2668

Description
Optimizes the DatabaseImpl search API's serialized FHIR Resource to HAPI FHIR Structure mapping block by introducing a parallelized implementation that uses async coroutines for each mapping iteration.

Alternative(s) considered
Have you considered any alternatives? And if so, why have you chosen the approach in this PR?

Type
Enhancement

Screenshots (if applicable)

Checklist

  • I have read and acknowledged the Code of conduct.
  • I have read the Contributing page.
  • I have signed the Google Individual CLA, or I am covered by my company's Corporate CLA.
  • I have discussed my proposed solution with code owners in the linked issue(s) and we have agreed upon the general approach.
  • I have run ./gradlew spotlessApply and ./gradlew spotlessCheck to check my code follows the style guide of this project.
  • I have run ./gradlew check and ./gradlew connectedCheck to test my changes locally.
  • I have built and run the demo app(s) to verify my change fixes the issue and/or does not break the demo app(s).

@ndegwamartin ndegwamartin requested a review from a team as a code owner September 7, 2024 10:43
- Optimize Database search API
Copy link
Collaborator

@FikriMilano FikriMilano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks great!

Additionally, could you provide some performance comparison between the old and new code? That will be cool to know

Copy link
Collaborator

@jingtang10 jingtang10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work thanks @ndegwamartin!

ndegwamartin added a commit to opensrp/android-fhir that referenced this pull request Sep 10, 2024
FORK
         - With unmerged PR #9
            - WUP  #13

SDK
            - WUP google#2178
            - WUP google#2650
            - WUP google#2663
PERF
- WUP google#2669
- WUP google#2565
- WUP google#2561
- WUP google#2535
@jingtang10
Copy link
Collaborator

To summarised our discussion yesterday, I think there's still work to be done in this PR - @ndegwamartin to investigate thread pool etc. Pls comment when this is ready for next round of review - @FikriMilano @aditya-07 @yigit @stevenckngaa @vorburger @kevinmost pls also take a look at this.

@ndegwamartin
Copy link
Collaborator Author

Device: Physical, Samsung Galaxy Active Tab 2
Mode : Benchmarking with Kotlin system Timing's measureTimeMillis
Scope: Database search API method search

Optimization: None

Run 1

Resource Type Total Records Timetaken(seconds) DB Query(seconds)
Group ~1K 8 ~0.2
Task ~17K ~24 ~1.8
Patient ~11K ~456 ~1.3

Run 2

Resource Type Total Records Timetaken(seconds) DB Query(seconds)
Group ~1K ~2 ~0.1
Task ~17K ~22 ~1.7
Patient ~11K ~472 ~1.2

Optimization: Using async with parent context (usually Dispatchers.IO)

Run 1

Resource Type Total Records Timetaken(seconds) DB Query(seconds)
Group ~1K 4.8 ~0.2
Task ~17K ~24 ~1.7
Patient ~11K ~450 ~1.3

Run 2

Resource Type Total Records Timetaken(seconds) DB Query(seconds)
Group ~1K ~2 ~0.1
Task ~17K ~24 ~1.7
Patient ~11K ~455 ~1.3

Optimization: Using async with Dispatchers.Default .
(Note - Threads safety of the FHIR JsonParser is achieved through creating a new instance for each loop)

Run 1

Resource Type Total Records Timetaken(seconds) DB Query(seconds)
Group ~1K ~5 ~0.2
Task ~17K ~5.4 ~1.8
Patient ~11K ~208 ~1.4

Run 2

Resource Type Total Records Timetaken(seconds) DB Query(seconds)
Group ~1K ~0.5 ~0.1
Task ~17K ~5 ~1.7
Patient ~11K ~204 ~1.3

Note - The tests were carried out in a QA test environment. In the real world Patients would be more than Groups (i.e. Patients = ~10 x No. of Groups ) and Tasks would be even more than Patients (i.e Tasks = ~30 x No. of Patients)

@ndegwamartin
Copy link
Collaborator Author

Full specs of the device:

  • Samsung Galaxy Tab Active2
  • Android 9 (28)
  • 3GB Memory

@ndegwamartin ndegwamartin marked this pull request as ready for review September 18, 2024 10:46
Copy link
Collaborator

@FikriMilano FikriMilano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Copy link
Collaborator

@MJ1998 MJ1998 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when you don't parallelize the parsing but just run the parsing in the Default Dispatcher ?

A new function:-

fun IParser.parseResourceInCPU(resourceString: String) = 
    withContext(Dispatchers.Default) { parseResource(resourceString) }

and replace all parseResource with this function.

I am surprised that parallelizing (second optimization) did not improve the search api performance. If parallelizing is not improving then do we really need it ?

@ndegwamartin
Copy link
Collaborator Author

Device: Physical, Samsung Galaxy Active Tab 2
Mode : Benchmarking with Kotlin system Timing's measureTimeMillis
Scope: Database search API method search - (Deserialization/mapping)

Optimization: Using IParser.parseResourceInCPU() with context Dispatchers.Default

Run 1

Resource Type Total Records Timetaken(seconds)
Group ~1K ~5
Task ~17K ~29
Patient ~11K ~457

Run 2

Resource Type Total Records Timetaken(seconds)
Group ~1K ~2
Task ~17K ~28
Patient ~11K ~461

cc @MJ1998

@MJ1998
Copy link
Collaborator

MJ1998 commented Sep 30, 2024

So its only when we do both - parallelize and use Default dispatcher - we see performance improvement, correct ?
@ndegwamartin

@ndegwamartin
Copy link
Collaborator Author

So its only when we do both - parallelize and use Default dispatcher - we see performance improvement, correct ? @ndegwamartin

Correct.

@MJ1998
Copy link
Collaborator

MJ1998 commented Oct 1, 2024

Okay so I think using Dispatchers.IO for CPU-intensive tasks might be less efficient due to thread type mismatch.

Copy link
Collaborator

@jingtang10 jingtang10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @ndegwamartin for all the work! great results!

ndegwamartin added a commit to opensrp/android-fhir that referenced this pull request Oct 2, 2024
    FORK
             - With unmerged PR #9
                - WUP  #13

    SDK
                - WUP google#2178
                - WUP google#2650
                - WUP google#2663
    PERF
    - WUP google#2669
    - WUP google#2565
    - WUP google#2561
    - WUP google#2535
Copy link
Collaborator

@MJ1998 MJ1998 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@MJ1998 MJ1998 merged commit 424ab83 into google:master Oct 7, 2024
6 checks passed
@jingtang10 jingtang10 deleted the issue2668-opt-dbsearch branch October 7, 2024 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Complete
Development

Successfully merging this pull request may close these issues.

Optimize the Database Search API
5 participants