Implement Parallelized map and optimize Database search API #2669

ndegwamartin · 2024-09-07T10:43:27Z

IMPORTANT: All PRs must be linked to an issue (except for extremely trivial and straightforward changes).

Description
Optimizes the DatabaseImpl search API's serialized FHIR Resource to HAPI FHIR Structure mapping block by introducing a parallelized implementation that uses async coroutines for each mapping iteration.

Alternative(s) considered
Have you considered any alternatives? And if so, why have you chosen the approach in this PR?

Type
Enhancement

Screenshots (if applicable)

Checklist

I have read and acknowledged the Code of conduct.
I have read the Contributing page.
I have signed the Google Individual CLA, or I am covered by my company's Corporate CLA.
I have discussed my proposed solution with code owners in the linked issue(s) and we have agreed upon the general approach.
I have run ./gradlew spotlessApply and ./gradlew spotlessCheck to check my code follows the style guide of this project.
I have run ./gradlew check and ./gradlew connectedCheck to test my changes locally.
I have built and run the demo app(s) to verify my change fixes the issue and/or does not break the demo app(s).

- Optimize Database search API

FikriMilano

The change looks great!

Additionally, could you provide some performance comparison between the old and new code? That will be cool to know

jingtang10

great work thanks @ndegwamartin!

FORK - With unmerged PR #9 - WUP #13 SDK - WUP google#2178 - WUP google#2650 - WUP google#2663 PERF - WUP google#2669 - WUP google#2565 - WUP google#2561 - WUP google#2535

jingtang10 · 2024-09-11T14:18:15Z

To summarised our discussion yesterday, I think there's still work to be done in this PR - @ndegwamartin to investigate thread pool etc. Pls comment when this is ready for next round of review - @FikriMilano @aditya-07 @yigit @stevenckngaa @vorburger @kevinmost pls also take a look at this.

ndegwamartin · 2024-09-17T16:15:02Z

Device: Physical, Samsung Galaxy Active Tab 2
Mode : Benchmarking with Kotlin system Timing's measureTimeMillis
Scope: Database search API method search

Optimization: None

Run 1

Resource Type	Total Records	Timetaken(seconds)	DB Query(seconds)
Group	~1K	8	~0.2
Task	~17K	~24	~1.8
Patient	~11K	~456	~1.3

Run 2

Resource Type	Total Records	Timetaken(seconds)	DB Query(seconds)
Group	~1K	~2	~0.1
Task	~17K	~22	~1.7
Patient	~11K	~472	~1.2

Optimization: Using async with parent context (usually Dispatchers.IO)

Run 1

Resource Type	Total Records	Timetaken(seconds)	DB Query(seconds)
Group	~1K	4.8	~0.2
Task	~17K	~24	~1.7
Patient	~11K	~450	~1.3

Run 2

Resource Type	Total Records	Timetaken(seconds)	DB Query(seconds)
Group	~1K	~2	~0.1
Task	~17K	~24	~1.7
Patient	~11K	~455	~1.3

Optimization: Using async with Dispatchers.Default .
(Note - Threads safety of the FHIR JsonParser is achieved through creating a new instance for each loop)

Run 1

Resource Type	Total Records	Timetaken(seconds)	DB Query(seconds)
Group	~1K	~5	~0.2
Task	~17K	~5.4	~1.8
Patient	~11K	~208	~1.4

Run 2

Resource Type	Total Records	Timetaken(seconds)	DB Query(seconds)
Group	~1K	~0.5	~0.1
Task	~17K	~5	~1.7
Patient	~11K	~204	~1.3

Note - The tests were carried out in a QA test environment. In the real world Patients would be more than Groups (i.e. Patients = ~10 x No. of Groups ) and Tasks would be even more than Patients (i.e Tasks = ~30 x No. of Patients)

ndegwamartin · 2024-09-17T16:24:32Z

Full specs of the device:

Samsung Galaxy Tab Active2
Android 9 (28)
3GB Memory

engine/src/main/java/com/google/android/fhir/db/impl/DatabaseImpl.kt

FikriMilano

Looks good!

MJ1998

What happens when you don't parallelize the parsing but just run the parsing in the Default Dispatcher ?

A new function:-

fun IParser.parseResourceInCPU(resourceString: String) = 
    withContext(Dispatchers.Default) { parseResource(resourceString) }

and replace all parseResource with this function.

I am surprised that parallelizing (second optimization) did not improve the search api performance. If parallelizing is not improving then do we really need it ?

ndegwamartin · 2024-09-30T12:49:37Z

Device: Physical, Samsung Galaxy Active Tab 2
Mode : Benchmarking with Kotlin system Timing's measureTimeMillis
Scope: Database search API method search - (Deserialization/mapping)

Optimization: Using IParser.parseResourceInCPU() with context Dispatchers.Default

Run 1

Resource Type	Total Records	Timetaken(seconds)
Group	~1K	~5
Task	~17K	~29
Patient	~11K	~457

Run 2

Resource Type	Total Records	Timetaken(seconds)
Group	~1K	~2
Task	~17K	~28
Patient	~11K	~461

cc @MJ1998

MJ1998 · 2024-09-30T12:55:08Z

So its only when we do both - parallelize and use Default dispatcher - we see performance improvement, correct ?
@ndegwamartin

ndegwamartin · 2024-09-30T12:55:30Z

So its only when we do both - parallelize and use Default dispatcher - we see performance improvement, correct ? @ndegwamartin

Correct.

MJ1998 · 2024-10-01T06:54:10Z

Okay so I think using Dispatchers.IO for CPU-intensive tasks might be less efficient due to thread type mismatch.

engine/src/main/java/com/google/android/fhir/Util.kt

jingtang10

thanks @ndegwamartin for all the work! great results!

engine/src/test/java/com/google/android/fhir/UtilTest.kt

FORK - With unmerged PR #9 - WUP #13 SDK - WUP google#2178 - WUP google#2650 - WUP google#2663 PERF - WUP google#2669 - WUP google#2565 - WUP google#2561 - WUP google#2535

MJ1998

LGTM.

ndegwamartin requested a review from a team as a code owner September 7, 2024 10:43

ndegwamartin requested a review from jingtang10 September 7, 2024 10:43

Implement Parallelized Map

2fc407c

- Optimize Database search API

ndegwamartin force-pushed the issue2668-opt-dbsearch branch from 0eb0187 to 2fc407c Compare September 7, 2024 10:46

FikriMilano reviewed Sep 9, 2024

View reviewed changes

jingtang10 reviewed Sep 10, 2024

View reviewed changes

Merge branch 'master' into issue2668-opt-dbsearch

edded37

ndegwamartin mentioned this pull request Sep 17, 2024

Optimize the Database Search API #2668

Closed

ndegwamartin marked this pull request as draft September 17, 2024 16:41

Search API perfomance DB optimization - Default Dispatcher

209da12

ndegwamartin marked this pull request as ready for review September 18, 2024 10:46

MJ1998 reviewed Sep 19, 2024

View reviewed changes

engine/src/main/java/com/google/android/fhir/db/impl/DatabaseImpl.kt Outdated Show resolved Hide resolved

FikriMilano approved these changes Sep 23, 2024

View reviewed changes

ndegwamartin and others added 2 commits September 24, 2024 11:01

Merge branch 'master' into issue2668-opt-dbsearch

5f8dae9

Clean up

9d8ca72

MJ1998 requested changes Sep 27, 2024

View reviewed changes

ellykits reviewed Oct 1, 2024

View reviewed changes

engine/src/main/java/com/google/android/fhir/Util.kt Outdated Show resolved Hide resolved

jingtang10 approved these changes Oct 1, 2024

View reviewed changes

ndegwamartin and others added 2 commits October 1, 2024 15:00

Merge branch 'master' into issue2668-opt-dbsearch

927f845

Clean up PR

0ba866c

ndegwamartin mentioned this pull request Oct 1, 2024

Update documentation with search API best practice #2684

Open

jingtang10 approved these changes Oct 1, 2024

View reviewed changes

engine/src/test/java/com/google/android/fhir/UtilTest.kt Outdated Show resolved Hide resolved

ndegwamartin and others added 3 commits October 5, 2024 20:42

Merge remote-tracking branch 'origin/master' into issue2668-opt-dbsearch

a9d3603

Remove Concurrency Unit Test

659d677

Merge branch 'master' into issue2668-opt-dbsearch

2271e97

MJ1998 approved these changes Oct 7, 2024

View reviewed changes

MJ1998 merged commit 424ab83 into google:master Oct 7, 2024
6 checks passed

jingtang10 deleted the issue2668-opt-dbsearch branch October 7, 2024 10:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Parallelized map and optimize Database search API #2669

Implement Parallelized map and optimize Database search API #2669

ndegwamartin commented Sep 7, 2024 •

edited

Loading

FikriMilano left a comment

jingtang10 left a comment

jingtang10 commented Sep 11, 2024

ndegwamartin commented Sep 17, 2024

ndegwamartin commented Sep 17, 2024

FikriMilano left a comment

MJ1998 left a comment

ndegwamartin commented Sep 30, 2024

MJ1998 commented Sep 30, 2024

ndegwamartin commented Sep 30, 2024

MJ1998 commented Oct 1, 2024

jingtang10 left a comment

MJ1998 left a comment

Implement Parallelized map and optimize Database search API #2669

Implement Parallelized map and optimize Database search API #2669

Conversation

ndegwamartin commented Sep 7, 2024 • edited Loading

FikriMilano left a comment

Choose a reason for hiding this comment

jingtang10 left a comment

Choose a reason for hiding this comment

jingtang10 commented Sep 11, 2024

ndegwamartin commented Sep 17, 2024

ndegwamartin commented Sep 17, 2024

FikriMilano left a comment

Choose a reason for hiding this comment

MJ1998 left a comment

Choose a reason for hiding this comment

ndegwamartin commented Sep 30, 2024

MJ1998 commented Sep 30, 2024

ndegwamartin commented Sep 30, 2024

MJ1998 commented Oct 1, 2024

jingtang10 left a comment

Choose a reason for hiding this comment

MJ1998 left a comment

Choose a reason for hiding this comment

ndegwamartin commented Sep 7, 2024 •

edited

Loading