Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up CnaEvent lookup during import #25

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sheridancbio
Copy link
Contributor

  • store existing events from the database in a HashMap instead of HashSet
  • retrieval from HashMap uses CnaEvent.Event equals() and hashMap() semantics
  • retrieval is neccessary in order to obtain the associated event_id from the db record
  • this avoids a linear search through the set of all CnaEvent.Events in the database

- store existing events from the database in a HashMap instead of HashSet
- retrieval from HashMap uses CnaEvent.Event equals() and hashMap() semantics
- retrieval is neccessary in order to obtain the associated event_id from the db record
- this avoids a linear search through the set of all CnaEvent.Events in the database
@sheridancbio sheridancbio added enhancement New feature or request DO NOT MERGE This is not yet ready for merge labels Mar 15, 2024
@sheridancbio
Copy link
Contributor Author

sheridancbio commented Mar 15, 2024

This is an in-progress attempt to implement an efficient lookup of existing CnaEvent.Event references as intended by #1
That PR had a bug, in that it obtained the event_id value from the newly created CnaEvent as parsed from the data_cna.txt file. However, the constructed CnaEvent objects coming from that parsing will not have any event_id set (they will default to 0). Instead, the event_id must be retrieved from an existing CnaEvent.Event object retrieved from the database through the DaoCnaEvent.getAllCnaEvents() call.

Instead of storing the CnaEvent.Events in a java Set, this PR stores them in a Map, mapping each event to itself. This allows retrieval from the Hashmap using .get(), which is efficient (more efficient than the linear search). In order for that to work correctly, the equals() function and hashCode() function of CnaEvent.Event has been overridden (previously) to function based only on gene and alteration fields (not event_id). The previous Map was demoted to a Set in this PR: cBioPortal/cbioportal#9847
This PR reverses that and restores the previous HashMap lookup.

@sheridancbio
Copy link
Contributor Author

Note : the validity of using the HashMap representation to retrieve the event_id populated objects based on a non-event_id argument to the Map.get(key) function was tested. However, a built importer with these code changes failed to obtain the event_id values as expected. Some additional debugging is needed ... probably related to the equals() and hashCode() functions of the other types contained inside of the CnaEvent.Event type.

@forus forus mentioned this pull request Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DO NOT MERGE This is not yet ready for merge enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant