Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClickHouse-Only cBioPortal Demo based on standard schema #10839

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

sheridancbio
Copy link
Contributor

This branch adapts the cBioPortal production codebase for direct use with a ClickHouse deployed database. This is to demonstrate and explore the feasibility of eliminating the mySQL dependency entirely as part of our adoption of OLAP based tools.

Describe changes proposed in this pull request:

  • mysql database driver replaced with clickhouse java driver (v0.6.0)
  • all table names converted to lower case (underscore delimited)
  • all field names converted to lower case (underscore delimited)
  • long IN(...) clauses contracted by eliminating unneeded whitespace
  • debugging of previous development branch attempted at 2024 hackathon (in progress)

Known issues and remaining work

  • substantial functionality still needs to be debugged
    • the sample counts on the home page
    • the results page
    • the patient view page
    • the alterationCounts endpoints for the study view page
  • performance is still unacceptable for the application of large study (genie) filters, with durations of > 10 minutes. This seems to determine that we will need to also deploy additional ClickHouse (RFC-80) strategies for optimizing performance, primarily through the avoidance of passing long sample lists into persistence layer IN (...) clauses
  • as part of the rollout of OLAP, new import functionality must be put in place. If the MySql database is to be dropped, a new process is needed for migrating, validating, and importing database updates into ClickHouse and our schema definition (cgds.sql) needs to be reworked, as well as making available new seed databases in the new format.
  • documentation needs extensive revision

Checks

Any screenshots or GIFs?

If this is a new visual feature please add a before/after screenshot or gif
here with e.g. Giphy CAPTURE or Peek

Notify reviewers

Read our Pull request merging
policy
. It can help to figure out who worked on the
file before you. Please use git blame <filename> to determine that
and notify them either through slack or by assigning them as a reviewer on the PR

pvannierop and others added 5 commits April 18, 2024 19:42
- table names in mybatis mappers standardized to lower case only
- SQL format bugs fixed (such as 'ONSAMPLE.PATIENT_ID' -> 'ON sample.PATIENT_ID'
- SQL keywords captitalized where noticed
- comment out password setting to allow use of local redis
- some whitespace standardization included in AlterationCountsMapper.xml
- whitespace changes to AlterationCountsMapper.xml not separate from lowercasing of fields
- also partial progress (to rendering of home page) fixing remaining GROUP BY issues
- format SQL IN list clauses with minimal whitespace
- attempt at debugging AlterationCountMapper (incomplete)
- fix some missed case conversions (to all lower case fields and tables)
@sheridancbio sheridancbio force-pushed the demo-clickhouse-poc-uppercase-columnnames branch 4 times, most recently from 9bbb691 to f5cad33 Compare August 22, 2024 00:50
put placeholder in migration.sql and make db version consistent
@sheridancbio sheridancbio force-pushed the demo-clickhouse-poc-uppercase-columnnames branch from f5cad33 to eab18ce Compare August 22, 2024 00:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants