Skip to content

Performance

Eloi Durant edited this page Mar 1, 2022 · 9 revisions

Dataset limitations

Panache does show some limits with regards to the size of the dataset. The vuex component stores the full dataset in RAM memory and limitations come from the computer capacities on the client side. We tried with real datasets with respectively 15, 40, 50, 84, 163 and 491 genomes, and encountered memory issues during the loading of the biggest one.

To be more precise, we ran tests with simulated datasets of various sizes (number of genomes, chromosomes and blocks per chromosome) for a better assessment of their impact on loading times.

From our tests, Panache was proved to be functional with about 300 individuals displaying 2000 blocks by chromosome (~15Mb) or 163 rice genome with 2983 blocks by chromosome (~13Mb). It seems to have its loading limit at ~16Mb, future developments will address this issue.

Find below an array of the tests carried for various sizes of files (size in KB, time in ms)...

Displayed FileName NbOfGenomes NbOfChrom MedianNbOfBlocksPerChrom FileSize TimeForLoading TimeForParsing TimeForGrouping TimeForStorage
y fake200G_10C_1000B.tsv 200 10 1000 4217 9 2204 39 5455
y matrixOryRuf.pav 40 12 2970 4380 25 1908 328 5756
y fake210G_12C_1000B.tsv 210 12 1000 5297 47 6070 129 24669
y matrixOryBar.pav 84 12 2983 7739 19 2882 559 12530
y fake210G_12C_2000B.tsv 210 12 2000 10607 53 15817 343 52074
y fake240G_12C_2000B.tsv 240 12 2000 12014 46 19766 548 77462
y fake250G_5C_5000B.tsv 250 5 5000 13009 61 8410 115 60965
y matrixOryGla.pav 163 12 2983 13687 15 13177 362 66175
y fake310G_11C_2000B.tsv 310 11 2000 14020 71 17613 630 71415
y fake370G_10C_2000B.tsv 370 10 2000 15088 30 11601 143 80669
fake310G_13C_2000B.tsv 310 13 2000 16572 34 22437 420 NA
fake200G_10C_5000B.tsv 200 10 5000 21131 12 9559 217 NA
matrixOryGla_x2G.pav 326 12 2983 25970 19 15005 286 NA
fake210G_12C_5000B.tsv 210 12 5000 26539 24 19612 708 NA
matrixOrySat.pav 491 12 2970 37617 17 24769 352 NA
fake200G_10C_10000B.tsv 200 10 10000 42351 20 21417 636 NA
fake200G_10C_20000B.tsv 200 10 20000 84831 20 99502 16999 NA

Display management

From a representation point of view, there is no maximum number of genomes since only a small portion of the Presence/Absence matrix is visible at any given time, focusing on the summary information tracks that will always be visible. Actually, the amount of blocks visible at any moment is depending on the zoom level, and users screen size. We compute a 'nucleotide-to-pixel' ratio, so that between 10 and 200 pangenomic blocks can be seen at once.

More about the calculation of the zoom level here...

The zoom level defines a range of values for how much space a nucleotide should take in pixel, with a min and max values calculated with the following equation:

minNtWidthInPixel = (displayWindowWidth * nbOfBlocksInCurrentChrom) / (maxNbOfBlocksToDisplay * lastNtOfChrom)

With

  • minNtWidthInPixel: the minimum display size in pixel of a nucleotide
  • displayWindowWidth : the width in pixel available for the display of the Presence / Absence matrix
  • nbOfBlocksInCurrentChrom: the number of blocks in the panchromosome currently being displayed
  • maxNbOfBlocksToDisplay : the maximum number of pangenomic blocks that should be visible on screen
  • lastNtOfChrom: the last nucleotidic coordinate of the panchromosome currently being displayed

Zoom values are therefore linked to the mean width of the pangenomic blocks, and adapted to the screen size so that a certain amount of blocks is visible. This amount may be exceeded or not reached depending on the distribution of blocks locally. The minNbOfBlocksToDisplay and maxNbOfBlocksToDisplay are arbitrarily set at 10 and 200 respectively. The default zoom value is calculated as the mean of minNtWidthInPixel and maxNtWidthInPixel. We chose to limit the max number of features visible at any time for performance, as displaying many SVGs at once is resource consuming.

Clone this wiki locally