-
Notifications
You must be signed in to change notification settings - Fork 4
Performance
Panache does show some limits with regards to the size of the dataset. The vuex component stores the full dataset in RAM memory and limitations come from the computer capacities on the client side. We tried with real datasets with respectively 15, 40, 50, 84, 163 and 491 genomes, and encountered memory issues during the loading of the biggest one.
To be more precise, we ran tests with simulated datasets of various sizes (number of genomes, chromosomes and blocks per chromosome) for a better assessment of their impact on loading times.
From our tests, Panache was proved to be functional with about 300 individuals displaying 2000 blocks by chromosome (~15Mb) or 163 rice genome with 2983 blocks by chromosome (~13Mb). It seems to have its loading limit at ~16Mb, future developments will address this issue.
Find below an array of the tests carried for various sizes of files (size in KB, time in ms)...
Displayed | FileName | NbOfGenomes | NbOfChrom | MedianNbOfBlocksPerChrom | FileSize | TimeForLoading | TimeForParsing | TimeForGrouping | TimeForStorage |
---|---|---|---|---|---|---|---|---|---|
y | fake200G_10C_1000B.tsv | 200 | 10 | 1000 | 4217 | 9 | 2204 | 39 | 5455 |
y | matrixOryRuf.pav | 40 | 12 | 2970 | 4380 | 25 | 1908 | 328 | 5756 |
y | fake210G_12C_1000B.tsv | 210 | 12 | 1000 | 5297 | 47 | 6070 | 129 | 24669 |
y | matrixOryBar.pav | 84 | 12 | 2983 | 7739 | 19 | 2882 | 559 | 12530 |
y | fake210G_12C_2000B.tsv | 210 | 12 | 2000 | 10607 | 53 | 15817 | 343 | 52074 |
y | fake240G_12C_2000B.tsv | 240 | 12 | 2000 | 12014 | 46 | 19766 | 548 | 77462 |
y | fake250G_5C_5000B.tsv | 250 | 5 | 5000 | 13009 | 61 | 8410 | 115 | 60965 |
y | matrixOryGla.pav | 163 | 12 | 2983 | 13687 | 15 | 13177 | 362 | 66175 |
y | fake310G_11C_2000B.tsv | 310 | 11 | 2000 | 14020 | 71 | 17613 | 630 | 71415 |
y | fake370G_10C_2000B.tsv | 370 | 10 | 2000 | 15088 | 30 | 11601 | 143 | 80669 |
fake310G_13C_2000B.tsv | 310 | 13 | 2000 | 16572 | 34 | 22437 | 420 | NA | |
fake200G_10C_5000B.tsv | 200 | 10 | 5000 | 21131 | 12 | 9559 | 217 | NA | |
matrixOryGla_x2G.pav | 326 | 12 | 2983 | 25970 | 19 | 15005 | 286 | NA | |
fake210G_12C_5000B.tsv | 210 | 12 | 5000 | 26539 | 24 | 19612 | 708 | NA | |
matrixOrySat.pav | 491 | 12 | 2970 | 37617 | 17 | 24769 | 352 | NA | |
fake200G_10C_10000B.tsv | 200 | 10 | 10000 | 42351 | 20 | 21417 | 636 | NA | |
fake200G_10C_20000B.tsv | 200 | 10 | 20000 | 84831 | 20 | 99502 | 16999 | NA |
From a representation point of view, there is no maximum number of genomes since only a small portion of the Presence/Absence matrix is visible at any given time, focusing on the summary information tracks that will always be visible. Actually, the amount of blocks visible at any moment is depending on the zoom level, and users screen size. We compute a 'nucleotide-to-pixel' ratio, so that between 10 and 200 pangenomic blocks can be seen at once.
More about the calculation of the zoom level here...
The zoom level defines a range of values for how much space a nucleotide should take in pixel, with a min and max values calculated with the following equation:
minNtWidthInPixel = (displayWindowWidth * nbOfBlocksInCurrentChrom) / (maxNbOfBlocksToDisplay * lastNtOfChrom)
With
- minNtWidthInPixel: the minimum display size in pixel of a nucleotide
- displayWindowWidth : the width in pixel available for the display of the Presence / Absence matrix
- nbOfBlocksInCurrentChrom: the number of blocks in the panchromosome currently being displayed
- maxNbOfBlocksToDisplay : the maximum number of pangenomic blocks that should be visible on screen
- lastNtOfChrom: the last nucleotidic coordinate of the panchromosome currently being displayed
Zoom values are therefore linked to the mean width of the pangenomic blocks, and adapted to the screen size so that a certain amount of blocks is visible. This amount may be exceeded or not reached depending on the distribution of blocks locally. The minNbOfBlocksToDisplay and maxNbOfBlocksToDisplay are arbitrarily set at 10 and 200 respectively. The default zoom value is calculated as the mean of minNtWidthInPixel and maxNtWidthInPixel. We chose to limit the max number of features visible at any time for performance, as displaying many SVGs at once is resource consuming.