Performance

Dataset limitations

Panache does show some limits with regards to the size of the dataset. The vuex component stores the full dataset in RAM memory and limitations come from the computer capacities on the client side. We tried with real datasets with respectively 15, 40, 50, 84, 163 and 491 genomes, and encountered memory issues during the loading of the biggest one.

To be more precise, we ran tests with simulated datasets of various sizes (number of genomes, chromosomes and blocks per chromosome) for a better assessment of their impact on loading times.

From our tests, Panache was proved to be functional with about 300 individuals displaying 2000 blocks by chromosome (~15Mb) or 163 rice genome with 2983 blocks by chromosome (~13Mb). It seems to have its loading limit at ~16Mb, future developments will address this issue.

Find below an array of the tests carried for various sizes of files (size in KB, time in ms)...

Displayed	FileName	NbOfGenomes	NbOfChrom	MedianNbOfBlocksPerChrom	FileSize	TimeForLoading	TimeForParsing	TimeForGrouping	TimeForStorage
y	fake200G_10C_1000B.tsv	200	10	1000	4217	9	2204	39	5455
y	matrixOryRuf.pav	40	12	2970	4380	25	1908	328	5756
y	fake210G_12C_1000B.tsv	210	12	1000	5297	47	6070	129	24669
y	matrixOryBar.pav	84	12	2983	7739	19	2882	559	12530
y	fake210G_12C_2000B.tsv	210	12	2000	10607	53	15817	343	52074
y	fake240G_12C_2000B.tsv	240	12	2000	12014	46	19766	548	77462
y	fake250G_5C_5000B.tsv	250	5	5000	13009	61	8410	115	60965
y	matrixOryGla.pav	163	12	2983	13687	15	13177	362	66175
y	fake310G_11C_2000B.tsv	310	11	2000	14020	71	17613	630	71415
y	fake370G_10C_2000B.tsv	370	10	2000	15088	30	11601	143	80669
	fake310G_13C_2000B.tsv	310	13	2000	16572	34	22437	420	NA
	fake200G_10C_5000B.tsv	200	10	5000	21131	12	9559	217	NA
	matrixOryGla_x2G.pav	326	12	2983	25970	19	15005	286	NA
	fake210G_12C_5000B.tsv	210	12	5000	26539	24	19612	708	NA
	matrixOrySat.pav	491	12	2970	37617	17	24769	352	NA
	fake200G_10C_10000B.tsv	200	10	10000	42351	20	21417	636	NA
	fake200G_10C_20000B.tsv	200	10	20000	84831	20	99502	16999	NA

Display management

From a representation point of view, there is no maximum number of genomes since only a small portion of the Presence/Absence matrix is visible at any given time, focusing on the summary information tracks that will always be visible. Actually, the amount of blocks visible at any moment is depending on the zoom level, and users screen size. We compute a 'nucleotide-to-pixel' ratio, so that between 10 and 200 pangenomic blocks can be seen at once.

More about the calculation of the zoom level here...

The zoom level defines a range of values for how much space a nucleotide should take in pixel, with a min and max values calculated with the following equation:

minNtWidthInPixel = (displayWindowWidth * nbOfBlocksInCurrentChrom) / (maxNbOfBlocksToDisplay * lastNtOfChrom)

With

minNtWidthInPixel: the minimum display size in pixel of a nucleotide
displayWindowWidth : the width in pixel available for the display of the Presence / Absence matrix
nbOfBlocksInCurrentChrom: the number of blocks in the panchromosome currently being displayed
maxNbOfBlocksToDisplay : the maximum number of pangenomic blocks that should be visible on screen
lastNtOfChrom: the last nucleotidic coordinate of the panchromosome currently being displayed

Zoom values are therefore linked to the mean width of the pangenomic blocks, and adapted to the screen size so that a certain amount of blocks is visible. This amount may be exceeded or not reached depending on the distribution of blocks locally. The minNbOfBlocksToDisplay and maxNbOfBlocksToDisplay are arbitrarily set at 10 and 200 respectively. The default zoom value is calculated as the mean of minNtWidthInPixel and maxNtWidthInPixel. We chose to limit the max number of features visible at any time for performance, as displaying many SVGs at once is resource consuming.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance

Dataset limitations

Display management

Clone this wiki locally