List: Optimisations and improvements #19

axefrog · 2017-02-11T17:54:21Z

Performance testing shows superior performance with many bulk and sequential operations, and inferior performance with random groups of adhoc, interleaved operations that jump around the list in no particular order. Currently it's hard to predict what the most common usage will be, but performance needs to be strong across the board, as real-world usage isn't going to cater to the "preferred" set of operations for which the List structure works best.

The following optimisations are worth having a look at when there's time, and should not only help to significantly improve performance in adhoc scenarios, but may also help to reduce code complexity, which is currently very high.

Use the existing left and right views as write targets as is currently the case, but only use them for reading if they already point at the correct location. Do not relocate them except for writing. By doing this, the number of checks and balances surrounding reading will drop significantly.
Keep a separate reader view and move it around independently of the write views.
Uncommitted ranges for intermediate views should be included in this. Caching this information will also reduce the complexity of managing the anchoring of the left/write views.
The reader view should keep a direct array of all slots in its path, and their ranges. There is no need to string multiple views together for this purpose. Traversal operations will be minimised by doing this. The array should be able to be treated like a super-simple skip list.
The reader is always subservient to write state, and does not affect data. Therefore, it only needs to be copied for changes to the list. For multiple reads against the same list, it is completely mutable. If two application modules are heavily using the same list in different ways and need to optimise performance further, the second module can simply clone the list in order to acquire an isolated reader.
As an extension of the above, the reader will have a direct reference to the root, allowing for quick random access reads.
Cache the write locations and ranges as list fields to eliminate the need for traversal-based range checking. The reader should be able to consult these to very quickly decide whether a commit first needs to be made. If writes are infrequent and reads are frequent, the amortized cost of reads will be almost the same as if uncommitted writes were not a concern at all. Caching these values may also allow a number of areas of code to be simplified or eliminated entirely.

Further Thinking

Use a single writer instead of dual left/right writers. By eliminating the second writer, a massive amount of complex logic can likely be removed. The left/right anchoring technique could remain in place as-is, but seeing as it applies to only a single write path, numerous checks and verifications would be able to be discarded.
Having one reader and one writer removes the need for view-selection logic.
Having the writer use a chain of lazily-committed views, as is the case right now, allows write speed to remain very fast. Rather than maintaining them as a chain though, keeping them in a short array instead allows much faster switching between different locations for writing, because, with a cached value indicating the lowest depth containing uncommitted changes, upward traversal can be skipped entirely in many cases by jumping straight to the correct level. Also, as the array will rarely be larger than 2-4 elements (one per depth level), the cost of cloning the array should be negligible, and the management of the view chain is easier to reduce to simple loops.
By having the reader and the writer maintain structural commonalities, the code footprint will likely be reduced, given that traversal logic will be shared.
The thought had crossed my mind that multiple writers may be useful for times when heavy modifications are being performed, such as during a sort operation. Instead of that, however, start by making sure the writer is committed, then perform the bulk operation outside of the management layer, then just reset the writers for subsequent adhoc operations.
The writer is the source of truth. The reader can be reconstructed from the writer (and root node discovery), but not vice versa.

axefrog self-assigned this Feb 11, 2017

axefrog changed the title ~~Optimising/improving Collectable.List~~ List: Optimisations and improvements Mar 11, 2017

axefrog added @collectable/list refac perf labels Mar 11, 2017

axefrog removed their assignment Mar 16, 2017

axefrog modified the milestone: 1.0 Release Mar 20, 2017

axefrog mentioned this issue Jul 28, 2017

fantasy-land compliance #49

Open

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

List: Optimisations and improvements #19

List: Optimisations and improvements #19

axefrog commented Feb 11, 2017 •

edited

Loading

List: Optimisations and improvements #19

List: Optimisations and improvements #19

Comments

axefrog commented Feb 11, 2017 • edited Loading

axefrog commented Feb 11, 2017 •

edited

Loading