Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List: Optimisations and improvements #19

Open
axefrog opened this issue Feb 11, 2017 · 0 comments
Open

List: Optimisations and improvements #19

axefrog opened this issue Feb 11, 2017 · 0 comments

Comments

@axefrog
Copy link
Member

axefrog commented Feb 11, 2017

Performance testing shows superior performance with many bulk and sequential operations, and inferior performance with random groups of adhoc, interleaved operations that jump around the list in no particular order. Currently it's hard to predict what the most common usage will be, but performance needs to be strong across the board, as real-world usage isn't going to cater to the "preferred" set of operations for which the List structure works best.

The following optimisations are worth having a look at when there's time, and should not only help to significantly improve performance in adhoc scenarios, but may also help to reduce code complexity, which is currently very high.

  • Use the existing left and right views as write targets as is currently the case, but only use them for reading if they already point at the correct location. Do not relocate them except for writing. By doing this, the number of checks and balances surrounding reading will drop significantly.
  • Keep a separate reader view and move it around independently of the write views.
    Uncommitted ranges for intermediate views should be included in this. Caching this information will also reduce the complexity of managing the anchoring of the left/write views.
  • The reader view should keep a direct array of all slots in its path, and their ranges. There is no need to string multiple views together for this purpose. Traversal operations will be minimised by doing this. The array should be able to be treated like a super-simple skip list.
  • The reader is always subservient to write state, and does not affect data. Therefore, it only needs to be copied for changes to the list. For multiple reads against the same list, it is completely mutable. If two application modules are heavily using the same list in different ways and need to optimise performance further, the second module can simply clone the list in order to acquire an isolated reader.
  • As an extension of the above, the reader will have a direct reference to the root, allowing for quick random access reads.
  • Cache the write locations and ranges as list fields to eliminate the need for traversal-based range checking. The reader should be able to consult these to very quickly decide whether a commit first needs to be made. If writes are infrequent and reads are frequent, the amortized cost of reads will be almost the same as if uncommitted writes were not a concern at all. Caching these values may also allow a number of areas of code to be simplified or eliminated entirely.

Further Thinking

  • Use a single writer instead of dual left/right writers. By eliminating the second writer, a massive amount of complex logic can likely be removed. The left/right anchoring technique could remain in place as-is, but seeing as it applies to only a single write path, numerous checks and verifications would be able to be discarded.
  • Having one reader and one writer removes the need for view-selection logic.
  • Having the writer use a chain of lazily-committed views, as is the case right now, allows write speed to remain very fast. Rather than maintaining them as a chain though, keeping them in a short array instead allows much faster switching between different locations for writing, because, with a cached value indicating the lowest depth containing uncommitted changes, upward traversal can be skipped entirely in many cases by jumping straight to the correct level. Also, as the array will rarely be larger than 2-4 elements (one per depth level), the cost of cloning the array should be negligible, and the management of the view chain is easier to reduce to simple loops.
  • By having the reader and the writer maintain structural commonalities, the code footprint will likely be reduced, given that traversal logic will be shared.
  • The thought had crossed my mind that multiple writers may be useful for times when heavy modifications are being performed, such as during a sort operation. Instead of that, however, start by making sure the writer is committed, then perform the bulk operation outside of the management layer, then just reset the writers for subsequent adhoc operations.
  • The writer is the source of truth. The reader can be reconstructed from the writer (and root node discovery), but not vice versa.
@axefrog axefrog self-assigned this Feb 11, 2017
@axefrog axefrog changed the title Optimising/improving Collectable.List List: Optimisations and improvements Mar 11, 2017
@axefrog axefrog removed their assignment Mar 16, 2017
@axefrog axefrog modified the milestone: 1.0 Release Mar 20, 2017
@axefrog axefrog mentioned this issue Jul 28, 2017
15 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant