Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize for performance, relationship resolver improvements #3917

Merged
merged 22 commits into from
Oct 11, 2023

Conversation

HebaruSan
Copy link
Member

@HebaruSan HebaruSan commented Sep 29, 2023

Motivation

Somehow I found the documentation for Parallel LINQ and got excited:

Parallel processing is powerful but tricky, so it is nice to have a framework that takes care of the difficult parts for you. Some of the slower parts of CKAN happen to be what that documentation calls "delightfully parallel", so a survey of opportunities to apply PLINQ is in order.

Problems

While working on this, I found several issues:

  • CmdLine supports an identifier=version syntax for specifying , but even though neither ConsoleUI nor GUI nor NetKAN uses it, it's implemented in Core. This really belongs in CmdLine.
  • InconsistentKraken uses a property called InconsistenciesPretty to return a text description of the problems it represents, meanwhile not even setting the Exception.Message property at all even though it serves the same purpose. When standard mechanisms like Exception.Message are available, we should use them rather than creating something redundant.
  • RelationshipResolver.ConflictList included key-value pairs with the value set to null, which meant that when we tried to use it to generate a description of the conflicts, we got a spurious conflict with "an unmanaged DLL or DLC"
  • If you select to install two mods that have conflicting dependencies, the mods' rows aren't highlighted. The conflict description text doesn't explain how the conflicting mods relate to your chosen mods, and it mentions each conflict twice, once as "A conflicts with B", followed by "B conflicts with A".
  • If you select to install mods with conflicts, a description of the conflict appears in the status bar, but if you deselect them, it doesn't go away.
  • If you choose to uninstall a mod that other mods depend on, it shows up in the changeset as "Requested by user", which is not strictly true.
  • If you turn on the "Hide epochs" setting and then load recommendations for a changeset, then click a row to view the corresponding mod info, the Relationships tab will have a stop sign representing conflicts for every mod, because a parameter to GUIMod is missing and the HideEpochs setting is passed in the incompatible parameter.
  • The new suppress_recommendations metadata property doesn't do anything for CmdLine. Arguably it should, especially given its behavior of automatically pulling in all recommendations by default.
  • The recommendations tab unintentionally triggers the checkbox-changed event of its ListView for every row added while loading, running and re-running its conflicts check, which can take a long time for a large changeset. A similar problem happens when you click the checkbox at the lower-left; checkboxes are changed one by one, and conflicts are re-checked after every such change.

Changes

Each change is nicely split up into its own independent commit, which compiles and passes all tests.

  • Since the first step in improving performance is always measurement, Logging.WithTimeElapsed is now a warpper around System.Diagnostics.Stopwatch for easily printing how long a particular piece of code takes to run. This was used to measure all parallelization changes to confirm that they improved performance.
    (The first thing I tried to parallelize was the test from Add test to check GUI thread safety #3914, which consistently ran slower instead. Figuring out the reasons for that was very helpful in getting familiar with PLINQ.)
  • Now some of the loading of Dictionary<string, T> objects in Registry and RepositoryData is parallelized by a new JsonParallelDictionaryConverter. The actual loading from the file has to be done sequentially because of how disks work, but if we take a JObject as the output of that, we can turn the values of its properties into AvailableModule or InstalledModule objects in parallel. Preserving the progress bar updates during loading was tricky, but I ended up using a new WithProgress function to inject that capability into the PLINQ pipeline, and treating the JObject step and the parallelized step as about 50% each seemed to give the smoothest continuity.
  • The Versions tab already used a background thread for loading its row colors, but it was only one thread and stil somewhat slow. Now it is fully parallelized courtesy of PLINQ.
  • The CompatibilitySorter now performs its compatible-providers search, initial simple compatibility checks, and dependency checks in parallel.
  • Creation of GUIMod objects from repo/registry data is now done in parallel
  • Creation of grid rows from GUIMod objects is now done in parallel
  • Filtering (setting .Visible property of) grid rows is now done in parallel
  • The changeset sorting algorithm from Install dependencies first #3667 is replaced with a more robust, less heuristic-based algorithm that's also faster thanks to being parallelized. To sum up, it treats the changeset as a graph (in the computer science sense) and assigns each mod in the changeset a number based on how many nodes (mods) can be visited by tracing a breadth-first search from dependency to depending mod, all the way to the user's original choices. This has many desirable properties, such as ensuring that dependencies always appear before their depending mods, putting ModuleManager at the very top, and handling dependency loops as well as they can be.
  • Similar to the cached data deserialization, the loading of modules from .tar.gz files is also now parallelized. Again the file contents have to be loaded sequentially, but once that's done, we hand off to multiple threads to turn them into CkanModule objects.
  • Now the recommendations tab loads much more quickly, partly due to being more parallelized and partly because it no longer checks every row for conflicts during loading.
  • Since Validate downloaded files against metadata before adding to cache #2243 we have been validating ZIP files before storing them to the cache, but up till now we still validated cached ZIPs again before installing. This takes quite a while for large ZIPs, and isn't necessary because ZIPs can't get into the cache without already being valid. It made some sense to leave it like this back then since people might still have had invalid files in the cache, but nowadays it's redundant. Now this extra re-validation is removed, which will speed up installs.
  • Now all the supporting code for identifier=syntax is moved from Core to CmdLine. This relieves Core from the burden of providing it, and makes it easier to understand in the context where it applies. A test for the old code is removed, and tests for the new code are added.
  • Now InconsistentKraken.InconsistenciesPretty is replaced by Exception.Message.
  • Now RelationshipResolver.ConflictList returns the user's choices as the keys of its pairs, so every mod that has a conflict or a conflicting dependency will be highlighted.
  • Now RelationshipResolver.ConflictDescriptions is added to give a full, readable description of the conflicts.
    • Each conflict is shown once instead of twice.
    • When a conflicting mod is a dependency rather than chosen by the user, it has (needed for: ModZ 1.1) appended to it so the user can understand which mods brought the conflict into the changeset.
    • The conflict description disappears if you deselect the conflicting mods.
  • Now if you remove a dependency, its depending mods say "Dependency removed" in the changeset
  • Now the spurious stop signs in mod info are fixed, and the parameters that caused the confusion are no longer optional and so won't cause this again.
  • Now dependencies with suppress_recommendations set to true in the metadata will also have their recommendations and suggestions skipped when installing via ckan install modname.

@HebaruSan HebaruSan added Bug Something is not working as intended Enhancement New features or functionality GUI Issues affecting the interactive GUI Cmdline Issues affecting the command line Core (ckan.dll) Issues affecting the core part of CKAN Tests Issues affecting the internal tests Relationships Issues affecting depends, recommends, etc. labels Sep 29, 2023
@HebaruSan HebaruSan changed the title Parallize for performance improvements, relationship resolver improvements Parallelize for performance improvements, relationship resolver improvements Sep 29, 2023
@HebaruSan HebaruSan changed the title Parallelize for performance improvements, relationship resolver improvements Parallelize for performance, relationship resolver improvements Sep 29, 2023
@HebaruSan HebaruSan force-pushed the feature/PLINQ branch 2 times, most recently from a83391f to 1f67431 Compare September 29, 2023 01:48
@HebaruSan

This comment was marked as outdated.

@HebaruSan HebaruSan added the In progress We're still working on this label Sep 30, 2023
@HebaruSan

This comment was marked as resolved.

@HebaruSan HebaruSan removed the In progress We're still working on this label Oct 3, 2023
@HebaruSan
Copy link
Member Author

OK, the recommendations stuff is in pretty good shape now. ✔️

Copy link
Member

@techman83 techman83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've given it a read over, nothing stands out across the change sets. I did make one comment, but I'll leave that up to your judgement @HebaruSan

Really cool set of work, and can see a bunch of tests included 👏

Core/Logging.cs Show resolved Hide resolved
@HebaruSan HebaruSan merged commit 46c4dec into KSP-CKAN:master Oct 11, 2023
10 checks passed
@HebaruSan HebaruSan deleted the feature/PLINQ branch October 11, 2023 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something is not working as intended Cmdline Issues affecting the command line Core (ckan.dll) Issues affecting the core part of CKAN Enhancement New features or functionality GUI Issues affecting the interactive GUI Relationships Issues affecting depends, recommends, etc. Tests Issues affecting the internal tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants