Validate downloaded files against metadata before adding to cache #2243
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problems
Despite storing the download file size, SHA1, and SHA256 for every download in the registry, CKAN currently performs no validation checks on downloaded files before it adds them to the download cache. This means that an HTML file or a ZIP for some other mod can be stored in the cache as if it was valid, and CKAN will attempt to use it later if installation is requested.
Now that KSP-CKAN/xKAN-meta_testing#32 has enabled caching of downloads across NetKAN and CKAN-meta pull requests (rather than just within the same request as I initially assumed), caching matters more. KSP-CKAN/NetKAN#6135 nearly got stuck; the author initially uploaded a ZIP in LZMA format, a format which CKAN (and Windows Explorer) cannot read, and that PR errored out. However, the validation log (now overwritten) revealed that CKAN only noticed there was a problem when it tried to install the "(cached)" download! So the code in charge of downloading and adding to the cache considers the file valid, so it won't be re-downloaded, but the code in charge of installing does not, so it can't be installed. Luckily the author uploaded a new version; if he had simply updated the existing version, then a "#rebuild" comment would only have re-run the validation on the already "cached" file, and there would have been no way to fix that PR.
A download cache should provide assurances of data integrity.
Causes
All caching is managed by
NetFileCache
, which operates at a per-URL level. Files downloaded by other components are passed toNetFileCache.Store
along with the source URL, and the URL is used to generate the name for the file in the cache so it can be found later.Notably, this class has no access to any information that could be used to validate the files, and it fails to use even the one check it does have access to, the one that detects the LZMA format (that's how the above Catch-22 came about).
Changes
A new
NetModuleCache
class is added as a wrapper aroundNetFileCache
to safeguard cache consistency at a per-module level. Its public interface is defined in terms ofCkanModule
objects rather thanUri
s, so it always has the information it needs to check whether a given file is valid. Such checks are now performed whenever a file is added to the cache.All code that previously used
NetFileCache
is updated to useNetModuleCache
instead, with one exception: Netkan needs to be able to add things to the cache before their size and hash are known, so it continues to use a raw file cache. In most (all?) cases this was simple as there was already aCkanModule
floating around ready to be used, and we just had to remove ".download
".If
NetFileCache
finds an existing invalid file in the cache (added before this PR), it will delete it. Note that this applies only to problems that are independent of metadata such as the LZMA format issue.The code that calculates SHA1 and SHA256 is moved from Netkan into Core to be shared by whatever code needs it, and the private duplicate copy in ConsoleUI is removed.
NetFileCache.GetCachedZip
no longer has two different validation modes controlled by an optionalbool
; it now always checks ZIP files' CRCs.A new
InvalidModuleFileKraken
class is created to describe problems with downloaded files that CKAN attempted to add to the cache.Since the cache is now a bit more reliable, ConsoleUI is updated to use
IsMaybeCachedZip
when checking if a module is cached for improved response times.Several existing tests are updated to populate
download_hash
in metadata representing the locally stored test ZIPs.New Tests are added:
Fixes #62.
Fixes #1927.
Fixes #2100.