Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate downloaded files against metadata before adding to cache #2243

Merged
merged 1 commit into from
Jan 7, 2018

Conversation

HebaruSan
Copy link
Member

@HebaruSan HebaruSan commented Jan 6, 2018

Problems

Despite storing the download file size, SHA1, and SHA256 for every download in the registry, CKAN currently performs no validation checks on downloaded files before it adds them to the download cache. This means that an HTML file or a ZIP for some other mod can be stored in the cache as if it was valid, and CKAN will attempt to use it later if installation is requested.

Now that KSP-CKAN/xKAN-meta_testing#32 has enabled caching of downloads across NetKAN and CKAN-meta pull requests (rather than just within the same request as I initially assumed), caching matters more. KSP-CKAN/NetKAN#6135 nearly got stuck; the author initially uploaded a ZIP in LZMA format, a format which CKAN (and Windows Explorer) cannot read, and that PR errored out. However, the validation log (now overwritten) revealed that CKAN only noticed there was a problem when it tried to install the "(cached)" download! So the code in charge of downloading and adding to the cache considers the file valid, so it won't be re-downloaded, but the code in charge of installing does not, so it can't be installed. Luckily the author uploaded a new version; if he had simply updated the existing version, then a "#rebuild" comment would only have re-run the validation on the already "cached" file, and there would have been no way to fix that PR.

A download cache should provide assurances of data integrity.

Causes

All caching is managed by NetFileCache, which operates at a per-URL level. Files downloaded by other components are passed to NetFileCache.Store along with the source URL, and the URL is used to generate the name for the file in the cache so it can be found later.

Notably, this class has no access to any information that could be used to validate the files, and it fails to use even the one check it does have access to, the one that detects the LZMA format (that's how the above Catch-22 came about).

Changes

A new NetModuleCache class is added as a wrapper around NetFileCache to safeguard cache consistency at a per-module level. Its public interface is defined in terms of CkanModule objects rather than Uris, so it always has the information it needs to check whether a given file is valid. Such checks are now performed whenever a file is added to the cache.

All code that previously used NetFileCache is updated to use NetModuleCache instead, with one exception: Netkan needs to be able to add things to the cache before their size and hash are known, so it continues to use a raw file cache. In most (all?) cases this was simple as there was already a CkanModule floating around ready to be used, and we just had to remove ".download".

If NetFileCache finds an existing invalid file in the cache (added before this PR), it will delete it. Note that this applies only to problems that are independent of metadata such as the LZMA format issue.

The code that calculates SHA1 and SHA256 is moved from Netkan into Core to be shared by whatever code needs it, and the private duplicate copy in ConsoleUI is removed.

NetFileCache.GetCachedZip no longer has two different validation modes controlled by an optional bool; it now always checks ZIP files' CRCs.

A new InvalidModuleFileKraken class is created to describe problems with downloaded files that CKAN attempted to add to the cache.

Since the cache is now a bit more reliable, ConsoleUI is updated to use IsMaybeCachedZip when checking if a module is cached for improved response times.

Several existing tests are updated to populate download_hash in metadata representing the locally stored test ZIPs.

New Tests are added:

  1. Make sure an LZMA format ZIP is not allowed into the cache
  2. Make sure a ZIP with a mismatched file size is not allowed into the cache
  3. Make sure a ZIP with the correct file size but incorrect SHA1 and SHA256 is not allowed into the cache

Fixes #62.
Fixes #1927.
Fixes #2100.

@HebaruSan HebaruSan added Bug Something is not working as intended Network Issues affecting internet connections of CKAN Pull request Core (ckan.dll) Issues affecting the core part of CKAN and removed Bug Something is not working as intended Network Issues affecting internet connections of CKAN labels Jan 6, 2018
@politas politas merged commit 002d797 into KSP-CKAN:master Jan 7, 2018
politas added a commit that referenced this pull request Jan 7, 2018
@politas politas removed Bug Something is not working as intended Network Issues affecting internet connections of CKAN Pull request labels Jan 7, 2018
@HebaruSan HebaruSan deleted the fix/cache-validation branch January 7, 2018 16:54
@techman83
Copy link
Member

Yeah, I only learned enough C# to write generate the hashes, thanks for implementing this fully :)

@HebaruSan
Copy link
Member Author

Ahh, I see. Well, having the hashes in the existing metadata is tremendously helpful, so thanks for that!

@Ruedii
Copy link

Ruedii commented Apr 1, 2018

LZMA is not proprietary.

It is entirely open source.

PKZIP is proprietary. It just happens to have it's reverse engineered design widely available.|

Here is the information on PKZIP's file compression options:
https://en.wikipedia.org/wiki/Zip_(file_format)#Compression_methods

Window's File Explorer, and their internal .Net library is badly broken and should not be used. It ignores the compression type field and presumes deflate. In other words, it implements only the Zip 1.x standard. not the 2.x standard.

This still doesn't change the problems, just the source of the issue. Either way the issues with atypical Zip file variants is upstream in the runtime and it's dependents, and thus far beyond our control so long as we use the runtime handling of Zip files. It would be nice to have a fallback that better handled these formats in the event the default method failed.

@HebaruSan
Copy link
Member Author

@Ruedii, CKAN uses ICSharpCode.SharpZipLib.Zip, not .NET's built-in library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Core (ckan.dll) Issues affecting the core part of CKAN
Projects
None yet
4 participants