Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve 145: add option to ignore URL host in cache key #147

Closed
wants to merge 5 commits into from

Conversation

yringler
Copy link

@yringler yringler commented Mar 23, 2021

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following matches the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

@CLAassistant
Copy link

CLAassistant commented Mar 23, 2021

CLA assistant check
All committers have signed the CLA.

@yringler
Copy link
Author

I wasn't able to find tests for the ImageSharpMiddleware, so I didn't add a test.
I did modify the sample Startup.cs (just locally) to verify manually that

  1. By default, different hosts have a different cache folder (I added a .Use() to set the Request.Host)
  2. When the option is set, different hosts still use the same cache folder.

@codecov
Copy link

codecov bot commented Mar 23, 2021

Codecov Report

Merging #147 (378178f) into master (52bcb16) will increase coverage by 0.30%.
The diff coverage is 100.00%.

❗ Current head 378178f differs from pull request most recent head ca93a8c. Consider uploading reports for the commit ca93a8c to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master     #147      +/-   ##
==========================================
+ Coverage   84.50%   84.80%   +0.30%     
==========================================
  Files          50       46       -4     
  Lines        1426     1369      -57     
  Branches      190      178      -12     
==========================================
- Hits         1205     1161      -44     
+ Misses        166      160       -6     
+ Partials       55       48       -7     
Flag Coverage Δ
unittests 84.80% <100.00%> (+0.30%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../ImageSharp.Web/Middleware/ImageSharpMiddleware.cs 87.87% <100.00%> (+1.95%) ⬆️
...harp.Web/Middleware/ImageSharpMiddlewareOptions.cs 90.19% <100.00%> (+0.19%) ⬆️
...DependencyInjection/ImageSharpBuilderExtensions.cs 80.00% <0.00%> (-0.31%) ⬇️
...s/PresetOnlyQueryCollectionRequestParserOptions.cs
src/ImageSharp.Web/Middleware/ImageWorkerResult.cs
...Commands/PresetOnlyQueryCollectionRequestParser.cs
...p.Web/Middleware/ConcurrentDictionaryExtensions.cs

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 52bcb16...ca93a8c. Read the comment docs.

@yringler
Copy link
Author

I see that code coverage is down, but it's tricky for me to find the tests responsible because I can't run all the tests.
PhysicalFileSystemCacheServerTests (And AzureBlob) need a localhost server to be running, but there doesn't seem to be a test server?

@JimBobSquarePants
Copy link
Member

Thanks @yringler !!

For the tests you need to be running the Azure Storage Emulator
https://github.com/SixLabors/ImageSharp.Web#running-the-tests

If you are working on a Windows machine with Visual Studio with the Azure SDK you've likely got an emulator installed which you can find via Start + typing Azure...

The current cache tests are here.
https://github.com/SixLabors/ImageSharp.Web/tree/d0559520245dffc484cd7fc77ae35fd20f0ff3b9/tests/ImageSharp.Web.Tests/Processing

@@ -492,9 +492,9 @@ private ValueTask StreamDisposeAsync(Stream stream)
}
}

private static string GetUri(HttpContext context, IDictionary<string, string> commands)
private static string GetUri(HttpContext context, IDictionary<string, string> commands, bool ignoreHost)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick note that this code can be much improved. We know the size of all strings, so we can allocate the right buffer size instead of growing it in the SB.
Also using the Format("{0}/") is too much for a simple concatenation.
Only QueryString.Create(commands) needs to be done before to know what will be the size.
Should I file an issue or not worth it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's a valid point. Please file an issue. I'm up for any optimizations we can through at the library.

@yringler
Copy link
Author

I'll be out for the next week and a bit (Passover), so I won't get to the unit tests before that.
If you choose to take care of the tests, I'd be much obliged!

@deanmarcussen
Copy link
Collaborator

Can I just offer a thought as well. Not sure if this would be best suited via a setting, or by putting this method into an interface, and making it a replaceable component to meet any variety of use cases.

In any kind of multi tenant scenario, where the host name actually means a different tenant / image source it's a little dangerous, easy to set a setting and have two tenants serving the same cached image, when the image source is actually different.

I'm 50/50 on the fence about it though, so will leave design decisions up to @JimBobSquarePants

@JimBobSquarePants
Copy link
Member

In any kind of multi tenant scenario, where the host name actually means a different tenant / image source it's a little dangerous, easy to set a setting and have two tenants serving the same cached image, when the image source is actually different.

I get your point here @deanmarcussen a setting is a little too easy to abuse.

I'll have a look at rewriting the PR to introduce two implementations of a as yet unnamed interface that can be injected to allow this. It's a fair amount of work but worth the effort.

@yringler
Copy link
Author

yringler commented Apr 7, 2021

I added tests for the new option...
But this may need some touching up by a maintainer to get it landed.

Perhaps it shouldn't be necessary to set up a whole test server to unit test a config?
I think that gets back to

I'll have a look at rewriting the PR to introduce two implementations of a as yet unnamed interface that can be injected to allow this. It's a fair amount of work but worth the effort.

In the meantime, this is what I'm able to put together, in lieu of / until the major refactor that you're suggesting

@JimBobSquarePants
Copy link
Member

It’ll be a few days til I can look at this. Sick at the moment with oral shingles.

@JimBobSquarePants
Copy link
Member

@yringler I haven't forgotten about this. Had a bout of ill health over the last 6 weeks so having to carefully choose my activities in a limited time frame.

@yringler
Copy link
Author

Oy. Feel better soon!

@ronaldbarendse
Copy link
Contributor

This issue kind of surprised me, as it's very common to have your site served from both the www. and root domain (and use canonical HTML tags to point to the preferred one, instead of redirecting). This will therefore also result in duplicate cached images, which I confirmed by testing on a local site myself (just swapping localhost with your local PC name or IP)...

Wouldn't it make a lot more sense to only use the path, as the default setup/middleware configuration doesn't support a multi-tenant setup anyway? If you want to add that support, you already need to change the default configuration, right?

Also note that the Request.Host property might contain the port, so it's a bit ambiguous what the new ignoreHost parameter might actually ignore.


The easy/quickest fix would be to change ICacheHash.Create(string value, uint length) to ICacheHash.Create(Uri uri, uint length) and let the ImageSharpMiddleware generate the absolute URI based on the HTTP context and parsed/known commands. However, this assumes both the IRequestParser and IImageProvider only use the URI to parse/provide the returned value. If you want to return a different image based on whether e.g. a user is logged in or not, this wouldn't work!

So I would suggest adding a new interface that generates the cache key, so this can be separated from the existing ICacheHash implementation/responsibility:

using Microsoft.AspNetCore.Http;
using SixLabors.ImageSharp.Web.Commands;

namespace SixLabors.ImageSharp.Web.Caching
{
    public interface ICacheKey
    {
        string Create(HttpContext context, CommandCollection commands);
    }
}

And a default implementation that uses the relative path:

using Microsoft.AspNetCore.Http;
using Microsoft.AspNetCore.Http.Extensions;
using SixLabors.ImageSharp.Web.Commands;

namespace SixLabors.ImageSharp.Web.Caching
{
    public class RelativePathLowercaseCacheKey : ICacheKey
    {
        public string Create(HttpContext context, CommandCollection commands)
            => UriHelper.BuildRelative(context.Request.PathBase, context.Request.Path, QueryString.Create(commands)).ToLowerInvariant();
    }
}

Note: I'm already using the new CommandCollection that's part of another PR. And the implementation name should also indicate it's using a lowercased value, as some servers/OSes are case sensitive.

@JimBobSquarePants
Copy link
Member

JimBobSquarePants commented Jan 27, 2022

If you just use the path that can (will absolutely because people are bad at naming things) increase the possibility of collision.

I’m curious as to why you say the default setup/middleware doesn’t support multi tenancy because I’m using it out of the box with a multi tenant website and a custom provider.

I do like your idea though just wouldn't ever use the relative path as the default implementation since that would invalidate the cache for any current installs.

@ronaldbarendse
Copy link
Contributor

If you just use the path that can (will absolutely because people are bad at naming things) increase the possibility of collision.

ICacheKey should only be concerted about what makes the request unique (only the relative path, include hostname, username, etc.). ICacheHash is still responsible for ensuring that value doesn't have a high possibility of collision.

I’m curious as to why you say the default setup/middleware doesn’t support multi tenancy because I’m using it out of the box with a multi tenant website and a custom provider.

Yes, so you had to use a custom provider to support multi tenancy (the default setup, using just AddImageSharp() doesn't allow that OOTB, e.g. by enabling some settings). If your custom provider uses something other than the URI to resolve to an image, that should be reflected in the cache key (which currently isn't the case).

In that sense, maybe the responsibility of returning the correct cache key should be added to IImageProvider? This implementation doesn't have to be aware of the commands though, so it would only return a unique cache key for the source image (e.g. GetAsync(HttpContext context, out string cacheKey)). The ImageSharpMiddleware should then add the commands to this key to make it unique for the processed image... I think that would make even more sense and remove the need to add/set an additional ICacheKey implementation.

@JimBobSquarePants
Copy link
Member

You're more on the right track with your new interface idea, I really like it.

Although I think you're misunderstanding what I mean by collisions.

ICacheHash can and will only hash the input value. Let's consider a scenario where a naïve multitenancy implementation doesn't ensure that relative urls are unique even when the image should be (the inverse of the scenario in #145).

[protocol][domainA]/path/image.jpg
[protocol][domainB]/path/image.jpg

This will generate the same hash since the relative paths are not unique.

There are implementations like this, I've seen them, they are shockingly numerous, and they are hideous.

IImageProvider should never be responsible for returning a url. That would be a major violation of concern crossing.

In my working implementation media item request use guids not slugified filenames to prevent collision and queries are prefiltered using EF Core query filters so I return true/false based ion the result of a query. It's a quick and easy way to handle multitenancy done right.

So back to the interface. I wouldn't change ICacheHash because that introduces a string/uri/string conversion I'd like to avoid and as pointed out above we can heavily optimize our string creation here using string.Create since we know the exact size of the string we require.

I would then create two implementations of ICacheKey.

  • AbsolutePathCacheKey (default so we don't break current cache implementations)
  • RelativePathCacheKey

If I could turn back time I'd definitely use a relative path as the default (naïve implementations be damned!) but unfortunately I think we're stuck. There's breaking changes and then there's breaking changes. (However, I could maybe be convinced as I don't believe anyone is paying for me to maintain this library)

@ronaldbarendse
Copy link
Contributor

Creating different cache keys and thereby invalidating all existing cached files would indeed be a functional change, especially because the cached files aren't automatically cleaned up! It wouldn't really break sites that upgrade though...

I've abstracted/implemented the existing cache key generation in PR #206 and also added some example/reference implementations.

If this gets released as part of v2, changing the default cache key implementation to use a lowercased relative URI wouldn't be a big issue, right? Upgrading will probably already require some code changes and adding a release note to advise users to manually clean up their image cache (or switch back to the v1 implementation) should be enough. Otherwise it will just re-generate the images and use up a bit more disk space (and potentially result in colliding with previously generated cache hashes for different images).

Regarding that last line: users should already always manually clean up their image cache after changing either the cache hash implementation anyway! So the same applies for changing the cache key implementation.

If I could turn back time I'd definitely use a relative path as the default (naïve implementations be damned!) but unfortunately I think we're stuck. There's breaking changes and then there's breaking changes. (However, I could maybe be convinced as I don't believe anyone is paying for me to maintain this library)

Yeah, would be nice if you could go back and fix things before they bit you in the *ss 😆 Umbraco is sponsoring SixLabors on a monthly basis, although I guess that only pays for the coffee for the whole team 😅

@JimBobSquarePants
Copy link
Member

Newly merged #206 replaces this. Thanks everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants