WIP: Add image cache metadata #50

kroymann · 2018-12-30T00:39:20Z

This is still a work-in-progress. I'm creating a PR now so we can discuss the design and make sure this is a reasonable direction to pursue.

Prerequisites

I have written a descriptive pull-request title
I have verified that there are no overlapping pull-requests open
I have verified that I am following matches the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
I have provided test coverage for my change (where applicable)

Description

This adds a mechanism to store additional metadata alongside cached images. The first use of this metadata cache is to store the Content-Type of the image so that we don't have to rely on making assumptions based on the filename, but can instead use the actual image format information obtained when originally processing the image.

This adds a mechanism to store additional metadata alongside cached images. The first use of this metadata cache is to store the Content-Type of the image so that we don't have to rely on making assumptions based on the filename, but can instead use the ACTUAL image format information obtained when originally processing the image.

JimBobSquarePants · 2019-01-01T11:55:13Z

src/ImageSharp.Web/Caching/PhysicalFileSystemCache.cs

@@ -146,5 +196,40 @@ public async Task<DateTimeOffset> SetAsync(string key, Stream stream)
        /// <param name="key">The cache key.</param>
        /// <returns>The <see cref="string"/>.</returns>
        private string ToFilePath(string key) => $"{this.Settings[Folder]}/{string.Join("/", key.Substring(0, (int)this.options.CachedNameLength).ToCharArray())}/{key}";
+
+        private async Task<Dictionary<string, string>> ReadMetadataFile(Stream stream)


This would have to be implementation agnostic.

I'm thinking we create a custom struct that can be easily serialized deserialized into raw bytes that can be used for storing both the content type and last modified date.

CacheMeta
8 bytes | ModifiedDate
2 bytes | Content Type string length
n bytes | Content Type

My thinking with this implementation was to use a file format that provides easy forward/backward compatibility so that additional metadata fields could be easily added in the future (I already have one such use that I would like to discuss if you end up deciding to proceed with this metadata concept). I would have used JSON, but didn't want to take a dependency on a JSON library, so I went with a really simple text-based key/value pair file format. I wasn't too worried about wasting a few extra bytes by having the data stored as text as the size should be trivial relative to the actual image data that it is annotating.

But thinking about it now, I do like the idea of having the metadata contained in a struct, as that would make it possible to add new metadata fields without needing to add any additional methods or change function signatures. Something like:

public struct ImageMetadata { public DateTime LastModifiedUtc; public string ContentType; } public interface IImageResolver { Task<Stream> OpenReadAsync(); Task<ImageMetadata> GetImageMetadata(); } public interface IImageCache { Task<IImageResolver> GetAsync(string key); Task<DateTimeOffset> SetAsync(string key, Stream stream, ImageMetadata metadata); }

Refactor IImageResolver so that the LastModified timestamp and the ContentType string are exposed via a new ImageMetadata class that can be asynchronously retrieved from a resolver. Similarly, when storing an image into an IImageCache, an ImageMetadata object is supplied along with the raw image stream. As part of this change, the logic for handling expired cache entries was shuffled around a little bit. The ImageSharpMiddleware now handles the logic for determining if the image available from the cache is still valid for use or not, and the IImageCache simply acts as a factory for IImageResolver instances.

kroymann · 2019-01-02T07:57:29Z

I pushed another commit that refactors the ImageMetadata into it's own simple class. For now I kept the simple text-based key/value pair serialization format as it has the benefit of being forward/backward compatible.

One thing to note: While making this change I shuffled around the logic for dealing with expired cache entries. The IImageCache is now purely responsible for generating IImageResolver instances, which in turn expose ImageMetadata containing the LastModified timestamp, and the ImageSharpMiddleware is now responsible for comparing timestamps and determining whether it can use a cached image or not.

JimBobSquarePants · 2019-01-02T10:03:05Z

I like the expires changes and the overall shape. I've got some ideas though and am gonna push them directly to your PR. Hopefully you'll see where I'm going here.

JimBobSquarePants · 2019-01-02T13:58:23Z

@kroymann Cached response type and extension is now solely the responsibility of the cache. I think this is starting to look good now.

I've gone ahead with the byte aligned struct approach to reduce ImageMetaData file size (we really need this to be as small as possible) we're maxing out at 18 bytes now.

I did notice that I was getting an odd AccessViolationException in ImageMetaData when moving between debug and release mode but I'm assuming that must be something to do with a bug in the CLR somewhere as deleting the cache seemed to fix it. Would certainly like a second opinion though.

@sebastienros What do you think of all this. It certainly seems more robust than before, should be much faster too on network based providers/caches.

JimBobSquarePants · 2019-01-02T14:02:18Z

Bugger. System.AccessViolationException looks like the real deal now, I'm getting failed tests on Travis too. I'll try and figure out what I broke in the last commit.

JimBobSquarePants · 2019-01-02T14:53:54Z

Ok... Symptoms are as follows.

If the cache is empty I can start the sample application and it will generate images plus metadata. ctrl+F5 works no problem. If I then close the tab and rebuild the application loading all images will fail with a AccessViolationException when parsing the metadata file stream. I'm stumped for now as to why and it's 2am so gonna stop.

kroymann · 2019-01-02T17:06:29Z

@JimBobSquarePants I'm curious about your rationale for wanting to serialize the metadata into the smallest possible packed binary format. Doing so causes the file format to become tightly coupled to the version of the code which means that if we want to add a new field in the future (and I already have such a use case in mind to discuss after we sort this out), the parsing code will get increasingly complex unless we're going to simply ignore files that were cached using previous versions of the metadata file format. Using a format like Json, or ProtoBuf, or the simple key/value-pair format from the beginning of this PR, the file becomes both forward and backward compatible. And relative to the file size of the image data itself, we're talking about a miniscule difference in disk space usage, right?

JimBobSquarePants · 2019-01-02T20:28:46Z

@kroyman what’s this additional data? I really do not want these files to become a dumping ground and do not see any scenario where I would want to add further information to the files.

The parsing code would not be complex as long as we use alignment properly.

My rationale for the layout is performance. I want the files to both be tiny over the wire and fast to load/save without allocations. Cache file comparison should be instant. After a couple of million files any difference will start to add up.

kroymann · 2019-01-02T22:23:09Z

I was thinking about making it possible to have the Cache-Control Max-Age be something that could vary by image (this is something I already do in my service that uses ImageSharp). In my case, the source images that come from Azure Blob Storage already have Cache-Control values that I need to preserve (some images are immutable and can be cached for a year, others could in theory change and thus should only cache for a day or two).

JimBobSquarePants · 2019-01-04T06:45:41Z

@kroymann You got me there, that's a damn fine idea which I have implemented.

I think this is ready to go so please have a good dig through to see if I have missed anything.

kroymann · 2019-01-04T19:18:39Z

@JimBobSquarePants I reviewed the cache-control change. Looks exactly like what I was envisioning. I gave everything a second read through and it all looks good to me.

JimBobSquarePants reviewed Jan 1, 2019

View reviewed changes

JimBobSquarePants added 2 commits January 2, 2019 23:20

Cleanup + use bytes for smallest possible meta file.

4aca727

Cache now generates extension based on correct content type.

8f9e56b

JimBobSquarePants added 2 commits January 3, 2019 01:08

Try removing inlining from Parse

2ff19f2

Use correct string size and reset stream.

9cd3d1b

JimBobSquarePants added 2 commits January 3, 2019 14:34

Use string based metadata

09c6b2a

Allow variable Chache-Control Max-Age

57e7650

JimBobSquarePants merged commit 78e352e into SixLabors:master Jan 7, 2019

kroymann deleted the cache_content_type branch January 31, 2019 19:30

deanmarcussen mentioned this pull request Apr 26, 2019

Delete image and add another one with same name OrchardCMS/OrchardCore#2228

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Add image cache metadata #50

WIP: Add image cache metadata #50

kroymann commented Dec 30, 2018

JimBobSquarePants Jan 1, 2019

kroymann Jan 2, 2019

kroymann Jan 2, 2019

kroymann commented Jan 2, 2019

JimBobSquarePants commented Jan 2, 2019

JimBobSquarePants commented Jan 2, 2019

JimBobSquarePants commented Jan 2, 2019

JimBobSquarePants commented Jan 2, 2019

kroymann commented Jan 2, 2019 •

edited

Loading

JimBobSquarePants commented Jan 2, 2019 •

edited

Loading

kroymann commented Jan 2, 2019

JimBobSquarePants commented Jan 4, 2019

kroymann commented Jan 4, 2019

WIP: Add image cache metadata #50

WIP: Add image cache metadata #50

Conversation

kroymann commented Dec 30, 2018

Prerequisites

Description

JimBobSquarePants Jan 1, 2019

Choose a reason for hiding this comment

kroymann Jan 2, 2019

Choose a reason for hiding this comment

kroymann Jan 2, 2019

Choose a reason for hiding this comment

kroymann commented Jan 2, 2019

JimBobSquarePants commented Jan 2, 2019

JimBobSquarePants commented Jan 2, 2019

JimBobSquarePants commented Jan 2, 2019

JimBobSquarePants commented Jan 2, 2019

kroymann commented Jan 2, 2019 • edited Loading

JimBobSquarePants commented Jan 2, 2019 • edited Loading

kroymann commented Jan 2, 2019

JimBobSquarePants commented Jan 4, 2019

kroymann commented Jan 4, 2019

kroymann commented Jan 2, 2019 •

edited

Loading

JimBobSquarePants commented Jan 2, 2019 •

edited

Loading