-
-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add image cache metadata #50
WIP: Add image cache metadata #50
Conversation
This adds a mechanism to store additional metadata alongside cached images. The first use of this metadata cache is to store the Content-Type of the image so that we don't have to rely on making assumptions based on the filename, but can instead use the ACTUAL image format information obtained when originally processing the image.
@@ -146,5 +196,40 @@ public async Task<DateTimeOffset> SetAsync(string key, Stream stream) | |||
/// <param name="key">The cache key.</param> | |||
/// <returns>The <see cref="string"/>.</returns> | |||
private string ToFilePath(string key) => $"{this.Settings[Folder]}/{string.Join("/", key.Substring(0, (int)this.options.CachedNameLength).ToCharArray())}/{key}"; | |||
|
|||
private async Task<Dictionary<string, string>> ReadMetadataFile(Stream stream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would have to be implementation agnostic.
I'm thinking we create a custom struct that can be easily serialized deserialized into raw bytes that can be used for storing both the content type and last modified date.
CacheMeta
8 bytes | ModifiedDate
2 bytes | Content Type string length
n bytes | Content Type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking with this implementation was to use a file format that provides easy forward/backward compatibility so that additional metadata fields could be easily added in the future (I already have one such use that I would like to discuss if you end up deciding to proceed with this metadata concept). I would have used JSON, but didn't want to take a dependency on a JSON library, so I went with a really simple text-based key/value pair file format. I wasn't too worried about wasting a few extra bytes by having the data stored as text as the size should be trivial relative to the actual image data that it is annotating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But thinking about it now, I do like the idea of having the metadata contained in a struct, as that would make it possible to add new metadata fields without needing to add any additional methods or change function signatures. Something like:
public struct ImageMetadata
{
public DateTime LastModifiedUtc;
public string ContentType;
}
public interface IImageResolver
{
Task<Stream> OpenReadAsync();
Task<ImageMetadata> GetImageMetadata();
}
public interface IImageCache
{
Task<IImageResolver> GetAsync(string key);
Task<DateTimeOffset> SetAsync(string key, Stream stream, ImageMetadata metadata);
}
Refactor IImageResolver so that the LastModified timestamp and the ContentType string are exposed via a new ImageMetadata class that can be asynchronously retrieved from a resolver. Similarly, when storing an image into an IImageCache, an ImageMetadata object is supplied along with the raw image stream. As part of this change, the logic for handling expired cache entries was shuffled around a little bit. The ImageSharpMiddleware now handles the logic for determining if the image available from the cache is still valid for use or not, and the IImageCache simply acts as a factory for IImageResolver instances.
I pushed another commit that refactors the ImageMetadata into it's own simple class. For now I kept the simple text-based key/value pair serialization format as it has the benefit of being forward/backward compatible. One thing to note: While making this change I shuffled around the logic for dealing with expired cache entries. The IImageCache is now purely responsible for generating IImageResolver instances, which in turn expose ImageMetadata containing the LastModified timestamp, and the ImageSharpMiddleware is now responsible for comparing timestamps and determining whether it can use a cached image or not. |
I like the expires changes and the overall shape. I've got some ideas though and am gonna push them directly to your PR. Hopefully you'll see where I'm going here. |
@kroymann Cached response type and extension is now solely the responsibility of the cache. I think this is starting to look good now. I've gone ahead with the byte aligned struct approach to reduce I did notice that I was getting an odd AccessViolationException in @sebastienros What do you think of all this. It certainly seems more robust than before, should be much faster too on network based providers/caches. |
Bugger. |
Ok... Symptoms are as follows. If the cache is empty I can start the sample application and it will generate images plus metadata. ctrl+F5 works no problem. If I then close the tab and rebuild the application loading all images will fail with a |
@JimBobSquarePants I'm curious about your rationale for wanting to serialize the metadata into the smallest possible packed binary format. Doing so causes the file format to become tightly coupled to the version of the code which means that if we want to add a new field in the future (and I already have such a use case in mind to discuss after we sort this out), the parsing code will get increasingly complex unless we're going to simply ignore files that were cached using previous versions of the metadata file format. Using a format like Json, or ProtoBuf, or the simple key/value-pair format from the beginning of this PR, the file becomes both forward and backward compatible. And relative to the file size of the image data itself, we're talking about a miniscule difference in disk space usage, right? |
@kroyman what’s this additional data? I really do not want these files to become a dumping ground and do not see any scenario where I would want to add further information to the files. The parsing code would not be complex as long as we use alignment properly. My rationale for the layout is performance. I want the files to both be tiny over the wire and fast to load/save without allocations. Cache file comparison should be instant. After a couple of million files any difference will start to add up. |
I was thinking about making it possible to have the Cache-Control Max-Age be something that could vary by image (this is something I already do in my service that uses ImageSharp). In my case, the source images that come from Azure Blob Storage already have Cache-Control values that I need to preserve (some images are immutable and can be cached for a year, others could in theory change and thus should only cache for a day or two). |
@kroymann You got me there, that's a damn fine idea which I have implemented. I think this is ready to go so please have a good dig through to see if I have missed anything. |
@JimBobSquarePants I reviewed the cache-control change. Looks exactly like what I was envisioning. I gave everything a second read through and it all looks good to me. |
This is still a work-in-progress. I'm creating a PR now so we can discuss the design and make sure this is a reasonable direction to pursue.
Prerequisites
Description
This adds a mechanism to store additional metadata alongside cached images. The first use of this metadata cache is to store the Content-Type of the image so that we don't have to rely on making assumptions based on the filename, but can instead use the actual image format information obtained when originally processing the image.