Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add image cache metadata #50

Merged
merged 8 commits into from
Jan 7, 2019

Conversation

kroymann
Copy link
Contributor

This is still a work-in-progress. I'm creating a PR now so we can discuss the design and make sure this is a reasonable direction to pursue.

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following matches the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

This adds a mechanism to store additional metadata alongside cached images. The first use of this metadata cache is to store the Content-Type of the image so that we don't have to rely on making assumptions based on the filename, but can instead use the actual image format information obtained when originally processing the image.

This adds a mechanism to store additional metadata alongside cached images.  The first use of this metadata cache is to store the Content-Type of the image so that we don't have to rely on making assumptions based on the filename, but can instead use the ACTUAL image format information obtained when originally processing the image.
@@ -146,5 +196,40 @@ public async Task<DateTimeOffset> SetAsync(string key, Stream stream)
/// <param name="key">The cache key.</param>
/// <returns>The <see cref="string"/>.</returns>
private string ToFilePath(string key) => $"{this.Settings[Folder]}/{string.Join("/", key.Substring(0, (int)this.options.CachedNameLength).ToCharArray())}/{key}";

private async Task<Dictionary<string, string>> ReadMetadataFile(Stream stream)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would have to be implementation agnostic.

I'm thinking we create a custom struct that can be easily serialized deserialized into raw bytes that can be used for storing both the content type and last modified date.

CacheMeta
8 bytes | ModifiedDate
2 bytes | Content Type string length
n bytes | Content Type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking with this implementation was to use a file format that provides easy forward/backward compatibility so that additional metadata fields could be easily added in the future (I already have one such use that I would like to discuss if you end up deciding to proceed with this metadata concept). I would have used JSON, but didn't want to take a dependency on a JSON library, so I went with a really simple text-based key/value pair file format. I wasn't too worried about wasting a few extra bytes by having the data stored as text as the size should be trivial relative to the actual image data that it is annotating.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But thinking about it now, I do like the idea of having the metadata contained in a struct, as that would make it possible to add new metadata fields without needing to add any additional methods or change function signatures. Something like:

public struct ImageMetadata
{
    public DateTime LastModifiedUtc;
    public string ContentType;
}

public interface IImageResolver
{
    Task<Stream> OpenReadAsync();
    Task<ImageMetadata> GetImageMetadata();
}

public interface IImageCache
{
    Task<IImageResolver> GetAsync(string key);
    Task<DateTimeOffset> SetAsync(string key, Stream stream, ImageMetadata metadata);
}

Refactor IImageResolver so that the LastModified timestamp and the ContentType string are exposed via a new ImageMetadata class that can be asynchronously retrieved from a resolver.  Similarly, when storing an image into an IImageCache, an ImageMetadata object is supplied along with the raw image stream.

As part of this change, the logic for handling expired cache entries was shuffled around a little bit.  The ImageSharpMiddleware now handles the logic for determining if the image available from the cache is still valid for use or not, and the IImageCache simply acts as a factory for IImageResolver instances.
@kroymann
Copy link
Contributor Author

kroymann commented Jan 2, 2019

I pushed another commit that refactors the ImageMetadata into it's own simple class. For now I kept the simple text-based key/value pair serialization format as it has the benefit of being forward/backward compatible.

One thing to note: While making this change I shuffled around the logic for dealing with expired cache entries. The IImageCache is now purely responsible for generating IImageResolver instances, which in turn expose ImageMetadata containing the LastModified timestamp, and the ImageSharpMiddleware is now responsible for comparing timestamps and determining whether it can use a cached image or not.

@JimBobSquarePants
Copy link
Member

I like the expires changes and the overall shape. I've got some ideas though and am gonna push them directly to your PR. Hopefully you'll see where I'm going here.

@JimBobSquarePants
Copy link
Member

@kroymann Cached response type and extension is now solely the responsibility of the cache. I think this is starting to look good now.

I've gone ahead with the byte aligned struct approach to reduce ImageMetaData file size (we really need this to be as small as possible) we're maxing out at 18 bytes now.

I did notice that I was getting an odd AccessViolationException in ImageMetaData when moving between debug and release mode but I'm assuming that must be something to do with a bug in the CLR somewhere as deleting the cache seemed to fix it. Would certainly like a second opinion though.

@sebastienros What do you think of all this. It certainly seems more robust than before, should be much faster too on network based providers/caches.

@JimBobSquarePants
Copy link
Member

Bugger. System.AccessViolationException looks like the real deal now, I'm getting failed tests on Travis too. I'll try and figure out what I broke in the last commit.

@JimBobSquarePants
Copy link
Member

Ok... Symptoms are as follows.

If the cache is empty I can start the sample application and it will generate images plus metadata. ctrl+F5 works no problem. If I then close the tab and rebuild the application loading all images will fail with a AccessViolationException when parsing the metadata file stream. I'm stumped for now as to why and it's 2am so gonna stop.

@kroymann
Copy link
Contributor Author

kroymann commented Jan 2, 2019

@JimBobSquarePants I'm curious about your rationale for wanting to serialize the metadata into the smallest possible packed binary format. Doing so causes the file format to become tightly coupled to the version of the code which means that if we want to add a new field in the future (and I already have such a use case in mind to discuss after we sort this out), the parsing code will get increasingly complex unless we're going to simply ignore files that were cached using previous versions of the metadata file format. Using a format like Json, or ProtoBuf, or the simple key/value-pair format from the beginning of this PR, the file becomes both forward and backward compatible. And relative to the file size of the image data itself, we're talking about a miniscule difference in disk space usage, right?

@JimBobSquarePants
Copy link
Member

JimBobSquarePants commented Jan 2, 2019

@kroyman what’s this additional data? I really do not want these files to become a dumping ground and do not see any scenario where I would want to add further information to the files.

The parsing code would not be complex as long as we use alignment properly.

My rationale for the layout is performance. I want the files to both be tiny over the wire and fast to load/save without allocations. Cache file comparison should be instant. After a couple of million files any difference will start to add up.

@kroymann
Copy link
Contributor Author

kroymann commented Jan 2, 2019

I was thinking about making it possible to have the Cache-Control Max-Age be something that could vary by image (this is something I already do in my service that uses ImageSharp). In my case, the source images that come from Azure Blob Storage already have Cache-Control values that I need to preserve (some images are immutable and can be cached for a year, others could in theory change and thus should only cache for a day or two).

@JimBobSquarePants
Copy link
Member

@kroymann You got me there, that's a damn fine idea which I have implemented.

I think this is ready to go so please have a good dig through to see if I have missed anything.

@kroymann
Copy link
Contributor Author

kroymann commented Jan 4, 2019

@JimBobSquarePants I reviewed the cache-control change. Looks exactly like what I was envisioning. I gave everything a second read through and it all looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants