Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory cache off-by-one error #396

Merged
merged 4 commits into from
Feb 13, 2018
Merged

Memory cache off-by-one error #396

merged 4 commits into from
Feb 13, 2018

Conversation

bnbalsamo
Copy link
Contributor

Current handling of the in memory cache in InfoCache has an off-by-one in the while loop condition. In cases where size > 0 the cache would (harmlessly, I believe) just be one smaller than stated in the size parameter. In cases where size == 0 we were getting StopIteration exceptions in production, though in my own testing I could only get KeyError: 'dictionary is empty'

This PR changes the while loop condition to strictly >, rather than >=

@bcail
Copy link
Contributor

bcail commented Feb 12, 2018

@bnbalsamo thanks for the PR. Could you add a test for your fix?

brian added 2 commits February 12, 2018 14:19
…nd actually remedied the issues with the RAM cache size == 0 case, as I thought I had in the previous commit. Added tests for these cases.
…here none of the info is cached in RAM. Still does two disk reads in all cases where RAM cache size > 0
@@ -372,8 +372,7 @@ def get(self, request):
info_and_lastmod = (info, lastmod)
logger.debug('Info for %s read from file system', request)
# into mem:
self._dict[request.url] = info_and_lastmod

self.__setitem__(request, info)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setitem writes the content to the file system as well as memory, doesn't it? Don't we only want this information to go into memory in this get method?

Maybe we should split setitem up into two methods, one for writing to disk and one for putting in memory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I see your reasoning here. Writing on every get() would be a lot of writing to disk.

In this case I would be more inclined to tack a kwarg into __setitem__ for this purpose. The __setitem__ and __getitem__ dict-style interface is what is meant to be presented to "the outside world" from this class, correct?

If that is the case (and the cache is meant to be swappable in the wider code base by purpose-built implementations which expose the same interface), I would think that this case would be the only one where bootstrapping something into RAM that already exists on the filesystem would be required - all other cases would require being written both to disk and into RAM, both within the class and in calling code.

@bnbalsamo
Copy link
Contributor Author

Sorry for multiple commits here - let me know if you'd prefer I squash them and re-open a PR.

My initial assumption was incorrect, there wasn't an off by one error in the loop condition, but the case where RAM cache size == 0 was broken.

To fix this the logic for RAM cache addition is now

add to cache --> trim cache
instead of
trim cache --> add to cache

There was also a particularly pesky bypassing of the RAM cache size logic in the getter - that now calls __setitem___ internally instead of handling the ._dict itself.

The final commit (which occurred to me in the middle of writing this comment) minimizes disk reads in the event there is no RAM cache.

There are also now tests for limited RAM cache sizes and no RAM cache.

One potential byproduct of this change is that the previously a call to InfoCache.get() would always deposit the request record in RAM, making subsequent calls to InfoCache.get() fast, it is now the case that calls to .get() can repeatedly be reading from disk (even within the same thread), if cache size is set to 0.

I'm not familiar enough with the entirety of the code base to guess whether or not someone relied on repeated calls to InfoCache.get() to be "RAM-fast" instead of "Disk-fast", but figured I should give a heads up, just in case.

@bcail bcail merged commit 80b707a into loris-imageserver:development Feb 13, 2018
@bcail
Copy link
Contributor

bcail commented Feb 13, 2018

@bnbalsamo thanks. Merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants