Memory cache off-by-one error #396

bnbalsamo · 2018-02-12T18:22:51Z

Current handling of the in memory cache in InfoCache has an off-by-one in the while loop condition. In cases where size > 0 the cache would (harmlessly, I believe) just be one smaller than stated in the size parameter. In cases where size == 0 we were getting StopIteration exceptions in production, though in my own testing I could only get KeyError: 'dictionary is empty'

This PR changes the while loop condition to strictly >, rather than >=

…o caused it to blow up if cache size == 0

bcail · 2018-02-12T18:33:23Z

@bnbalsamo thanks for the PR. Could you add a test for your fix?

…nd actually remedied the issues with the RAM cache size == 0 case, as I thought I had in the previous commit. Added tests for these cases.

…here none of the info is cached in RAM. Still does two disk reads in all cases where RAM cache size > 0

bcail · 2018-02-12T21:00:29Z

loris/img_info.py

@@ -372,8 +372,7 @@ def get(self, request):
                info_and_lastmod = (info, lastmod)
                logger.debug('Info for %s read from file system', request)
                # into mem:
-                self._dict[request.url] = info_and_lastmod
-
+                self.__setitem__(request, info)


setitem writes the content to the file system as well as memory, doesn't it? Don't we only want this information to go into memory in this get method?

Maybe we should split setitem up into two methods, one for writing to disk and one for putting in memory?

Hmm, I see your reasoning here. Writing on every get() would be a lot of writing to disk.

In this case I would be more inclined to tack a kwarg into __setitem__ for this purpose. The __setitem__ and __getitem__ dict-style interface is what is meant to be presented to "the outside world" from this class, correct?

If that is the case (and the cache is meant to be swappable in the wider code base by purpose-built implementations which expose the same interface), I would think that this case would be the only one where bootstrapping something into RAM that already exists on the filesystem would be required - all other cases would require being written both to disk and into RAM, both within the class and in calling code.

bnbalsamo · 2018-02-12T21:02:13Z

Sorry for multiple commits here - let me know if you'd prefer I squash them and re-open a PR.

My initial assumption was incorrect, there wasn't an off by one error in the loop condition, but the case where RAM cache size == 0 was broken.

To fix this the logic for RAM cache addition is now

add to cache --> trim cache
instead of
trim cache --> add to cache

There was also a particularly pesky bypassing of the RAM cache size logic in the getter - that now calls __setitem___ internally instead of handling the ._dict itself.

The final commit (which occurred to me in the middle of writing this comment) minimizes disk reads in the event there is no RAM cache.

There are also now tests for limited RAM cache sizes and no RAM cache.

One potential byproduct of this change is that the previously a call to InfoCache.get() would always deposit the request record in RAM, making subsequent calls to InfoCache.get() fast, it is now the case that calls to .get() can repeatedly be reading from disk (even within the same thread), if cache size is set to 0.

I'm not familiar enough with the entirety of the code base to guess whether or not someone relied on repeated calls to InfoCache.get() to be "RAM-fast" instead of "Disk-fast", but figured I should give a heads up, just in case.

…ery get()

bcail · 2018-02-13T19:17:24Z

@bnbalsamo thanks. Merging.

Fixes an off by one in the handling of the in memory cache, which als…

615e3bb

…o caused it to blow up if cache size == 0

brian added 2 commits February 12, 2018 14:19

Fixed InfoCache getter bypassing size restrictions of the RAM cache a…

2e9ad5f

…nd actually remedied the issues with the RAM cache size == 0 case, as I thought I had in the previous commit. Added tests for these cases.

Minor effeciency improvement - only read from disk once in the case w…

96e54e9

…here none of the info is cached in RAM. Still does two disk reads in all cases where RAM cache size > 0

bcail reviewed Feb 12, 2018

View reviewed changes

per conversation with @bcail in #396 - Avoiding writing to disk on ev…

a02fcca

…ery get()

bcail merged commit 80b707a into loris-imageserver:development Feb 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory cache off-by-one error #396

Memory cache off-by-one error #396

bnbalsamo commented Feb 12, 2018

bcail commented Feb 12, 2018

bcail Feb 12, 2018

bnbalsamo Feb 12, 2018

bnbalsamo commented Feb 12, 2018

bcail commented Feb 13, 2018

Memory cache off-by-one error #396

Memory cache off-by-one error #396

Conversation

bnbalsamo commented Feb 12, 2018

bcail commented Feb 12, 2018

bcail Feb 12, 2018

Choose a reason for hiding this comment

bnbalsamo Feb 12, 2018

Choose a reason for hiding this comment

bnbalsamo commented Feb 12, 2018

bcail commented Feb 13, 2018