Inconsistent local/remote cache hits #534

zeiler · 2024-04-18T00:13:07Z

We have been experimenting with rueidis and noticing odd behaviour that doesn't seem to be captured by the test cases.

In some if our tests we're simply doing a set followed by a get immediately.

builder := rc.rueidisClient.B().Set().Key(key).Value(rueidis.BinaryString(b))
var built rueidis.Completed
// Not sure if this is needed, using the redis client is much easier for expirations.
if expiration >= time.Second {
  built = builder.Ex(expiration).Build()
} else if expiration > 0 && expiration < time.Second {
  built = builder.Px(expiration).Build()
} else {
  built = builder.Build()
}
goErr := rc.rueidisClient.Do(ctx, built).Error()

// Then in another function immediately afterwards we try to get from the cache:

redisResult := rc.rueidisClient.DoCache(ctx, rc.rueidisClient.B().Get().Key(key).Cache(), ttl)
// Note: this scenario happens often and randomly where it says it's a local cache hit but 
// it's a Nil error (ie. redis miss) and redisResults.AsBytes() is also empty.
// So we had to hack in falling back to making the redis call directly. 
if redisResult.IsCacheHit() && redisResult.Error() == rueidis.Nil {
  redisResult = rc.rueidisClient.Do(ctx, rc.rueidisClient.B().Get().Key(key).Build())
}
byteVal, goErr := redisResult.AsBytes()

So a few things are confusing us:

We find that IsCacheHit() and rueidis.Nil are both being returned from DoCache's result, which seems like a scenario that should never happen. To make tests more reliable we've had to fall back to calling redis (which we though DoCache was supposed to do for us). How is this happening?
Additionally, for test cases that have that Set() and Get() immediately afterwards to make them reliable I'm having to add a time.Sleep(time.Second) kind of pause, is that to be expected? When we have sleeps, the above scenario seems to happen less or not at all, but the above scenario shouldn't happen I would think ever regardless of sleeping.
Iin order to invalidate the local caches, does a Del() in rueidis do that or does only Set() do it? In trying to get tests to pass reliably it's not clear if we need a long sleep after Del() and Set() for the invalidation messages to propagate or if you need a Set() command in order to send the invalidation messages.
I looked through many tests cases in rueidis and they have Set() followed by Get() and have sensible hit and miss counters, but that doesn't seem to be the case when using it in practice so I'm wondering if the mocking in the tests is not representative.
Finally, it would be great to have examples to do simple stuff like above. The README is only showing a couple things and it took extensive amounts of time to look through the codebase to see the commands that are possible and how to use them.

Thanks for the help!

The text was updated successfully, but these errors were encountered:

rueian · 2024-04-18T13:14:41Z

Hi @zeiler,

Yes, you may need to wait a little bit for the invalidation to clear the local cache. This is caused by a change in Redis: redis/redis#9422. After that change, all invalidation messages are delivered to rueidis after command replies, resulting in a client.Do(ctx, client.B().Set()...) call to return first and leave the local cache staled.

We find that IsCacheHit() and rueidis.Nil are both being returned from DoCache's result, which seems like a scenario that should never happen.

This is actually a valid scenario. That simply means that you have quired a non-existing key before by using the client.DoCache(...). For example:

result1 := client.DoCache(context.Background(), client.B().Get().Key("key_not_exists").Cache(), time.Second)
result2 := client.DoCache(context.Background(), client.B().Get().Key("key_not_exists").Cache(), time.Second)
result1.IsCacheHit() == false
result1.Error() == rueidis.Nil
result2.IsCacheHit() == true
result2.Error() == rueidis.Nil

Iin order to invalidate the local caches, does a Del() in rueidis do that or does only Set() do it? In trying to get tests to pass reliably it's not clear if we need a long sleep after Del() and Set() for the invalidation messages to propagate or if you need a Set() command in order to send the invalidation messages.

Both will do it. Actually, every write command will do it, even if the command writes to a previously non-existing key. And yes, it takes some time for the invalidation to be propagated back to rueidis due to redis/redis#9422.

I looked through many tests cases in rueidis and they have Set() followed by Get() and have sensible hit and miss counters, but that doesn't seem to be the case when using it in practice so I'm wondering if the mocking in the tests is not representative.

I believe you are looking at the case that keys have never been retrieved before. For example, the following script SETs a newly generated key and GETs it immediately with DoCache() in each iteration. No pause is needed.

for i := 0; i < 100000; i++ {
	key := strconv.Itoa(rand.Int())
	if err := client.Do(context.Background(), client.B().Set().Key(key).Value(key).Build()).Error(); err != nil {
		panic(err)
	}
	result := client.DoCache(context.Background(), client.B().Get().Key(key).Cache(), time.Second)
	if result.IsCacheHit() {
		panic("should not hit")
	}
	if val, err := result.ToString(); err != nil {
		panic(err)
	} else if val != key {
		panic("key not equal")
	}
}

Finally, it would be great to have examples to do simple stuff like above. The README is only showing a couple things and it took extensive amounts of time to look through the codebase to see the commands that are possible and how to use them.

That is a good idea. Maybe the following example is helpful:

package main

import (
	"context"
	"fmt"
	"math/rand"
	"strconv"
	"time"

	"github.com/redis/rueidis"
)

func main() {
	client, err := rueidis.NewClient(rueidis.ClientOption{InitAddress: []string{"127.0.0.1:6379"}})
	if err != nil {
		panic(err)
	}
	defer client.Close()

	for i := 0; i < 10000; i++ {
		key := strconv.Itoa(rand.Int())
		val := key

		fetch1 := client.DoCache(context.Background(), client.B().Get().Key(key).Cache(), time.Second)
		if !rueidis.IsRedisNil(fetch1.Error()) {
			panic("duplicated random key")
		}
		if fetch1.IsCacheHit() {
			panic("the first fetch should not hit the local cache")
		}
		fetch2 := client.DoCache(context.Background(), client.B().Get().Key(key).Cache(), time.Second)
		if !fetch2.IsCacheHit() || !rueidis.IsRedisNil(fetch1.Error()) {
			panic("the second fetch should hit the local cache and it should return redis nil")
		}
		if err := client.Do(context.Background(), client.B().Set().Key(key).Value(key).Build()).Error(); err != nil {
			panic("failed to set a previously non-existing key: " + err.Error())
		}
		for {
			time.Sleep(time.Millisecond)
			fetch3 := client.DoCache(context.Background(), client.B().Get().Key(key).Cache(), time.Second)
			if fetch3.IsCacheHit() {
				fmt.Println("got staled value from the local cache of key", key)
				continue
			}
			if v, err := fetch3.ToString(); err != nil {
				panic(err)
			} else if v != val {
				panic("val and v are not equal")
			}
			break
		}
	}
}

zeiler · 2024-04-19T03:24:21Z

This is super helpful. I think I now understand what is happening. We are using this as a cache around a DB call. So we have a wrapper class that has a Get() method that basically does:

func (w *wrapper) Get(ctx, key, dbMissFunc) {
  fetch1 := client.DoCache(ctx, client.B().Get().Key(key).Cache(), time.Second)
  if fetch1.IsCacheHit() && fetch1.Error() == rueidis.Nil {
     // Note: 1, try fetching from redis directly, added this to combat the race condition.
     fetch1 = client.Do(ctx, client.B().Get().Key(key).Build())
  }
  if fetch1.Error == rueidis.Nil { // missed both locally and in redis
    dbVal := dbMissFunc(cox, key)
    set1 := client.Do(ctx, client.B().Set().Key(key).Value(key).Build())
    return dbVal
  } else {
    return rueidis value
  }
}

And in tests were calling wrapper.Get() multiple times which I think confirms you scenario of DoCache being called first with a non-existing key (in which case that client.Do at Note: 1 would not be called and also return Nil), then Set happens and the wrapper.Get() is called again in which case sometimes that Note: 1 section happens and other times it doesn’t. I think that’s just a race condition in the test between the invalidation message reaching the client so that the DoCache() is not a hit anymore. Adding sleep between the two wrapper.Get() calls helped the tests become stable. Do you recommend we keep that second redis Do() call in this scenario? I'm leaning towards yes since we shouldn’t really have an invalid key more than once, so when it comes back from local cache as a hit and Nil it’s likely available in redis directly by that point? This would prevent hitting the DB as well since Set() should have put the DB's information into redis. I think for our use case the might be better than the for {} loop around fetch3 with a continue like you have in the example below.

I’m sure the aside implementation is better than this but it seems to fail on redis 6.

Super helpful to see the example!

rueian · 2024-04-19T12:42:49Z

Normally, I will just do:

func (w *wrapper) Get(ctx, key, dbMissFunc) {
  val, err := client.DoCache(ctx, client.B().Get().Key(key).Cache(), time.Second).ToString()
  if err == rueidis.Nil { // missed both locally and in redis
    dbVal := dbMissFunc(ctx, key)
    set1 := client.Do(ctx, client.B().Set().Key(key).Value(key).Build())
    return dbVal, nil
  }
  return val, err
}

The above approach may have some stampedes on the dbMissFunc(), but it is usually fine for trading simplicity. If stampedes are problems, then rueidisaside can help. We could make rueidisaside work with Redis 6 as well.

On the other hand, I think your approach is also good. Although the staled local cache will usually be invalidated after a millisecond, a direct client.Do() would not cost too much as well.

zeiler · 2024-04-24T03:30:56Z

Ah ok makes sense, thanks!

zeiler closed this as completed Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent local/remote cache hits #534

Inconsistent local/remote cache hits #534

zeiler commented Apr 18, 2024

rueian commented Apr 18, 2024

zeiler commented Apr 19, 2024

rueian commented Apr 19, 2024

zeiler commented Apr 24, 2024

Inconsistent local/remote cache hits #534

Inconsistent local/remote cache hits #534

Comments

zeiler commented Apr 18, 2024

rueian commented Apr 18, 2024

zeiler commented Apr 19, 2024

rueian commented Apr 19, 2024

zeiler commented Apr 24, 2024