Inconsistent reads for R3 KV #4710

cbrewster · 2023-10-26T02:05:59Z

Observed behavior

We have an R3 KV and we recently upgraded to NATS 2.10.3 from 2.9. We often use CAS operations to make updates to keys in the bucket. We'll read a key, make modifications, then update the key with the revision that was originally read. This has been working great until we upgraded to 2.10 and we started to see errors like nats: wrong last sequence: 835882. We also started to notice other inconsistent behavior on systems that read from the KV.

When investigating further we were able to see inconsistent reads from the same NATS server for the same KV key. Here's an example of what I was observing:

$ nats req '$JS.API.DIRECT.GET.KV_deployments.$KV.deployments.6fa09715-c135-4cb0-8b7b-b7e5e49075b1' ''
19:58:23 Sending request on "$JS.API.DIRECT.GET.KV_deployments.$KV.deployments.6fa09715-c135-4cb0-8b7b-b7e5e49075b1"
19:58:24 Received with rtt 43.167285ms
19:58:24 Nats-Stream: KV_deployments
19:58:24 Nats-Subject: $KV.deployments.6fa09715-c135-4cb0-8b7b-b7e5e49075b1
19:58:24 Nats-Sequence: 835937
19:58:24 Nats-Time-Stamp: 2023-10-25T21:06:45.163089148Z
19:58:24 Nats-Expected-Last-Subject-Sequence: 832643

<message redacted>

$ nats req '$JS.API.DIRECT.GET.KV_deployments.$KV.deployments.6fa09715-c135-4cb0-8b7b-b7e5e49075b1' ''
19:58:27 Sending request on "$JS.API.DIRECT.GET.KV_deployments.$KV.deployments.6fa09715-c135-4cb0-8b7b-b7e5e49075b1"
19:58:27 Received with rtt 41.753095ms
19:58:27 Nats-Time-Stamp: 2023-10-25T23:17:08.937511603Z
19:58:27 Nats-Expected-Last-Subject-Sequence: 836168
19:58:27 Nats-Stream: KV_deployments
19:58:27 Nats-Subject: $KV.deployments.6fa09715-c135-4cb0-8b7b-b7e5e49075b1
19:58:27 Nats-Sequence: 836176

<message redacted>

While this view doesn't show the servers, we have another command which provides a UI on top of NATS for our use-case and it was able to demonstrate this inconsistency within the same server.

$ deployctl deployment get 6fa09715-c135-4cb0-8b7b-b7e5e49075b1
server: nats-0
revision: 835937 2023-10-25 16:06:45.163089148 -0500 CDT
...

$ deployctl deployment get 6fa09715-c135-4cb0-8b7b-b7e5e49075b1
server: nats-0
revision: 836176 2023-10-25 18:17:08.937511603 -0500 CDT
...

This only started after we upgraded to 2.10.x earlier this week so we think this is likely a regression.

We tried to roll back to 2.9.23, but it seems that there's some data format inconsistencies that prevented us from doing so:

[18] 2023/10/26 01:11:01.715960 [WRN] Error applying entries to 'A... > KV_deployments': invalid character '*' looking for beginning of value
[18] 2023/10/26 01:11:01.716092 [WRN] Error applying entries to 'A... > KV_deployments': invalid character '*' looking for beginning of value

This has been causing a lot of weird errors for us since the data can be stale for many hours. We've seen a NATS server restart help the problem but it doesn't prevent the issue from happening again.

KV Config:

Information for Stream KV_deployments created 2023-03-30 09:38:22

             Subjects: $KV.deployments.>
             Replicas: 3
              Storage: File

Options:

            Retention: Limits
     Acknowledgements: true
       Discard Policy: New
     Duplicate Window: 2m0s
           Direct Get: true
    Allows Msg Delete: false
         Allows Purge: true
       Allows Rollups: true

Limits:

     Maximum Messages: unlimited
  Maximum Per Subject: 1
        Maximum Bytes: unlimited
          Maximum Age: unlimited
 Maximum Message Size: unlimited
    Maximum Consumers: unlimited


Cluster Information:

                 Name: deployments-us-central1
               Leader: nats-2
              Replica: nats-0, current, seen 0.21s ago
              Replica: nats-1, current, seen 0.21s ago

State:

             Messages: 10,040
                Bytes: 2.4 MiB
             FirstSeq: 367 @ 2023-03-30T15:06:59 UTC
              LastSeq: 836,276 @ 2023-10-26T02:18:41 UTC
     Deleted Messages: <redacted>
     Active Consumers: 4
   Number of Subjects: <redacted>

Expected behavior

We'd expect the same NATS server to respond with consistent values after being written and not show stale values minutes/hours after they were updated.

Server and client version

Server: 2.10.3

Host environment

We're running NATS on GKE using the 0.1.x helm chart

Steps to reproduce

Unfortunately we do not have exact steps to reproduce at the moment, but we will attempt to find a reproduction soon. Help before then would be appreciated. Either to fix forward on 2.10.x or to be able to roll back to 2.9.x until the fix is in place.

The text was updated successfully, but these errors were encountered:

derekcollison · 2023-10-26T03:21:39Z

One option to sync them all to the current leader would be to scale to 1 and back up to 3 for the KV itself.

nats stream update KV_deployments --replicas=1

Then scale back up.

nats stream update KV_deployments --replicas=3

jnmoyne · 2023-10-26T07:09:44Z

If the issue is due to the use of direct gets one possible easy and quick workaround would be to edit the stream's config to turn direct gets off.

Anyways, If you know there's a likelihood of a single key being modified by two clients at the same time, and therefore want to use CAS to control the concurrent writes to the key, then there's a general argument that could be made that you do not want to use direct gets as you want to make sure you always get the latest possible value for the key when doing your get before the update.

cbrewster · 2023-10-26T08:47:57Z

One option to sync them all to the current leader would be to scale to 1 and back up to 3 for the KV itself.

Initially I thought this was a replication issue, but I was able to see the same node flip back and forth between the two different responses (one current and one stale).

If the issue is due to the use of direct gets one possible easy and quick workaround would be to edit the stream's config to turn direct gets off.

Anyways, If you know there's a likelihood of a single key being modified by two clients at the same time, and therefore want to use CAS to control the concurrent writes to the key, then there's a general argument that could be made that you do not want to use direct gets as you want to make sure you always get the latest possible value for the key when doing your get before the update.

We also have a service which subscribes to the KV using a Watcher and keeps its own version of the table in memory. We noticed that after the upgrade, these could also have stale records (hours old). We do have some retries around CAS operations but we can try disabling direct get. It just seems odd to me that the stale records are so old, I wouldn’t expect direct get to be causing the issue I’m seeing, but maybe I misunderstand how it works?

mgross-ebner · 2023-10-26T08:56:14Z

I'm not sure if it is related, but we are seeing error messages when updating values (nats-server 2.10.2):

➜  ~ nats kv add demo
Information for Key-Value Store Bucket demo created 2023-10-26T10:54:39+02:00

Configuration:

          Bucket Name: demo
         History Kept: 1
        Values Stored: 0
   Backing Store Kind: JetStream
          Bucket Size: 0 B
  Maximum Bucket Size: unlimited
   Maximum Value Size: unlimited
          Maximum Age: unlimited
     JetStream Stream: KV_demo
              Storage: File

Cluster Information:

                 Name: C1
               Leader: n1-c1
➜  ~ nats kv create demo k1 v1
v1
➜  ~ nats kv update demo k1 v2
nats: error: nats: wrong last sequence: 1

ripienaar · 2023-10-26T09:01:25Z

The last command needs to be nats kv update demo k1 v2 1 the 1 here is the revision of the previous value.

[rip@p1-lon]% nats kv create demo k1 v1
v1
~
[rip@p1-lon]% nats kv update demo k1 v2
nats: error: nats: wrong last sequence: 1

[rip@p1-lon]% nats kv update demo k1 v2 1
v2

I'll imrpove the error there

cbrewster · 2023-10-26T09:20:27Z

For some more info, we've discovered the 3 nodes have different data for these streams and are not self healing:

(via curl "nats-{1,2,3}.nats:8222/jsz?streams=true")

nats-0

        {
          "name": "KV_deployments",
          "created": "2023-03-30T14:38:22.515826553Z",
          "cluster": {
            "name": "deployments-us-central1",
            "leader": "nats-2",
            "replicas": [
              {
                "name": "nats-1",
                "current": false,
                "active": 0,
                "lag": 803083,
                "peer": "yrzKKRBu"
              },
              {
                "name": "nats-2",
                "current": true,
                "active": 91107098,
                "lag": 803083,
                "peer": "cnrtt3eg"
              }
            ]
          },
          "state": {
            "messages": 10108,
            "bytes": 2549966,
            "first_seq": 367,
            "first_ts": "2023-03-30T15:06:59.251496804Z",
            "last_seq": 836453,
            "last_ts": "2023-10-26T09:08:23.456285522Z",
            "num_subjects": 10108,
            "num_deleted": 825979,
            "consumer_count": 1
          }
        },

nats-1

        {
          "name": "KV_deployments",
          "created": "2023-03-30T14:38:22.515826553Z",
          "cluster": {
            "name": "deployments-us-central1",
            "leader": "nats-2",
            "replicas": [
              {
                "name": "nats-0",
                "current": false,
                "active": 0,
                "lag": 803083,
                "peer": "S1Nunr6R"
              },
              {
                "name": "nats-2",
                "current": true,
                "active": 719890186,
                "lag": 803083,
                "peer": "cnrtt3eg"
              }
            ]
          },
          "state": {
            "messages": 10100,
            "bytes": 2541574,
            "first_seq": 367,
            "first_ts": "2023-03-30T15:06:59.251496804Z",
            "last_seq": 836361,
            "last_ts": "2023-10-26T09:08:23.324863022Z",
            "num_subjects": 10100,
            "num_deleted": 825895,
            "consumer_count": 1
          }
        },

nats-2

        {
          "name": "KV_deployments",
          "created": "2023-03-30T14:38:22.515826553Z",
          "cluster": {
            "name": "deployments-us-central1",
            "leader": "nats-2",
            "replicas": [
              {
                "name": "nats-0",
                "current": true,
                "active": 287742819,
                "peer": "S1Nunr6R"
              },
              {
                "name": "nats-1",
                "current": true,
                "active": 287704603,
                "peer": "yrzKKRBu"
              }
            ]
          },
          "state": {
            "messages": 10107,
            "bytes": 2549849,
            "first_seq": 367,
            "first_ts": "2023-03-30T15:06:59.251496804Z",
            "last_seq": 836453,
            "last_ts": "2023-10-26T09:08:23.456285522Z",
            "num_subjects": 10107,
            "num_deleted": 825980,
            "consumer_count": 2
          }
        },

cbrewster · 2023-10-26T09:46:30Z

From our metrics we do observe that our replicas got out of sync on Monday. I did as @derekcollison mentioned to scale down to R1 and then back to R3 and it was able to re-synchronize. Though a bit concerning that we can enter this state and not recover without manual intervention.

nats_stream_total_messages{stream_name="KV_deployments"}

Around this time there were some node restarts and some logs about resetting of WAL state

derekcollison · 2023-10-26T14:58:53Z

Agree on drifting state, was trying to get you guys unblocked.

Did the scale down and up help?

cbrewster · 2023-10-26T15:17:22Z

Agree on drifting state, was trying to get you guys unblocked.

Did the scale down and up help?

Yup, all the replicas are tracking the primary as expected now

derekcollison · 2023-10-26T16:39:55Z

ok good. Other option is to snapshot and restore but that involves some minor downtime that I did not want you to incur.

We can not guarantee the drift will not happen again, but we feel with 2.10.3 and 2.10.4 coming today we are in a good spot and have fixed quite a few issues. My sense is the issue was there before the upgrade and persisted, hence my recommendation.

cbrewster · 2023-10-26T18:10:55Z

We can not guarantee the drift will not happen again, but we feel with 2.10.3 and 2.10.4 coming today we are in a good spot and have fixed quite a few issues. My sense is the issue was there before the upgrade and persisted, hence my recommendation.

We were fully on 2.10.3 at the time the desync happened, shown in the graph above. Are there known issues that will be fixed by 2.10.4?

tommyjcarpenter · 2024-06-12T21:43:56Z

We have observed a different problem, on 2.10.11, but where the solution was exactly the same:

nats stream update KV_x --replicas=1

nats stream update KV_x --replicas=3

The issue we observed is that a 3 node deployment would report inconsistent KV counts - it turned out one of the 3 pods was not syncing:

nats -s "infra-nats.xxx:4222" kv ls  active-controllers --creds tommy.creds | wc -l 
246
nats -s "infra-nats.xxx:4222" kv ls  active-controllers --creds tommy.creds | wc -l 
246
nats -s "infra-nats.xxx:4222" kv ls  active-controllers --creds tommy.creds | wc -l 
1

So the same call made multiple times would return different results (or no data) depending on which of the 3 pods you hit. Scaling down the stream replicas to 1, and then back up to 3, forces a resync.

Sorry for piggybacking; I didnt want to create a new issue just yet, because the solution given above worked.

wallyqs · 2024-08-28T04:26:24Z

Fixed via #5821 on v2.10.19

tommyjcarpenter · 2024-08-28T10:38:58Z

Thanks much!
Do you know when these releases typically make their way to the official helm repo?

wallyqs · 2024-08-28T18:45:42Z

@tommyjcarpenter the helm chart with the updated version is now available: https://github.com/nats-io/k8s/releases/tag/nats-1.2.3

cbrewster added the defect Suspected defect such as a bug or regression label Oct 26, 2023

cbrewster changed the title ~~Inconsistent values for R3 KV~~ Inconsistent reads for R3 KV Oct 26, 2023

wallyqs closed this as completed Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent reads for R3 KV #4710

Inconsistent reads for R3 KV #4710

cbrewster commented Oct 26, 2023 •

edited

Loading

derekcollison commented Oct 26, 2023

jnmoyne commented Oct 26, 2023 •

edited

Loading

cbrewster commented Oct 26, 2023

mgross-ebner commented Oct 26, 2023 •

edited

Loading

ripienaar commented Oct 26, 2023

cbrewster commented Oct 26, 2023

cbrewster commented Oct 26, 2023 •

edited

Loading

derekcollison commented Oct 26, 2023

cbrewster commented Oct 26, 2023

derekcollison commented Oct 26, 2023

cbrewster commented Oct 26, 2023

tommyjcarpenter commented Jun 12, 2024

wallyqs commented Aug 28, 2024

tommyjcarpenter commented Aug 28, 2024

wallyqs commented Aug 28, 2024

Inconsistent reads for R3 KV #4710

Inconsistent reads for R3 KV #4710

Comments

cbrewster commented Oct 26, 2023 • edited Loading

Observed behavior

Expected behavior

Server and client version

Host environment

Steps to reproduce

derekcollison commented Oct 26, 2023

jnmoyne commented Oct 26, 2023 • edited Loading

cbrewster commented Oct 26, 2023

mgross-ebner commented Oct 26, 2023 • edited Loading

ripienaar commented Oct 26, 2023

cbrewster commented Oct 26, 2023

cbrewster commented Oct 26, 2023 • edited Loading

derekcollison commented Oct 26, 2023

cbrewster commented Oct 26, 2023

derekcollison commented Oct 26, 2023

cbrewster commented Oct 26, 2023

tommyjcarpenter commented Jun 12, 2024

wallyqs commented Aug 28, 2024

tommyjcarpenter commented Aug 28, 2024

wallyqs commented Aug 28, 2024

cbrewster commented Oct 26, 2023 •

edited

Loading

jnmoyne commented Oct 26, 2023 •

edited

Loading

mgross-ebner commented Oct 26, 2023 •

edited

Loading

cbrewster commented Oct 26, 2023 •

edited

Loading