Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log unexpected $ENDOFF responses in dual channel replication #839

Merged
merged 1 commit into from
Aug 15, 2024

Conversation

secwall
Copy link
Contributor

@secwall secwall commented Jul 29, 2024

I've tried to test a dual channel replication but forgot to add +sync for my replication user. As a result replica entered silent cycle like this:

* Connecting to PRIMARY 127.0.0.1:6379
* PRIMARY <-> REPLICA sync started
* Non blocking connect for SYNC fired the event.
* Primary replied to PING, replication can continue...
* Trying a partial resynchronization (request ...)
* PSYNC is not possible, initialize RDB channel.
* Aborting dual channel sync

And primary got endless cycle like this:

* Replica 127.0.0.1:6380 asks for synchronization
* Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '...', my replication IDs are '...' and '...')
* Replica 127.0.0.1:6380 is capable of dual channel synchronization, and partial sync isn't possible. Full sync will continue with dedicated RDB channel.

There was no way to understand that replication user is missing +sync acl on notice log level. With this one-line change we get a warning message in our replica log.


Signed-off-by: secwall secwall@yandex-team.ru

I've tried to test a dual channel replication but forgot to add +copy
for my replication user. As a result replica entered silent cycle like
this:
```
* Connecting to PRIMARY 127.0.0.1:6379
* PRIMARY <-> REPLICA sync started
* Non blocking connect for SYNC fired the event.
* Primary replied to PING, replication can continue...
* Trying a partial resynchronization (request ...)
* PSYNC is not possible, initialize RDB channel.
* Aborting dual channel sync
```

And primary got endless cycle like this:
```
* Replica 127.0.0.1:6380 asks for synchronization
* Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '...', my replication IDs are '...' and '...')
* Replica 127.0.0.1:6380 is capable of dual channel synchronization, and partial sync isn't possible. Full sync will continue with dedicated RDB channel.
```

There was no way to understand that replication user is missing +copy acl on notice log level.
With this one-line change we get a warning message in our replica log.

---------

Signed-off-by: secwall <secwall@yandex-team.ru>
@madolson
Copy link
Member

@secwall Can I ask what your use case was to manually be setting up the dual channel?

@secwall
Copy link
Contributor Author

secwall commented Aug 13, 2024

@secwall Can I ask what your use case was to manually be setting up the dual channel?

I've been trying to test how the new features in version 8.0 work in our environment.

Copy link

codecov bot commented Aug 14, 2024

Codecov Report

Attention: Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Project coverage is 70.36%. Comparing base (b4d96ca) to head (ab13533).
Report is 25 commits behind head on unstable.

Files Patch % Lines
src/replication.c 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #839      +/-   ##
============================================
- Coverage     70.38%   70.36%   -0.02%     
============================================
  Files           112      112              
  Lines         61462    61463       +1     
============================================
- Hits          43261    43250      -11     
- Misses        18201    18213      +12     
Files Coverage Δ
src/replication.c 87.13% <0.00%> (-0.04%) ⬇️

... and 12 files with indirect coverage changes

@madolson madolson merged commit 103365f into valkey-io:unstable Aug 15, 2024
47 checks passed
@madolson madolson added the release-notes This issue should get a line item in the release notes label Aug 15, 2024
mapleFU pushed a commit to mapleFU/valkey that referenced this pull request Aug 21, 2024
…io#839)

I've tried to test a dual channel replication but forgot to add +sync
for my replication user. As a result replica entered silent cycle like
this:
```
* Connecting to PRIMARY 127.0.0.1:6379
* PRIMARY <-> REPLICA sync started
* Non blocking connect for SYNC fired the event.
* Primary replied to PING, replication can continue...
* Trying a partial resynchronization (request ...)
* PSYNC is not possible, initialize RDB channel.
* Aborting dual channel sync
```

And primary got endless cycle like this:
```
* Replica 127.0.0.1:6380 asks for synchronization
* Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '...', my replication IDs are '...' and '...')
* Replica 127.0.0.1:6380 is capable of dual channel synchronization, and partial sync isn't possible. Full sync will continue with dedicated RDB channel.
```

There was no way to understand that replication user is missing +sync
acl on notice log level. With this one-line change we get a warning
message in our replica log.

---------

Signed-off-by: secwall <secwall@yandex-team.ru>
Signed-off-by: mwish <maplewish117@gmail.com>
mapleFU pushed a commit to mapleFU/valkey that referenced this pull request Aug 22, 2024
…io#839)

I've tried to test a dual channel replication but forgot to add +sync
for my replication user. As a result replica entered silent cycle like
this:
```
* Connecting to PRIMARY 127.0.0.1:6379
* PRIMARY <-> REPLICA sync started
* Non blocking connect for SYNC fired the event.
* Primary replied to PING, replication can continue...
* Trying a partial resynchronization (request ...)
* PSYNC is not possible, initialize RDB channel.
* Aborting dual channel sync
```

And primary got endless cycle like this:
```
* Replica 127.0.0.1:6380 asks for synchronization
* Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '...', my replication IDs are '...' and '...')
* Replica 127.0.0.1:6380 is capable of dual channel synchronization, and partial sync isn't possible. Full sync will continue with dedicated RDB channel.
```

There was no way to understand that replication user is missing +sync
acl on notice log level. With this one-line change we get a warning
message in our replica log.

---------

Signed-off-by: secwall <secwall@yandex-team.ru>
Signed-off-by: mwish <maplewish117@gmail.com>
madolson pushed a commit that referenced this pull request Sep 2, 2024
I've tried to test a dual channel replication but forgot to add +sync
for my replication user. As a result replica entered silent cycle like
this:
```
* Connecting to PRIMARY 127.0.0.1:6379
* PRIMARY <-> REPLICA sync started
* Non blocking connect for SYNC fired the event.
* Primary replied to PING, replication can continue...
* Trying a partial resynchronization (request ...)
* PSYNC is not possible, initialize RDB channel.
* Aborting dual channel sync
```

And primary got endless cycle like this:
```
* Replica 127.0.0.1:6380 asks for synchronization
* Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '...', my replication IDs are '...' and '...')
* Replica 127.0.0.1:6380 is capable of dual channel synchronization, and partial sync isn't possible. Full sync will continue with dedicated RDB channel.
```

There was no way to understand that replication user is missing +sync
acl on notice log level. With this one-line change we get a warning
message in our replica log.

---------

Signed-off-by: secwall <secwall@yandex-team.ru>
madolson pushed a commit that referenced this pull request Sep 3, 2024
I've tried to test a dual channel replication but forgot to add +sync
for my replication user. As a result replica entered silent cycle like
this:
```
* Connecting to PRIMARY 127.0.0.1:6379
* PRIMARY <-> REPLICA sync started
* Non blocking connect for SYNC fired the event.
* Primary replied to PING, replication can continue...
* Trying a partial resynchronization (request ...)
* PSYNC is not possible, initialize RDB channel.
* Aborting dual channel sync
```

And primary got endless cycle like this:
```
* Replica 127.0.0.1:6380 asks for synchronization
* Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '...', my replication IDs are '...' and '...')
* Replica 127.0.0.1:6380 is capable of dual channel synchronization, and partial sync isn't possible. Full sync will continue with dedicated RDB channel.
```

There was no way to understand that replication user is missing +sync
acl on notice log level. With this one-line change we get a warning
message in our replica log.

---------

Signed-off-by: secwall <secwall@yandex-team.ru>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes This issue should get a line item in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants