Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the issue of ignoring callback calls for removed keys. #789

Merged
merged 4 commits into from
Jul 10, 2023

Conversation

oleksandrivantsiv
Copy link
Collaborator

What I did
Fix the issue of ignoring callback calls for removed keys.

Why I did it
ConfigDBConnector.listen method has a caching mechanism (added in #587 PR) that preloads the DB state before starting. When the notification about the changed key is received the listen method gets key data from the DB (in all cases when the key was added, updated, or removed) and compares the data with the cache. It fires the callback only if data differ from the cache. Otherwise, the callback is ignored.

If the client.hgetall(key) is called for the removed key it returns an empty dictionary ({}). This can be confused with the data of the key with no attributes. For example: {"TABLE": {"KEY": {}}}.

So if preloaded data contains a key with no attributes and the listener method receives a notification about the removal of such key the callback is not called. The listener will simply remove the key from the cache without calling the callback. This leads to the situation when the removal of the key is not handled.

The solution is to get data for the added or updated keys, and for the removed keys use None instead. This will ensure that the comparison with the cache will work as expected.

How I verified it
Compile the package and run the unit test. Unit tests are extended to cover the expected behavior.

Details if related

@xumia
Copy link
Collaborator

xumia commented Jun 4, 2023

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@oleksandrivantsiv
Copy link
Collaborator Author

@qiluo-msft kindly reminder to review

@qiluo-msft qiluo-msft requested a review from liuh-80 June 7, 2023 22:01
data = None
else:
client = self.get_redis_client(self.db_name)
data = self.raw_to_typed(client.hgetall(key))
Copy link
Contributor

@qiluo-msft qiluo-msft Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hgetall

The bug is introduced by hgetall implementation. For non-existing key (like just deleted), the return value should be None instread of {}. #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value of hgetall is aligned with the Redis HGETALL:
https://redis.io/commands/hgetall/

Return
Array reply: list of fields and their values stored in the hash, or an empty list when key does not exist.

From my POV it works as expected.

There is no reason to query data for the key if the key is removed. We can consider this also as a performance optimization.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qiluo-msft, @liuh-80 kindly reminder to review

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with you on HGETALL behavior.

@@ -105,8 +105,11 @@ class ConfigDBConnector_Native : public SonicV2Connector_Native
try:
(table, row) = key.split(self.TABLE_NAME_SEPARATOR, 1)
if table in self.handlers:
client = self.get_redis_client(self.db_name)
data = self.raw_to_typed(client.hgetall(key))
if item['data'] == 'del':
Copy link
Contributor

@qiluo-msft qiluo-msft Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

del

I experimented and found del is not the only value. It may be hdel.

I am thinking the root cause is in raw_to_typed(). If raw_data == {}, the return value should be None. It is general fix, but seems good for all the usage.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get raw_data == {} in two cases:

  1. When the key is removed.
  2. If the key is created without attributes. An example is NTP configuration:
    "NTP_SERVER": {
        "10.210.25.32": {},
        "10.75.202.2": {}
    },

So, we can't rely only on the data itself but also need to take into account operation type.
If the operation type is 'del' we clearly understand that the key was removed and there is no need to query its data, so we can pass None instead.
Otherwise, the key was added or changed, and raw_data == {} is valid data in that case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for mistake. I update above comment hset -> hdel.
What I mean is that del is not the only possible item['data'] for a deleted key. Please consider at least hdel.

This is my experiment:
Session 1:

$ redis-cli -n 4
127.0.0.1:6379[4]> hset a b c
(integer) 1
127.0.0.1:6379[4]> hdel a b
(integer) 1
127.0.0.1:6379[4]>

Session 2:

$ redis-cli
127.0.0.1:6379> select 4
OK
127.0.0.1:6379[4]> psubscribe '__keyspace@4*__:*'
Reading messages... (press Ctrl-C to quit)
1) "psubscribe"
2) "__keyspace@4*__:*"
3) (integer) 1
1) "pmessage"
2) "__keyspace@4*__:*"
3) "__keyspace@4__:a"
4) "hset"
1) "pmessage"
2) "__keyspace@4*__:*"
3) "__keyspace@4__:a"
4) "hdel"
1) "pmessage"
2) "__keyspace@4*__:*"
3) "__keyspace@4__:a"
4) "del"
1) "pmessage"
2) "__keyspace@4*__:*"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct, but we are interested in only del messages in this particular case.
hdel is generated when the field under the key is removed:

root@r-r740-08-bf3-sonic-01:/home/admin# redis-cli -n 4
127.0.0.1:6379[4]> HSET key_1 a 1
(integer) 1
127.0.0.1:6379[4]> HSET key_1 b 1
(integer) 1
127.0.0.1:6379[4]> HDEL key_1 b 1
(integer) 1
127.0.0.1:6379[4]>

127.0.0.1:6379[4]> psubscribe '__keyspace@4*__:*'
Reading messages... (press Ctrl-C to quit)
1) "psubscribe"
2) "__keyspace@4*__:*"
3) (integer) 1
1) "pmessage"
2) "__keyspace@4*__:*"
3) "__keyspace@4__:key_1"
4) "hset"
1) "pmessage"
2) "__keyspace@4*__:*"
3) "__keyspace@4__:key_1"
4) "hset"
1) "pmessage"
2) "__keyspace@4*__:*"
3) "__keyspace@4__:key_1"
4) "hdel"

So, when 'hdel' is received we need to get key data to handle the removed field.

'del' message is generated when the key is removed, including all its data. In this case, there is no data to get from DB as they are already gone:

root@r-r740-08-bf3-sonic-01:/home/admin# redis-cli -n 4
127.0.0.1:6379[4]> HSET key_1 a 1
(integer) 1
127.0.0.1:6379[4]> HSET key_1 b 1
(integer) 1
127.0.0.1:6379[4]> DEL key_1
(integer) 1
127.0.0.1:6379[4]>

root@r-r740-08-bf3-sonic-01:/home/admin# redis-cli -n 4
127.0.0.1:6379[4]> psubscribe '__keyspace@4*__:*'
Reading messages... (press Ctrl-C to quit)
1) "psubscribe"
2) "__keyspace@4*__:*"
3) (integer) 1
1) "pmessage"
2) "__keyspace@4*__:*"
3) "__keyspace@4__:key_1"
4) "hset"
1) "pmessage"
2) "__keyspace@4*__:*"
3) "__keyspace@4__:key_1"
4) "hset"
1) "pmessage"
2) "__keyspace@4*__:*"
3) "__keyspace@4__:key_1"
4) "del"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qiluo-msft kindly reminder to review

Copy link
Contributor

@qiluo-msft qiluo-msft Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you that hdel does not always mean key deleted.

I have second thought on previous discussion

We can get raw_data == {} in two cases:

  1. ...
  2. If the key is created without attributes. An example is NTP configuration.

In this case, the underlying Redis data is actually

127.0.0.1:6379[4]> hgetall "NTP_SERVER|10.20.8.129"
1) "NULL"
2) "NULL"

So my suggestion is still applicable:

The root cause is in raw_to_typed(). If raw_data == {}, the return value should be None. It is general fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oleksandrivantsiv Please check my last comment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qiluo-msft the suggestion will work. The only concern I have is that it will consume more CPU resources and time to execute from the performance point of view. Because we will need to get the data for the not existing key. This operation is redundant as we already know that the key is removed.

@liat-grozovik
Copy link
Collaborator

liat-grozovik commented Jun 14, 2023

@qiluo-msft @ganglyu any additional comments or we can move on and merge this one?

@oleksandrivantsiv
Copy link
Collaborator Author

@qiluo-msft kindly reminder to review

@oleksandrivantsiv
Copy link
Collaborator Author

/azp run

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 789 in repo sonic-net/sonic-swss-common

@xumia
Copy link
Collaborator

xumia commented Jul 7, 2023

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@qiluo-msft qiluo-msft merged commit e0f394c into sonic-net:master Jul 10, 2023
@liuh-80
Copy link
Contributor

liuh-80 commented Jul 18, 2023

@oleksandrivantsiv , this PR will break all sonic cli, because it changes the get_entry() behavior:

before this change, if an entry does not exist, return {}
After this change, if an entry does not exist, return None.

for example, following command will break:

admin@vlab-01:$ sudo config vlan add 1234
Traceback (most recent call last):
File "/usr/local/bin/config", line 8, in
sys.exit(config())
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 764, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/click/decorators.py", line 64, in new_func
return ctx.invoke(f, obj, *args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/config/vlan.py", line 50, in add_vlan
if clicommon.check_if_vlanid_exist(db.cfgdb, vlan): # TODO: MISSING CONSTRAINT IN YANG MODEL
File "/usr/local/lib/python3.9/dist-packages/utilities_common/cli.py", line 257, in check_if_vlanid_exist
if len(config_db.get_entry(table_name, vlan)) != 0:
TypeError: object of type 'NoneType' has no len()
admin@vlab-01:$

@qiluo-msft
Copy link
Contributor

@oleksandrivantsiv Maybe we can change the implementation of get_entry, get_table, get_config to be backward-compatible, even if the old behivor make less sense.

@oleksandrivantsiv
Copy link
Collaborator Author

@qiluo-msft I think that we need to close

@oleksandrivantsiv Maybe we can change the implementation of get_entry, get_table, get_config to be backward-compatible, even if the old behivor make less sense.

ACK. I will check and provide feedback.

@oleksandrivantsiv
Copy link
Collaborator Author

@qiluo-msft I think that we need to close

@oleksandrivantsiv Maybe we can change the implementation of get_entry, get_table, get_config to be backward-compatible, even if the old behivor make less sense.

ACK. I will check and provide feedback.

@qiluo-msft I think that we need to return to the suggestion I proposed by checking the message type and returning None only when the key is removed. This should solve the problem with VLAN command and won't require changing of get_entry, get_table, get_config methods implementation.

@qiluo-msft
Copy link
Contributor

e need to return to the suggestion I proposed by checking the message type and returning None only when the key is removed. This should solve the problem with VLAN command and won't require changing of get_entry, get_table, get_config methods implementation.

Please go ahead to raise another PR.

qiluo-msft added a commit that referenced this pull request Jul 21, 2023
qiluo-msft added a commit that referenced this pull request Aug 1, 2023
)" (#804)

This reverts commit e0f394c.

The reason is unexpected behavior change of `get_entry`. ref: #789 (comment)
SviatoslavBoichuk pushed a commit to SviatoslavBoichuk/sonic-swss-common that referenced this pull request Sep 7, 2023
…#789)

**What I did**
Fix the issue of ignoring callback calls for removed keys.

**Why I did it**
ConfigDBConnector.listen method has a caching mechanism (added in sonic-net#587 PR) that preloads the DB state before starting. When the notification about the changed key is received the listen method gets key data from the DB (in all cases when the key was added, updated, or removed) and compares the data with the cache. It fires the callback only if data differ from the cache. Otherwise, the callback is ignored. 

If the `client.hgetall(key)` is called for the removed key it returns an empty dictionary (`{}`). This can be confused with the data of the key with no attributes. For example: `{"TABLE": {"KEY": {}}}`. 

So if preloaded data contains a key with no attributes and the listener method receives a notification about the removal of such key the callback is not called. The listener will simply remove the key from the cache without calling the callback. This leads to the situation when the removal of the key is not handled.

The solution is to get data for the added or updated keys, and for the removed keys use `None` instead. This will ensure that the comparison with the cache will work as expected.

**How I verified it**
Compile the package and run the unit test. Unit tests are extended to cover the expected behavior.
SviatoslavBoichuk pushed a commit to SviatoslavBoichuk/sonic-swss-common that referenced this pull request Sep 7, 2023
…onic-net#789)" (sonic-net#804)

This reverts commit e0f394c.

The reason is unexpected behavior change of `get_entry`. ref: sonic-net#789 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants