-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added tests for MQTT retained messages in various cluster/domain conf… #5082
Conversation
server/mqtt_ex_test_test.go
Outdated
target := topo.makef(t) | ||
t.Cleanup(target.Shutdown) | ||
|
||
// TODO (levb): if we do not pre-initialize the streams and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NS: cc @derekcollison @kozlovic This test uncovered a rather inconsistent behavior in multi-domain scenarios. If there are 2 domains, A and B, and the MQTT JetStream assets are first initialized in A, B will only receive and replay retained messages after its own retained message assets are created. Touching the servers with an MQTT connection here ensures the success of the tests; otherwise A<->B fail 1/2 of the time.
Not quite sure if this is "by design".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think that if they have different domains, messages should not even be visible in the other domain, but I have to say I don't really understand that concept. What I notice is that you don't specify js_domain:
in the mqtt{}
block of the config, but if I try to add those, then the test fails early since the lookup of a stream fails with a timeout since the JS APIs are rewritten with the domain name, like $JS.HUBD.API.STREAM.INFO.$MQTT_sess
and for some reason, the server does not listen to those, even though you have specified domain
in the JS config too. I don't really see in JS code where the server listens to that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't specify js_domain: in the mqtt{} block of the config
👍 I did not (and still do not) fully understand the intended functionality of this option, would love to add tests for it. Will read the code and try to understand better. The customer that was having issues did not have it set, so I did not prioritize it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments, but unfortunately regarding the domains, I am not sure how that works and can't really help there.
server/mqtt_ex_test_test.go
Outdated
for _, c := range t.clusters { | ||
c.shutdown() | ||
} | ||
for _, s := range t.servers { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment in Reload()
tb.Helper() | ||
|
||
for _, c := range t.clusters { | ||
c.restartAllSamePorts() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This a mistake that I use to make: restartAllSamePorts()
does NOTHING if the cluster has not been shutdown first. Basically that code goes through servers in the cluster and restart IF they are not currently running. So you need to add c.shutdown()
prior to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And adding that shows that TestMQTTExRetainedMessages/cluster
are now failing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, should call c.stopAll(), not c.shutdown() since in that case the config/store would be wiped out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need after the restart to add something like:
for _, stream := range []string{mqttSessStreamName, mqttStreamName, mqttQoS2IncomingMsgsStreamName, mqttOutStreamName, mqttRetainedMsgsStreamName} {
c.waitOnStreamLeader("ONE", stream)
}
Without something like that, then it is possible that a connect fails because one of the stream lookup fails with a timeout (since after a restart, it may take some time to elect a leader for those streams).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch... fixing.
|
||
for i, s := range t.servers { | ||
o := s.getOpts() | ||
s.Shutdown() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it that mqttExTarget will either have t.clusters or t.servers set but not both? Or at least that t.servers will not be servers that are in t.clusters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the case at the moment, but you're right that it may not be the case in the future (a cluster with a single leaf node immediately comes to mind).
TBH, this was just a "minimal viable" implementation for the current requirements, I didn't want to cut/paste nor to go too abstract here (it's just an "extra" test for the time being). If we keep using MQTTEx going forward more then yes, the "target" implementation should be refactored.
server/mqtt_ex_test_test.go
Outdated
sub []mqttExDial | ||
} | ||
|
||
func TestMQTTExCompliance(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to be able to exclude this file from running the normal tests, say from server repo something like:
go test -race -v -run=TestMQTT ./server
Now fails if you don't have all the things that you do in GithubActions setup. So you may want to use some build flags or something to exclude this file. Maybe something like:
//go:build !skip_mqtt_tests && run_mqtt_ex
// +build !skip_mqtt_tests,run_mqtt_ex
Of course, that means adding -tags=run_mqtt_ex
in the go
command to run those tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it, but passed on a separate build tag. All the MQTTEx tests (should) do a t.Skip
if mqtt-test
is not installed in CI, so it's ok to invoke them in the main tag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I did not realize that this is what was happening. I installed mqtt-test, etc.. and added it to my path, but then when I was trying to run the "normal" MQTT tests, those 2 would run and I thought that was new.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, will separate.
server/mqtt_ex_test_test.go
Outdated
target := topo.makef(t) | ||
t.Cleanup(target.Shutdown) | ||
|
||
// TODO (levb): if we do not pre-initialize the streams and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think that if they have different domains, messages should not even be visible in the other domain, but I have to say I don't really understand that concept. What I notice is that you don't specify js_domain:
in the mqtt{}
block of the config, but if I try to add those, then the test fails early since the lookup of a stream fails with a timeout since the JS APIs are rewritten with the domain name, like $JS.HUBD.API.STREAM.INFO.$MQTT_sess
and for some reason, the server does not listen to those, even though you have specified domain
in the JS config too. I don't really see in JS code where the server listens to that...
tb.Helper() | ||
|
||
for _, c := range t.clusters { | ||
c.restartAllSamePorts() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, should call c.stopAll(), not c.shutdown() since in that case the config/store would be wiped out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to make sure that the cluster is really restarted. And I have found then that tests are quite flappy. I think you need to have to maybe use testMQTTConnectRetry() with a clean session and dummy ID (or no ID specified) until it succeeds before having the real test in mqttexRunTest(). Just waiting on the cluster to be ready proved not enough in tests I was running last week.
server/mqtt_ex_test_test.go
Outdated
sub []mqttExDial | ||
} | ||
|
||
func TestMQTTExCompliance(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I did not realize that this is what was happening. I installed mqtt-test, etc.. and added it to my path, but then when I was trying to run the "normal" MQTT tests, those 2 would run and I thought that was new.
@kozlovic sorry my commit message was non-descriptive, I squashed from another branch. Anyway, I am still working on the restart a another potential issue I ran into, will mark you for review when done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
#5082) This is a Test- and CI-only PR. No server changes. See also ConnectEverything/mqtt-test#2.
#5082) This is a Test- and CI-only PR. No server changes. See also ConnectEverything/mqtt-test#2.
This is a Test- and CI-only PR. No server changes.
See also ConnectEverything/mqtt-test#2.