Mas i1804 peerdiscovery #1812

martinsumner · 2022-03-03T11:41:39Z

The heart of the problem is how to avoid needing configuration changes on sink clusters when source clusters are bing changed. This allows for new nodes to be discovered automatically, from configured nodes.

Default behaviour is to always fallback to configured behaviour.

Worker Counts and Per Peer Limits need to be set based on an understanding of whether this will be enabled. Although, if per peer limit is left to default, the consequence will be the worker count will be evenly distributed (independently by each node). Note, if Worker Count mod (Src Node Count) =/= 0 - then there will be no balancing of the excess workers across the sink nodes.

Refactoring of riak_kv_replrtq to allow sharing of code and interaction between snk and peer

Adds some further logging. Also corrects the comparison between current and discovered peers to avoid unnecessary resets.

Adds operator riak_client functions

ThomasArts · 2022-03-23T08:46:40Z

src/riak_kv_replrtq_peer.erl

+%% Prompt for the discovery of peers
+-spec update_discovery(riak_kv_replrtq_snk:queue_name()) -> boolean().
+update_discovery(QueueName) ->
+    gen_server:call(?MODULE, {update_discovery, QueueName}, 60 * 1000).


Do you really want 60 hardcoded here or rather use the macro ?DISCOVERY_TIMEOUT_SECONDS.

ThomasArts · 2022-03-23T08:51:50Z

src/riak_kv_replrtq_peer.erl

+                fun({QueueName, _PeerInfo}) -> 
+                    erlang:send_after(MinDelay * 1000,
+                        self(),
+                        {prompt_discovery, QueueName})


It takes a few moments to realize that there are 2 interfaces for prompt discovery where this one goes via handle_info and has only one argument that does a lookup for the PeerInfo at the time it executes, whereas the cast version gets the PeerInfo in the interface.

Probably one should comment this difference more clearly. In particular why one does not need to do a state lookup for the cast version

ThomasArts · 2022-03-23T08:54:14Z

src/riak_kv_replrtq_peer.erl

+handle_info({prompt_discovery, QueueName}, State) ->
+    {QueueName, PeerInfo} =
+        lists:keyfind(QueueName, 1, State#state.discovery_peers),
+    ok = prompt_discovery({QueueName, PeerInfo}),


I kind of dislike this... you leave the context of the genserver in the cast to later return to it... I wonder if it would not be better to actually call prompt_discovery(QueueName, PeerInfo, regular) at this point.

ThomasArts · 2022-03-23T09:15:37Z

src/riak_kv_replrtq_peer.erl

+                        riak_kv_replrtq_snk:current_peers(QueueName)))
+        end,
+    case discover_peers(PeerInfo, StartDelayMS) of
+        CurrentPeers ->


I guess this is by design, but if you run with Type count_change then CurrentPeers is the empty list. So in case discover_peers returns an empty list, you don't know in which of the two cases you are.

Both cases return false, but the side effects are different.

Tried to avoid this confusion by not relying on the empty list mis-match - rather a specific mis-match between list and atom. Commented as well. Don't think it is super-clean still, but improved maybe.

ThomasArts · 2022-03-23T09:23:33Z

src/riak_client.erl

+    non_neg_integer()) -> list(node()).
+replrtq_reset_all_workercounts(WorkerC, PerPeerL) ->
+    UpNodes = riak_core_node_watcher:nodes(riak_kv),
+    lists:foldl(replrtq_resetcount_fun(WorkerC, PerPeerL), [], UpNodes).


Personally I find the code clearer if you inline this function.

ThomasArts · 2022-03-23T09:28:23Z

src/riak_kv_replrtq_snk.erl

@@ -409,13 +428,45 @@ handle_info({prompt_requeue, WorkItem}, State) ->
    requeue_work(WorkItem),
    {noreply, State}.

-terminate(_Reason, _State) ->
+terminate(_Reason, State) ->


This terminate function may now take considerably more time.

If the supervisor terminates this server, how likely is it that this takes too long compared to the time you want it restarted?

ThomasArts · 2022-03-23T09:29:53Z

src/riak_kv_replrtq_snk.erl

+    {SnkWorkerCount, PerPeerLimit}.
+
+set_worker_counts(SnkWorkerCount, PerPeerLimit) ->
+    application:set_env(riak_kv, replrtq_sinkworkers, SnkWorkerCount),


Why using app_helper for getting and application for setting? Is this an artefcat of OTP18 or so?

Never thought about this much, just following inline with its use elsewhere. Looking at the docs application:get_env/3 didn't exist until R16B ... and Riak was initially written prior to that ... so I guess it is just a legacy of this.

It might be one thing to add to the list for the OTP 24+ version of Riak, to refactor throughout and to take this out.

ThomasArts

Some smaller comments, nothing structural, just making sure you did things on purpose.

Rename _discovery functions to reduce confusion over multiple things called prompt_discovery doing different things

Saves some pain with pattern matching {ok, V}|undefined

Merge in changes from Riak 3.0.10. Includes PRs: - #1809 - #1812 - #1814 - #1816 - #1829 - #1830

See basho#1804 The heart of the problem is how to avoid needing configuration changes on sink clusters when source clusters are bing changed. This allows for new nodes to be discovered automatically, from configured nodes. Default behaviour is to always fallback to configured behaviour. Worker Counts and Per Peer Limits need to be set based on an understanding of whether this will be enabled. Although, if per peer limit is left to default, the consequence will be the worker count will be evenly distributed (independently by each node). Note, if Worker Count mod (Src Node Count) =/= 0 - then there will be no balancing of the excess workers across the sink nodes. # Conflicts: # rebar.config # src/riak_kv_replrtq_peer.erl # src/riak_kv_replrtq_snk.erl # src/riak_kv_replrtq_src.erl

martinsumner added 9 commits February 28, 2022 14:33

Initial peer discovery code

718edd3

Refactoring of riak_kv_replrtq to allow sharing of code and interaction between snk and peer

Add membership_request (pb)

0ee3b7b

Update riak_client.erl

c3e7c64

Update following test

0a90a2a

Adds some further logging. Also corrects the comparison between current and discovered peers to avoid unnecessary resets.

Updates following extension of test

746c6bb

Adds operator riak_client functions

Add HTTP API for membership_request

4a68c7c

Close clients in work queue when terminating

5656ce9

Update riak_kv_replrtq_snk.erl

23dd69f

Add configuration items

4c39146

Update riak_kv_replrtq_peer.erl

0d9b124

ThomasArts reviewed Mar 23, 2022

View reviewed changes

ThomasArts approved these changes Mar 23, 2022

View reviewed changes

martinsumner added 6 commits March 23, 2022 10:33

Updates following code review

aea514f

Rename _discovery functions to reduce confusion over multiple things called prompt_discovery doing different things

Update to avoid https/git issue

1181596

Convert app_helper -> application in replrtq

44b5567

app_helper:get/2 =/= application:get_env/2

95c5e4b

Saves some pain with pattern matching {ok, V}|undefined

Revert back as app_helper:get_env/2 =/= appplication:get_env/2

31dc8db

Update riak_kv_replrtq_peer.erl

179a543

martinsumner merged commit aeca1ca into develop-3.0 May 11, 2022

martinsumner deleted the mas-i1804-peerdiscovery branch May 12, 2022 14:28

martinsumner added a commit that referenced this pull request May 31, 2022

Mas 310 merge (#1832)

a632efe

Merge in changes from Riak 3.0.10. Includes PRs: - #1809 - #1812 - #1814 - #1816 - #1829 - #1830

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mas i1804 peerdiscovery #1812

Mas i1804 peerdiscovery #1812

martinsumner commented Mar 3, 2022

ThomasArts Mar 23, 2022

ThomasArts Mar 23, 2022 •

edited

Loading

ThomasArts Mar 23, 2022

ThomasArts Mar 23, 2022

martinsumner Mar 23, 2022

ThomasArts Mar 23, 2022

ThomasArts Mar 23, 2022

ThomasArts Mar 23, 2022

martinsumner Mar 23, 2022

ThomasArts left a comment

Mas i1804 peerdiscovery #1812

Mas i1804 peerdiscovery #1812

Conversation

martinsumner commented Mar 3, 2022

Choose a reason for hiding this comment

ThomasArts Mar 23, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ThomasArts left a comment

Choose a reason for hiding this comment

ThomasArts Mar 23, 2022 •

edited

Loading