Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NET-3865] [Supportability] Additional Information in the output of 'consul operator raft list-peers' #17582

Merged
merged 30 commits into from
Jun 14, 2023
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
2f94024
init
absolutelightning Jun 6, 2023
7626d09
fix tests
absolutelightning Jun 6, 2023
79aabc9
added -detailed in docs
absolutelightning Jun 6, 2023
44eee41
added change log
absolutelightning Jun 6, 2023
1390353
Merge branch 'main' into asheshvidyut/NET-3865
absolutelightning Jun 6, 2023
d118518
fix doc
absolutelightning Jun 6, 2023
d46a074
checking for entry in map
absolutelightning Jun 7, 2023
539dbc7
fix tests
absolutelightning Jun 7, 2023
604b616
removed detailed flag
absolutelightning Jun 7, 2023
3a63b3d
removed detailed flag
absolutelightning Jun 7, 2023
008946c
revert unwanted changes
absolutelightning Jun 7, 2023
509062e
removed unwanted changes
absolutelightning Jun 7, 2023
cf22ea5
updated change log
absolutelightning Jun 7, 2023
eb69358
Merge branch 'main' into asheshvidyut/NET-3865
absolutelightning Jun 7, 2023
5c2a587
pr review comment changes
absolutelightning Jun 8, 2023
8e1f728
pr comment changes single API instead of two
absolutelightning Jun 8, 2023
360ab9e
Merge branch 'asheshvidyut/NET-3865' of github.com:hashicorp/consul i…
absolutelightning Jun 8, 2023
dc25344
fix change log
absolutelightning Jun 8, 2023
c461f59
Merge branch 'main' into asheshvidyut/NET-3865
absolutelightning Jun 8, 2023
3631708
fix tests
absolutelightning Jun 8, 2023
5d479e8
Merge branch 'asheshvidyut/NET-3865' of github.com:hashicorp/consul i…
absolutelightning Jun 8, 2023
40dc1e1
fix tests
absolutelightning Jun 8, 2023
e7afa34
fix test operator raft endpoint test
absolutelightning Jun 9, 2023
8c1bab4
Update .changelog/17582.txt
absolutelightning Jun 9, 2023
6b76d59
nits
absolutelightning Jun 9, 2023
c001e59
Merge branch 'asheshvidyut/NET-3865' of github.com:hashicorp/consul i…
absolutelightning Jun 9, 2023
64e3b0a
Merge branch 'main' into asheshvidyut/NET-3865
absolutelightning Jun 14, 2023
7f8d9d8
updated docs
absolutelightning Jun 14, 2023
fdad3bb
Merge branch 'asheshvidyut/NET-3865' of github.com:hashicorp/consul i…
absolutelightning Jun 14, 2023
1adcb15
Merge branch 'main' into asheshvidyut/NET-3865
absolutelightning Jun 14, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changelog/17582.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
```release-note:feature
improved consul operator raft list-peers command added CommitIndex as well as Trails Leader By in the output table
Commit Index is the LastIndex from /v1/operator/autopilot/health and Trails Leader By is the difference from leader
last index and followers last index
```
Copy link
Collaborator

@analogue analogue Jun 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe short and sweet would be preferable here given the audience are customers interested in changes and how they benefit/affect them.

e.g. consul operator raft list-peers shows the number of commits each follower is trailing the leader by to aid in troubleshooting.

17 changes: 17 additions & 0 deletions api/operator_raft.go
Original file line number Diff line number Diff line change
Expand Up @@ -122,3 +122,20 @@ func (op *Operator) RaftRemovePeerByID(id string, q *WriteOptions) error {
}
return nil
}

// GetAutoPilotHealth is used to query the autopilot health.
func (op *Operator) GetAutoPilotHealth(q *QueryOptions) (*OperatorHealthReply, error) {
r := op.c.newRequest("GET", "/v1/operator/autopilot/health")
r.setQueryOptions(q)
_, resp, err := op.c.doRequest(r)
if err != nil {
return nil, err
}
defer closeResponseBody(resp)

var out OperatorHealthReply
if err := decodeBody(resp, &out); err != nil {
return nil, err
}
return &out, nil
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you're duplicating the code for AutoPilotServerHealth instead of calling it directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I did not see the code before. Thanks!

19 changes: 19 additions & 0 deletions api/operator_raft_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -57,3 +57,22 @@ func TestAPI_OperatorRaftLeaderTransfer(t *testing.T) {
t.Fatalf("err:%v", transfer)
}
}

func TestAPI_GetAutoPilotHealth(t *testing.T) {
t.Parallel()
c, s := makeClient(t)
defer s.Stop()

operator := c.Operator()
out, err := operator.GetAutoPilotHealth(nil)
if err != nil {
t.Fatalf("err: %v", err)
}

if len(out.Servers) != 1 ||
!out.Servers[0].Leader ||
!out.Servers[0].Voter ||
out.Servers[0].LastIndex <= 0 {
t.Fatalf("bad: %v", out)
}
}
41 changes: 38 additions & 3 deletions command/operator/raft/listpeers/operator_raft_list.go
Original file line number Diff line number Diff line change
Expand Up @@ -70,8 +70,29 @@ func raftListPeers(client *api.Client, stale bool) (string, error) {
return "", fmt.Errorf("Failed to retrieve raft configuration: %v", err)
}

autoPilotReply, err := client.Operator().GetAutoPilotHealth(q)
if err != nil {
return "", fmt.Errorf("Failed to retrieve autopilot health: %v", err)
}

serverHealthDataMap := make(map[string]api.ServerHealth)
leaderLastCommitIndex := uint64(0)

for _, serverHealthData := range autoPilotReply.Servers {
serverHealthDataMap[serverHealthData.ID] = serverHealthData
}

for _, s := range reply.Servers {
if s.Leader {
serverHealthDataLeader, ok := serverHealthDataMap[s.ID]
if ok {
leaderLastCommitIndex = serverHealthDataLeader.LastIndex
}
}
}

Copy link
Collaborator

@analogue analogue Jun 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of making a second API request to get a single value (LastIndex) per server, how feasible would it be to just add LastIndex to the existing RaftConfiguration struct returned by the server side implementation of RaftGetConfiguration(...). It seems the data is readily accessible via consul.Server.autopilot.GetState().Servers[].Stats.LastIndex.

@dhiaayachi or @boxofrad Are there prior established patterns for this type of change that gets an additional piece of data or are multiple APIs calls the norm in the implemenation of commands?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @analogue extending the RaftConfiguration to add the LastIndex seem to be a cleaner approach in this case.

I don't think multiple API calls is a norm here and I would vote for having all the data needed included in a single API call as much as possible. It will also give the possibility to users to use the API to perform some automation, like watching the servers before going forward to the next step of a rolling upgrade.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

// Format it as a nice table.
result := []string{"Node\x1fID\x1fAddress\x1fState\x1fVoter\x1fRaftProtocol"}
result := []string{"Node\x1fID\x1fAddress\x1fState\x1fVoter\x1fRaftProtocol\x1fCommit Index\x1fTrails Leader By"}
for _, s := range reply.Servers {
raftProtocol := s.ProtocolVersion

Expand All @@ -82,8 +103,22 @@ func raftListPeers(client *api.Client, stale bool) (string, error) {
if s.Leader {
state = "leader"
}
result = append(result, fmt.Sprintf("%s\x1f%s\x1f%s\x1f%s\x1f%v\x1f%s",
s.Node, s.ID, s.Address, state, s.Voter, raftProtocol))

serverHealthData, ok := serverHealthDataMap[s.ID]
if ok {
trailsLeaderBy := leaderLastCommitIndex - serverHealthData.LastIndex
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@analogue - Will leader last commit index be always greater than follower? or should I take abs(leaderLastCommitIndex - serverHealthData.LastIndex)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe the leader's last commit index should always be greater than or equal to the follower's last commit index.

trailsLeaderByText := fmt.Sprintf("%d Commits", trailsLeaderBy)
if s.Leader {
trailsLeaderByText = "_"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer - (dash) over _ (underscore) since a dash conveys "not applicable" or "skip" when generally seen in a table cell.

} else if trailsLeaderBy <= 1 {
trailsLeaderByText = fmt.Sprintf("%d Commit", trailsLeaderBy)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer lowercase commit over Commit for consistency.

}
result = append(result, fmt.Sprintf("%s\x1f%s\x1f%s\x1f%s\x1f%v\x1f%s\x1f%v\x1f%s",
s.Node, s.ID, s.Address, state, s.Voter, raftProtocol, serverHealthData.LastIndex, trailsLeaderByText))
} else {
result = append(result, fmt.Sprintf("%s\x1f%s\x1f%s\x1f%s\x1f%v\x1f%s\x1f%v",
s.Node, s.ID, s.Address, state, s.Voter, raftProtocol, "_"))
}
}

return columnize.Format(result, &columnize.Config{Delim: string([]byte{0x1f})}), nil
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ func TestOperatorRaftListPeersCommand(t *testing.T) {
a := agent.NewTestAgent(t, ``)
defer a.Shutdown()

expected := fmt.Sprintf("%s %s 127.0.0.1:%d leader true 3",
expected := fmt.Sprintf("%s %s 127.0.0.1:%d leader true 3 1 _",
a.Config.NodeName, a.Config.NodeID, a.Config.ServerPort)

// Test the list-peers subcommand directly
Expand Down
2 changes: 1 addition & 1 deletion website/content/commands/operator/raft.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ configuration.

- `-stale` - Enables non-leader servers to provide cluster state information.
If the cluster is in an outage state without a leader,
we recommend setting this option to `true.
we recommend setting this option to `true`.
Default is `false`.

## remove-peer
Expand Down