Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation on some anti-patterns #6034

Merged
merged 12 commits into from
Apr 7, 2023
Merged

Conversation

ulascansenturk
Copy link
Contributor

@ulascansenturk ulascansenturk commented Feb 17, 2023

Created error handling documentation,

To rely on RPC errors instead of

grpc.FailOnNonTempDialError

grpc.WithBlock

grpc.WithReturnConnectionError

#6028

RELEASE NOTES: N/A

@ulascansenturk ulascansenturk changed the title Create documentation for better error handling Add documentation on some anti-patterns Feb 17, 2023
@easwars easwars self-assigned this Feb 21, 2023
@easwars easwars added the Type: Documentation Documentation or examples label Feb 24, 2023
@easwars easwars added this to the 1.54 Release milestone Feb 24, 2023
@easwars
Copy link
Contributor

easwars commented Feb 24, 2023

@ulascansenturk : Could you please wrap the lines at 80-cols. Thank you.

Copy link
Member

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to begin a document that explains why these options are not recommended by explaining that grpc.Dial is not actually a "dialing" function at all. It creates a ClientConn, which is a persistent, managed pool of connections that automatically reconnects as needed. The reason these options are not useful is that errors encountered during the first Dial are no different from those that occur 1 second after the initial connection succeeds, and your application needs to handle both of those cases. It's for that reason that we don't recommend specially handling the initial connection error vs. any other one (vs. "WithBlock...may end up waiting indefinitely" and "miss opportunities to recover").

@dfawley
Copy link
Member

dfawley commented Mar 2, 2023

Please also wrap all lines at 80 columns when you can, which will make it easier for us to read & review.

@dfawley dfawley assigned easwars and dfawley and unassigned ulascansenturk Mar 2, 2023
@ulascansenturk
Copy link
Contributor Author

Please also wrap all lines at 80 columns when you can, which will make it easier for us to read & review.

Done

Documentation/error-handling.md Outdated Show resolved Hide resolved
Documentation/error-handling.md Outdated Show resolved Hide resolved
Documentation/error-handling.md Outdated Show resolved Hide resolved
Documentation/error-handling.md Outdated Show resolved Hide resolved
Documentation/error-handling.md Outdated Show resolved Hide resolved
Documentation/error-handling.md Outdated Show resolved Hide resolved
Documentation/error-handling.md Outdated Show resolved Hide resolved
Documentation/error-handling.md Outdated Show resolved Hide resolved
Documentation/error-handling.md Outdated Show resolved Hide resolved
Documentation/error-handling.md Outdated Show resolved Hide resolved
Documentation/error-handling.md Outdated Show resolved Hide resolved
Documentation/error-handling.md Outdated Show resolved Hide resolved
Documentation/error-handling.md Outdated Show resolved Hide resolved
Documentation/error-handling.md Outdated Show resolved Hide resolved
Documentation/error-handling.md Outdated Show resolved Hide resolved
Documentation/error-handling.md Outdated Show resolved Hide resolved
Copy link
Member

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry again for the late reply. This looks really good, just a few more comments.

Documentation/anti-patterns.md Outdated Show resolved Hide resolved
Documentation/anti-patterns.md Show resolved Hide resolved
Documentation/anti-patterns.md Show resolved Hide resolved
Documentation/anti-patterns.md Show resolved Hide resolved
Documentation/anti-patterns.md Outdated Show resolved Hide resolved
Documentation/anti-patterns.md Outdated Show resolved Hide resolved

## Conclusion

The [`FailOnNonTempDialError`](https://pkg.go.dev/google.golang.org/grpc#FailOnNonTempDialError), [`WithBlock`](https://pkg.go.dev/google.golang.org/grpc#WithBlock), and [`WithReturnConnectionError`](https://pkg.go.dev/google.golang.org/grpc#WithReturnConnectionError)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added to both the first references at the beginning and the references at the very end.
@dfawley

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add the references in the other direction? It would be nice if the godoc for those functions pointed to this document.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For better understanding, do you want me to add godoc references to all other docstrings?

Copy link
Member

@dfawley dfawley Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like a link to this document from

// connecting the server happens in background.
(etc). I recognize this document has no current link, so it will look broken in the PR, but it will work when it's merged. (https://github.com/grpc/grpc-go/blob/master/Documentation/anti-patterns.md)

Copy link
Contributor Author

@ulascansenturk ulascansenturk Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, i'll prepare new PR for that, or should I do it in this pull request?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do it in this one, thanks.

@ulascansenturk ulascansenturk requested a review from dfawley April 6, 2023 22:33
Copy link
Member

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other last minor thing. Otherwise LGTM!

Documentation/anti-patterns.md Outdated Show resolved Hide resolved
@ulascansenturk ulascansenturk requested a review from dfawley April 7, 2023 00:26
Comment on lines 90 to 91
- When retrying failed RPCs, consider using the built-in retry mechanism
provided by gRPC-Go,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems some of the wrapping got messed up here. Please take a final pass and make sure it's all lined up correctly, thanks.

dialoptions.go Outdated
Comment on lines 298 to 299
// For more information about this anti-pattern
// see https://github.com/grpc/grpc-go/blob/master/Documentation/anti-patterns.md.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

//
// Use of this feature is not recommended.  For more information, please see: <link>

?

dialoptions.go Outdated
Comment on lines 315 to 316
// For more information about this anti-pattern
// see https://github.com/grpc/grpc-go/blob/master/Documentation/anti-patterns.md.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this above the // # Experimental line and use the same text as above. (and similar request below)

@ulascansenturk ulascansenturk requested a review from dfawley April 7, 2023 00:39
@dfawley
Copy link
Member

dfawley commented Apr 7, 2023

@easwars can you review the changes & re-approve?

@easwars easwars merged commit 01f8b86 into grpc:master Apr 7, 2023
1 check passed
@oguzyildizz
Copy link

The reason these options are not useful is that errors encountered during the first Dial are no different from those that occur 1 second after the initial connection succeeds

@dfawley what if we want to catch configuration mistakes and don't care about the transient ones? Then WithBlock becomes useful before marking the container ready, no? What would you suggest instead?

@dfawley
Copy link
Member

dfawley commented Apr 27, 2023

@dfawley what if we want to catch configuration mistakes and don't care about the transient ones? Then WithBlock becomes useful before marking the container ready, no?

So in this case, WithBlock may actually cause you to fail in the event of a transient problem as well. If the dependent service happens to be down or unreachable at startup due to a transient problem with that service or your network, you will timeout and fail despite there being no configuration error.

For configuration errors, there are many other kinds of configuration problems and other errors that you couldn't catch this way anyway. E.g. you may connect to the wrong service, so the connection would succeed, but all RPCs would fail with NOT_IMPLEMENTED.

What would you suggest instead?

Yeah, as I said in #6034 (comment): "Rather than fixate on "don't do these things", we should ideally be explaining "here's how to do the things you need to do"", so I fully expected these sorts of questions. 😬

One way to detect and mitigate configuration errors is to use a phased rollout/canary process, and instrument monitoring in your applications. You can always just perform your RPCs using the channel, and the RPCs will fail with UNAVAILABLE when there are connection problems, and the error message should indicate the reason for the connection failure. Any dispatched RPCs will block until there is a READY connection or until the initial connection attempt has errored. (You should also typically not use the confusingly-named WaitForReady call option, as this will make the RPC block even after connection errors occur, until the RPC's deadline is reached.)

@dfawley
Copy link
Member

dfawley commented May 16, 2023

cc @temawi

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Type: Documentation Documentation or examples
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants