Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: testing/cmp: add new package #45200

Closed
dsnet opened this issue Mar 24, 2021 · 32 comments
Closed

proposal: testing/cmp: add new package #45200

dsnet opened this issue Mar 24, 2021 · 32 comments

Comments

@dsnet
Copy link
Member

dsnet commented Mar 24, 2021

TL;DR, I propose adding github.com/google/go-cmp/cmp to the standard library as testing/cmp.

Determining equality of two values is one of the most common operations needed for a unit test where the test wants to know whether some computed value matches some pre-determined expected value.

Using the Go language and standard library alone, there are two primary options:

For simple cases, these work fine, but are insufficient for more complex cases:

  • The == operator only works on comparable types, which means that it doesn't work for Go slices and maps, which are two common kinds that tests want to compare.
  • The reflect.DeepEqual function is a "recursive relaxation of Go's == operator", but has several short-comings:
    • It provides no ability to customize the comparison, which is increasingly more useful for more complex types.
    • It does not explain why two values are different, which becomes increasingly more helpful for larger values.
    • It blindly compares unexported fields, which causes a test to unfortunately have an implicit dependency on unexported artifacts of types from a module's dependencies (and therefore on undefined behavior).

For many users, the standard library is insufficient, so they turn to third-party packages to perform equality. According to the module proxy, the most common comparison module used is github.com/google/go-cmp with 7264 module dependents (and 25th most imported module). I propose including cmp in the standard library itself.

Why include cmp in the standard library?

The most widely used comparison function in Go is currently reflect.DeepEqual (with ~34k usages of reflect.DeepEqual compared to ~6k usages of cmp.Equal). Inclusion of cmp to the standard library would provide better visibility to it and allow more tests to be written that would have been better off using cmp.Equal rather than reflect.DeepEqual.

A problem with reflect.DeepEqual is that it hampers module authors from making changes to their types that otherwise should be safe to perform. Since reflect.DeepEqual blindly compares unexported fields, it causes a test to have an implicit dependency on the value of unexported fields in types that come from a module's dependency. When the author of one those types changes an unexported field, it surprisingly breaks many brittle tests. Examples of this occurring was when Go1.9 added monotonic timestamp support and the change to the internal representation of time.Time broke hundreds of test at Google. Similar problems occurred with adding/modifying unexported fields to protocol buffer messages. reflect.DeepEqual is a comparison function that looks like it works wells, but may cause the test to be brittle. Furthermore, reflect.DeepEqual does not tell you why two values are different, making it even more challenging for users to diagnose such brittle tests.

As a contributor to the standard library, there are a number of times that I would have liked to use cmp when writing tests in the standard library itself.

How is cmp.Equal similar or different than reflect.DeepEqual?

cmp.Equal is designed to be more expressive than reflect.DeepEqual. It accepts any number of user-specified options that may affect how the comparison is performed. While reflect.DeepEqual is more performant, cmp.Equal is more flexible. Also, cmp.Diff provides a humanly readable report for why two values differ, rather than simply providing a boolean result for whether they differ.

One significant difference from reflect.DeepEqual is that cmp.Equal panics by default when trying to compare unexported fields unless the user explicitly permits it with an cmp.Exporter option. This design decision was to avoid the problems observed with using reflect.DeepEqual at scale mentioned earlier.

Without any options specified, cmp.Equal is identical to reflect.DeepEqual except:

  • cmp.Equal panics when trying to comparing unexported fields
  • cmp.Equal uses a type's Equal method if it has one
  • cmp.Equal has an arguably more correct comparison for graphs
    • reflect.DeepEqual's cycle detection primarily aims to avoid infinite recursion, but does not necessarily verify whether the two graphs have the same topology, while cmp.Equal checks that the graph topology are the same.

(Fun fact) Package reflect is the 15th most imported Go package, but ~77% of the imports (for test files) only do so to use DeepEqual. In 2011, Rob Pike explained that "[reflection] is a powerful tool that should be used with care and avoided unless strictly necessary." I find it ironic that most usages of reflect is to access a function that is arguably not even about Go reflection (i.e., it doesn't provide functionality that mirrors the Go language itself).

How does cmp compare to other comparison packages?

The only other comparison module (within the top 250 most widely used modules) is github.com/go-test/deep with 845 dependents (compared to 7264 dependents for cmp). Package deep is not as flexible as cmp and relies on globals to configure how comparisons are performed.

@gopherbot gopherbot added this to the Proposal milestone Mar 24, 2021
@mvdan
Copy link
Member

mvdan commented Mar 24, 2021

I wonder if the package path testing/cmp is appropriate. It's true that many uses will be in tests, but many others will not. And there's nothing about the API that ties it to testing or test packages. Perhaps reflect/cmp?

I generally agree with this proposal; the testing package has been borrowing bits and bops from https://github.com/frankban/quicktest recently such as its Mkdir and Setenv, and I'd love for us to incorporate more of its good ideas. The elephant in the room is its powerful and easy to use comparisons, powered by go-cmp.

Also, have you thought about size? In terms of API and packages (there's also cmpopts), but also when weighing the size of Go in its source and binary forms. A more powerful reflect.DeepEqual doesn't sound particularly heavy at first glance, but the repository contains 50 files containing ~400KiB in total.

@mvdan
Copy link
Member

mvdan commented Mar 24, 2021

Also cc @rogpeppe @frankban @myitcv

@rogpeppe
Copy link
Contributor

As a big fan of the cmp package, this seems like a good idea to me. The API is stable and well tested.

One thought: with generics only a year or so away, I wonder if we should hold on until we can make the signatures generic:

func Diff[T any](x, y T, opts ...Option) string
func Equal[T any](x, y T, opts ...Option) bool

Although the implementation would still be using reflection under the hood, using a type parameter
would make it clear that we're expecting to compare values of the same type, and avoid common mistakes
where values of incompatible types are compared, only for the
comparison to fail at runtime.

The current behaviour would still be available by using Equal[interface{}] or Diff[interface{}].

I'd also support folding the cmpopts package into the root cmp package (I always have difficulty remembering which functionality is in which place).

@dsnet
Copy link
Member Author

dsnet commented Mar 24, 2021

I wonder if the package path testing/cmp is appropriate. It's true that many uses will be in tests, but many others will not. And there's nothing about the API that ties it to testing or test packages. Perhaps reflect/cmp?

When cmp was first created, making it a general purpose package was one of the sub-goals. However, since cmp is predominantly used for testing purposes, it chose to use panics for situations where two types just could not be compared. The package's propensity to panicking (and it's relatively slow speed compared to reflect.DeepEqual) makes it unsuitable for non-testing usages.

@dsnet
Copy link
Member Author

dsnet commented Mar 24, 2021

Also, have you thought about size? In terms of API and packages (there's also cmpopts), but also when weighing the size of Go in its source and binary forms. A more powerful reflect.DeepEqual doesn't sound particularly heavy at first glance, but the repository contains 50 files containing ~400KiB in total.

Nope. Thanks for raising up the thought. There's some files and functionality that can be removed I believe:

@dsnet
Copy link
Member Author

dsnet commented Mar 24, 2021

One thought: with generics only a year or so away, I wonder if we should hold on until we can make the signatures generic:

Interesting thought. I should also note that generics would help with:

I'd also support folding the cmpopts package into the root cmp package (I always have difficulty remembering which functionality is in which place).

Yea, that's been one of my longer term goals. The main reason I haven't done so yet is because adding them in causes the godoc page to be less readable, which is another motivating reason for #44447.

@josharian
Copy link
Contributor

I'd be happy for github.com/pkg/diff to morph into a useful diff package for std. I'd rather it be in golang.org/x, so that it can be used by all and sundry. (I'm frustrated by various gems locked up in std internal.) I cannot dedicate much time to it in the near future, but would be happy to be involved in API design (and contribute the existing code as a starting point). For that, it's probably time for someone (not me) to file a proposal where we can start discussing what we want out of it.

@rsc
Copy link
Contributor

rsc commented Mar 24, 2021

@josharian I've been doing various diff investigations off and on for a few years (sic) and had some time last week to hook a bunch of stuff together. I have rather a lot of thoughts, and an implementation of the the O(N)-space Myers algorithm, among others. I completely agree that we should have a standard diff package separate from testing/cmp. I need to take care of a few higher priority things but I intend to write more next month. I've retitled #23113 to make that clear.

@rsc
Copy link
Contributor

rsc commented Mar 24, 2021

@dsnet My main concern with adding testing/cmp is that it is a very large amount of code and API, especially compared to package testing itself. Just the sheer number of internal packages you listed in #45200 (comment) scares me.

Is it possible to take this opportunity to simplify at all?

@josharian
Copy link
Contributor

@rsc works for me. I've been thinking about diffs on and off too. :) I look forward to chatting about it when you are ready.

@dsnet
Copy link
Member Author

dsnet commented Mar 25, 2021

Is it possible to take this opportunity to simplify at all?

@rsc, my main point from #45200 (comment) was that most of those internal packages can be eliminated since the needed functionality has either been added to the standard library already (e.g., reflect.Value.IsZero) or already exists as an internal package (e.g., internal/fmtsort). The most significant internal package is internal/diff and ideally cmp does not need to implement its own but make use of another general purpose diffing package. I'll continue the discussion on #23113 on how to generalize the diffing algorithm there to be suitable for cmp's needs.

@dsnet
Copy link
Member Author

dsnet commented Mar 25, 2021

My main concern with adding testing/cmp is that it is a very large amount of code

@rsc. If it's any consolation, the logic that's concerned with the semantics of whether two values are equal is around ~900 LOC. The remaining complexity comes from the logic to pretty print the difference, which accounts for ~1800 LOC. Most of that complexity is because the the reporter is a relatively large set of heuristics for how to present the difference in a way that is easiest for humans to interpret. The reporter logic was fine tuned over the past years based on user feedback and has been fairly stable for the past year or so.

@rogpeppe
Copy link
Contributor

I'd also like to mention another possibility: the cmp package has excellent functionality for deep-printing
values, but that's buried in the code and only accessible via cmp.Diff. Printing a value recursively
is often of great value, but neither of the widely used implementations I'm aware of (spew and pretty)
are great, because it's common to need some customisation to make the output readable (for example
by omitting some fields or truncating certain byte slices).

I wonder if it might be a good idea to consider including some deep-printing functionality in the standard library.
The style of the cmp API potentially sets a nice precedent for what a deep printing API might look like.

It would probably be best if it was testing-oriented: like cmp, such a package would probably
need to use unsafe to allow it to call methods on values from unexported fields.

Aside: like @dsnet, I also think that cmp would most happily live at testing/cmp, partly for the above reason.

@komuw
Copy link
Contributor

komuw commented Mar 25, 2021

but neither of the widely used implementations I'm aware of (spew and pretty)
are great, because it's common to need some customisation to make the output readable (for example
by omitting some fields or truncating certain byte slices).

@rogpeppe I believe sanity-io/litter can do that, if I'm not mistaken.

I completely agree that we should have a standard diff package separate from testing/cmp

@rsc There's also; #41980 which I think would be related.

@smasher164
Copy link
Member

smasher164 commented Mar 28, 2021

I wonder if it might be a good idea to consider including some deep-printing functionality in the standard library.
The style of the cmp API potentially sets a nice precedent for what a deep printing API might look like.

I brought this up on the mailing list at one point: https://groups.google.com/g/golang-nuts/c/Tn0QeDv6fU8/m/ukcrSF6BCwAJ

Deep printing was seen as requiring possibly too much customization, leading to a complex API. And a general mechanism to traverse data structures could live outside the standard library.

Edit: related #28141

myitcv added a commit to rogpeppe/go-internal that referenced this issue Mar 29, 2021
The current version of github.com/pkg/diff.Diff has quadratic behaviour.
This means that when we attempt a diff between relatively modest sized
files, it's easy to find yourself out of memory.

Therefore, when we see a diff between two large files (large defined in
terms of number of lines), tersely report that as if the two were binary
files, i.e. do not try to render the diff.

When github.com/pkg/diff or similar supports a linear algorithm:

golang/go#45200 (comment)

we can revert this change.
myitcv added a commit to rogpeppe/go-internal that referenced this issue Mar 29, 2021
The current version of github.com/pkg/diff.Diff has quadratic behaviour.
This means that when we attempt a diff between relatively modest sized
files, it's easy to find yourself out of memory.

Therefore, when we see a diff between two large files (large defined in
terms of number of lines), tersely report that as if the two were binary
files, i.e. do not try to render the diff.

When github.com/pkg/diff or similar supports a linear algorithm:

golang/go#45200 (comment)

we can revert this change.
myitcv added a commit to rogpeppe/go-internal that referenced this issue Mar 29, 2021
)

The current version of github.com/pkg/diff.Diff has quadratic behaviour.
This means that when we attempt a diff between relatively modest sized
files, it's easy to find yourself out of memory.

Therefore, when we see a diff between two large files (large defined in
terms of number of lines), tersely report that as if the two were binary
files, i.e. do not try to render the diff.

When github.com/pkg/diff or similar supports a linear algorithm:

golang/go#45200 (comment)

we can revert this change.
@rsc
Copy link
Contributor

rsc commented Apr 1, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc changed the title proposal: add testing/cmp proposal: testing/cmp: add new package Apr 1, 2021
@peterbourgon
Copy link

peterbourgon commented Apr 12, 2021

I use cmp in many of my projects, and would love to see its capabilities become part of the stdlib. But the current package API is unnecessarily complex and esoteric; IMO it would need to be substantially simplified (read: reduced) before it could be realistically considered for inclusion.

@rogpeppe
Copy link
Contributor

But the current package API is unnecessarily complex and esoteric; IMO it would need to be substantially simplified (read: reduced) before it could be realistically considered for inclusion.

I'm interested in what aspects of the API you see as possibilities for simplification. Personally I've see it as an example of a nice composable API and can't see how it could be simplified much without removing a lot of its usefulness.

As I said above, generics could make the API more obvious. Another example, Transformer could have a generic signature rather than using interface{}:

func Transformer[T, R any](name string, f func(T) R) Option

@dsnet
Copy link
Member Author

dsnet commented Apr 12, 2021

I've found the diffing capabilities lacking, especially for multiline strings

@leighmcculloch, what version of cmp are you using? Higher fidelity printing of multiline strings has been added since v0.5.0. See this example output.

@Merovius
Copy link
Contributor

Does that customization need to be performed by code that lives in the stdlib?

FWIW, as it currently stands, the Option interface is opaque. So any 3rd party implementations must be implemented based on the Options that are already there.

Personally, I think this is a bit unfortunate, but I'm not sure how to fix it. In any case, one obvious answer to "the API surface is too big" would be to make it possible/easy to factor out the lesser used options into x/ or 3rd party packages, as @peterbourgon suggests. I think for that, we'd either need to make Option a functionally complete interface that allows 3rd party implementations, or find a minimum orthogonal set of Options that can serve as a basis (the existing ones are obviously a basis, but they don't seem minimal - especially if you include cmpopts).

Personally I'm not convinced cmp should be in the stdlib either. IMO it works just fine as a 3rd party module and I generally agree that the API surface it provides seems a bit unwieldy. I think it might live well in x/ and I think something like it, but a lot smaller could fit into the stdlib. At that point, we're not really talking about "putting go-cmp into the stdlib", but we are talking about "provide a package for better generic comparisons, inspired by go-cmp" and I think we'd need to actually see how that would look, before talking about if it "works" or not.

@Merovius
Copy link
Contributor

Oh, also, FWIW: When I used go-cmp in the past, but needed more than cmpopts.SortSlices or something of that sort, I found it easier to implement the comparison function directly and use that, instead of trying to fit it into cmp.Equal, which tended to cause a bunch of confusing dynamic type-errors for me. cmp.Diff provides more bang for your buck (as implementing diff is harder), of course. But I found it telling that the API seemingly makes some things more difficult to do with the package, than without.

@seebs
Copy link
Contributor

seebs commented Apr 20, 2021

In addition to the other issues, I'd note: I frequently want to compare slices such that i consider a nil slice to be equivalent to a non-nil slice of len 0, and reflect.DeepEqual has bitten me on that several times, and worse, done so such that a naive printf of the slices (using %d if they're []int) doesn't show any difference...

Writing a function that compares two things and tells me whether or how they differ has usually been a great return on time spent in improving test quality and outputs, but I've written that function for []int far more often than I should.

@dsnet
Copy link
Member Author

dsnet commented Apr 20, 2021

When I used go-cmp in the past, but needed more than cmpopts.SortSlices or something of that sort, I found it easier to implement the comparison function directly

@Merovius. Yea, while cmpopts.SortSlices allows the ability to sort any slice type, I agree that it's not always easy to use. In the case of a simple type (e.g., int), it's unfortunate needing to provide an explicit less function when it's trivial. In the case of a more complex type, the comparison can get complicated and may not be aware of any ignored fields. There's been some discussion about having a generic less function, but there are some non-trivial problems to figure out. In most cases, the user doesn't care about the order, but wants to functionally treat a slice as a multi-set.

#45200 (comment)

@seebs, it's not clear to me whether your comment is in support or opposition towards the proposal. Since cmp.Equal has it's heritage in reflect.DeepEqual it treats []T(nil) and []T{} as inequal. That said, cmp.Diff does clearly show when the difference is due to nil-ness. Also, the cmpopts.EquateEmpty option can be used to treat such cases as equal.

@myitcv
Copy link
Member

myitcv commented Nov 23, 2021

@rsc any update on this since #45200 (comment)? Asking in order to avoid duplication of effort with respect to pkg/diff#26. Thanks

@rsc
Copy link
Contributor

rsc commented Apr 19, 2023

This proposal issue has been lingering, in large part because we don't really know how to move forward on it.

go-cmp is widely used but also very complex. I agree with the comments above that if we were to move forward with something in the standard library, we'd want a significantly trimmed down version. I have read the go-cmp package docs a few times over the past few years and each time, I'm left feeling like I don't fully understand what the package does. Obviously the basic functionality is easy to understand, but all the complications are not. And then you also have to read the cmpopts package. At first glance it's not even obvious that cmpopts can be written in terms of cmp. I initially thought that cmpopts was using some internal tricks to return cmp.Options that cmp knew about but weren't exposed in cmp's API itself. Eventually I figured out that cmpopts is entirely layered on top of cmp.Transformer, cmp.FilterValues, and cmp.Comparer, but it's weird that you have to reach for a different package to get the "simple" helpers. Of course, if it was all one package that's even more API to digest.

I believe I met with Joe or Damien or both at one point long ago to discuss potential ways to make cmp fit better into the standard library, but my memory is that there were interface reasons that make it essentially impossible to change any details of cmp without breaking existing uses. I forget the exact details, and maybe I am misremembering.

There may well be some reduced form of google/go-cmp that should become testing/cmp. I don't think an unmodified google/go-cmp is that form. And I don't see any clear path forward as far as what exactly to remove.

It seems like maybe we should move this proposal toward a decline, which would give clarity to the issue and perhaps open a space for other proposals of simpler APIs that might better fit the standard library. Of course, google/go-cmp will remain for anyone who wants it, same as always.

@rsc rsc moved this from Active to Likely Decline in Proposals May 3, 2023
@rsc
Copy link
Contributor

rsc commented May 3, 2023

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented May 10, 2023

No change in consensus, so declined.
— rsc for the proposal review group

@rsc rsc closed this as completed May 10, 2023
@golang golang locked and limited conversation to collaborators May 9, 2024
@rsc rsc removed this from Proposals May 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests