Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: pre-allocate a slice of provable length #64595

Open
dsnet opened this issue Dec 7, 2023 · 10 comments
Open

cmd/compile: pre-allocate a slice of provable length #64595

dsnet opened this issue Dec 7, 2023 · 10 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@dsnet
Copy link
Member

dsnet commented Dec 7, 2023

Go version

go1.21.2

What did you do?

Compile the following snippet:

func transform(a []int) []uint {
	var b []uint
	for _, x := range a {
		b = append(b, uint(x))
	}
	return b
}

What did you expect to see?

I would expect it to compile to something like:

func transform(a []int) []uint {
	b := make([]uint, len(a), len(a) + ...) // where ... is some extra capacity to reproduce above behavior
	for i, x := range a {
		b[i] = uint(x)
	}
	return b
}

I expect to see the following:

  • An implicit call to makeslice of a given length and capacity. The elements within the length need not be explicitly zeroed (since it can be provably overwritten).
  • The append call in the loop be essentially rewritten as an index operation, thus avoiding a call to runtime.growslice in every iteration of the loop.

The conversation of int to uint is for demonstration, but the above optimization should be doable for any conversion that does not panic.

What did you see instead?

Right now, this compiles more or less to what you see in the original code snippet.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Dec 7, 2023
@randall77 randall77 added this to the Unplanned milestone Dec 7, 2023
@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Dec 7, 2023
@cagedmantis
Copy link
Contributor

cc @golang/compiler

@Jorropo
Copy link
Member

Jorropo commented Dec 8, 2023

I implemented this some time ago and this was less interesting that it looks like.

If your loop has return statements or branches where the slices is not used (like panics) it is not obvious you want to preallocate the slice.
For example:

func transform(a []int, f(int) uint) (r []uint) {
	for _, x := range a {
		r = append(r, f(x))
	}
	return
}

Here f might panic and in this case you could allocate more than you would by incrementally growing.

Maybe you say, panics are cold let's optimize for cases where they don't happen.
Ok what about:

func transform(a []int, f(int) (uint, error)) (r []uint, _ error) {
	for _, x := range a {
		v, err := f(x)
		if err != nil { return nil, err }
		r = append(r, )
	}
	return
}

Do you also want to say that returning non nil error should be considered cold by the compiler ?

Given theses conservative considerations this matched few cases,
the speed up weren't even that great compared to the heavy code complexity in the compiler.

@Jorropo
Copy link
Member

Jorropo commented Dec 8, 2023

There is also a linter which helps with this https://github.com/alexkohler/prealloc, it's not good enough to do machine fixes, but it's way good enough for primates look at it's output and fix interesting cases.
Primates easily have the missing context of which panic, error and other branches which are worth disregarding and worth optimizing.

@AlexanderYastrebov
Copy link
Contributor

There is also a linter which helps with this https://github.com/alexkohler/prealloc

Thanks for the link. I've checked our code base and most common matches are those that transform one slice into another.

Should we consider adding to slices something like:

func AppendFunc[T ~[]R, E, R any](t T, mapper func(element E) R, s ...E) T {
	newslice := Grow(t, len(s))
	for _, v := range s {
		newslice = append(newslice, mapper(v))
	}
	return newslice
}

func MapFunc[R any, T []R, E any](mapper func(element E) R, s ...E) T {
	return AppendFunc([]R(nil), mapper, s...)
}

to cover trivial but common cases?

@randall77
Copy link
Contributor

I believe that use case is covered by upcoming iterator proposals, in particular iter.Map from #61898 and slices.Append from #61899.

@AlexanderYastrebov
Copy link
Contributor

@randall77 Thank you. Those revolve around iter.Seq which does not provide size so they will not be able to pre-allocate result afaict.

@randall77
Copy link
Contributor

True. The size could be passed somehow using an unexported method, although I don't immediately see an obvious way to do that (we could return a named iter.Seq with its own methods, but because iter.Seq must be a function type, maybe there's no place to store the length to return? Except hackily in the closure somehow).

@dsnet
Copy link
Member Author

dsnet commented Dec 12, 2023

There is also a linter which helps

The issue at hand is that I generally don't want to think about it when writing code.
On the coding side of this, I want to focus on the business logic, rather than optimizations.
On the optimization side, writing the loop naturally with append allows PGO to make a decision whether preallocating is worth it if it realizes the loop is continually re-allocating in practice.

@prattmic
Copy link
Member

Several of us have recently been brainstorming online slice/map pre-sizing (#64703). To a degree, this is a special case of that, where the ideal [1] size can be determined statically.

[1] The exception is that transform returns b. If the caller continues to append to b, then the ideal initial capacity is even higher. Online tracking could capture that.

@adonovan
Copy link
Member

adonovan commented Dec 22, 2023

I wonder how often this optimization makes things slower: If a is a very large slice of empty slices, then the new algorithm makes two passes over it, causing twice as many cache misses. Rare, I suspect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

8 participants