cmd/gc: optimize switch statements with string literal cases #10000

jonhoo · 2015-02-25T18:31:14Z

Consider the following benchmark:

package main

import (
    "fmt"
    "testing"
)

var words = []string{"super", "califragi", "listic", "expiali", "docius"}
var wlen = len(words)

func BenchmarkSwitch(b *testing.B) {
    m := 0
    for i := 0; i < b.N; i++ {
        switch words[i%wlen] {
        case "super":
            m++
        case "califragi":
            m++
        case "listic":
            m++
        case "expiali":
            m++
        case "docius":
            m++
        }
    }
    fmt.Println(m)
}

func BenchmarkIf(b *testing.B) {
    m := 0
    for i := 0; i < b.N; i++ {
        w := words[i%wlen]
        if w == "super" {
            m++
        } else if w == "califragi" {
            m++
        } else if w == "listic" {
            m++
        } else if w == "expiali" {
            m++
        } else if w == "docius" {
            m++
        }
    }
    fmt.Println(m)
}

On my machine, this gives

$ go test -bench . -benchtime 30s
BenchmarkSwitch
2000000000          29.2 ns/op
BenchmarkIf
2000000000          22.9 ns/op
ok      x   109.375s

Despite the switch statement having more opportunity for optimization, it is consistently slower than the if variant. After a brief discussion on golang-nuts, it was pointed out that Go will move to a binary search for switch statements with more than three constant cases. Firstly, this threshold seems too low, as runtime.eqstring (because of length checks) is often an order of magnitude faster than runtime.cmpstring. Second, other optimizations can probably also be implemented here by the compiler. @josharian mentions a few in the golang-nuts thread:

I think that there can be significant improvement here, e.g. using diverge indices and lengths for the binary search followed by a final call to eqstring.

The text was updated successfully, but these errors were encountered:

josharian · 2015-02-26T00:41:32Z

To expand on my last comment:

I think that we should first sort (and binary search) by string length. It is readily available in the string header, registerizable, and cheap to compare.

If we wanted to optimize further, instead of sorting (and searching) strings of the same length using cmpstring, we could instead further sort (and binary search) by the indices at which the strings vary--effectively doing something like walking a trie.

I think that using the string length is worth doing right way. It should be pretty simple and a quick win. The fancier approach I'm less sure about, at least initially, since cmpstring is written in pretty heavily optimized assembly.

josharian · 2015-03-02T22:02:31Z

Started, but waiting for switch code cleanup before mailing to avoid churn.

josharian · 2015-03-17T23:04:36Z

The benchmark here has some quirks:

The % operator is a significant portion of the execution time of the benchmark.
It is averaging the time taken by different paths into the switch statement, which hides variability by index.
By using string literals throughout, eqstring gets to short-circuit by doing a pointer equality check rather than having to actually compare bytes. The effect is not significant here, because the strings are short, but it is present. (This suggests an optimization opportunity for cmpstring. I will investigate separately whether doing that is worthwhile.)

Here is a set of benchmarks that provides more visibility into exactly what is going on:

package main

import "testing"

var words = [...]string{"super", "califragi", "listic", "expiali", "docius"}

func benchmarkSwitch(b *testing.B, s string) {
    m := 0
    for i := 0; i < b.N; i++ {
        switch s {
        case "super":
            m++
        case "califragi":
            m++
        case "listic":
            m++
        case "expiali":
            m++
        case "docius":
            m++
        }
    }
}

func BenchmarkSwitch0(b *testing.B) { benchmarkSwitch(b, words[0]) }
func BenchmarkSwitch1(b *testing.B) { benchmarkSwitch(b, words[1]) }
func BenchmarkSwitch2(b *testing.B) { benchmarkSwitch(b, words[2]) }
func BenchmarkSwitch3(b *testing.B) { benchmarkSwitch(b, words[3]) }
func BenchmarkSwitch4(b *testing.B) { benchmarkSwitch(b, words[4]) }

func BenchmarkSwitchNewStr0(b *testing.B) { benchmarkSwitch(b, string([]byte(words[0]))) }
func BenchmarkSwitchNewStr1(b *testing.B) { benchmarkSwitch(b, string([]byte(words[1]))) }
func BenchmarkSwitchNewStr2(b *testing.B) { benchmarkSwitch(b, string([]byte(words[2]))) }
func BenchmarkSwitchNewStr3(b *testing.B) { benchmarkSwitch(b, string([]byte(words[3]))) }
func BenchmarkSwitchNewStr4(b *testing.B) { benchmarkSwitch(b, string([]byte(words[4]))) }

func benchmarkIf(b *testing.B, s string) {
    m := 0
    for i := 0; i < b.N; i++ {
        if s == "super" {
            m++
        } else if s == "califragi" {
            m++
        } else if s == "listic" {
            m++
        } else if s == "expiali" {
            m++
        } else if s == "docius" {
            m++
        }
    }
}

func BenchmarkIf0(b *testing.B) { benchmarkIf(b, words[0]) }
func BenchmarkIf1(b *testing.B) { benchmarkIf(b, words[1]) }
func BenchmarkIf2(b *testing.B) { benchmarkIf(b, words[2]) }
func BenchmarkIf3(b *testing.B) { benchmarkIf(b, words[3]) }
func BenchmarkIf4(b *testing.B) { benchmarkIf(b, words[4]) }

func BenchmarkIfNewStr0(b *testing.B) { benchmarkIf(b, string([]byte(words[0]))) }
func BenchmarkIfNewStr1(b *testing.B) { benchmarkIf(b, string([]byte(words[1]))) }
func BenchmarkIfNewStr2(b *testing.B) { benchmarkIf(b, string([]byte(words[2]))) }
func BenchmarkIfNewStr3(b *testing.B) { benchmarkIf(b, string([]byte(words[3]))) }
func BenchmarkIfNewStr4(b *testing.B) { benchmarkIf(b, string([]byte(words[4]))) }

Here's the result of running these benchmarks:

benchmark                  old ns/op

BenchmarkIf0               3.36
BenchmarkIf1               4.45
BenchmarkIf2               5.22
BenchmarkIf3               5.56
BenchmarkIf4               10.5

BenchmarkIfNewStr0         5.26
BenchmarkIfNewStr1         7.19
BenchmarkIfNewStr2         7.23
BenchmarkIfNewStr3         7.47
BenchmarkIfNewStr4         12.4

BenchmarkSwitch0           9.56
BenchmarkSwitch1           8.73
BenchmarkSwitch2           9.38
BenchmarkSwitch3           8.66
BenchmarkSwitch4           7.99

BenchmarkSwitchNewStr0     11.3
BenchmarkSwitchNewStr1     11.1
BenchmarkSwitchNewStr2     11.0
BenchmarkSwitchNewStr3     10.3
BenchmarkSwitchNewStr4     11.0

josharian · 2015-03-17T23:17:54Z

Mailed golang.org/cl/7698

josharian self-assigned this Feb 26, 2015

josharian mentioned this issue Feb 26, 2015

runtime: add asm version of cmpstring and bytes·Compare for ppc64 #10007

Closed

bradfitz added the optimization label Feb 26, 2015

josharian changed the title ~~runtime: optimizing switch statements with constant cases~~ cmd/gc: optimize switch statements with string literal cases Feb 26, 2015

josharian closed this as completed in 4bc9bad Mar 18, 2015

golang locked and limited conversation to collaborators Jun 25, 2016

gopherbot added the FrozenDueToAge label Jun 25, 2016

rsc unassigned josharian Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/gc: optimize switch statements with string literal cases #10000

cmd/gc: optimize switch statements with string literal cases #10000

jonhoo commented Feb 25, 2015

josharian commented Feb 26, 2015

josharian commented Mar 2, 2015

josharian commented Mar 17, 2015

josharian commented Mar 17, 2015

cmd/gc: optimize switch statements with string literal cases #10000

cmd/gc: optimize switch statements with string literal cases #10000

Comments

jonhoo commented Feb 25, 2015

josharian commented Feb 26, 2015

josharian commented Mar 2, 2015

josharian commented Mar 17, 2015

josharian commented Mar 17, 2015