A version of bufio.Scanner
that works with lines of arbitrary length.
If you're getting a bufio.Scanner: token too long
error, this may be what you want.
If your code used to look like this:
import "bufio"
s := bufio.NewScanner(myIoReader)
for s.Scan() {
// Do work
}
You can now handle very long lines without errors by changing to:
import "github.com/turtlemonvh/altscanner"
s := altscanner.NewAltScanner(myIoReader)
for s.Scan() {
// Do work
}
- Only breaks on newlines.
- Just appends bytes to a byte slice instead of using a real buffer.
If you have a good idea about the size of your data and are running go>1.6 (where the Scanner.Buffer
method was introduced), you probably just want to change the size of the buffer used by the scanner. For example:
// Create a scanner and resize its buffer to be 10X larger than usual (640 Kb instead of 64 Kb)
scanner := bufio.NewScanner(file)
scanner.Buffer(make([]byte, bufio.MaxScanTokenSize), bufio.MaxScanTokenSize*10)
However, if you need to be compatible with go<1.6 or you really have no idea about the size of your data, this approach works pretty well.
It is robust, but not very fast. The benchmark results below show the performance of reading in 5 lines of content. The lines used in the tests are either 30 bytes (short) or 300K bytes (long).
$ go test -test.bench=Scanner -test.run=^$ -test.benchmem
BenchmarkBufioScannerSmall-8 1000000 1061 ns/op 4128 B/op 2 allocs/op
BenchmarkBufferedBufioScannerSmall-8 1000000 1059 ns/op 4128 B/op 2 allocs/op
BenchmarkAltScannerSmall-8 1000000 1779 ns/op 5824 B/op 8 allocs/op
BenchmarkBufferedBufioScannerLong-8 50000 28077 ns/op 127008 B/op 6 allocs/op
BenchmarkAltScannerLong-8 2000 1142195 ns/op 7032704 B/op 78 allocs/op
PASS
ok github.com/turtlemonvh/altscanner 13.458s
AltScanner
is significantly slower, has many more allocations, and uses significantly more bytes per operation than the buffer bufio.Scanner
. In short: it is always faster to use Scanner.Buffer
to adjust the size of the buffer if you are using go1.6+ and you are confident about the max possible size of an line.
MIT