Bad Go: pointer returns
Mmmm, pointy
As an old C programmer I struggle with this one: it feels completely normal for functions to return pointers to structs. But I’ve a feeling this is bad Go, and that we’re normally better off returning struct values. I’m going to see if I can prove that returning struct values is just plain better, and that returning pointers is bad Go.
I’m going to define a struct that I can vary in size easily. The contents of the struct is an array: I can change the size of the struct simply by changing the size of the array.
const bigStructSize = 10
type bigStruct struct {
a [bigStructSize]int
}
Next I’ll create a couple of routines to build a new version of this struct. One will return it as a pointer, the other as a value.
func newBigStruct() bigStruct {
var b bigStruct
for i := 0; i < bigStructSize; i++ {
b.a[i] = i
}
return b
}
func newBigStructPtr() *bigStruct {
var b bigStruct
for i := 0; i < bigStructSize; i++ {
b.a[i] = i
}
return &b
}
Finally I’ll write a couple of benchmarks to measure how long it takes to get and use these structs. I’m going to do a simple calculation on the values in the struct so the compiler doesn’t just optimise the whole thing away.
func BenchmarkStructReturnValue(b *testing.B) {
b.ReportAllocs()
t := 0
for i := 0; i < b.N; i++ {
v := newBigStruct()
t += v.a[0]
}
}
func BenchmarkStructReturnPointer(b *testing.B) {
b.ReportAllocs()
t := 0
for i := 0; i < b.N; i++ {
v := newBigStructPtr()
t += v.a[0]
}
}
With bigStructSize set to 10 returning by value is about twice as fast as returning the pointer. In the pointer case the memory has to be allocated on the heap, which will take about 25ns, then the data is set up (which should take about the same time in both cases), then the pointer is written to the stack to return the struct to the caller. In the value case there’s no allocation, but the whole struct has to be copied onto the stack to return it to the caller.
At this size of struct the overhead of copying the data on the stack is less than the overhead of allocating the memory.
BenchmarkStructReturnValue-8 100000000 15.4 ns/op 0 B/op 0 allocs/op
BenchmarkStructReturnPointer-8 50000000 36.5 ns/op 80 B/op 1 allocs/op
When we chage bigStructSize to 100, so the struct now contains 100 ints, the gap in absolute terms increases - although the percentage increase for the pointer case is less.
BenchmarkStructReturnValue-8 20000000 105 ns/op 0 B/op 0 allocs/op
BenchmarkStructReturnPointer-8 10000000 185 ns/op 896 B/op 1 allocs/op
Surely if we try 1000 ints in the struct then returning the pointer will be faster?
BenchmarkStructReturnValue-8 2000000 830 ns/op 0 B/op 0 allocs/op
BenchmarkStructReturnPointer-8 1000000 1401 ns/op 8192 B/op 1 allocs/op
Nope, still much worse. How about 10,000?
BenchmarkStructReturnValue-8 100000 13332 ns/op 0 B/op 0 allocs/op
BenchmarkStructReturnPointer-8 200000 11032 ns/op 81920 B/op 1 allocs/op
Finally, with 10,000 ints in our struct returning a pointer to the struct is faster. With some further investigation it seems like the tipping point for me on my laptop is 2700. At this point I’ve very little idea why there’s such a large difference at 1000 ints. Lets profile the benchmark!
go test -bench BenchmarkStructReturnValue -run ^$ -cpuprofile cpu2.prof
go tool pprof post.test cpu2.prof
(pprof) top
Showing nodes accounting for 2.25s, 100% of 2.25s total
flat flat% sum% cum cum%
2.09s 92.89% 92.89% 2.23s 99.11% github.com/philpearl/blog/content/post.newBigStruct
0.14s 6.22% 99.11% 0.14s 6.22% runtime.newstack
0.02s 0.89% 100% 0.02s 0.89% runtime.nanotime
0 0% 100% 2.23s 99.11% github.com/philpearl/blog/content/post.BenchmarkStructReturnValue
0 0% 100% 0.02s 0.89% runtime.mstart
0 0% 100% 0.02s 0.89% runtime.mstart1
0 0% 100% 0.02s 0.89% runtime.sysmon
0 0% 100% 2.23s 99.11% testing.(*B).launch
0 0% 100% 2.23s 99.11% testing.(*B).runN
In the value case nearly all the work is happening in newBigStruct. It’s all very straightforward. What if we profile the pointer test?
go test -bench BenchmarkStructReturnPointer -run ^$ -cpuprofile cpu.prof
go tool pprof post.test cpu.prof
(pprof) top
Showing nodes accounting for 2690ms, 93.08% of 2890ms total
Dropped 28 nodes (cum <= 14.45ms)
Showing top 10 nodes out of 67
flat flat% sum% cum cum%
1110ms 38.41% 38.41% 1110ms 38.41% runtime.pthread_cond_signal
790ms 27.34% 65.74% 790ms 27.34% runtime.pthread_cond_wait
300ms 10.38% 76.12% 300ms 10.38% runtime.usleep
200ms 6.92% 83.04% 200ms 6.92% runtime.pthread_cond_timedwait_relative_np
80ms 2.77% 85.81% 80ms 2.77% runtime.nanotime
60ms 2.08% 87.89% 140ms 4.84% runtime.sweepone
50ms 1.73% 89.62% 50ms 1.73% runtime.pthread_mutex_lock
40ms 1.38% 91.00% 150ms 5.19% github.com/philpearl/blog/content/post.newBigStructPtr
30ms 1.04% 92.04% 40ms 1.38% runtime.gcMarkDone
30ms 1.04% 93.08% 40ms 1.38% runtime.scanobject
In the newBigStructPtr case the picture is much more complex and there are many more functions that use significant CPU. Only ~5% of the time is spent in newBigStructPtr setting up the struct. Instead, there’s lots of time in the Go runtime dealing with threads and locks and garbage collection. The underlying function returning a pointer is fast, but the baggage that comes with allocating the pointer is a huge overhead.
Now this scenario is very simplistic. The data is created and then immediately thrown away, so there will be a huge burden on the garbage collector. If the lifetime of the returned data was longer the results could be very different. But perhaps this is an indication that returning pointers to structs that have a short lifetime is bad Go.