Bad Go: pointer returns

Mmmm, pointy

As an old C programmer I struggle with this one: it feels completely normal for functions to return pointers to structs. But I’ve a feeling this is bad Go, and that we’re normally better off returning struct values. I’m going to see if I can prove that returning struct values is just plain better, and that returning pointers is bad Go.

I’m going to define a struct that I can vary in size easily. The contents of the struct is an array: I can change the size of the struct simply by changing the size of the array.

const bigStructSize = 10

type bigStruct struct {
	a [bigStructSize]int
}

Next I’ll create a couple of routines to build a new version of this struct. One will return it as a pointer, the other as a value.

func newBigStruct() bigStruct {
   var b bigStruct
   for i := 0; i < bigStructSize; i++ {
   	b.a[i] = i
   }
   return b
}

func newBigStructPtr() *bigStruct {
   var b bigStruct
   for i := 0; i < bigStructSize; i++ {
   	b.a[i] = i
   }
   return &b
}

Finally I’ll write a couple of benchmarks to measure how long it takes to get and use these structs. I’m going to do a simple calculation on the values in the struct so the compiler doesn’t just optimise the whole thing away.

func BenchmarkStructReturnValue(b *testing.B) {
	b.ReportAllocs()

	t := 0
	for i := 0; i < b.N; i++ {
		v := newBigStruct()
		t += v.a[0]
	}
}

func BenchmarkStructReturnPointer(b *testing.B) {
	b.ReportAllocs()

	t := 0
	for i := 0; i < b.N; i++ {
		v := newBigStructPtr()
		t += v.a[0]
	}
}

With bigStructSize set to 10 returning by value is about twice as fast as returning the pointer. In the pointer case the memory has to be allocated on the heap, which will take about 25ns, then the data is set up (which should take about the same time in both cases), then the pointer is written to the stack to return the struct to the caller. In the value case there’s no allocation, but the whole struct has to be copied onto the stack to return it to the caller.

At this size of struct the overhead of copying the data on the stack is less than the overhead of allocating the memory.

BenchmarkStructReturnValue-8  	100000000	15.4 ns/op	 0 B/op	0 allocs/op
BenchmarkStructReturnPointer-8	50000000	36.5 ns/op	80 B/op	1 allocs/op

When we chage bigStructSize to 100, so the struct now contains 100 ints, the gap in absolute terms increases - although the percentage increase for the pointer case is less.

BenchmarkStructReturnValue-8  	20000000	105 ns/op	  0 B/op	0 allocs/op
BenchmarkStructReturnPointer-8	10000000	185 ns/op	896 B/op	1 allocs/op

Surely if we try 1000 ints in the struct then returning the pointer will be faster?

BenchmarkStructReturnValue-8  	2000000	 830 ns/op	   0 B/op	0 allocs/op
BenchmarkStructReturnPointer-8	1000000	1401 ns/op	8192 B/op	1 allocs/op

Nope, still much worse. How about 10,000?

BenchmarkStructReturnValue-8  	100000	13332 ns/op	    0 B/op	0 allocs/op
BenchmarkStructReturnPointer-8	200000	11032 ns/op	81920 B/op	1 allocs/op

Finally, with 10,000 ints in our struct returning a pointer to the struct is faster. With some further investigation it seems like the tipping point for me on my laptop is 2700. At this point I’ve very little idea why there’s such a large difference at 1000 ints. Lets profile the benchmark!

go test -bench BenchmarkStructReturnValue -run ^$ -cpuprofile cpu2.prof
go tool pprof  post.test cpu2.prof 
(pprof) top
Showing nodes accounting for 2.25s, 100% of 2.25s total
      flat  flat%   sum%        cum   cum%
     2.09s 92.89% 92.89%      2.23s 99.11%  github.com/philpearl/blog/content/post.newBigStruct
     0.14s  6.22% 99.11%      0.14s  6.22%  runtime.newstack
     0.02s  0.89%   100%      0.02s  0.89%  runtime.nanotime
         0     0%   100%      2.23s 99.11%  github.com/philpearl/blog/content/post.BenchmarkStructReturnValue
         0     0%   100%      0.02s  0.89%  runtime.mstart
         0     0%   100%      0.02s  0.89%  runtime.mstart1
         0     0%   100%      0.02s  0.89%  runtime.sysmon
         0     0%   100%      2.23s 99.11%  testing.(*B).launch
         0     0%   100%      2.23s 99.11%  testing.(*B).runN

In the value case nearly all the work is happening in newBigStruct. It’s all very straightforward. What if we profile the pointer test?

go test -bench BenchmarkStructReturnPointer -run ^$ -cpuprofile cpu.prof
go tool pprof post.test cpu.prof 
(pprof) top
Showing nodes accounting for 2690ms, 93.08% of 2890ms total
Dropped 28 nodes (cum <= 14.45ms)
Showing top 10 nodes out of 67
      flat  flat%   sum%        cum   cum%
    1110ms 38.41% 38.41%     1110ms 38.41%  runtime.pthread_cond_signal
     790ms 27.34% 65.74%      790ms 27.34%  runtime.pthread_cond_wait
     300ms 10.38% 76.12%      300ms 10.38%  runtime.usleep
     200ms  6.92% 83.04%      200ms  6.92%  runtime.pthread_cond_timedwait_relative_np
      80ms  2.77% 85.81%       80ms  2.77%  runtime.nanotime
      60ms  2.08% 87.89%      140ms  4.84%  runtime.sweepone
      50ms  1.73% 89.62%       50ms  1.73%  runtime.pthread_mutex_lock
      40ms  1.38% 91.00%      150ms  5.19%  github.com/philpearl/blog/content/post.newBigStructPtr
      30ms  1.04% 92.04%       40ms  1.38%  runtime.gcMarkDone
      30ms  1.04% 93.08%       40ms  1.38%  runtime.scanobject

In the newBigStructPtr case the picture is much more complex and there are many more functions that use significant CPU. Only ~5% of the time is spent in newBigStructPtr setting up the struct. Instead, there’s lots of time in the Go runtime dealing with threads and locks and garbage collection. The underlying function returning a pointer is fast, but the baggage that comes with allocating the pointer is a huge overhead.

Now this scenario is very simplistic. The data is created and then immediately thrown away, so there will be a huge burden on the garbage collector. If the lifetime of the returned data was longer the results could be very different. But perhaps this is an indication that returning pointers to structs that have a short lifetime is bad Go.