A very unsafe house by Cindy Tang via Unsplash

Breaking Printf

Sun, Nov 7, 2021

Acute impostor syndrome. When something you’ve built, something you’re proud of, built against all advice, full of unsafe, goes deeply, horribly wrong. Unspeakably wrong. You’ve built your company’s data pipeline around this code. If it is wrong then everything might be wrong. How could it go so wrong?

Your belief in yourself crumbles away. The ghost of Rob Pike haunts your every waking moment, shreiking “premature optimisation is the root of all evil”.

How wrong exactly? Well, code that you know should print zero, or at worst some reasonably small number, insists on printing a number that matches the time in nanoseconds since first January 1970 at around 9 in the morning on 30th September 2021.

Panic grips as you hear Robert Griesemer whispering “memory corruption”. Ken Thompson is not angry, just disappointed.

But you keep the panic under control and begin to investigate. In the debugger the variable appears to be zero, but it still prints as 163298695000000000. You change the code so it will only print the variable if it is zero. It prints 163298695000000000.

	if limited == 0 {
		fmt.Printf("limited is zero. %d\n", limited) // Prints 163298695000000000
	}

Your despair lifts. Even if you’ve done something very stupid, at least it’s interesting.

Here’s a minimal reproduction scenario. For what I hope are obvious reasons I’ve used 42 in place of 163298695000000000.

package badgo

import (
	"fmt"
	"unsafe"
)

func run() {
	var limited int
	var out int
	doThing(out, func(out interface{}) {
		if limited == 0 {
			fmt.Printf("limited is zero. %d\n", limited) // Prints 42
		}
		limited++
	})
}

//go:noinline
func doThing(out interface{}, f func(out interface{})) {
	p := (*eface)(unsafe.Pointer(&out)).data
	*(*int)(p) = 42
	f(out)
}

type eface struct {
	rtype unsafe.Pointer
	data  unsafe.Pointer
}

Here’s the output.

limited is zero. 42

The flaw is this bit. Just look at it. It certainly doesn’t seem to be rational or understandable. It’s wrong, but it isn’t quite as wrong as it looks.

    p := (*eface)(unsafe.Pointer(&out)).data
	*(*int)(p) = 42

What it is doing is using knowledge of the internals of interface types to gain access to a pointer to the value that the interface represents. I’ve talked about the internals of interface types before here, but let’s see if we can go over what’s going on.

An interface variable is comprised of two pointers. You can think of it as syntactic sugar for a struct like the following.

type eface struct {
	rtype unsafe.Pointer
	data unsafe.Pointer
}

The first pointer points to some information about the type of the value contained in the interface. The second pointer points to the value of the interface.

When you put a pointer type into an interface variable, then the data pointer in the interface variable can simply be the pointer.

var a interface{}

i = 7
a = &i

In this code above, the data pointer within a can simply be a pointer to i.

If you put a non-pointer type in an interface variable then what do you use for the data pointer? One choice would be to use a pointer to the original variable. But what would then happen if you changed the original variable?

var i int
var a interface{}

i = 7
a = i
i++

fmt.Println(a)

What would happen to a in this case? If a contained a pointer to i, then when we change i the contents of a would change, and the code above would print 8, not 7. This is not what we want, and not what happens if you run the code above.

So Go does not put a pointer to the original value in the interface variable in this case. Instead it allocates some memory, copies the value into it, then uses the pointer to this newly allocated memory in the interface variable.

That way you can change the original value (i in the case above), and the value in the interface variable (a above) does not change.

So let’s go back and look at that dodgy code again.

    p := (*eface)(unsafe.Pointer(&out)).data
	*(*int)(p) = 42

out is an interface variable containing an int All those brackets and unsafe things are retrieving that data pointer from the interface variable. Which is in this case is a pointer to an int. And then it writes the value 42 into it. That should be fine, right? As long as we don’t care about the original value changing?

Ah, actually, no, it isn’t.

As I’ve described above, putting a non-pointer value in an interface variable causes an allocation so Go can copy the value. Allocations can be bad for performance. So Go has some optimisations to avoid some of these allocations.

In particular if you put a small int (<= 255) in an interface variable, Go does not allocate and instead uses a pointer into a statically defined table containing integers 0 to 255.

So when this code runs the data pointer in the out interface variable contains a pointer to the zero entry in this static table.

    p := (*eface)(unsafe.Pointer(&out)).data
	*(*int)(p) = 42

And we overwrite it with 42. :facepalm:

Ah, that’s not good. The next time someone puts an integer variable containing zero into an interface variable, Go will use that entry in the static table again. Except it now contains 42. We’ve made 0 == 42. Sometimes.

Winding back a little, I said the flawed code was not 100% wrong.

The utterly wrong mistake was using this trick on int types.
The mostly wrong mistake was using this trick on non-pointer types at all.
The rather foolish mistake was using any trick in this very particular case as there’s very little upside. We’re saving just one allocation in a relatively rare operation.

But I’ll not say using tricks like this is completely wrong. The flawed code is in github.com/philpearl/avro (I’ve not fixed it yet), and that library is riddled with tricks with unsafe. The unsafe tricks reduced data-processing runs of ~5 hours on 60-plus cores with 95% of the time in GC, to ~24 minutes.

There may be other flaws. Future changes to Go may cause problems. The library may be hard to use correctly. But the risk is worth the reward, despite the heart-stopping moments.