Good Go: Contributing to encoding/json
Some sequences have been shortened
In my last post I whinged endlessly about encoding/json and in particular about the poor performance of anything with a custom marshaler. I decided to try to do something about it and I raised a proposal for a new marshaler interface.
As I half expected I wasn’t immediately welcomed as the new Go Messiah and my proposal has been effectively rejected (stop press - actually rejected). There’s very little inclination amongst the Go maintainers to change any of the interfaces in the standard libraries, and for some reason changing encoding/json or encoding/xml is viewed with particular horror.
So I could cry and sulk a bit (and of course I will), but I could also see what else I could do to improve things. When I looked at this before there were 3 allocations for every field with a custom JSON marshaler. One is imposed by the poor interface. I wondered what I could do about the other two?
Now, I happen to have been to a Go contributor’s workshop, and I have made a simple contribution to Go before, so I do have the Go git repo on my laptop. So step one was to pull master, build the Go toolchain and run the encoding/json tests and benchmarks.
After a bit of poking about I found there was no benchmark for marshaling a struct with a custom marshaler. Following the principle that you cannot improve what you do not measure, adding a new benchmark was step one.
This is the new benchmark. It uses an Encoder as this can be reused, which reduces the noise from the test and shows the overhead due to the custom marshaler in stark relief.
func BenchmarkEncodeMarshaler(b *testing.B) {
b.ReportAllocs()
m := struct {
A int
B RawMessage
}{}
b.RunParallel(func(pb *testing.PB) {
enc := NewEncoder(ioutil.Discard)
for pb.Next() {
if err := enc.Encode(&m); err != nil {
b.Fatal("Encode:", err)
}
}
})
}
And here are the results running on Go 1.13-ish. As expected from my previous post there are 3 allocations on every run.
BenchmarkEncodeMarshaler-8 8169886 130 ns/op 100 B/op 3 allocs/op
Next we run a profile to see where those allocations are. We run the benchmark again with memory profiling enabled, then start the profiling tool and select the alloc_objects index. This index counts every time an object is allocated over the run, so is useful for showing us where most of the allocations occur. The -run ^$
on the test command is to ensure we don’t run any tests which might pollute the profiling data. ^$
is a regular expression. ^
matches the beginning of the line and $
matches the end, so -run ^$
won’t match any tests, so just our benchmark will run.
go test -run ^$ -bench BenchmarkEncodeMarshaler -memprofile mem.prof
go tool pprof json.test mem.prof
(pprof) sample_index=alloc_objects
Let’s look for the functions that allocate the most objects. top
does this for us.
(pprof) top
Showing nodes accounting for 21529479, 100% of 21529577 total
Dropped 4 nodes (cum <= 107647)
Showing top 10 nodes out of 13
flat flat% sum% cum cum%
9437472 43.83% 43.83% 9437472 43.83% reflect.packEface
9404990 43.68% 87.52% 9404990 43.68% encoding/json.compact
2687017 12.48% 100% 2687017 12.48% encoding/json.RawMessage.MarshalJSON
0 0% 100% 21529479 100% encoding/json.(*Encoder).Encode
0 0% 100% 21529479 100% encoding/json.(*encodeState).marshal
0 0% 100% 21529479 100% encoding/json.(*encodeState).reflectValue
There are only 3 functions with significant allocations. One is within the RawMessage.MarshalJSON
call and is the one we know is forced by the interface. One is in the compact
function in the json library, and one is within the reflect package.
Let’s start with compact
. We can look at an annotated source listing to show us where the allocations are in that function.
(pprof) list encoding/json.compact
Total: 21529577
ROUTINE ======================== encoding/json.compact in /Users/phil/Projects/go/src/encoding/json/indent.go
9404990 9404990 (flat, cum) 43.68% of Total
. . 12: return compact(dst, src, false)
. . 13:}
. . 14:
. . 15:func compact(dst *bytes.Buffer, src []byte, escape bool) error {
. . 16: origLen := dst.Len()
9404990 9404990 17: var scan scanner
. . 18: scan.reset()
. . 19: start := 0
. . 20: for i, c := range src {
. . 21: if escape && (c == '<' || c == '>' || c == '&') {
. . 22: if start < i {
So all the allocations here are creating the scan variable. This must escape to the heap and hence cause an allocation. (I’m not going to go into the details of why it escapes - suffice it to say that some shit goes down)
Why are we calling compact? The code below shows it is called after each call to a MarshalJSON
method, and the comment implies it is there because the library doesn’t trust marshalers to produce valid, maximally compact JSON. Whatever the reasoning, the call to compact
will happen for every field in the JSON with a custom marshaler, and each will cause an allocation - so it’s worth addressing.
b, err := m.MarshalJSON()
if err == nil {
// copy JSON into buffer, checking validity.
err = compact(&e.Buffer, b, opts.escapeHTML)
}
What can we do about this? Well, our problem is not that we get a few allocations when we marshal some JSON, it’s that our programs do this marshaling over and over again, and the allocations occur over and over again. If we find some way to re-use these scanners from one Marshal
call to the next then we’ll have addressed the problem.
The encoding/json API doesn’t provide a mechanism for a program to maintain any state from one marshaling call to the next, so within encoding/json we need to maintain some scanners for re-use without help from the application. sync.Pool
is for this precise purpose. We’ll get scanners from the pool and return them to the pool when we’re done. sync.Pool
is very fast so we should make savings over allocating a new scanner each time and having it garbage collected. We do need to ensure the scanner is fully reset each time - nothing will zero its fields for us.
We’ll create a pool for scanners, and some helper functions to get and release them which will also ensure the scanner is reset.
var scannerPool = sync.Pool{
New: func() interface{} {
return &scanner{}
},
}
func newScanner() *scanner {
scan := scannerPool.Get().(*scanner)
// scan.reset by design doesn't set bytes to zero
scan.bytes = 0
scan.reset()
return scan
}
func freeScanner(scan *scanner) {
// Avoid hanging on to too much memory in extreme cases.
if len(scan.parseState) > 1024 {
scan.parseState = nil
}
scannerPool.Put(scan)
}
We’ll then change the compact call to use a scanner from the pool rather than declare a scanner value.
func compact(dst *bytes.Buffer, src []byte, escape bool) error {
origLen := dst.Len()
scan := newScanner()
defer freeScanner(scan)
Note that defer
performance will greatly improve in Go 1.14 so we can happily use it here without worrying about the overhead. (Also note that there are some other small changes involved in this PR excluded here for brevity).
If we run the benchmark again at this point and run old and new results through benchstat
this is what we get. Quite a nice speed improvement and importantly we’ve removed the allocation.
name old time/op new time/op delta
EncodeMarshaler-8 118ns ± 2% 104ns ± 1% -12.21% (p=0.001 n=7+7)
name old alloc/op new alloc/op delta
EncodeMarshaler-8 100B ± 0% 36B ± 0% -64.00% (p=0.000 n=8+8)
name old allocs/op new allocs/op delta
EncodeMarshaler-8 3.00 ± 0% 2.00 ± 0% -33.33% (p=0.000 n=8+8)
A word of advice about running benchmarks. When measuring performance shut down absolutely everything you can on your machine to stop it interfering with the running time of the benchmark. This should reduce the variation between runs. Oh, and run the benchmark multiple times (I used -count 8
), and use benchstat
to summarise and compare results.
If we re-run our profiling top
shows compact
is no longer causing allocations.
(pprof) top
Showing nodes accounting for 12075339, 100% of 12075851 total
Dropped 3 nodes (cum <= 60379)
Showing top 10 nodes out of 12
flat flat% sum% cum cum%
9650470 79.92% 79.92% 9650470 79.92% reflect.packEface
2424869 20.08% 100% 2424869 20.08% encoding/json.RawMessage.MarshalJSON
0 0% 100% 12075851 100% encoding/json.(*Encoder).Encode
0 0% 100% 12075339 100% encoding/json.(*encodeState).marshal
0 0% 100% 12075339 100% encoding/json.(*encodeState).reflectValue
This change (and one other that gets rid of the other allocation) will be part of Go 1.14! How exciting!
How was the experience of contributing these changes? Well, it’s satisfying to get the changes in, but the process takes some getting used to if you’re an emotional weakling like me. Particularly if you are used to slapping changes into git then pushing them to production the same day after a cursory review from a busy colleague. Expect lots of picky code review and very high standards for every aspect of each change. And quite a few rounds of feedback as you try to feel out what’s good for the Go codebase. Some sequences have certainly been shortened in this explanation of how these changes came about! You can see the full extent of the process and my many mis-steps here. Many thanks to Daniel for his patient & expert review, and particularly for his enthusiasm when the changes were finally ready to be merged.