diff --git a/examples/pclnttab/main.go b/examples/pclnttab/main.go index 60db0fe..67fa8eb 100644 --- a/examples/pclnttab/main.go +++ b/examples/pclnttab/main.go @@ -48,7 +48,7 @@ func debugSymtab(gopclntab []byte) error { for _, pc := range callers() { file, line, fn := table.PCToLine(uint64(pc)) - fmt.Printf("%s() %s:%d\n", fn.Name, file, line) + fmt.Printf("%x: %s() %s:%d\n", pc, fn.Name, file, line) } return nil } diff --git a/stack-traces.md b/stack-traces.md index fa8e218..1021fee 100644 --- a/stack-traces.md +++ b/stack-traces.md @@ -1,6 +1,6 @@ 🚧 This note is still work in progress, please come back later! 🚧 -This document was last updated for `go1.16` but probably still applies to older/newer versions for the most parts. +This document was last updated for `go1.16.3` but probably still applies to older/newer versions for the most parts. # Stack Traces in Go @@ -32,19 +32,21 @@ This text format has been [described elsewhere](https://www.ardanlabs.com/blog/2 As the name implies, stack traces originate from "the stack". Even so the details vary, most programming languages have a concept of a stack and use it to store things like local variables, arguments, return values and return addresses. Generating a stack trace usually involves navigating the stack in a process known as [Unwinding](#unwinding) that will be described in more detail later on. -Platforms like `x86-64` define a [stack layout](https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64) and calling convention for C and encourage other programming languages to adopt it for interoperability. Go doesn't follow these conventions, and instead uses its own idiosyncratic [calling convention](https://dr-knz.net/go-calling-convention-x86-64.html). Future versions of Go (1.17?) will adopt a more traditional [register-based](https://go.googlesource.com/proposal/+/refs/changes/78/248178/1/design/40724-register-calling.md) convention that will improve performance . However compatibility with platform conventions is not planned as it would negatively impact goroutine scalability. +Platforms like `x86-64` define a [stack layout](https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64) and [calling convention](https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-1.0.pdf) for C and encourage other programming languages to adopt it for interoperability. Go doesn't follow these conventions, and instead uses its own idiosyncratic [calling convention](https://dr-knz.net/go-calling-convention-x86-64.html). Future versions of Go (1.17?) will adopt a more traditional [register-based](https://go.googlesource.com/proposal/+/refs/changes/78/248178/1/design/40724-register-calling.md) convention that will improve performance . However even the new convention won't be platform-compatible as that would negatively impact goroutine scalability. -Even today, Go's stack layout is slightly different on different platforms. To keep things manageable, we'll assume that we're on `x86-64` for the remainder of this note. +Go's stack layout is slightly different on different platforms. To keep things manageable, we'll assume that we're on `x86-64` for the remainder of this note. ### Stack Layout -Now let's take a closer look at the stack. Every goroutine has its own stack that is at least [2 KiB](https://sourcegraph.com/search?q=repo:golang/go+repo:%5Egithub%5C.com/golang/go%24+_StackMin+%3D&patternType=literal) and grows from a high memory address towards lower memory addresses. This can be a bit confusing and is mostly a historical convention from a time when memory was so limited that one had to worry about the stack colliding with other memory regions used by the program. +Now let's take a closer look at the stack. Every goroutine has its own stack that is at least [2 KiB](https://sourcegraph.com/search?q=repo:golang/go+repo:%5Egithub%5C.com/golang/go%24+_StackMin+%3D&patternType=literal) and grows from a high memory address towards lower memory addresses. This can be a bit confusing and is mostly a historical convention from a time when the address space was so limited that one had to worry about the stack colliding with other memory regions used by the program. + +The picture below shows the stack of a sample goroutine that is currently calling `main.foo()` like our example above: ![](./goroutine-stack.png) -There is a lot going on in the picture above, but for now let's focus on the things highlighted in red. To get a stack trace, the first thing we need is the current program counter (pc) which identifies the function that is currently being executed. This is found in a CPU register called `rip` (instruction pointer register) that points to another region of memory that holds the executable machine code of our program. If you're not familiar with registers, you can think of them as special CPU variables that are incredibly fast to access. +There is a lot going on in this picture, but for now let's focus on the things highlighted in red. To get a stack trace, the first thing we need is the current program counter (pc). This is found in a CPU register called `rip` (instruction pointer register) and points to another region of memory that holds the executable machine code of our program. Since we're currently calling `main.foo()` `rip` is pointing to an instruction within that function. If you're not familiar with registers, you can think of them as special CPU variables that are incredibly fast to access. Some of them, like `rip`, `rsp` or `rbp` have special purposes, while others can be used by compilers as they see fit. -The next step is to find the program counters of all the callers of the current function, i.e. all the `return address (pc)` values that are also highlighted in red. There are various techniques for doing this, which are described in the [Unwinding](#unwinding) section. The end result is a list of program counters that represent a stack trace. In fact, it's exactly the same list you can get from [`runtime.Callers()`](https://golang.org/pkg/runtime/#Callers) within your program. Last but not least, these `pc` values are translated into human readable file/line/function names as described in the [Symbolization](#symbolization) section below. +Now that we know the program counter of the current function, it's time to find pc values of our callers, i.e. all the `return address (pc)` values that are also highlighted in red. There are various techniques for doing this, which are described in the [Unwinding](#unwinding) section. The end result is a list of program counters that represent a stack trace just like the one you can get from [`runtime.Callers()`](https://golang.org/pkg/runtime/#Callers). Last but not least, these `pc` values are usually translated into human readable file/line/function names as described in the [Symbolization](#symbolization) section below. In Go itself you can simply calll [`runtime.CallerFramers()`](https://golang.org/pkg/runtime/#CallersFrames) to symbolize a list of `pc` values. ### Real Example @@ -110,28 +112,42 @@ If you want to try it out yourself, perhaps modify the example program to spawn ## Unwinding -Unwinding (or stack walking) is the process of collecting all the return addresses (see red elements in [Stack Layout](#stack-layout)) from the stack. Together with the current instruction pointer register (`rip`) they form a list of program counter (pc) values that can be turned into a human readable stack trace via [Symbolization](#symbolization). +Unwinding (or stack walking) is the process of collecting all the return addresses (see red elements in [Stack Layout](#stack-layout)) from the stack. Together with the current instruction pointer register (`rip`) they form a list of program counter (`pc`) values that can be turned into a human readable stack trace via [Symbolization](#symbolization). -The Go runtime, including the builtin profilers, exclusively use [.gopclntab](#gopclntab) for unwinding. However, we'll start with describing [Frame Pointer](#frame-pointer) unwinding first, because it is much easier to understand and might [become the default in the future](https://github.com/golang/go/issues/16638). +The Go runtime, including the builtin profilers, exclusively use [.gopclntab](#gopclntab) for unwinding. However, we'll start with describing [Frame Pointer](#frame-pointer) unwinding first, because it is much easier to understand and might [become supported in the future](https://github.com/golang/go/issues/16638). ### Frame Pointer -Frame pointer unwinding is the simple process of following the base pointer register (`rbp`) to the first frame pointer on the stack which points to the next frame pointer and so on. In other words, it is following the orange lines in the [Stack Layout](#stack-layout) graphic. For each visited frame pointer, the return address (pc) sitting 8 bytes above the frame pointer is collected along the way. +Frame pointer unwinding is the simple process of following the base pointer register (`rbp`) to the first frame pointer on the stack which points to the next frame pointer and so on. In other words, it is following the orange lines in the [Stack Layout](#stack-layout) graphic. For each visited frame pointer, the return address (pc) sitting 8 bytes above the frame pointer is collected along the way. That's it : ). -The main downside to frame pointers is that they add some performance overhead to every function call during normal program execution. Because of this compilers such as `gcc` offer options such as `-fomit-frame-pointers` to omit them for better performance. However, it's a devil's bargain: It gives you small performance win right away, but it reduces your ability to debug and diagnose performance issues in the future. Because of this the general advice is: +The main downside to frame pointers is that pushing them onto the stack adds some performance overhead to every function call during normal program execution. The Go authors estimated an average 2% execution overhead for an average program in the [Go 1.7 release notes](https://golang.org/doc/go1.7). Because of this compilers such as `gcc` offer options such as `-fomit-frame-pointers` to omit them for better performance. However, it's a devil's bargain: It gives you small performance win right away, but it reduces your ability to debug and diagnose performance issues in the future. Because of this the general advice is: > Always compile with frame pointers. Omitting frame pointers is an evil compiler optimization that breaks debuggers, and sadly, is often the default. > – [Brendan Gregg](http://www.brendangregg.com/perf.html) -Despite this Go used to omit frame pointers which caused interoperability issues with third party debuggers and profilers such as [Linux perf](http://www.brendangregg.com/perf.html). Fortunately the Go developers recognized the issue and since [Go 1.7](https://golang.org/doc/go1.7) frame pointers are always included for [64bit binaries ](https://sourcegraph.com/search?q=framepointer_enabled+repo:%5Egithub%5C.com/golang/go%24+&patternType=literal). And unlike `gcc`, the Go compiler offers no option to disable frame pointers, which is a good thing in my opinion. +In Go you don't even need this advice. Since Go 1.7 frame pointers are enabled by default for 64 bit binaries, and there is no `-fomit-frame-pointers` footgun available. This allows Go to be compatible with third party debuggers and profilers such as [Linux perf](http://www.brendangregg.com/perf.html) out of the box. -Anyway, if all of this is too abstract for you, and you'd like to see some code, [here](https://github.com/felixge/gounwind/blob/5cc8505361807a22169595999689bd793ed6d391/gounwind.go) is an alternative `runtime.Callers()` implementation that uses frame pointer unwinding instead of [.gopclntab](#gopclntab). The simplicity should speak for itself when compared to the alternative methods described below. It should also be clear that frame pointer unwinding has `O(N)` time complexity where `N` is the number of stack frames that need to be traversed. +If you'd like to see frame pointer unwinding in action, you can check out [this tiny code snippet](https://github.com/felixge/gounwind/blob/5cc8505361807a22169595999689bd793ed6d391/gounwind.go) which is a fast alternative to the official `runtime.Callers()` implementation. The simplicity should speak for itself when compared to the other unwinding methods described below. It should also be clear that frame pointer unwinding has `O(N)` time complexity where `N` is the number of stack frames that need to be traversed. + +Despite the apparent simplicity, frame pointer unwinding is no panacea. Frame pointers are pushed to the stack by the callee, so for interrupt based profiling there is a race condition that might cause you to miss the caller of the current function in your stack trace. Additionally frame pointer unwinding can't unwind inlined function calls. So at least some of the complexity of [.gopclntab](#gopclntab) or [DWARF](#dwarf) is essential to enable accurate unwinding. ### .gopclntab -Since frame pointers are only available on 64bit platforms, Go has to use a more complicated approach to enable stack tracing in a cross-platform manner [for now](https://github.com/golang/go/issues/16638). The nitty-gritty of this has been described by Russ Cox in his [Go 1.2 Runtime Symbol Information](https://golang.org/s/go12symtab) document, but I'll try to summarize the key idea. +Despite frame pointers being available on 64bit platforms, Go is not leveraging them for unwinding ([this might change](https://github.com/golang/go/issues/16638)). Instead Go ships with its own idiosyncratic unwinding tables that are embedded in the `.gopclntab` section of any Go binary. `.gopclntab` stands for "go program counter line table", but this is a bit of a misnomer as it contains various tables and meta data required for unwinding and symbolization. For unwinding, the general idea is to embed a table that maps every program counter (`pc`) to the current distance (delta) of the stack pointer (`rsp`) from the nearest `return address (pc)` above it. The initially lookup uses the `pc` from the `rip` instruction pointer register and then uses the `return address (pc)` for the next lookup and so on. -Simply speaking, the idea is to take the current program count (pc) of a goroutine and look it up in the sorted `` table in the `.gopclntab` section embedded in the binary. If you want, you can get access to this table using the [debug/elf](./examples/pclnttab/linux.go) or ([debug/macho](./examples/pclnttab/darwin.go)) package yourself. +Russ Cox initially described some of the involved data structured in his [Go 1.2 Runtime Symbol Information](https://golang.org/s/go12symtab) document, but it's very outdated by now and it's probably better to look at the current implementation directly. The relevant files are [traceback.go](https://github.com/golang/go/blob/go1.16.3/src/runtime/traceback.go) and [symtab.go](https://github.com/golang/go/blob/go1.16.3/src/runtime/symtab.go), so let's dive in. + +There are various use cases for stack traces in Go, but they all end up hitting the [`gentraceback()`](https://github.com/golang/go/blob/go1.16.3/src/runtime/traceback.go#L76-L86) function. If the caller is e.g. `runtime.Callers()` the function only needs to do unwinding, but e.g. `panic()` wants text output, which requires symbolization as well. Additionally the code has to deal with the difference between [link register architectures](https://en.wikipedia.org/wiki/Link_register) such as ARM that work a little different from x86. This combination of unwinding, symbolization, support for different architectures and bespoke data structures might just be a regular day in the shop for the system developers on the Go team, but it's definitely been tricky for me, so please watch out for potential inaccuracies in my description below. + +Each frame lookup begins with the current `pc` which is passed to [`findfunc()`](https://github.com/golang/go/blob/go1.16.3/src/runtime/symtab.go#L671) which looks up the meta data for the function that contains the `pc`. Historically this was done using `O(log N)` binary search, but [nowadays](https://go-review.googlesource.com/c/go/+/2097/) there is a hash-map-like index of [`findfuncbucket`](https://github.com/golang/go/blob/go1.16.3/src/runtime/symtab.go#L671) structs that usually directly guides us to the right entry using an `O(1)` algorithm. So at this point the overall complexity is still the same as frame pointer unwinding, but it's worth noting that the constant overheads are already significantly higher. + +The [_func](https://github.com/golang/go/blob/9baddd3f21230c55f0ad2a10f5f20579dcf0a0bb/src/runtime/runtime2.go#L825) meta data that we just retrieved contains a `pcsp` offset into the `pctab` table that maps program counters to stack pointer deltas. To decode this information, we call [`funcspdelta()`](https://github.com/golang/go/blob/go1.16.3/src/runtime/symtab.go#L903) which does a `O(N)` linear search over all program counters of the function until it finds the (`pc`, `sp delta`) pair were are looking for. For stacks with recursive call cycles, a tiny program counter cache is used to avoids doing lots of duplicated work. + +Now that that we have the stack pointer delta, we can use it to locate the next `return address (pc)` value of the caller and do the same lookup for it as before until we reach the "bottom" of the stack. + +So for non-recursive call stacks, the complexity for `gopclntab` unwinding is `O(N*M)` where `N` is the number of frames on the stack, and `M` is the average size of the generated machine code per function. I was able verify the impact of both factors [experimentally](https://github.com/DataDog/go-profiler-notes/tree/main/examples/stack-unwind-overhead), but in reality I'd assume both `N` and `M` to be fairly similar for most non-trivial Go applications. That being said, naive frame pointer unwinding appears to be [50x faster](https://github.com/felixge/gounwind) so high-resolution profiling and tracing use cases would certainly benefit from seeing [support for it](https://github.com/golang/go/issues/16638). + +One thing that I found surprising is that Go ships with two `.gopclntab` implementations. In addition to the one I've just described, there is also the [debug/gosym](https://golang.org/pkg/debug/gosym/) package that implements largely the same code and seems to be used by the linker, `go tool addr2line` and more. If you want, you can use it in combination with [debug/elf](./examples/pclnttab/linux.go) or ([debug/macho](./examples/pclnttab/darwin.go)) to go on your [gopclntab adventures](./examples/pclnttab). ### DWARF