diff --git a/guide/README.md b/guide/README.md index c3cf0a6..acbebcc 100644 --- a/guide/README.md +++ b/guide/README.md @@ -3,14 +3,15 @@ - **[Introduction](#introduction):** [Read This](#read-this) · [Mental Model for Go](#mental-model-for-go) · Profiling vs Tracing - **Use Cases:** Reduce Costs · Reduce Latency · Memory Leaks · Program Hanging · Outages -- **Go Profilers**: CPU · Memory · Block · Mutex · Goroutine · ThreadCreate -- **Viewing Profiles**: Command Line · Flamegraph · Webgraph +- **Go Profilers**: [CPU](#cpu-profiler) · Memory · Block · Mutex · Goroutine · [ThreadCreate](#threadcreate) +- **Viewing Profiles**: Command Line · Flame Graph · Graph - **Go Execution Tracer:** Timeline View · Derive Profiles +- **Go Metrics:** MemStats - **Other Tools:** time · perf · bpftrace -- **Advanced Topics:** Assembly · Stack Traces -- **Datadog Products:** Continuous Profiler · APM (Distributed Tracing) +- **Advanced Topics:** Assembly · Stack Traces · Little's Law +- **Datadog Products:** Continuous Profiler · APM (Distributed Tracing) · Metrics -🚧 This document is a work in progress. All sections above will become clickable links over time. Follow me [on twitter](https://twitter.com/felixge) for updates. +🚧 This document is a work in progress. All sections above will become clickable links over time. The best way to find out about updates is to follow me and [my thread on twitter](https://twitter.com/felixge/status/1435537024388304900) where I'll announce new sections being added. # Introduction @@ -127,6 +128,88 @@ Generally speaking the costs of GC is proportional to the amount of heap allocat As with the previous mental model in this guide, everything above is an extremely simplified view of reality. But hopefully it will be good enough to make sense out of the remainder of this guide, and inspire you to read more articles on the subject. One article you should definitely read is [Getting to Go: The Journey of Go's Garbage Collector](https://go.dev/blog/ismmkeynote) which gives you a good idea of how Go's GC has advanced over the years, and the pace at which it is improving. +# Go Profilers +## CPU Profiler + +Go's CPU profiler can help you identify which parts of your code base consume a lot of CPU time. + +⚠️ Please note that CPU time is usually different from the real time experienced as latency by your users. For example a typical http request might take `100ms` to complete, but only consume `5ms` of CPU time while waiting for `95ms` on a database. It's also possible for a request to take `100ms`, but spend `200ms` of CPU if two goroutines are performing CPU intensive work in parallel. If this is confusing to you, please refer to the [Mental Model for Go section](#mental-model-for-go). + +You can enable the CPU profiler via various APIs: + +- `go test -cpuprofile cpu.pprof` will run your tests and write a CPU profile to a file named `cpu.pprof`. +- [`pprof.StartCPUProfile(w)`](https://pkg.go.dev/runtime/pprof#StartCPUProfile) will capture a CPU profile to `w` that covers the time span until [`pprof.StopCPUProfile()`](https://pkg.go.dev/runtime/pprof#StopCPUProfile) is called. +- [`import _ "net/http/pprof"`](https://pkg.go.dev/net/http/pprof) allows you to request a 30s CPU profile by hitting the `GET /debug/pprof/profile?seconds=30` of the default http server that you can start via `http.ListenAndServe("localhost:6060", nil)`. + +Regardless of how you activate the CPU profiler, the resulting profile will be a frequency table of stack traces formatted in the binary [pprof](../pprof.md) format. A simplified version of such a table is shown below: + +|stack trace|samples/count|cpu/nanoseconds| +|-|-|-| +|main;foo|5|50000000| +|main;foo;bar|3|30000000| +|main;foobar|4|40000000| + +The CPU profiler captures this data by asking the operating system to monitor the CPU usage of the application and sends it a `SIGPROF` signal for every `10ms` of CPU time it consumes. The OS also includes time consumed by the kernel on behalf of the application in this monitoring. Since the signal deliver rate depends on CPU consumption, it's dynamic and can be up to `N * 100Hz` where `N` is the number of logical CPU cores on the system. When a `SIGPROF` signal arrives, Go's signal handler captures a stack trace of the currently active goroutine, and increments the corresponding values in the profile. The `cpu/nanoseconds` value is currently directly derived from the sample count, so it is redundant, but convenient. + +### Profiler Labels + +A cool feature of Go's CPU profiler is that you can attach arbitrary key value pairs to a goroutine. These labels will be inherited by any goroutine spawned from that goroutine and show up in the resulting profile. + +Let's consider the [example](./cpu-profiler-labels.go) below that does some CPU `work()` on behalf of a `user`. By using the [`pprof.Labels()`](https://pkg.go.dev/runtime/pprof#Labels) and [`pprof.Do()`](https://pkg.go.dev/runtime/pprof#Do) API, we can associate the `user` with the goroutine that is executing the `work()` function. Additionally the labels are automatically inherited by any goroutine spawned within the same code block, for example the `backgroundWork()` goroutine. + +```go +func work(ctx context.Context, user string) { + labels := pprof.Labels("user", user) + pprof.Do(ctx, labels, func(_ context.Context) { + go backgroundWork() + directWork() + }) +} +``` + +The resulting profile will include a new label column and might look something like this: + +|stack trace|label|samples/count|cpu/nanoseconds| +|-|-|-|-| +|main.childWork|user:bob|5|50000000| +|main.childWork|user:alice|2|20000000| +|main.work;main.directWork|user:bob|4|40000000| +|main.work;main.directWork|user:alice|3|30000000| + +Viewing the same profile with pprof's Graph view will also include the labels: + + + +How you use these labels is up to you. You might include things such as `user ids`, `request ids`, `http endpoints`, `subscription plan` or other data that can allow you to get a better understanding of what types of requests are causing high CPU utilization, even when they are being processed by the same code paths. That being said, using labels will increase the size of your pprof files. So you should probably start with low cardinality labels such as endpoints before moving on to high cardinality labels once you feel confident that they don't impact the performance of your application. + +### CPU Utilization + +Since the sample rate of the CPU profiler adapts to amount of CPU your program is consuming, you can derive the CPU utilization from CPU profiles. In fact pprof will do this automatically for you. For example the profile below was taking from a program that had an average CPU utilization of `147.77%`: + +``` +$ go tool pprof guide/cpu-utilization.pprof +Type: cpu +Time: Sep 9, 2021 at 11:34pm (CEST) +Duration: 1.12s, Total samples = 1.65s (147.77%) +Entering interactive mode (type "help" for commands, "o" for options) +(pprof) +``` + +Another popular way to express CPU utilization is CPU cores. In the example above the program was using an average of `1.47` CPU cores during the profiling period. + +⚠️ Due to one of the known issues below you shouldn't put too much trust in this number if it's near or higher than `250%`. However, if you see a very low number such as `10%` this usually indicates that CPU consumption is not an issue for your application. A common mistake is to ignore this number and start worrying about a particular function taking up a long time relative to the rest of the profile. This is usually a waste of time when overall CPU utilization is low, as not much can be gained from optimizing this function. +### Known CPU Profiler Issues + +There are a few known issues and limitations of the CPU profiler that you might want to be aware of: + +- 🐞 A known issue on linux is that the CPU profiler struggles to achieve a sample rate beyond `250Hz`. This is usually not a problem, but can lead to bias if your CPU utilization is very spiky. For more information on this, check out this [GitHub issue](https://github.com/golang/go/issues/35057). +- ⚠️️ You can call [`runtime.SetCPUProfileRate()`](https://pkg.go.dev/runtime#SetCPUProfileRate) to adjust the CPU profiler rate before calling `runtime.StartCPUProfile()`. This will print a warning saying `runtime: cannot set cpu profile rate until previous profile has finished`. However, it still works within the limitation of the bug mentioned above. This issue was [initially raised here](https://github.com/golang/go/issues/40094), and there is an [accepted proposal for improving the API](https://github.com/golang/go/issues/42502). +- ⚠️ The maximum number of nested function calls that can be captured in stack traces by the CPU profiler is currently [hard coded to `64`](https://sourcegraph.com/search?q=context:global+repo:github.com/golang/go+file:src/*+maxCPUProfStack+%3D&patternType=literal). If your program is using a lot of recursion or other patterns that lead to deep stack depths, your CPU profile will include stack traces that are truncated. This means you will miss parts of the call chain that led to the function that was active at the time the sample was taken. + +## ThreadCreate + +🐞 The threadcreate profile is intended to show stack traces that led to the creation of new OS threads. However, it's been [broken since 2013](https://github.com/golang/go/issues/6104), so you should stay away from it. + # Disclaimers I'm [felixge](https://github.com/felixge) and work at [Datadog](https://www.datadoghq.com/) on [Continuous Profiling](https://www.datadoghq.com/product/code-profiling/) for Go. You should check it out. We're also [hiring](https://www.datadoghq.com/jobs-engineering/#all&all_locations) : ). @@ -140,5 +223,7 @@ Notes: - GC Cost: O(N) with regards to live allocations on the heap containing pointers. - Each pointer slot in an allocation has a cost! Even nil pointers. - Reducing Costs: Talk about CPU, Memory and Networking. Is it possible to profile the latter? +- pprof: Maybe host a service to convert perf.data files into pprof files? +- Reuse cute gophers from conf talks. --> \ No newline at end of file diff --git a/guide/cpu-max-stack-depth.go b/guide/cpu-max-stack-depth.go new file mode 100644 index 0000000..721ea35 --- /dev/null +++ b/guide/cpu-max-stack-depth.go @@ -0,0 +1,37 @@ +// +build ignore + +package main + +import ( + "os" + "runtime/pprof" +) + +func main() { + pprof.StartCPUProfile(os.Stdout) + defer pprof.StopCPUProfile() + + belowLimit() + aboveLimit() +} + +func belowLimit() { + atDepth(32, cpuHog) +} + +func aboveLimit() { + atDepth(64, cpuHog) +} + +func cpuHog() { + for i := 0; i < 5000*1000*1000; i++ { + } +} + +func atDepth(n int, fn func()) { + if n > 0 { + atDepth(n-1, fn) + } else { + fn() + } +} diff --git a/guide/cpu-max-stack-depth.pprof b/guide/cpu-max-stack-depth.pprof new file mode 100644 index 0000000..8c7bd1a Binary files /dev/null and b/guide/cpu-max-stack-depth.pprof differ diff --git a/guide/cpu-profiler-labels.go b/guide/cpu-profiler-labels.go new file mode 100644 index 0000000..27f4d4d --- /dev/null +++ b/guide/cpu-profiler-labels.go @@ -0,0 +1,38 @@ +// +build ignore + +package main + +import ( + "context" + "os" + "runtime/pprof" + "time" +) + +func main() { + pprof.StartCPUProfile(os.Stdout) + defer pprof.StopCPUProfile() + + go work(context.Background(), "alice") + go work(context.Background(), "bob") + + time.Sleep(50 * time.Millisecond) +} + +func work(ctx context.Context, user string) { + labels := pprof.Labels("user", user) + pprof.Do(ctx, labels, func(_ context.Context) { + go backgroundWork() + directWork() + }) +} + +func directWork() { + for { + } +} + +func backgroundWork() { + for { + } +} diff --git a/guide/cpu-profiler-labels.png b/guide/cpu-profiler-labels.png new file mode 100644 index 0000000..afea348 Binary files /dev/null and b/guide/cpu-profiler-labels.png differ diff --git a/guide/cpu-profiler-labels.pprof b/guide/cpu-profiler-labels.pprof new file mode 100644 index 0000000..3e8c7f6 Binary files /dev/null and b/guide/cpu-profiler-labels.pprof differ diff --git a/guide/cpu-rate.go b/guide/cpu-rate.go new file mode 100644 index 0000000..4176430 --- /dev/null +++ b/guide/cpu-rate.go @@ -0,0 +1,25 @@ +// +build ignore + +package main + +import ( + "os" + "runtime" + "runtime/pprof" + "time" +) + +func main() { + runtime.SetCPUProfileRate(800) + + pprof.StartCPUProfile(os.Stdout) + defer pprof.StopCPUProfile() + + go cpuHog() + time.Sleep(time.Second) +} + +func cpuHog() { + for { + } +} diff --git a/guide/cpu-rate.pprof b/guide/cpu-rate.pprof new file mode 100644 index 0000000..1330ba7 Binary files /dev/null and b/guide/cpu-rate.pprof differ diff --git a/guide/cpu-utilization.go b/guide/cpu-utilization.go new file mode 100644 index 0000000..21b28bc --- /dev/null +++ b/guide/cpu-utilization.go @@ -0,0 +1,24 @@ +// +build ignore + +package main + +import ( + "os" + "runtime/pprof" + "time" +) + +func main() { + pprof.StartCPUProfile(os.Stdout) + defer pprof.StopCPUProfile() + + go cpuHog() + go cpuHog() + + time.Sleep(time.Second) +} + +func cpuHog() { + for { + } +} diff --git a/guide/cpu-utilization.pprof b/guide/cpu-utilization.pprof new file mode 100644 index 0000000..03bb5fb Binary files /dev/null and b/guide/cpu-utilization.pprof differ diff --git a/guide/heap.go b/guide/heap.go index 2744071..e4fbaf3 100644 --- a/guide/heap.go +++ b/guide/heap.go @@ -4,25 +4,13 @@ package main import ( "fmt" - "sync" - "sync/atomic" ) func main() { - var ( - sum int32 - wg = &sync.WaitGroup{} - ) - - wg.Add(2) - go add(&sum, 23, wg) - go add(&sum, 42, wg) - wg.Wait() - - fmt.Println(sum) + fmt.Println(*add(23, 42)) } -func add(dst *int32, delta int32, wg *sync.WaitGroup) { - atomic.AddInt32(dst, delta) - wg.Done() +func add(a, b int) *int { + sum := a + b + return &sum } diff --git a/guide/heap2.go b/guide/heap2.go deleted file mode 100644 index e4fbaf3..0000000 --- a/guide/heap2.go +++ /dev/null @@ -1,16 +0,0 @@ -// +build ignore - -package main - -import ( - "fmt" -) - -func main() { - fmt.Println(*add(23, 42)) -} - -func add(a, b int) *int { - sum := a + b - return &sum -}