You've already forked go-profiler-notes
mirror of
https://github.com/DataDog/go-profiler-notes.git
synced 2025-07-12 23:50:13 +02:00
hack
This commit is contained in:
@ -140,7 +140,7 @@ You can control the CPU profiler via various APIs:
|
|||||||
- `go test -cpuprofile cpu.pprof` will run your tests and write a CPU profile to a file named `cpu.pprof`.
|
- `go test -cpuprofile cpu.pprof` will run your tests and write a CPU profile to a file named `cpu.pprof`.
|
||||||
- [`pprof.StartCPUProfile(w)`](https://pkg.go.dev/runtime/pprof#StartCPUProfile) captures a CPU profile to `w` that covers the time span until [`pprof.StopCPUProfile()`](https://pkg.go.dev/runtime/pprof#StopCPUProfile) is called.
|
- [`pprof.StartCPUProfile(w)`](https://pkg.go.dev/runtime/pprof#StartCPUProfile) captures a CPU profile to `w` that covers the time span until [`pprof.StopCPUProfile()`](https://pkg.go.dev/runtime/pprof#StopCPUProfile) is called.
|
||||||
- [`import _ "net/http/pprof"`](https://pkg.go.dev/net/http/pprof) allows you to request a 30s CPU profile by hitting the `GET /debug/pprof/profile?seconds=30` endpoint of the default http server that you can start via `http.ListenAndServe("localhost:6060", nil)`.
|
- [`import _ "net/http/pprof"`](https://pkg.go.dev/net/http/pprof) allows you to request a 30s CPU profile by hitting the `GET /debug/pprof/profile?seconds=30` endpoint of the default http server that you can start via `http.ListenAndServe("localhost:6060", nil)`.
|
||||||
- [`runtime.SetCPUProfileRate()`](https://pkg.go.dev/runtime#SetCPUProfileRate) lets you to control the sampling rate of the CPU profiler. See [Known CPU Profiler Issues](#known-cpu-profiler-issues) for current limitations.
|
- [`runtime.SetCPUProfileRate()`](https://pkg.go.dev/runtime#SetCPUProfileRate) lets you to control the sampling rate of the CPU profiler. See [CPU Profiler Limitations](#cpu-profiler-limitations) for current limitations.
|
||||||
- [`runtime.SetCgoTraceback()`](https://pkg.go.dev/runtime#SetCgoTraceback) can be used to get stack traces into cgo code. [benesch/cgosymbolizer](https://github.com/benesch/cgosymbolizer) has an implementation for Linux and macOS.
|
- [`runtime.SetCgoTraceback()`](https://pkg.go.dev/runtime#SetCgoTraceback) can be used to get stack traces into cgo code. [benesch/cgosymbolizer](https://github.com/benesch/cgosymbolizer) has an implementation for Linux and macOS.
|
||||||
|
|
||||||
If you need a quick snippet to paste into your `main()` function, you can use the code below:
|
If you need a quick snippet to paste into your `main()` function, you can use the code below:
|
||||||
@ -207,18 +207,18 @@ Entering interactive mode (type "help" for commands, "o" for options)
|
|||||||
|
|
||||||
Another popular way to express CPU utilization is CPU cores. In the example above the program was using an average of `1.47` CPU cores during the profiling period.
|
Another popular way to express CPU utilization is CPU cores. In the example above the program was using an average of `1.47` CPU cores during the profiling period.
|
||||||
|
|
||||||
⚠️ Due to one of the known issues below you shouldn't put too much trust in this number if it's near or higher than `250%`. However, if you see a very low number such as `10%` this usually indicates that CPU consumption is not an issue for your application. A common mistake is to ignore this number and start worrying about a particular function taking up a long time relative to the rest of the profile. This is usually a waste of time when overall CPU utilization is low, as not much can be gained from optimizing this function.
|
⚠️ Due to one of the known [CPU Profiler Limitations](#cpu-profiler-limitations) below you shouldn't put too much trust in this number if it's near or higher than `250%`. However, if you see a very low number such as `10%` this usually indicates that CPU consumption is not an issue for your application. A common mistake is to ignore this number and start worrying about a particular function taking up a long time relative to the rest of the profile. This is usually a waste of time when overall CPU utilization is low, as not much can be gained from optimizing this function.
|
||||||
|
|
||||||
### System Calls in CPU Profiles
|
### System Calls in CPU Profiles
|
||||||
|
|
||||||
If you see system calls such as `syscall.Read()` or `syscall.Write()` using a lot of time in your CPU profiles, please note that this is only the CPU time spend inside of these functions in the kernel. The I/O time itself is not being tracked. Spending a lot of time on system calls is usually a sign of making too many of them, so perhaps increasing buffer sizes can help. For more complicated situations like this, you should consider using Linux perf, as it can also show you kernel stack traces that might provide you with additional clues.
|
If you see system calls such as `syscall.Read()` or `syscall.Write()` using a lot of time in your CPU profiles, please note that this is only the CPU time spend inside of these functions in the kernel. The I/O time itself is not being tracked. Spending a lot of time on system calls is usually a sign of making too many of them, so perhaps increasing buffer sizes can help. For more complicated situations like this, you should consider using Linux perf, as it can also show you kernel stack traces that might provide you with additional clues.
|
||||||
### Known CPU Profiler Issues
|
### CPU Profiler Limitations
|
||||||
|
|
||||||
There are a few known issues and limitations of the CPU profiler that you might want to be aware of:
|
There are a few known issues and limitations of the CPU profiler that you might want to be aware of:
|
||||||
|
|
||||||
- 🐞 A known issue on linux is that the CPU profiler struggles to achieve a sample rate beyond `250Hz`. This is usually not a problem, but can lead to bias if your CPU utilization is very spiky. For more information on this, check out this [GitHub issue](https://github.com/golang/go/issues/35057).
|
- 🐞 A known issue on linux is that the CPU profiler struggles to achieve a sample rate beyond `250Hz`. This is usually not a problem, but can lead to bias if your CPU utilization is very spiky. For more information on this, check out this [GitHub issue](https://github.com/golang/go/issues/35057).
|
||||||
- ⚠️️ You can call [`runtime.SetCPUProfileRate()`](https://pkg.go.dev/runtime#SetCPUProfileRate) to adjust the CPU profiler rate before calling `runtime.StartCPUProfile()`. This will print a warning saying `runtime: cannot set cpu profile rate until previous profile has finished`. However, it still works within the limitation of the bug mentioned above. This issue was [initially raised here](https://github.com/golang/go/issues/40094), and there is an [accepted proposal for improving the API](https://github.com/golang/go/issues/42502).
|
- ⚠️️ You can call [`runtime.SetCPUProfileRate()`](https://pkg.go.dev/runtime#SetCPUProfileRate) to adjust the CPU profiler rate before calling `runtime.StartCPUProfile()`. This will print a warning saying `runtime: cannot set cpu profile rate until previous profile has finished`. However, it still works within the limitation of the bug mentioned above. This issue was [initially raised here](https://github.com/golang/go/issues/40094), and there is an [accepted proposal for improving the API](https://github.com/golang/go/issues/42502).
|
||||||
- ⚠️ The maximum number of nested function calls that can be captured in stack traces by the CPU profiler is currently [hard coded to `64`](https://sourcegraph.com/search?q=context:global+repo:github.com/golang/go+file:src/*+maxCPUProfStack+%3D&patternType=literal). If your program is using a lot of recursion or other patterns that lead to deep stack depths, your CPU profile will include stack traces that are truncated. This means you will miss parts of the call chain that led to the function that was active at the time the sample was taken.
|
- ⚠️ The maximum number of nested function calls that can be captured in stack traces by the CPU profiler is currently [`64`](https://sourcegraph.com/search?q=context:global+repo:github.com/golang/go+file:src/*+maxCPUProfStack+%3D&patternType=literal). If your program is using a lot of recursion or other patterns that lead to deep stack depths, your CPU profile will include stack traces that are truncated. This means you will miss parts of the call chain that led to the function that was active at the time the sample was taken.
|
||||||
|
|
||||||
## Memory Profiler
|
## Memory Profiler
|
||||||
|
|
||||||
@ -228,18 +228,18 @@ Heap memory management related activities are often responsible for up to 20-30%
|
|||||||
|
|
||||||
⚠️ The memory profiler does not show stack allocations as these are generally much cheaper than heap allocations. Please refer to the [Garbage Collector](#garbage-collector) section for more details.
|
⚠️ The memory profiler does not show stack allocations as these are generally much cheaper than heap allocations. Please refer to the [Garbage Collector](#garbage-collector) section for more details.
|
||||||
|
|
||||||
You can enable the memory profiler via various APIs:
|
You can control the memory profiler via various APIs:
|
||||||
|
|
||||||
- `go test -memprofile mem.pprof` will run your tests and write a memory profile to a file named `mem.pprof`.
|
- `go test -memprofile mem.pprof` will run your tests and write a memory profile to a file named `mem.pprof`.
|
||||||
- [`pprof.Lookup("heap").WriteTo(w, 0)`](https://pkg.go.dev/runtime/pprof#Lookup) writes a memory profile that covers the time since the start of the process to `w`.
|
- [`pprof.Lookup("allocs").WriteTo(w, 0)`](https://pkg.go.dev/runtime/pprof#Lookup) writes a memory profile that covers the time since the start of the process to `w`.
|
||||||
- [`import _ "net/http/pprof"`](https://pkg.go.dev/net/http/pprof) allows you to request a 30s memory profile by hitting the `GET /debug/pprof/heap?seconds=30` endpoint of the default http server that you can start via `http.ListenAndServe("localhost:6060", nil)`. This is also called a delta profile internally.
|
- [`import _ "net/http/pprof"`](https://pkg.go.dev/net/http/pprof) allows you to request a 30s memory profile by hitting the `GET /debug/pprof/allocs?seconds=30` endpoint of the default http server that you can start via `http.ListenAndServe("localhost:6060", nil)`. This is also called a delta profile internally. 🚧 Test
|
||||||
- [`runtime.MemProfileRate`](https://pkg.go.dev/runtime#MemProfileRate) lets you to control the sampling rate of the memory profiler. See [Known Memory Profiler Issues](#known-memory-profiler-issues) for current limitations.
|
- [`runtime.MemProfileRate`](https://pkg.go.dev/runtime#MemProfileRate) lets you to control the sampling rate of the memory profiler. See [Memory Profiler Limitations](#memory-profiler-limitations) for current limitations.
|
||||||
|
|
||||||
If you need a quick snippet to paste into your `main()` function, you can use the code below:
|
If you need a quick snippet to paste into your `main()` function, you can use the code below:
|
||||||
|
|
||||||
```go
|
```go
|
||||||
file, _ := os.Create("./mem.pprof")
|
file, _ := os.Create("./mem.pprof")
|
||||||
defer pprof.Lookup("heap").WriteTo(file, 0)
|
defer pprof.Lookup("allocs").WriteTo(file, 0)
|
||||||
```
|
```
|
||||||
|
|
||||||
Regardless of how you activate the Memory profiler, the resulting profile will essentially be a table of stack traces formatted in the binary [pprof](../pprof.md) format. A simplified version of such a table is shown below:
|
Regardless of how you activate the Memory profiler, the resulting profile will essentially be a table of stack traces formatted in the binary [pprof](../pprof.md) format. A simplified version of such a table is shown below:
|
||||||
@ -250,12 +250,19 @@ Regardless of how you activate the Memory profiler, the resulting profile will e
|
|||||||
|main;foo;bar|3|768|0|0|
|
|main;foo;bar|3|768|0|0|
|
||||||
|main;foobar|4|512|1|128|
|
|main;foobar|4|512|1|128|
|
||||||
|
|
||||||
The memory profile contains two major pieces of information:
|
A memory profile contains two major pieces of information:
|
||||||
|
|
||||||
- `alloc_*`: The amount of allocations that your program has made since the start of the process (or profiling period for delta profiles).
|
- `alloc_*`: The amount of allocations that your program has made since the start of the process (or profiling period for delta profiles).
|
||||||
- `insue_*`: The amount of allocations that your program has made that were still reachable during the last GC.
|
- `insue_*`: The amount of allocations that your program has made that were still reachable during the last GC.
|
||||||
|
|
||||||
|
### Memory Profiler Sampling Rate
|
||||||
|
|
||||||
|
### Allocs vs Heap Profile
|
||||||
|
|
||||||
|
The [`pprof.Lookup()`](https://pkg.go.dev/runtime/pprof#Lookup) function as well as [net/http/pprof](https://pkg.go.dev/net/http/pprof) package expose the memory profile under two names: `allocs` and `heap`. Both profiles contain the same data, the only difference is that the `allocs` profile has `alloc_space/bytes` set as the default sample type, whereas the `heap` profile defaults to `inuse_space/bytes`. This is used by the pprof tool to decide which sample type to show by default.
|
||||||
|
### Memory Inuse vs RSS
|
||||||
|
|
||||||
|
### Memory Profiler Implementation
|
||||||
|
|
||||||
```
|
```
|
||||||
func malloc(size):
|
func malloc(size):
|
||||||
@ -280,13 +287,15 @@ func sweep(object):
|
|||||||
return object
|
return object
|
||||||
```
|
```
|
||||||
|
|
||||||
### Memory Inuse vs RSS
|
### Memory Profiler Limitations
|
||||||
|
|
||||||
### Known Memory Profiler Issues
|
There are a few known issues and limitations of the memory profiler that you might want to be aware of:
|
||||||
|
|
||||||
- ⚠️ [`runtime.MemProfileRate`](https://pkg.go.dev/runtime#MemProfileRate) must be should only be modified once as early as possible in the startup of the program, for example at the beginning of `main()`. Writing this value is inherently a small data race, and changing it multiple times during program execution will produce incorrect profiles.
|
- ⚠️ [`runtime.MemProfileRate`](https://pkg.go.dev/runtime#MemProfileRate) must be should only be modified once as early as possible in the startup of the program, for example at the beginning of `main()`. Writing this value is inherently a small data race, and changing it multiple times during program execution will produce incorrect profiles.
|
||||||
- ⚠ The memory profiler does not support labels like the [CPU Profiler Labels](#cpu-profiler-labels). It's difficult to add this feature to the current implementation as it could create a memory leak in the internal hash map that holds the memory profiling data.
|
- ⚠ When debugging potential memory leaks, the memory profiler can show you where those allocations were created, but it can't show you which references are keeping them alive. A few attempts to solve this problem were made over the years, but none of them work with recent versions of Go. If you know about a working tool, please [let me know](https://github.com/DataDog/go-profiler-notes/issues).
|
||||||
- ⚠ The memory profiler does not show heap allocations made by cgo C code.
|
- ⚠ [CPU Profiler Labels](#cpu-profiler-labels) or similar are not supported by the memory profiler. It's difficult to add this feature to the current implementation as it could create a memory leak in the internal hash map that holds the memory profiling data.
|
||||||
|
- ⚠ Allocations made by cgo C code don't show up in the memory profile.
|
||||||
|
- ⚠️ The maximum number of nested function calls that can be captured in stack traces by the memory profiler is currently [`32`](https://sourcegraph.com/search?q=context:global+repo:github.com/golang/go+file:src/*+maxStack+%3D&patternType=literal), see [CPU Profiler Limitations](#cpu-profiler-limitations) for more information on what happens when you exceed this limit.
|
||||||
|
|
||||||
## ThreadCreate Profiler
|
## ThreadCreate Profiler
|
||||||
|
|
||||||
|
Reference in New Issue
Block a user