This tool works like magic. It was designed by top Engineers at Oracle (Bryan Cantrill,Adam Leventhal, Mike Shapiro & Brendan Gregg with several design goals clearly stated. It offers an unprecedented view of both
user and kernel space etc. Solaris Dynamic Tracing Guide - http://docs.oracle.com/cd/E19253-01/817-6223/index.html
Hidden in Plain Sight - http://queue.acm.org/detail.cfm?id=1117401
This important link and it’s sub links clearly show the mistakes IBM, RedHat and Intel in how they approached the situation. https://blogs.oracle.com/ahl/entry/dtrace_knockoffs The linux community has seen a number of tools like this such as:Perf,LTTng,ktap,ktrace,ftrace,strace & SystemTap. These are handy but one tool should do it all and thought through from start to finish.
Dtrace Design Goals
- Safety first
Flame Graphs
For example you can trace all application function stack traces on CPU using the profile provider and subsequently convert each stack to single lines with the relative count of those stack traces then transform this into a flamegraph which is an .svg file you can browser within most web-browsers, quickly and easily finding what part of the application including code paths is specifically on CPU the most. Here’s how
- git clone https://github.com/brendangregg/FlameGraph (to get the perl scripts)
- cd FlameGraph (or where-ever you clone to)
Then the 3 step process as follows:
Enable extended stack frames, profile 99 times per second for name mysqld, print userspace stack traces and count for 60 seconds total then output to a file
dtrace -x ustackframes=100 -n 'profile-99 /execname == "mysqld" && arg1/ {@[ustack()] = count(); } tick-60s { exit(0); }' -o out.stacks ./stackcollapse.pl out.stacks > out.folded ./flamegraph.pl out.folded > out.svg
Then just open the out.svg and Bob’s your Uncle!
This is for kernel stack trace, (you may wish to pipe to filter idle threads)
dtrace -x stackframes=100 -n 'profile-199 /arg0/ {@[stack()] = count(); } tick-60s { exit(0); }' -o out.stacks ./stackcollapse.pl out.stacks | grep -v cpu_idle > out.folded ./flamegraph.pl out.folded > out.svg
Off CPU
Note this will print out alot of info, user and kernel space stack traces along with distribution graph. See off CPU analysis - http://www.brendangregg.com/offcpuanalysis.html
dtrace -x ustackframes=100 -n 'sched:::off-cpu /execname == "mysqld"/ {self->ts = timestamp; } sched:::on-cpu /self->ts/ { @[stack(), ustack(), "ns"] =quantize(timestamp - self->ts); self->ts = 0; }'
Stall Cycles
if on FreeBSD you can do this:
in-case load pmcstat kernel module
kldload pmc pmcstat -S RESOURCE_STALLS.ANY -O out.pmcstat sleep 10 pmcstat -R out.pmcstat -z100 -G out.stacks ./stackcollapse-pmc.pl out.stacks | ./flamegraph.pl > out.svg
Memory leak Detection
Possible to do as we can trace all application calls, so why not each of alloc,calloc,malloc etc... see - http://www.brendangregg.com/FlameGraphs/memoryflamegraphs.html
Measuring latency of IOs
$ dtrace -qn 'syscall::write:entry /execname == "mysqld"/ {self->stime = timestamp;} syscall::write:return /self->stime != 0/ {@LWrite = quantize(timestamp - self->stime);} tick-10s {printa(@LWrite);}' value ------------- Distribution ------------- count 512 | 0 1024 |@ 21 2048 |@@@@@@@@@@@ 295 4096 |@@@@@@@ 174 8192 |@@@@@@@@@@@@@@@@@@@@@ 544 16384 |@ 19 32768 | 0 value ------------- Distribution ------------- count 512 | 0 1024 |@ 43 2048 |@@@@@@@@@@@@@@@ 1080 4096 |@@@@@@@ 531 8192 |@@@@@@@@@@@@@@@@@ 1269 16384 |@ 38 32768 | 0
Value column shows latency in nanoseconds which increments in powers of 2
Distribution is ascii art graph with count of IOs for that that specific latency
Count is the count of IOs with “value” latency
The output of all counters is refreshed every 10s