Loading...
 

DTrace

This tool works like magic. It was designed by top Engineers at Oracle (Bryan Cantrill,Adam Leventhal, Mike Shapiro & Brendan Gregg with several design goals clearly stated. It offers an unprecedented view of both
user and kernel space etc. Solaris Dynamic Tracing Guide - http://docs.oracle.com/cd/E19253-01/817-6223/index.html

Hidden in Plain Sight - http://queue.acm.org/detail.cfm?id=1117401

This important link and it’s sub links clearly show the mistakes IBM, RedHat and Intel in how they approached the situation. https://blogs.oracle.com/ahl/entry/dtrace_knockoffs The linux community has seen a number of tools like this such as:Perf,LTTng,ktap,ktrace,ftrace,strace & SystemTap. These are handy but one tool should do it all and thought through from start to finish.

Dtrace Design Goals

  • Safety first


Flame Graphs

For example you can trace all application function stack traces on CPU using the profile provider and subsequently convert each stack to single lines with the relative count of those stack traces then transform this into a flamegraph which is an .svg file you can browser within most web-browsers, quickly and easily finding what part of the application including code paths is specifically on CPU the most. Here’s how


Then the 3 step process as follows:
Enable extended stack frames, profile 99 times per second for name mysqld, print userspace stack traces and count for 60 seconds total then output to a file

dtrace -x ustackframes=100 -n 'profile-99 /execname == "mysqld" && arg1/ {@[ustack()] = count(); } tick-60s { exit(0); }' -o out.stacks
./stackcollapse.pl out.stacks > out.folded
./flamegraph.pl out.folded > out.svg

Then just open the out.svg and Bob’s your Uncle!

This is for kernel stack trace, (you may wish to pipe to filter idle threads)

dtrace -x stackframes=100 -n 'profile-199 /arg0/ {@[stack()] = count(); } tick-60s { exit(0); }' -o out.stacks
./stackcollapse.pl out.stacks | grep -v cpu_idle > out.folded
./flamegraph.pl out.folded > out.svg

Off CPU

Note this will print out alot of info, user and kernel space stack traces along with distribution graph. See off CPU analysis - http://www.brendangregg.com/offcpuanalysis.html

dtrace -x ustackframes=100 -n 'sched:::off-cpu /execname == "mysqld"/ {self->ts = timestamp; } sched:::on-cpu /self->ts/ { @[stack(), ustack(), "ns"] =quantize(timestamp - self->ts); self->ts = 0; }'


Stall Cycles

if on FreeBSD you can do this:
in-case load pmcstat kernel module

kldload pmc
pmcstat -S RESOURCE_STALLS.ANY -O out.pmcstat sleep 10
pmcstat -R out.pmcstat -z100 -G out.stacks
./stackcollapse-pmc.pl out.stacks | ./flamegraph.pl > out.svg

Memory leak Detection

Possible to do as we can trace all application calls, so why not each of alloc,calloc,malloc etc... see - http://www.brendangregg.com/FlameGraphs/memoryflamegraphs.html

Measuring latency of IOs

$ dtrace -qn 'syscall::write:entry /execname == "mysqld"/ {self->stime = timestamp;} syscall::write:return /self->stime != 0/ {@LWrite = quantize(timestamp - self->stime);} tick-10s {printa(@LWrite);}'


           value  ------------- Distribution ------------- count
             512 |                                         0
            1024 |@                                        21
            2048 |@@@@@@@@@@@                              295
            4096 |@@@@@@@                                  174
            8192 |@@@@@@@@@@@@@@@@@@@@@                    544
           16384 |@                                        19
           32768 |                                         0



           value  ------------- Distribution ------------- count
             512 |                                         0
            1024 |@                                        43
            2048 |@@@@@@@@@@@@@@@                          1080
            4096 |@@@@@@@                                  531
            8192 |@@@@@@@@@@@@@@@@@                        1269
           16384 |@                                        38
           32768 |                                         0

Value column shows latency in nanoseconds which increments in powers of 2
Distribution is ascii art graph with count of IOs for that that specific latency
Count is the count of IOs with “value” latency
The output of all counters is refreshed every 10s