Practical Linux tracing ( Part 1/5) : symbols, debug symbols and stack unwinding

Tungdam
12 min readJun 24, 2020

Disclaimer

I’m not a professional tracer. What you are going to see here are the results of my adventure so far with Linux tracing tools. I’m still learning them everyday.

Most of the background concepts you’ll see are simplified, may be over simplified to let us understand them more easily so we can focus on the main purpose: to know what our system is doing. The complicated concepts will be explained by references from other professional guys who knows it much better than i do.

I’ve been strongly inspired by Brendan D. Gregg since years ago so don’t be surprised that you will see many references from his work.

Prerequisite

You should have be familiar with tracing tools. In general, they are tools for troubleshooting or performance tuning, beside old-but-gold *top and *stat tools. To name it specifically, we’re doing to play with Perf, bcc-tools and bpftrace in this, hopefully, series of blog posts about Linux tracing.

If you have no idea about the tools that i am talking about. You should read this first.

This is the part I of my practical tracing series.

See more:

Okay, let’s go to the main part of our blog today.

Symbols, Debug symbols and stack unwinding

Why would we care about all these things ?

So, you read all the links above, you’re feeling highly motivated. Let’s jump to profile something and make a beautiful FrameGraph to see how our application is doing.

Let’s try to pick a sandbox , install nginx server there, start a stress test, start profiling from 1 nginx worker first

[root@tungdam10-virt.dev:~/FlameGraph]# perf record -a -g -p 13000 -- sleep 5
Warning:
PID/TID switch overriding SYSTEM
[ perf record: Woken up 5 times to write data ]
[ perf record: Captured and wrote 1.146 MB perf.data (9148 samples) ]

Make a FlameGraph for further performance analysis like a pro:

[root@tungdam10-virt.dev:~/FlameGraph]# perf script | ./stackcollapse-perf.pl  | ./flamegraph.pl > nginx.svg
[root@tungdam10-virt.dev:~/FlameGraph]#

But the result looks not really promising like this

Super useless nginx FlameGraph

Well... Instead of having something to tell you: your nginx is using function cool to call function fantastic , your debug process tell you something like unknown unknown unknown . That’s sad.

TL,DR:

  • What happened ? You have some missing symbols
  • How to fix it: install some package that contains debug symbols, both for your nginx ( nginx-dbg ) and its shared libraries ( for example zlib1g-dbg or even libc6-dbg )
  • Profile your nginx again, your FlameGraph will look something like this:
Much better nginx stack traces
  • It has better info about function, but looks like most of the are kernel related. Ok, let’s profile your nginx again, this time with --call-graph dwarf , it should give you even cooler FlameGraph
Nginx flamegraph with dwarf

Missing symbols is the root cause. It makes your profling result useless. It can’t tell you anything or very few things about your application / kernel. And it’s the very first problem that i met and have to solve while playing with perf or bpftrace, especially while tracing user-space program.

If you just want to have solution, you can stop here. Otherwise, let’s move on.

Let’s talk about symbols

A symbol in an ELF is a name of a variable or function that associated with its memory address.

A symbol of a function is the event that we can trace / sample to get more info about what is it doing.

For example, here’s a snippet of a list of events that we can trace in nginx ( using readelf which belongs to Debian binutils package

readelf -s `which nginx`
....
1051: 0000000000063080 113 FUNC GLOBAL DEFAULT 14 ngx_http_set_exten
1052: 0000000000045a00 379 FUNC GLOBAL DEFAULT 14 ngx_process_events_and_ti
1053: 0000000000039bf0 113 FUNC GLOBAL DEFAULT 14 ngx_conf_set_msec_slot
1054: 00000000000825d0 837 FUNC GLOBAL DEFAULT 14 ngx_http_upstream_create_
1055: 000000000004c480 106 FUNC GLOBAL DEFAULT 14 ngx_shm_alloc
1056: 0000000000052420 323 FUNC GLOBAL DEFAULT 14 ngx_thread_task_post
1057: 00000000000278a0 133 FUNC GLOBAL DEFAULT 14 ngx_chain_coalesce_file
1058: 0000000000052750 178 FUNC GLOBAL DEFAULT 14 ngx_thread_pool_add
...

or a snippet of a list of event that you can trace for linux kernel

cat /proc/kallsyms | egrep ' (t|T) '
.....
ffffffffb16893f0 T vfs_removexattr
ffffffffb168a130 T __vfs_setxattr_noperm
ffffffffb168a2d0 T vfs_setxattr
ffffffffb1696f00 T vfs_fsync_range
ffffffffb1696f90 T vfs_fsync
ffffffffb16999b0 T vfs_statfs
ffffffffb16a8410 T __fsnotify_vfsmount_delete
ffffffffb16bdf80 T vfs_cancel_lock
ffffffffb16bf9d0 T vfs_test_lock
ffffffffb16c0410 T vfs_setlease
ffffffffb16c1540 T vfs_lock_file
ffffffffb170fde0 T get_vfs_caps_from_disk
ffffffffb1832710 T pci_sriov_set_totalvfs
ffffffffb1832750 T pci_sriov_get_totalvfs
....

So, symbols are critical for event tracing which used kprobe/uprobe. Without knowing the name of the function we want to probe, we can’t probe, obviously.

For quite a long time, i thought that the symbols that we use to trace are debug symbols, which exists in your binary when you compile it with -g but that’s not true. ( Feel free to correct me if I’m wrong here , I would love to know more about this )

Let’s examine a simple C code:

#include <stdio.h>
#include <unistd.h>
void hello() {
printf("hello world \n");
sleep(1);
}
int main() {
hello();
}

Compile it in my Debian 10 VM, without -g

gcc symbol.c -o hello

There’s no debug symbols

[root@tungdam10-virt.dev:~]# objdump -h hello  | grep debug
[root@tungdam10-virt.dev:~]#

We still have a symbol for our hello function name like this

readelf -s hello | fgrep hello
52: 0000000000001145 29 FUNC GLOBAL DEFAULT 14 hello

Let’s try tracing that hello event by

bpftrace -e 'uprobe:./hello:hello {printf("%s",ustack)}'  -c ./hello
Attaching 1 probe...
hello world
hello+0
__libc_start_main+235
0x41fd89415541f689

( here i use bpftrace because it’s easier to write an one-liner for event tracing. Don’t worry about the command or syntax much )

Yeah, we still can catch it.

So, for function tracing, we don’t need debug symbols of , we just need symbols, which are entries of type FUNClisted in 2 tables of readelf -s command output:

.dynsym ( dynamic symbols, which can be used by another program

.symtab ( “local” symbols used by the binary itself only )

Usually “local” symbols are stripped from a binary ( for reducing binary size only, AFAIK ) and those symbols are provided in a -dbg or -dbgsym packages.

For example, in case of nginx, binary was stripped

[root@tungdam10-virt.dev:~]# file `which nginx`
/usr/sbin/nginx: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=9506d41e4ab0f7a635ce85f2b5ce0ec2446f01ed, stripped

And we can see .dynsym only, no .symtab

[root@tungdam10-virt.dev:~]# readelf -s `which nginx` | fgrep 'Symbol table'
Symbol table '.dynsym' contains 1077 entries:

It’s not a very big deal as we still have quite many events to trace ( dynamic FUNC symbols that shouldn’t be stripped ) but we may miss something which is interesting. Fortunately, in most distro, nginx provide nginx-dbg that contains more symbols for us to play with. It’s just not the case with all other software.

Let’s try stripping our hello binary to see what happens ?

strip -s ./hello

Our symbol is gone

[root@tungdam10-virt.dev:~]# readelf -s hello | fgrep hello
[root@tungdam10-virt.dev:~]#

And we can’t trace it anymore:

[root@tungdam10-virt.dev:~]# bpftrace -e 'uprobe:./hello:hello {printf("%s",ustack)}'  -c ./hello
No probes to attach

Then, why do we need DEBUG SYMBOLS?

Debug symbols are information , which obviously, for debugging purpose, that tell us more info about symbols:

  • Interpret a memory address to a line of original source code
[root@tungdam10-virt.dev:~]# objdump --dwarf=decodedline hellohello:     file format elf64-x86-64Contents of the .debug_line section:CU: ./symbol.c:
File name Line number Starting address View
symbol.c 3 0x1145
symbol.c 4 0x1149
symbol.c 5 0x1155
symbol.c 6 0x115f
symbol.c 7 0x1162
symbol.c 8 0x1166
symbol.c 9 0x1175
symbol.c 9 0x1177

Here the address 0x1145 which is our hello function as we can see here:

readelf -s hello | fgrep hello
57: 0000000000001145 29 FUNC GLOBAL DEFAULT 14 hello

is defined at line number 3 in our source code.

I personally find this not very helpful, as we can browser through the source code and grep the function name to read more about it. But we still need it for a much more important purpose

  • Stack unwinding

This is the killer feature of trace tools, show us the stack traces or activation records of a function, which answers the question: which function triggers which functions that lead to a function that i care about ( maybe, it’s slowing my server down )

Let’s take a look at another simple code like this, let’s call it funk.c :

#include <stdio.h>
#include <unistd.h>
void func_d() {
int msec=1;
printf("%s","Hello world from D\n");
usleep(10000*msec);
}
void func_c() {
printf("%s","Hello from C\n");
func_d();
}
void func_b() {
printf("%s","Hello from B\n");
func_c();
}
void func_a() {
printf("%s","Hello from A\n");
func_b();
}
int main() {
func_a();
}

Compile it with -g which enable debug symbols in DWARF format. ( We won’t talk much about it here but i found this article is very helpful if you want to read more about it )

gcc -g funk.c -o stack

Add an uprobe by perf ( because bpftrace doesn’t support DWARF stack unwinding ) so we can trace it

[root@tungdam10-virt.dev:~]# perf probe -x ./stack 'func_d'
Added new event:
probe_stack:func_d (on func_d in /root/stack)
[root@tungdam10-virt.dev:~]# perf record -e probe_stack:func_d -aR -g --call-graph dwarf ./stack
Hello from A
Hello from B
Hello from C
Hello world from D
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.779 MB perf.data (1 samples) ]

A stack trace for func_d looks like this:

[root@tungdam10-virt.dev:~]# perf script
stack 30593 [002] 66862.244979: probe_stack:func_d: (55c783f82145)
55c783f82145 func_d+0x0 (/root/stack)
55c783f8218c func_c+0x19 (/root/stack)
55c783f821a9 func_b+0x19 (/root/stack)
55c783f821c6 func_a+0x19 (/root/stack)
55c783f821d7 main+0xd (/root/stack)
7fbcd2a8809a __libc_start_main+0xea (/usr/lib/x86_64-linux-gnu/libc-2.28.so)
55c783f82089 _start+0x29 (/root/stack)

As you see, from here we know that the activation records are like this: main() -> func_a() -> func_b() -> func_c() -> func_d()

This basically is the idea we rely on to build a cool FlameGraph like this:

Okay, looks nice. So debug symbols are cool, DWARF are cool. And it’s quite a popular practice that software compiled with -g so we can use this stack unwinding technique.

But there are some limitations with DWARF:

  • Its size is usually too big
  • BPF-based tools can’t use it as stack unwinding technique ( so maybe it’s limitation of BPF-based tools :D )

We should finish here. But I’m not tired yet. So stay with me for a bit more.

There’s another stack unwinding technique, using frame pointer. This is definitely not my area but let me try to explain it in the very simplified way:

  • A stack trace ( or activation records or call stacks ) contains many frames , which are saved in Last In First Out ( LIFO ) structure. That’s why we called it stack frames.
  • Each frame , in turn, contains info about a state of a function when it’s executed ( address of a memory area that stores its parameter, local variables, return address )
  • Frame pointer is…. Ok, seriously, i don’t know. Let’s see how a professional guy explain it here. I will quote the key part:
           High
| ... |
+---------+
+24| Arg 1 |
+---------+
+16| Arg 2 |
+---------+
+ 8| Return |
+---------+
EBP+--> |Saved EBP|
+---------+
- 8| Var 1 |
+---------+
ESP+--> | Var 2 |
+---------+
| ... |
Low

As you can see, the frame pointer for the last stack frame is stored at the start of current stack frame, creating a linked list of frame pointers. The stack is unwound by following this linked list.

So, basically a frame pointer can get us to memory address ( thus by some dark magic with assembly, offset, CPU register , we’ll have all needed info about different frames in a stack) , which can give us a full stack strace.

This is the default stack walking for call-graph method by perf and so far the only stack walking technique support by bcc-tools or bpftrace ( they depends on frame pointer but has its own BPF helper and map storage to store info about stacks )

Let’s see how it looks like with our stack program ( funk.c source code above )

[root@tungdam10-virt.dev:~]# perf record -e probe_stack:func_d -aR -g --call-graph fp
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.171 MB perf.data (1 samples) ]
[root@tungdam10-virt.dev:~]# perf script
stack 1887 [001] 74560.238011: probe_stack:func_d: (564a38f23145)
564a38f23145 func_d+0x0 (/root/stack)
564a38f231aa func_b+0x1a (/root/stack)
564a38f231c7 func_a+0x1a (/root/stack)
564a38f231d8 main+0xe (/root/stack)
7f02c679109b __libc_start_main+0xeb (/usr/lib/x86_64-linux-gnu/libc-2.28.so)
41fd89415541f689 [unknown] ([unknown])

Note that now we are using fp frame-pointer stack walking method, not dwarf anymore.

Same stack trace but with bpftrace and its ustack function ( to get user’s stack trace):

bpftrace -e 'uprobe:./stack:func_d {printf("%s\n", ustack); }'
Attaching 1 probe...
func_d+0
func_b+26
func_a+26
main+14
__libc_start_main+235
0x41fd89415541f689

Let’s compile our funk.c without -g to see what will happen:

[root@tungdam10-virt.dev:~]# gcc funk.c -o stack
[root@tungdam10-virt.dev:~]# bpftrace -e 'uprobe:./stack:func_d {printf("%s\n", ustack); }' -c ./stack
Attaching 1 probe...
Hello from A
Hello from B
Hello from C
Hello world from D
func_d+0
func_b+26
func_a+26
main+14
__libc_start_main+235
0x41fd89415541f689

still cool. Well, who’ll need debug symbols now ?

There are several points:

  • For optimization purpose, software usually compiled with -fomit-frame-pointer flag, which makes it impossible to use frame-pointer as stack walking method anymore. For example, in our case:
[root@tungdam10-virt.dev:~]# gcc funk.c -o stack -fomit-frame-pointer
[root@tungdam10-virt.dev:~]# bpftrace -e 'uprobe:./stack:func_d {printf("%s\n", ustack); }' -c ./stack
Attaching 1 probe...
Hello from A
Hello from B
Hello from C
Hello world from D
func_d+0
0x41fd89415541f689

Only 1 function symbol, no call stacks anymore.

Unfortunately, in many system, this is the default option, especially with a big software that is very performance sensitive, like C standard library ( libc6 in debian ) for example, or Java JVM. That’s why we still have an unknown symbol like 0x41fd89415541f689in many of our examples using frame pointer stack unwinding. ( Because libc doesn’t have frame pointer to unwind to resolve symbol of __libc_start_main function, which is _start function that you can read more here )

( But somehow it’s very tricky for my case, gcc (Debian 8.3.0–6) 8.3.0 perserve frame pointer by default . To disable it, we have to set it explicitly using -fomit-frame-pointer)

I’m still waiting for debian maintainer to change it, by using -fno-omit-frame-pointer instead, but it looks like it’s not going to happen very soon. See this.

  • Somewhat i currently find unwinding by DWARF is more stable. I lied to you in examples above while using bpftrace. We have 4 functions, but bpftrace only shows 3 of them in the stack call, due to this bug. Or I have to let my programs to sleep a bit, otherwise bpftrace trace won’t be able to resolve the symbol correctly, due to this and this.
  • In the world of C, compile with -g is much more popular than fno-omit-frame-pointer , thus DWARF can be used for most of the cases.
  • Whatever the stack unwinding method, eventually it’s a software, it must have bugs some day, having more 2 options for comparing is always good.

Conclusion

We have come along way, so far, hopefully you can understand:

  • Symbols, debug symbols, stack strace and stack unwinding concepts
  • Which symbol ( = event ) we can trace by examing its binary
  • How should we compile our binary to have symbols for tracing
  • Which package we should look for to overcome the missing symbols situation
  • Some basic bpftrace command that we can use to trace a specific function

All of these can help us to develop a very helpful strategy in debugging software in production later. I will write about it very soon, if this article makes you happy enough.

Follow me on twitter to get more articles like this, or just say hi ;)

Thanks for reading.

Small note: As of today ( 9th November 2023 ), I’m looking for a new job, ideally Senior+ SRE / DevOps Engineer role but can contribute to anything related to infrastructure engineering. If you’re hiring or know somebody is, please kindly ping me via my twitter or linkedin for further info. Of course, feel free to just connect and say hi. I would love to know more about you. Thank you a ton !

--

--

Tungdam

Sysadmin. Amateur Linux tracer. Performance enthusiast.