Small things that helped

Tungdam
5 min readDec 1, 2022

Handy Linux network-related things that helped me. Hopefully they’re helpful with you as well. I’m sick at home, can’t sleep and it’s boring af. Let’s write something to cheer things up.

Resolving file descriptor in strace output with -yy

Sometimes we want to know which file / socket / device our process is interacting with. By default strace only show a number representing file descriptor like this:

strace -e connect,socket,write curl -I https://google.com --silent >/dev/null

socket(AF_INET6, SOCK_DGRAM, IPPROTO_IP) = 3
socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 5
connect(5, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("142.250.66.142")}, 16) = -1 EINPROGRESS (Operation now in progress)
write(5, "\26\3\1\2\0\1\0\1\374\3\3\22\26\354t#\322\232\35R\226\377\26L\2=>D\332v/\302"..., 517) = 517
write(5, "\24\3\3\0\1\1\27\3\3\0E1\353\243\204\370<S(L\210\246\0011|J\250\1\321\325\271\237"..., 80) = 80
write(5, "\27\3\3\0)z\22dy\257F\222\325C0\324e\333\262|X\3006v%O*AP\5IB"..., 46) = 46
write(5, "\27\3\3\0,\31<O\375{\202\3264V\215\300\rV\36\350Z\265\"\235\6\313+\274\226\201Q\251"..., 49) = 49
write(5, "\27\3\3\0\36\257`fy\245\375\37\32\"2\266\311O\315\2202\10.\300\230\246\222\26\255\221\37\272"..., 35) = 35
write(5, "\27\3\3\0:\256\370\330`\212\352Z\326\325O;\376\302\230\314\327j'\2102\351\211\265\351~7%"..., 63) = 63
write(5, "\27\3\3\0\32 2\t\n\342V\375bp\7\260#_\330+\304\263\246\2536\360kn\250qH", 31) = 31
write(5, "\27\3\3\0\"\375k\23y\353\344\264f\351^\264\r\3079\366\214\31a\36[i\2456:#\361L"..., 39) = 39
write(1, "HTTP/2 301 \r\nlocation: https://w"..., 668) = 668
write(5, "\27\3\3\0\23j\232\"N\23\276\252:\231S\312\327S\355^\17 \ng", 24) = 24
+++ exited with 0 +++

With -yy , we can see clearly what we’re writing to, without having to go to proc/our_pid/fd ( usually it’s too late because of short-lived nature of a socket )

strace -yy -e connect,socket,write curl -I https://google.com --silent >/dev/null
socket(AF_INET6, SOCK_DGRAM, IPPROTO_IP) = 3<UDPv6:[19224551]>
socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 5<TCP:[19224556]>
connect(5<TCP:[19224556]>, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("142.250.66.142")}, 16) = -1 EINPROGRESS (Operation now in progress)
write(5<TCP:[10.3.4.213:53490->142.250.66.142:443]>, "\26\3\1\2\0\1\0\1\374\3\3\266\220\36\210bA'<\320R\341A\303R7B\4G]\374\253"..., 517) = 517
write(5<TCP:[10.3.4.213:53490->142.250.66.142:443]>, "\24\3\3\0\1\1\27\3\3\0E\311\31\20=*\325\313\327\232a\224_\226\356@?@\207V;\10"..., 80) = 80
write(5<TCP:[10.3.4.213:53490->142.250.66.142:443]>, "\27\3\3\0)\336h\\j\27\334\274@K\274\231\366\217\331\354\3\350\0\324\351\263Z\227P\262z?"..., 46) = 46
write(5<TCP:[10.3.4.213:53490->142.250.66.142:443]>, "\27\3\3\0,\3651\311\273\217!S\206\376\341+wn\336!\367CHZ\213\201&0^\250Kj"..., 49) = 49
write(5<TCP:[10.3.4.213:53490->142.250.66.142:443]>, "\27\3\3\0\36\244y\3073|r\214V\361\257n\235\210oN\346\255z\324\247\32\0230\330b\231\350"..., 35) = 35
write(5<TCP:[10.3.4.213:53490->142.250.66.142:443]>, "\27\3\3\0:\374\317\236\312\317\322\201G\373\271\\\315\364 ~\245\303\30u\303#S-\332k\21\t"..., 63) = 63
write(5<TCP:[10.3.4.213:53490->142.250.66.142:443]>, "\27\3\3\0\32\364'\302\204\251\"\351\353G\211\314|,/\5\226t\330\320O\30T \363\212\353", 31) = 31
write(5<TCP:[10.3.4.213:53490->142.250.66.142:443]>, "\27\3\3\0\"\205\r\315\v\7\340(\213V\371\232\23Bs\344E]<\317`Z\307,\222C\205-"..., 39) = 39
write(1</dev/null<char 1:3>>, "HTTP/2 301 \r\nlocation: https://w"..., 668) = 668
write(5<TCP:[10.3.4.213:53490->142.250.66.142:443]>, "\27\3\3\0\23\344^\362\n\361\273\355*\252\211\202\266\2579\337EE#\200", 24) = 24
+++ exited with 0 +++

Note that the file descriptor 5 now is resolved to

<TCP:[10.3.4.213:53490->142.250.66.142:443]>

I find this very handy.

Just a side note, perf trace can resolve fd very well if it’s a normal file , but doesn’t provide this descriptive info about socket. And perf trace itself doesn’t resolve memory address ( in this example, the write buffer ) to a human-readable format for you, which is a big plus of strace IMHO.

Traceroute using fixed port and protocol

Recently I stumbled upon a case when traffic from the other countries couldn’t reach DNS servers in our Data Center. Our ISP denied the possibility that they have any kind of filter that block our UDP port 53 traffic ( of course :) ) .

I didn’t know till that time that we can use traceroute with fixed UDP port to probe the routes. Just as simple as this:

traceroute -U -p 53 your_ip_to_trace

then you’ll know at which point your traffic is filtered out. I provided such info for our ISP and they fixed it then.

TLS key negotiation failed OpenVPN issue

I got this while switching our VPN setup from TCP to UDP. Checked the FAQ related to this issue but none of them is my case. After analyzing tcpdump record on both sides, I realized that the TLS handshake packet arrived to our service but was routed to another way ( our OpenVPN server has multi-homing / multiple interfaces setup — which is a common story for a VPN server ) , ended up causing such an error.

We did have a custom policy routing rules to cover this issue and it worked well with TCP server. It happens only when we switch our OpenVPN server from TCP-over-TCP to UDP.

The root cause is noted here long ago, let me quote it:

Yes, this is a one of the common caveats with UDP — with TCP, you get a file descriptor from the kernel, and the kernel handles all the “where did the original packet go to? what do I need to use as a source address for replies?” (because TCP).

For UDP, the application needs to do this, and normal socket API does not even tell the application the destination IP address used in the client connection — so when OpenVPN answers, the kernel will just use one of the addresses in the system (typically the address on the outgoing interface).

Oh !!! This is the best practical example of how connection-oriented TCP differs from connection-less UDP. With UDP, there’s no listen() and accept() after bind() , thus nothing to maintain connection states ( which contains the precise source address for the reply even when tcp server is binded to 0.0.0.0 ).

There’s actually still a file descriptor in case of UDP server but it just doesn’t contain info about 2 ends of the socket. For example, here’s the difference in recv and send syscalls from 2 simple TCP / UDP python servers

TCP

recvfrom(4<TCP:[10.3.4.213:12022->10.101.0.7:31578]>, "asdf\r\n", 1024, 0, NULL, NULL) = 6
sendto(4<TCP:[10.3.4.213:12022->10.101.0.7:31578]>, "asdf\r\n", 6, 0, {sa_family=AF_INET, sin_port=htons(31578), sin_addr=inet_addr("10.101.0.7")}, 16) = 6

UDP

recvfrom(3<UDP:[0.0.0.0:12023]>, "test", 1024, 0, {sa_family=AF_INET, sin_port=htons(36351), sin_addr=inet_addr("10.101.0.7")}, [16]) = 4
sendto(3<UDP:[0.0.0.0:12023]>, "test", 4, 0, {sa_family=AF_INET, sin_port=htons(36351), sin_addr=inet_addr("10.101.0.7")}, 16) = 4

Note how strace resolves file descriptor in each case.

Solution: Just bind your OpenVPN server to a specific IP by using local param, like:

local 1.2.3.4

Quick and simple debugging with tcpdump

I once needed to solve a case when our developer reported that 1 of our redis instance was returning a strange error but we don’t know which one exactly.

Developers log error message only. And we have dozen of them ( which are slave instances of several redis clusters — deployed on kubernetes ) .

Solution ?

On the server that host the application reporting redis error

tcpdump -i any port 6379 -nvv | grep -i "that client side error message" 

Could identify the pod’s IP easily to drill deeper on the bug.

How to set ip_local_port_range properly ?

[   16.599434] ip_local_port_range: prefer different parity for start/end values.

I used to set ip_local_port_range to 15000 65000 just to make it easier to remember and got the warning as above. At first ( a for quite a long time ) , I told myself “what kind of bad karma do kernel developers try to avoid with this non-sense thing”. Because developers , in general, have a good sense of humor.

Turned out they did it for a reason. In short, it “possibly” can speed up the port allocation process for both connect() and bind() if the total of number of ports in the range is a even number. ( ( end — start + 1) % 2 == 0 .

That patch was from 2015 and I haven’t looked very closely to verify it. Just because we still receive the warning, the impact should be there, still.

I ran into this thing while debugging the high %sys CPU issue in 1 of our ingress nginx when it was under a small DDOS attack ( several millions rpm ) . My case is not applicable here because the destination addresses of connect() don’t have high cardinality as the condition mentioned in the patch.

Thanks for reading this far. Hope this helps. And please do tell me what do you think so I can improve myself. ( Right here or I’m tungdam@ on Twitter , come and say hi )

Good night. Hope that I can sleep after this :(

--

--

Tungdam

Sysadmin. Amateur Linux tracer. Performance enthusiast.