What I’ve learned so far about tracing
This is a wrap up of my practical linux tracing series. You can see older posts start from here.
In this final write-up, I’ll change my style a bit, to make it more straight to the point.
Here are my top 10 key takeaways after some time working as a Linux tracer.
1. Tracing tools, techniques and data ( events, traces , spans… ) are means, not the end result. What matters is how do you interpret them to understand your situation and root cause.
2. Tracing is basically is to know what your service/application is doing. Eventually the truth lies in source code. Be prepared to read somebody’s code always.
3. Nowadays the data we can collect is huge, find a way to visualize and analyze it. Trace with a plan in your mind. The raw data only is way beyond your cognitive ability.
4. Tracing or performance tuning is not an overnight task. Just like hacking. The success you see is the result of weeks or months research. There’s a quick win sometimes but the big win usually requires you to understand your system thoroughly. No pain no gain.
5. On the other hand, there’s nothing wrong with “script kiddies” if it helps. Many top engineers in the field goal is to try making tracing or debugging much easier everyday. ( By sharing knowledge and making tools )
6. If you want to leverage tracing as a quick method to solve your production issue, invest into tracing infrastructure ( eg: tools, libraries, storage, even third-party solution… ) and making your application tracing-friendly ( eg: integrated instrumentation libraries right from the start to expose useful data ). Thus, Implementing distributed tracing for the whole organization requires a culture shift. But it’s worth doing that.
7. Tracing is cool, but solving production problem on time is much cooler. Root cause investigation can wait, sometimes work-around to mitigate damage should have higher priority. This is where simple things like change management or a quick conversation with your fellow developers are really helpful.
8. Always think about overhead while tracing. If you can’t measure it very precisely, start small ( eg: profiling with low frequency, recording a few events , in short time ) , observe the impact then increase things gradually.
9. Tracing is mostly about finding error. So be humble and professional while reporting it to the author of “not-very-optimal code”. The tracer should have a clear evidence or proof-of-concept while presenting the error. If you’re not that sure ( quite often ), sit down with your colleagues and asking for their help to verify your assumption, nicely. Sometimes a tracer needs to have very good convincing skill.
10. Tracing is fun, and addictive.
Happy new year everybody !