Process Tracing
Installing Breakpoints
Imagine now you want to discover what calls a buggy function. Normally, you'd set a breakpoint on it and print the backtrace when it fires. Although gdb.py
can do this, because it knows nothing about your sources, you supply the breakpoint as a raw address in hex and receive the backtrace in much the same form (Figure 2).
Breakpoints come in two flavors – hardware and software – although gdb.py
supports only software breakpoints. To install such a breakpoint, you modify the instruction at the location of interest and put a "breakpoint trigger" instruction there instead. For x86, it's INT 3
(opcode 0xCC). It triggers an interrupt, which the kernel handles and delivers as SIGTRAP
to the tracer. The tracer then puts the original instruction back and resumes the tracee.
In the Python-ptrace implementation, a breakpoint is encapsulated in a class of its own, ptrace.debugger.Breakpoint
. To create a breakpoint, you supply its constructor a tracee process and the address to put the breakpoint into. As you can see from Listing 3, it calls readBytes()
first to read original instructions from the tracee's memory. Then, it calls writeBytes()
to put INT 3
at the address
. PtraceProcess.writeBytes()
translates the PTRACE_POKETEXT
request, which copies a machine word from data
to the address
.
Listing 3
Installing a breakpoint (simplified)
Tracing System Calls
For dessert, I'll briefly skim system call tracing. A de facto standard tool for this is strace (see the "Command of the Month" section), but Python-ptrace bundles its own pure Python version, strace.py
(Figure 3).
When you execute
strace.py /usr/bin/<something>
strace.py
runs the program you specify as a child and issues a PTRACE_TRACEME
request to make the parent (i.e., strace.py
itself) trace it automatically. Then, a slightly modified version of the above "event loop" begins. It starts with a PtraceProcess.syscall()
, which translates to a PTRACE_SYSCALL
request. The kernel then stops the tracee on each syscall entry and exit with a SIGTRAP
signal. This signal is somewhat oversubscribed, so to distinguish syscall traps from everything else, Linux (and some other Unices) introduces a PTRACE_O_SYSGOOD
option. When it's enabled with a PTRACE_SETOPTIONS
request (python-ptrace does this automatically if supported), the kernel delivers system call traps as SIGTRAP | 0x80
– that is, with bit 7 in the signal number raised (|
denotes bitwise OR).
You might wonder why it is important to notify the tracer both on syscall entry and exit. The assumption is that you use the first trap to decode the syscall arguments and the second to obtain the return value. Although it sounds simple, in fact it is rather hairy. Ptrace-python devotes a whole package, ptrace.syscall
, for these purposes. Consult it if you are interested. In short, system calls are distinguished by their numbers, which are dependent on architecture and come through architecture-dependent registers. Where to get the arguments and return value also depends on the application binary interface (ABI). This is not to say you'd expect to see flag names such as O_RDONLY
instead of raw numerical values.
When all this grunt work is finished, strace.py
issues PTRACE_SYSCALL
once again to run the tracee until the next syscall entry and exit, and the loop commences. The addr
argument is unused, and data
stores a signal to inject into the tracee when it's resumed.
Command of the Month: strace
Strace (Figure 4) is a venerable tool with noble SunOS origins that dates back to the early days of Linux. It appears you can teach an old dog some new tricks, though. Version 4.15, released around last Christmas, brings some "cool stuff" created in the course of the last year's Google's Summer of Code program.
I'm speaking about fault injection. Handling error conditions in system programming can be tricky, so how do you check if you have accounted for all of them? It's relatively easy to test if an application misbehaves when it can't open a file, but how do you test for more convoluted things such as interrupted system calls or per-process limits that have been reached?
As of strace 4.15, you can instruct the tool to forge the return value for a selected system call. Consider the example
strace -e fault=open:error=ENOSPC:when=5+ U
/some/program
which makes strace return an ENOSPC error for the fifth and subsequent open() system calls. According to the man page [6], this happens when a filesystem open() can hold no more files. This state isn't trivial to achieve in the real world, but strace makes testing for such tricky conditions a breeze.
Internally, strace sets the syscall number to -1 before resuming a tracee. Because it is an invalid syscall number, the kernel replies with ENOSYS, but the error specification overrides this return value. The when tells strace when to inject the fault: You can do it for every matching system call or for the first invocation, for instance. A newer strace (unreleased at the time of writing) allows you to inject a signal alongside the error code. You already know this works because you supplied a data argument to a corresponding PTRACE_SYSCALL request.
The only catch is that your distribution (if it's not Arch, you know) probably still ships the old strace. Packages for selected distributions are available through the openSUSE Build Service [7]
Infos
- TUI key bindings: https://sourceware.org/gdb/current/onlinedocs/gdb/TUI-Keys.html#TUI-Keys
- ptrace(2) man page: http://man7.org/linux/man-pages/man2/ptrace.2.html
- prctl(2) man page: http://man7.org/linux/man-pages/man2/prctl.2.html
- python-ptrace home: http://python-ptrace.readthedocs.io/en/latest/
- diStorm disassembler home: https://github.com/gdabah/distorm
- open(2) man page: http://man7.org/linux/man-pages/man2/open.2.html
- openSUSE Build Service page for strace: https://build.opensuse.org/package/show/home:ldv_alt/strace/
« Previous 1 2
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Fedora 41 Beta Available with Some Interesting Additions
If you're a Fedora fan, you'll be excited to hear the beta version of the latest release is now available for testing and includes plenty of updates.
-
AlmaLinux Unveils New Hardware Certification Process
The AlmaLinux Hardware Certification Program run by the Certification Special Interest Group (SIG) aims to ensure seamless compatibility between AlmaLinux and a wide range of hardware configurations.
-
Wind River Introduces eLxr Pro Linux Solution
eLxr Pro offers an end-to-end Linux solution backed by expert commercial support.
-
Juno Tab 3 Launches with Ubuntu 24.04
Anyone looking for a full-blown Linux tablet need look no further. Juno has released the Tab 3.
-
New KDE Slimbook Plasma Available for Preorder
Powered by an AMD Ryzen CPU, the latest KDE Slimbook laptop is powerful enough for local AI tasks.
-
Rhino Linux Announces Latest "Quick Update"
If you prefer your Linux distribution to be of the rolling type, Rhino Linux delivers a beautiful and reliable experience.
-
Plasma Desktop Will Soon Ask for Donations
The next iteration of Plasma has reached the soft feature freeze for the 6.2 version and includes a feature that could be divisive.
-
Linux Market Share Hits New High
For the first time, the Linux market share has reached a new high for desktops, and the trend looks like it will continue.
-
LibreOffice 24.8 Delivers New Features
LibreOffice is often considered the de facto standard office suite for the Linux operating system.
-
Deepin 23 Offers Wayland Support and New AI Tool
Deepin has been considered one of the most beautiful desktop operating systems for a long time and the arrival of version 23 has bolstered that reputation.