Anatomy of a simple Linux utility

How ls Works

© racorn, 123RF

© racorn, 123RF

Article from Issue 174/2015
Author(s):

A simple Linux utility program such as ls might look simple, but many steps happen behind the scenes from the time you type "ls" to the time you see the directory listing. In this article, we look at these behind-the-scene details.

What really happens when you enter a program's name in a terminal window? This article is a journey into the workings of a commonly used program – the ubiquitous ls file listing command. This journey starts with the Bash [1] shell finding the ls program in response to the letters ls typed at the terminal, and it leads to a list of files and directories retrieved from the underlying filesystem [2].

To recreate these results, you'll need some basic understanding of standard debugging techniques using the GNU debugger (gdb), some familiarity with the SystemTap system information utility [3] [4], and an intermediate-level understanding of C programming code. SystemTap is a scripting language and an instrumentation framework that allows you to examine a Linux kernel dynamically. If you don't have all these skills, following along will still give you some insight into the inner workings of a program on Linux.

This article assumes you are running Linux kernel 3.18 [5] with the debug symbols for Bash installed, that a local copy of the 3.18 kernel source is available, and that SystemTap is set up properly. In the next section, I will describe how to configure your system to follow this article.

Setting Up Your System

To install the Bash debug symbols on Fedora 21, you can use the command:

# debuginfo-install bash

If you do not have the GNU debugger gdb installed, you can install it using  yum install gdb.

The kernel 3.18 source can be downloaded from The Linux Kernel Archives [6], or, if you prefer to clone the kernel source, switch to the v3.19 branch. SystemTap can be installed on Fedora 21 with:

# yum install systemtap-devel systemtap-client
# stap-prep

The last line installs the necessary kernel packages for your kernel.

Methodology

Before getting started, it is worthwhile discussing the methodology I adopted for this investigation. The first step is to understand how the program – an executable script or a binary program – corresponding to a command entered on the command line is found. By placing breakpoints at key locations in Bash, you can halt the execution of Bash and examine key variables to get an idea what the program is processing at that point in the program. The next section makes this step clearer with an example that uses the ls program.

Once you know how the program to be executed is found, you want to know how the program itself works. System calls are the entry point for a program to the kernel space. The program either invokes one directly or via a library function call.

After determining the key system call or calls, you then look into the kernel source code to find the function implementing that system call. SystemTap scripts can then trace the entry and exit from these functions, illustrating how the control flow occurs to and from kernel space.

I adopt this methodology to understand how the ls program works, but the same techniques should be relevant for any program.

First Steps: Typing ls

When I type ls, the location of the binary corresponding to the command is first searched in the locations in the PATH environment variable. You can chart this action using the GNU debugger (gdb); you'll either need the debug symbols for Bash installed or a locally built copy of Bash with debug enabled.

To begin, start a gdb session and pass in the bash binary:

> gdb bash

Place a breakpoint in the search_for_command() function and start bash, passing in ls as the argument (Listing 1).

Listing 1

Placing Breakpoints in Bash Source

01 (gdb) b search_for_command
02 Breakpoint 1 at 0x46ce80: file findcmd.c, line 307
03 (gdb) run -c ls
04
05 Breakpoint 1, search_for_command (pathname=0x707140 "ls", flags=1) at
06 findcmd.c:307
07 307     {
08 (gdb) bt
09 #0  search_for_command (pathname=0x707140 "ls", flags=1) at
10 findcmd.c:307
11 #1  0x000000000041f69a in execute_disk_command (cmdflags=64, \
       fds_to_close=0x705c10, async=0, pipe_out=-1, pipe_in=-1,\
       command_line=0x7071e0 "ls", redirects=0x0, words=0x707200) at \
       execute_cmd.c:4918
12 #2  execute_simple_command (simple_command=<optimized out>, \
       pipe_in=pipe_in@entry=-1, pipe_out=pipe_out@entry=-1, \
       async=async@entry=0, fds_to_close=fds_to_close@entry=0x705c10) at \
       execute_cmd.c:4240
13 #3  0x00000000004362cc in execute_command_internal_command=0x705bc0, \
       asynchronous=asynchronous@entry=0, pipe_in=pipe_in@entry=-1, \
       pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0x705c10) \
       at execute_cmd.c:799
14 #4  0x00000000004771ab in parse_and_execute (string=<optimized out>, \
       from_file=from_file@entry=0x4b3050 "-c", flags=flags@entry=4) at \
       evalstring.c:387
15 #5  0x000000000042238e in run_one_command(command=<optimized out>) \
       at shell.c:1358
16 #6  0x00000000004212af in main (argc=3,argv=0x7fffffffdc18, \
       env=0x7fffffffdc38) at shell.c:705

As you can see from line #0 in Listing 1, the argument pathname refers to the string ls, which now has to be searched in the locations specified by the user's $PATH variable. My $PATH is as follows:

> echo $PATH
  /usr/lib64/qt-3.3/bin:/usr/lib64/ccache:/bin:/usr/bin:\
  /usr/local/bin:/usr/local/sbin:/usr/sbin:/home/asaha/.local/bin:\
  /home/asaha/bin

I now place a breakpoint in the find_user_command_in_path() function to see how Bash searches through all the locations present in $PATH (Listing 2).

Listing 2

Searching for the Program in $PATH

01 (gdb) b find_user_command_in_path
02 Breakpoint 2 at 0x46c850: file findcmd.c, line 557.
03
04 (gdb) cont
05 Continuing.
06 Breakpoint 3, find_in_path_element \
   (name=name@entry=0x707140 "ls", path=path@entry=0x707280 \
   "/usr/lib64/qt-3.3/bin", flags=flags@entry=36, \
   dotinfop=dotinfop@entry=0x7fffffffd650, name_len=<optimized out>) \
   at findcmd.c:472
07 472     find_in_path_element (name, path, flags, name_len, dotinfop)
08
09 (gdb) cont
10 Continuing.
11 Breakpoint 3, find_in_path_element (name=name@entry=0x707140 "ls", \
     path=path@entry=0x707280 "/usr/lib64/ccache", flags=flags@entry=36,\
     dotinfop=dotinfop@entry=0x7fffffffd650, name_len=<optimized out>) \
     at findcmd.c:472
12 472     find_in_path_element (name, path, flags, name_len, dotinfop)
13
14 (gdb) cont
15 Continuing.
16 Breakpoint 3, find_in_path_element \
   (name=name@entry=0x707140 "ls", path=path@entry=0x707280 "/bin", \
    flags=flags@entry=36, dotinfop=dotinfop@entry=0x7fffffffd650, \
    name_len=<optimized out>) at findcmd.c:472
17 472     find_in_path_element (name, path, flags, name_len, dotinfop)
18 (gdb) cont
19 Continuing.
20 process 11762 is executing new program: /usr/bin/ls

At the end of Listing 2, /usr/bin/ls has been found (/bin is a symlink to /usr/bin on Fedora 21); the function shell_execve() invokes the execve() system call to execute the command.

The stat() system call is invoked to check the existence of the executable corresponding to ls in the path locations. Listing 3 shows the snippet of the calls to stat() for the three path locations.

Listing 3

stat() Calls to Path Locations

01 stat("/usr/lib64/qt-3.3/bin/ls", 0x7fff8c535c40) = -1 ENOENT \
   (No such file or directory)
02 stat("/usr/lib64/ccache/ls", 0x7fff8c535c40) = -1 ENOENT \
   (No such file or directory)
03 stat("/bin/ls", {st_mode=S_IFREG|0755, st_size=123088, ...}) = 0

A closer look at the kernel reveals how the stat() command works. From here on out, all source references are relative to the top-level kernel source directory.

The stat() system call is defined as in fs/stat.c (Listing 4). The vfs_stat() function in turn is defined as shown in Listing 5. The function vfs_fstatat() makes use of the inode data structures to check for the file's existence, and, if it exists, it retrieves the file's attributes. To see what is happening in kernel space when the stat() function call is invoked, I use the SystemTap script in Listing 6 to trace the call to and from the vfs_fstatat() function (Listing 6).

Listing 5

Definition of vfs_stat()

01 int vfs_stat(const char __user *name, struct kstat *stat)
02 {
03     return vfs_fstatat(AT_FDCWD, name, stat, 0);
04 }

Listing 6

Tracing Call To and From vfs_fstatat()

01 probe kernel.function("vfs_fstatat@fs/stat.c").call
02 {
03     # we are only interested in calls to vfs_fstatat() from "bash"
04     if(execname() == "bash")
05         printf("%s -> %s %s\n", thread_indent(-1), probefunc(), \
                  kernel_string($filename));
06 }
07
08 probe kernel.function("vfs_fstatat@fs/stat.c").return
09 {
10     if(execname() == "bash")
11         printf("%s <- %s\n", thread_indent(-1), probefunc());
12 }
13
14 probe timer.ms(30000)
15 {
16     exit()
17 }

Listing 4

Definition of stat() System Call

01 SYSCALL_DEFINE2(stat, const char __user *, filename, \
                   struct__old_kernel_stat __user *, statbuf)
02 {
03     struct kstat stat;
04     int error;
05
06     error = vfs_stat(filename, &stat);
07     if (error)
08         return error;
09
10     return cp_old_stat(&stat, statbuf);
11 }

The vfs_fstatat() function has the prototype:

int vfs_fstatat\
    (int dfd, const char __user *filename, struct kstat *stat, int flag)

The parameter, filename is what I am interested in here. When you run the SystemTap script, you will see the lines shown in Listing 7.

Listing 7

Output of Script in Listing 6

01 # stap -v find_ls.stp
02 Pass 1: parsed user script and 174 library script(s) using
03 448852virt/271248res/6248shr/267632data kb, in 1600usr/120sys/1721real
04 ms.
05 Pass 2: analyzed script: 3 probe(s), 17 function(s), 5 embed(s), 2
06 global(s) using 519976virt/341516res/7620shr/338756data kb, in
07 700usr/100sys/801real ms.
08 Pass 3: using cached
09 /root/.systemtap/cache/40/stap_40cbb339787d6b1aad27f7870ca767f0_6441.c
10 Pass 4: using cached
11 /root/.systemtap/cache/40/stap_40cbb339787d6b1aad27f7870ca767f0_6441.ko
12 Pass 5: starting run.

Now, execute the ls command in another terminal window. You should see the lines shown in Listing 8 in the SystemTap window.

Listing 8

Output of SystemTap script in Listing 7

0 bash(28736): -> vfs_fstatat .
106 bash(28736): -> vfs_fstatat /usr/lib64/qt-3.3/bin/ls
118 bash(28736): <- SYSC_newstat
125 bash(28736): -> vfs_fstatat /usr/local/bin/ls
134 bash(28736): <- SYSC_newstat
141 bash(28736): -> vfs_fstatat /bin/ls
155 bash(28736): <- SYSC_newstat
162 bash(28736): -> vfs_fstatat /bin/ls
170 bash(28736): <- SYSC_newstat
201 bash(28736): -> vfs_fstatat /bin/ls
213 bash(28736): <- SYSC_newstat
245 bash(28736): -> vfs_fstatat /bin/ls
253 bash(28736): <- SYSC_newstat
259 bash(28736): -> vfs_fstatat /bin/ls
267 bash(28736): <- SYSC_newstat
283 bash(28736): -> vfs_fstatat /bin/ls
290 bash(28736): <- SYSC_newstat

At this stage, I have a fairly reasonable idea of what happens in userspace and kernel space so that the location of the program to which ls corresponds is found. Now, I am ready to see how the binary is executed.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Tracing Tools

    Programs rarely reveal what they are doing in the background, but a few clever tools, of interest to both programmers and administrators, monitor this activity and log system functions.

  • Bash Tuning

    In the old days, shells were capable of little more than calling external programs and executing basic, internal commands. With all the bells and whistles in the latest versions of Bash, however, you hardly need the support of external tools.

  • Fedora 13 is Live

    The latest release features improved device access, improved virtualization.

  • USENIX LISA: Ted T'so Helps with System Crashes and Presents SystemTap

    Linux kernel developer Ted T'so shared his know-how in a number of tutorials at the USENIX LISA conference in San Diego. One theme was getting first help for system crashes, and in the process, he took the opportunity to present SystemTap.

  • UKUUG Linux Conference 2006

    Blue skies, lots of sun, a great conference venue, and about 200 Linux enthusiasts – that’s the perfect recipe for another successful UKUUG summer meeting at the University of Sussex, Brighton.

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

News

njobs Europe
What:
Where:
Country:
Njobs Netherlands Njobs Deutschland Njobs United Kingdom Njobs Italia Njobs France Njobs Espana Njobs Poland
Njobs Austria Njobs Denmark Njobs Belgium Njobs Czech Republic Njobs Mexico Njobs India Njobs Colombia