Anatomy of a simple Linux utility

How ls Works

© racorn, 123RF

© racorn, 123RF

Article from Issue 174/2015
Author(s):

A simple Linux utility program such as ls might look simple, but many steps happen behind the scenes from the time you type "ls" to the time you see the directory listing. In this article, we look at these behind-the-scene details.

What really happens when you enter a program's name in a terminal window? This article is a journey into the workings of a commonly used program – the ubiquitous ls file listing command. This journey starts with the Bash [1] shell finding the ls program in response to the letters ls typed at the terminal, and it leads to a list of files and directories retrieved from the underlying filesystem [2].

To recreate these results, you'll need some basic understanding of standard debugging techniques using the GNU debugger (gdb), some familiarity with the SystemTap system information utility [3] [4], and an intermediate-level understanding of C programming code. SystemTap is a scripting language and an instrumentation framework that allows you to examine a Linux kernel dynamically. If you don't have all these skills, following along will still give you some insight into the inner workings of a program on Linux.

This article assumes you are running Linux kernel 3.18 [5] with the debug symbols for Bash installed, that a local copy of the 3.18 kernel source is available, and that SystemTap is set up properly. In the next section, I will describe how to configure your system to follow this article.

Setting Up Your System

To install the Bash debug symbols on Fedora 21, you can use the command:

# debuginfo-install bash

If you do not have the GNU debugger gdb installed, you can install it using  yum install gdb.

The kernel 3.18 source can be downloaded from The Linux Kernel Archives [6], or, if you prefer to clone the kernel source, switch to the v3.19 branch. SystemTap can be installed on Fedora 21 with:

# yum install systemtap-devel systemtap-client
# stap-prep

The last line installs the necessary kernel packages for your kernel.

Methodology

Before getting started, it is worthwhile discussing the methodology I adopted for this investigation. The first step is to understand how the program – an executable script or a binary program – corresponding to a command entered on the command line is found. By placing breakpoints at key locations in Bash, you can halt the execution of Bash and examine key variables to get an idea what the program is processing at that point in the program. The next section makes this step clearer with an example that uses the ls program.

Once you know how the program to be executed is found, you want to know how the program itself works. System calls are the entry point for a program to the kernel space. The program either invokes one directly or via a library function call.

After determining the key system call or calls, you then look into the kernel source code to find the function implementing that system call. SystemTap scripts can then trace the entry and exit from these functions, illustrating how the control flow occurs to and from kernel space.

I adopt this methodology to understand how the ls program works, but the same techniques should be relevant for any program.

First Steps: Typing ls

When I type ls, the location of the binary corresponding to the command is first searched in the locations in the PATH environment variable. You can chart this action using the GNU debugger (gdb); you'll either need the debug symbols for Bash installed or a locally built copy of Bash with debug enabled.

To begin, start a gdb session and pass in the bash binary:

> gdb bash

Place a breakpoint in the search_for_command() function and start bash, passing in ls as the argument (Listing 1).

Listing 1

Placing Breakpoints in Bash Source

 

As you can see from line #0 in Listing 1, the argument pathname refers to the string ls, which now has to be searched in the locations specified by the user's $PATH variable. My $PATH is as follows:

> echo $PATH
  /usr/lib64/qt-3.3/bin:/usr/lib64/ccache:/bin:/usr/bin:\
  /usr/local/bin:/usr/local/sbin:/usr/sbin:/home/asaha/.local/bin:\
  /home/asaha/bin

I now place a breakpoint in the find_user_command_in_path() function to see how Bash searches through all the locations present in $PATH (Listing 2).

Listing 2

Searching for the Program in $PATH

 

At the end of Listing 2, /usr/bin/ls has been found (/bin is a symlink to /usr/bin on Fedora 21); the function shell_execve() invokes the execve() system call to execute the command.

The stat() system call is invoked to check the existence of the executable corresponding to ls in the path locations. Listing 3 shows the snippet of the calls to stat() for the three path locations.

Listing 3

stat() Calls to Path Locations

 

A closer look at the kernel reveals how the stat() command works. From here on out, all source references are relative to the top-level kernel source directory.

The stat() system call is defined as in fs/stat.c (Listing 4). The vfs_stat() function in turn is defined as shown in Listing 5. The function vfs_fstatat() makes use of the inode data structures to check for the file's existence, and, if it exists, it retrieves the file's attributes. To see what is happening in kernel space when the stat() function call is invoked, I use the SystemTap script in Listing 6 to trace the call to and from the vfs_fstatat() function (Listing 6).

Listing 5

Definition of vfs_stat()

 

Listing 6

Tracing Call To and From vfs_fstatat()

 

Listing 4

Definition of stat() System Call

 

The vfs_fstatat() function has the prototype:

int vfs_fstatat\
    (int dfd, const char __user *filename, struct kstat *stat, int flag)

The parameter, filename is what I am interested in here. When you run the SystemTap script, you will see the lines shown in Listing 7.

Listing 7

Output of Script in Listing 6

 

Now, execute the ls command in another terminal window. You should see the lines shown in Listing 8 in the SystemTap window.

Listing 8

Output of SystemTap script in Listing 7

 

At this stage, I have a fairly reasonable idea of what happens in userspace and kernel space so that the location of the program to which ls corresponds is found. Now, I am ready to see how the binary is executed.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Tracing Tools

    Programs rarely reveal what they are doing in the background, but a few clever tools, of interest to both programmers and administrators, monitor this activity and log system functions.

  • Fedora 13 is Live

    The latest release features improved device access, improved virtualization.

  • Bash Tuning

    In the old days, shells were capable of little more than calling external programs and executing basic, internal commands. With all the bells and whistles in the latest versions of Bash, however, you hardly need the support of external tools.

  • USENIX LISA: Ted T'so Helps with System Crashes and Presents SystemTap

    Linux kernel developer Ted T'so shared his know-how in a number of tutorials at the USENIX LISA conference in San Diego. One theme was getting first help for system crashes, and in the process, he took the opportunity to present SystemTap.

  • Binary Data in Bash

    Bash is known for admin utilities and text manipulation tools, but the venerable command shell included with most Linux systems also has some powerful commands for manipulating binary data.

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

News