Delve into ELF Binary Magic
Delve into ELF Binary Magic
Discover what goes on inside executable files, how to reverse-engineer them, and how to make them as small as possible.
Back in the good old days, you could leave your door unlocked at night, music made sense, and writing computer programs was simply a case of putting some CPU instructions in the right order. Today, we have a mammoth range of libraries, toolkits, abstraction layers, and other things that make writing large programs easier – but it's increasingly difficult to understand what the CPU is actually doing. Open up LibreOffice, for example, and type a dot (period) character. What exactly happens here? How many CPU instructions are being executed between your finger hitting the key and that dot appearing on the screen?
Now, we don't want to sound like old codgers who think that everything should be written in assembly language. There's a reason why we have these layers of abstraction, to make software safer, easier to understand, and more portable. But sometimes it's good to go low-level and interact more closely with the CPU and operating system, to better understand what's going on. So, in this article, we'll get down and dirty with CPU instructions, the ELF executable format, and reverse-engineering binary files so you can see what they do.
I Can C Clearly Now
Let's start by writing a very simple C program. Put this into a file called test.c in your home directory:
#include <stdio.h> int main() { puts("Ciao!"); }
Now compile it in a terminal and then run it, using the following commands:
gcc test.c -o test ./test
As you'd expect, our "test" program simply prints the word "Ciao" on the screen, using the standard C library's puts (put string) routine – no surprises there. But enter ls -l test
, and you'll notice something odd: The program is around 8KB in size! Sure, that may sound trivial in today's world of terabyte hard drives, but 8KB is actually huge for a program so simple. (Consider that space exploration classic Elite, back in 1984, was squeezed into 22KB of RAM [1]. That included a whole galaxy to explore, 3D spacecraft, missions, trading, and more. And yet our "Ciao" program is a third of the size.)
Well, this "test" executable includes some information generated by the compiler that we can use for debugging purposes. Let's remove that:
strip test
Now do ls -l test
again, and you'll see that it's slightly smaller – down to 6KB. But that still feels overly large. The Commodore 64's operating system and support routines (aka "KERNAL") fit into 8KB, so we must be able to do better.
When we start poking around inside our "test" binary executable file, however, we see that most of the data inside it has nothing to do with the printing bit. Run this command to see what kind of ASCII (text) data is stored inside the file:
strings test
You'll see results like in Figure 1. There are lots of text strings there generated by GCC that are of no importance to us, but if we look closely we can see the "Ciao" string somewhere among all the gobbledygook. Let's see exactly what kind of file test
is, using the command file test
. Your results will look something like this:
test: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]= 83b8ef795a8d78af706be36db142d9d64aca2307, stripped
Wow, that's quite a bit of info. The most important part is "ELF": This means that the file is in "Executable and Linkable Format," which is the standard format on all GNU/Linux systems. But what exactly is so special about this format? What does it do?
What's in an ELF?
Well, back in the days of early 8-bit and 16-bit computers and .com files on MS-DOS, executable files were simply bundles of CPU instructions and data. The operating system loaded them into a position in RAM and handed over execution to that position. There was no checking of the executable code beforehand, nor was there a distinct separation of code and data. It was a free for all, so to speak.
In modern operating systems, the situation is different. Multiple programs are run simultaneously, they can be loaded into many different places in RAM, and they are made up of multiple sections. In ELF files, like our test
program, there's a "header" section that doesn't include executable code but provides the operating system with information about the program (e.g., the data we see in the file
command above).
Following this, the ELF file contains other sections: one for executable code, one for read-only data (like the "Ciao" string), and one for data that can be changed. Keeping these separate is an important security measure, as the operating system then knows which things can be changed, and which cannot. You don't want a compromised program able to start modifying its own code on the fly, for instance. A simplified view of ELF file structure is shown in Figure 2.
So, that's one reason why our "test" file is larger than we'd expect – but there are others. We can "disassemble" the executable file to show the assembly language that corresponds to the CPU instructions like so:
objdump -d test
This produces a lot of output, so you may want to pipe it through less
to view it: obdjump -d | less
. As you scroll around, look for the .text
section, which actually contains the main code of our program, despite its name. If you've never seen assembly language before, you may be surprised at how many instructions are there, just to print a single word. Are they all necessary?
The answer is no. GCC and other parts of the compiler tool chain add a lot of boilerplate and setup code that's useful but not strictly needed. Most of this is added later in the compilation process. To see the raw assembly language that GCC initially generates from our C code, run this command:
gcc -S test.c
This generates a file called test.s
– have a look inside it, and you'll see results like in Figure 3. The parts on the left, beginning with the dot characters and no indentation, are labels that point to specific parts of the code. You can see that the .LC0 label points to our "Ciao" string, while the main program code begins at .LFB0.
Various assembly language instructions set up the program and memory, but the two that do the work of putting "Ciao" on the screen are these:
leaq .LC0(%rip), %rdi call puts@PLT
We won't go into the specifics of assembly language right now, but in a nutshell: This code executes (calls) the standard C library's puts routine, giving it the location of the text string to print. Once the C library has done its work, it hands control back to our program, which does a bit of cleaning up before it ends with the ret
instruction (return – basically, give control back to the operating system).
You Can Go Your Own Way
So, we've poked around inside an executable generated from a C file and done some reverse-engineering on it; now let's look at making the program as small as possible. One thing we want to do is remove our dependency on the GNU C library (glibc). Running this command
ldd test
shows the libraries on which test
depends – and one of them is libc.so.6. The output of ldd
shows where that library is on your system (it's probably a symlink to another file), so with ls -l
followed by the full filename you can see how big it is. On our system, the C library weighs in at 1.8MB, but if you're running a super-sized Gentoo setup with optimizations galore, you may have shrunk it down a bit. In any case, it's a hefty dependency that we'd like to get rid of.
But, how do we print a message on the screen, without using puts
, printf
, or other common routines from the C library? Well, we can actually get the kernel to do the work for us. The Linux kernel includes a bunch of system calls for doing crucial tasks: opening and closing files, starting processes, and basic input and output. Many standard C library routines act as fancy wrappers around these system calls, adding extra features and checks to reduce bugs, which is why few programs interact directly with the kernel. But we're going to do it!
We'll write a short assembly language program that does the exact same job as test.c
created earlier. (Note that we're using 32-bit x86 assembly language code here, so it'll work on 32-bit and 64-bit Intel/AMD PCs, but not on other architectures like the Raspberry Pi.) To convert the assembly code into an executable, we'll use the NASM assembler, so install it from your distro's package manager or – on Ubuntu-based distributions – enter the following:
sudo apt-get install nasm
Then, enter the following into a text editor and save it as test2.asm
(Figure 4):
section .data msg db "Ciao!", 10 section .text global _start _start: mov ecx, msg mov edx, 6 mov ebx, 1 mov eax, 4 int 0x80 mov eax, 1 int 0x80
Assemble it into a binary executable file (test2
) and run it using these commands:
nasm -f elf test2.asm ld -m elf_i386 -s -o test2 test2.o ./test2
Et voila – "Ciao!" is printed on the screen, just like with the C program we created at the start of this tutorial. But this program is very different, in that it uses a kernel system call to display the text on the screen.
At the start, we set up a "data" section, which contains our "Ciao!" text string. This is put next to a label called msg
, which identifies exactly where the string can be found. Note that we end our string with the number 10, which is the ASCII character for a line feed (like pressing enter – see the online ASCII chart [2] for a reference).
So with our string prepared, we can start writing CPU instructions and talk to the Linux kernel. You may recall that the "text" section is the one that contains code, rather confusingly, so we start with this section. We immediately create another label called _start
, which points to the beginning of the code – this is used by the operating system to determine exactly where in the file the program begins.
Next up, we need to populate some registers with important data. Registers are a bit like variables, in that they can store many different values, but they are actually memory storage spaces built in to the CPU. Working with them is extremely fast, compared to regular RAM, but there is a very limited set of registers.
Anyway, before we tell the Linux kernel to display the string, we need to provide it with some information. First, we put the location of the string into the ecx
register. (NASM instructions go from right to left, so the first mov
here means move – actually copy – the value of msg
into the ecx
register.)
Second, we need to determine how many characters in the string we want the kernel to print. With the exclamation point and trailing line feed (10) character, that's six characters in total, so we put that in the edx
register. Then we put 1 and 4 into the ebx
and eax
registers, respectively, which tell the kernel which specific system call to use (in our case, write
) and where to print the text (stdout).
With all the registers set up, the int 0x80
instruction does the magic of "interrupting" our program and handing control over to the Linux kernel. The kernel looks at the eax
register and thinks: "Aha, the calling program wants me to run the 'write' system call. Let's see what's in the other registers, to find out where the string is, how long it is, and where I should display it."
Once the kernel has done its work, it hands execution back to our program. Then we put l
into the eax
register and call the kernel again – this time the l
value tells the kernel to safely terminate our program. And that's it!
Now run ls -l test2
, and you'll see that the executable is down to around 350 bytes! That's way, way smaller than the C equivalent. We've still created a valid ELF executable file, but there's none of the extra startup and cleanup code added by the C compiler, nor are we using a C library.
And guess what? It's possible to make this executable even smaller! This involves some rather advanced tricks and hacks, but if this tutorial has whetted your appetite for minimalism, check out the fascinating "Creating Really Teensy ELF Executables for Linux" guide by Brian Raiter [3].
ELF and ARM
Although we focused on x86 assembly language in the latter part of this tutorial, ELF files are not constrained by any particular CPU architecture. If you have a Raspberry Pi and go into the /bin
directory, for instance, and run file
on a few of the executables there, you'll notice that they're also ELF files – but for the ARM architecture.
ARM is arguably a much more elegant and better designed instruction set than x86. The latter has a more limited set of registers (in 32-bit mode), and some instructions can only be performed on certain registers. There's also baggage everywhere, due to backwards compatibility over many decades. So, if you really want to get into assembly language, we recommend going with ARM first.
Sure, it's not architecture used by most desktop and laptop PCs, but it's absolutely everywhere – in smartphones, embedded devices, and of course the Raspberry Pi. We ran a tutorial on ARM assembly previously that you can find on the website [4]. One especially fun ARM device to play around with and write assembly code for is the Nintendo Game Boy Advance. It's a fairly simple machine compared to the Pi, but you can do a lot with it. Parater [5] has a pretty good outline of the essentials, covering common ARM CPU instructions and how to interface with the Game Boy Advance's hardware.
Infos
- Elite: http://news.bbc.co.uk/2/hi/technology/8261272.stm
- ASCII chart: http://asciichart.com
- ELF Executables for Linux: http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html
- ARM assembly tutorial: https://www.linuxvoice.com/creative-commons-issues/
- Parater: https://patater.com/gbaguy/gbaasm.htm
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Linux Servers Targeted by Akira Ransomware
A group of bad actors who have already extorted $42 million have their sights set on the Linux platform.
-
TUXEDO Computers Unveils Linux Laptop Featuring AMD Ryzen CPU
This latest release is the first laptop to include the new CPU from Ryzen and Linux preinstalled.
-
XZ Gets the All-Clear
The back door xz vulnerability has been officially reverted for Fedora 40 and versions 38 and 39 were never affected.
-
Canonical Collaborates with Qualcomm on New Venture
This new joint effort is geared toward bringing Ubuntu and Ubuntu Core to Qualcomm-powered devices.
-
Kodi 21.0 Open-Source Entertainment Hub Released
After a year of development, the award-winning Kodi cross-platform, media center software is now available with many new additions and improvements.
-
Linux Usage Increases in Two Key Areas
If market share is your thing, you'll be happy to know that Linux is on the rise in two areas that, if they keep climbing, could have serious meaning for Linux's future.
-
Vulnerability Discovered in xz Libraries
An urgent alert for Fedora 40 has been posted and users should pay attention.
-
Canonical Bumps LTS Support to 12 years
If you're worried that your Ubuntu LTS release won't be supported long enough to last, Canonical has a surprise for you in the form of 12 years of security coverage.
-
Fedora 40 Beta Released Soon
With the official release of Fedora 40 coming in April, it's almost time to download the beta and see what's new.
-
New Pentesting Distribution to Compete with Kali Linux
SnoopGod is now available for your testing needs