A Deep Dive into the ELF File Format

The Section Headers

Each section header describes one section, including its type, permissions, size, and location (on disk and in memory), and other miscellaneous information. The raw data for the sections appears at the end of the ELF file. This raw data consists of executable code, program data such as global objects, and information used in the link. Some ELF images could also contain debugging data in DWARF format.

Table 1 shows some common section names and their descriptions. Most of these need little explanation. Note that the capitalized section names used in the source code don't make it into the final ELF file; they're just labels for the benefit of the assembler. The real section names are by convention lowercase and start with a dot. For more information on the .gnu.hash section, see "GNU Hash ELF Sections" online [4]; the accompanying Git repository contains a small C utility that will generate a GNU-style hash for any symbol.

Table 1

Common Section Names

Name

Description

interp

A (possibly) null-terminated string requesting a program interpreter (aka runtime linker/loader). On Linux, this would typically be /lib64/ld-linux-x86-64.so.2; on FreeBSD it's /libexec/ld-elf.so.1.

.gnu.hash

When the dynamic linker combines all the objects in a process, it needs a way to discover, rapidly, whether a particular symbol is present in a given object file. The GNU hash section provides a precomputed hash table to facilitate this.

.dynsym

A list of symbols that the object provides or requires. The first symbol should be a null symbol. For symbols we're providing ourselves, we need to supply the section index of the section where the symbol's storage is located, and a virtual address for the symbol.

.dynstr

Null-terminated strings, usually the names of libraries and functions needed in the link. This section, like .strtab and .shstrtab, is defined to begin and end with a null character.

.rela.plt

Relocations. Each relocation contains the address of a slot the dynamic linker needs to fill in, as well as the offset of the corresponding symbol in .dynsym, the type of relocation (we'll only be using R_X86_64_JMP_SLOT = 7), and a constant addend. These fields are all quad-words.

.text

The actual executable code of the program; the address of this section is typically the program's entry point.

.plt

Contains code used as a springboard to functions in other ELF images loaded in the same address space.

.got.plt

Contains the absolute addresses of functions in other ELF images loaded in the same address space.

.bss, .data

These sections contain only program dataholding variables expected by libc.

.dynamic

This section contains an array of pairs of quad-words providing extra information to help with dynamic linking. The first quad-word can be thought of as a configuration option and the second, its value. For example, DT_NEEDED followed by the offset of the string libc.so.7 indicates that this library is needed, and DT_GNU_HASH followed by a virtual address tells the linker where to find the .gnu.hash section.

.symtab

Non-dynamic symbols; not usually loaded into memory at runtime.

.strtab

Non-dynamic strings referenced by .symtab.

.shstrtab

Contains the section names used by the section headers.

As with the program headers, section headers have a common format, so I wrap the declarations in a macro (Listing 3).

Listing 3

The Section Header Macro

 

We can then declare a section header with just one macro invocation:

SECTION_HEADER TEXT,SHSTRTAB.S6,SHT_PROGBITS,SHF_ALLOCor SHF_EXECINSTR,
LOAD_BASE + PLANE1 + SECTION_TEXT,SECTION_TEXT,TEXT_SIZE,0,0,0x10,0x0

Section Contents

This simple example does not require all the sections described in Table 1, but a brief description of the .text, .plt, .got.plt, and .rela.plt sections will give you an indication of how the sections are structured.

The .text section contains the executable code for the program proper (Listing 4). In this case, the code calls puts to print a string to the terminal and exit to return control to the operating system.

Listing 4

The main Function

 

x86-64 instructions often use relative addressing. This means that, for example, a CALL instruction is encoded differently depending on its distance from the code it's calling (the destination is encoded as a signed 32-bit value). This makes it impossible to encode an absolute address or call a function whose offset won't fit in 32 bits. The solution is the Procedure Linkage table (PLT) and the Global Offset Table (GOT), which are described in the .plt and got.plt sections. The PLT provides call destinations that are local to the ELF image, so all of its labels can be reached by a 32-bit relative call. It then uses a JMP instruction to jump to the real function, whose address the dynamic linker has placed in the GOT. The PLT and GOT also allow the dynamic linker to resolve function addresses only when required, speeding up the loading process.

Listing 5 defines some convenience macros for the .plt and got.plt sections and then lists the sections themselves.

Listing 5

The PLT and GOT

 

Next I use the .rela.plt section to define a macro for a single relocation, using the 24-byte structure described earlier (Listing 6).

Listing 6

Relocations

 

Exported and Imported Symbols

The executable exports two symbols, environ and __progname, expected by libc, and imports puts and exit. In Listing 7, I wrap these declarations in some convenience macros.

Listing 7

Imported and Exported Symbols

 

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Qmake for Qt

    Qt’s own build system Qmake is often overlooked for larger projects, but many experienced developers appreciate Qmake support for shadow builds and pre- or post-build dependencies.

  • Jasonette

    Jasonette makes it supremely easy to build simple and advanced Android apps with a minimum of coding.

  • ARM64 Assembly and GPIO

    Reading, writing, and arithmetic with the Raspberry Pi in ARM64 assembly language.

  • 01000010

    Talk to your Raspberry Pi in its native assembler language.

  • Programming with QCanvas

    The Qt toolkit from Trolltech sports features that appeal to any developer’s needs, but one of the most fascinating and powerful parts of the toolkit is the QCanvas class.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News