How compilers work
Symbol Table
To map the source code compactly in memory, the scanner replaces each recognized keyword, each variable name, and all other elements with a symbol. You could replace the rather long variable names with two numbers: The first number is used as a substitute for the variable name. The second number specifies the row in a table, the symbol table, which contains the variable name used by the developer. During syntax and semantics analysis, additional information, such as the type of variable, appears in the symbol table.
To find an entry in the symbol table as quickly as possible, many compilers run the variable name through a hash function that can be calculated quickly. This step spits out a number, which the compiler then uses as an index in the symbol table. If the compiler encounters the variable name again later on, it simply calculates the hash value and immediately gets the location in the symbol table with all important information about the variable – such as its type. The compiler does not have to search through the entire table.
Both the scanner and the semantic routine generate a number of tables in addition to the symbol table. Among other things, these tables contain the nesting structure for the loops and the loop variables. The compiler repeatedly accesses the information in the tables, even later on.
Interpreter, Assembler, and Translator
Unlike the compiler, an interpreter reads the source code and executes it directly; thus, no object code is generated. The classic interpreters analyze each command in the source code one after another.
Modern interpreters convert the complete source code into a special optimized internal representation. The interpreter then executes this intermediate or byte code much faster. Sometimes a just-in-time compiler translates the internal representation into machine language, which further increases the execution speed. Java uses this procedure.
An assembler is a special form of the compiler that translates a program into assembler or machine language. Since assembler is usually a symbolic representation of the machine commands, both languages are similar. The generic term translator usually refers to all three (i.e., compilers, assemblers, and interpreters).
Internal Representation of the Source Code
The scanner passes the symbols that it has determined to the parser. The syntax and semantic analysis ends with an internal representation of the source code. The compiler and the languages are responsible for what this representation looks like. The program could be present as a syntax tree or in Polish notation. Many compilers also use quadruples, for example, from the A = B + A
statement it would be:
+, B, A, T1 =, T1, A
T1
is a temporary variable created by the compiler.
So far, the compiler has only analyzed the source program. For this reason, experts also refer to this first phase as the analysis phase.
Generating Code
In the next step, another component optimizes the internal presentation. As a rule, the compiler optimizes the run time and assigns memory locations to the variables. In the preceding example, the compiler would try to eliminate the temporary variable T1
.
In the last phase, the compiler finally generates the executable machine code. Generally, programmers call this object code or simply code. Under Linux, it is usually either a (dynamic) library or the executable program.
Some compilers also produce assembler code, which is then converted into machine language by a downstream assembler. The compiler could generate the following code from A = A + B
:
lda a ; load a in the accumulator add b ; add b to the accumulator sto a ; store accumulator to a
Control structures such as if
, while
, and for
can usually be mapped with the jump instructions of the processor. Complex loops, such as for
, may optionally replace a (longer) while
loop.
Because it knows the processor instruction set, and the information from the tables, the compiler also makes the code more compact. The stack is used for function calls: Before starting a function, the compiler dumps its arguments and the return address onto the stack. The processor then performs the function. Finally, the compiler has to clean up the stack; current processors use special commands to support the compiler in this task.
« Previous 1 2 3 Next »
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Latest Cinnamon Desktop Releases with a Bold New Look
Just in time for the holidays, the developer of the Cinnamon desktop has shipped a new release to help spice up your eggnog with new features and a new look.
-
Armbian 24.11 Released with Expanded Hardware Support
If you've been waiting for Armbian to support OrangePi 5 Max and Radxa ROCK 5B+, the wait is over.
-
SUSE Renames Several Products for Better Name Recognition
SUSE has been a very powerful player in the European market, but it knows it must branch out to gain serious traction. Will a name change do the trick?
-
ESET Discovers New Linux Malware
WolfsBane is an all-in-one malware that has hit the Linux operating system and includes a dropper, a launcher, and a backdoor.
-
New Linux Kernel Patch Allows Forcing a CPU Mitigation
Even when CPU mitigations can consume precious CPU cycles, it might not be a bad idea to allow users to enable them, even if your machine isn't vulnerable.
-
Red Hat Enterprise Linux 9.5 Released
Notify your friends, loved ones, and colleagues that the latest version of RHEL is available with plenty of enhancements.
-
Linux Sees Massive Performance Increase from a Single Line of Code
With one line of code, Intel was able to increase the performance of the Linux kernel by 4,000 percent.
-
Fedora KDE Approved as an Official Spin
If you prefer the Plasma desktop environment and the Fedora distribution, you're in luck because there's now an official spin that is listed on the same level as the Fedora Workstation edition.
-
New Steam Client Ups the Ante for Linux
The latest release from Steam has some pretty cool tricks up its sleeve.
-
Gnome OS Transitioning Toward a General-Purpose Distro
If you're looking for the perfectly vanilla take on the Gnome desktop, Gnome OS might be for you.