How compilers work
Symbol Table
To map the source code compactly in memory, the scanner replaces each recognized keyword, each variable name, and all other elements with a symbol. You could replace the rather long variable names with two numbers: The first number is used as a substitute for the variable name. The second number specifies the row in a table, the symbol table, which contains the variable name used by the developer. During syntax and semantics analysis, additional information, such as the type of variable, appears in the symbol table.
To find an entry in the symbol table as quickly as possible, many compilers run the variable name through a hash function that can be calculated quickly. This step spits out a number, which the compiler then uses as an index in the symbol table. If the compiler encounters the variable name again later on, it simply calculates the hash value and immediately gets the location in the symbol table with all important information about the variable – such as its type. The compiler does not have to search through the entire table.
Both the scanner and the semantic routine generate a number of tables in addition to the symbol table. Among other things, these tables contain the nesting structure for the loops and the loop variables. The compiler repeatedly accesses the information in the tables, even later on.
Interpreter, Assembler, and Translator
Unlike the compiler, an interpreter reads the source code and executes it directly; thus, no object code is generated. The classic interpreters analyze each command in the source code one after another.
Modern interpreters convert the complete source code into a special optimized internal representation. The interpreter then executes this intermediate or byte code much faster. Sometimes a just-in-time compiler translates the internal representation into machine language, which further increases the execution speed. Java uses this procedure.
An assembler is a special form of the compiler that translates a program into assembler or machine language. Since assembler is usually a symbolic representation of the machine commands, both languages are similar. The generic term translator usually refers to all three (i.e., compilers, assemblers, and interpreters).
Internal Representation of the Source Code
The scanner passes the symbols that it has determined to the parser. The syntax and semantic analysis ends with an internal representation of the source code. The compiler and the languages are responsible for what this representation looks like. The program could be present as a syntax tree or in Polish notation. Many compilers also use quadruples, for example, from the A = B + A
statement it would be:
+, B, A, T1 =, T1, A
T1
is a temporary variable created by the compiler.
So far, the compiler has only analyzed the source program. For this reason, experts also refer to this first phase as the analysis phase.
Generating Code
In the next step, another component optimizes the internal presentation. As a rule, the compiler optimizes the run time and assigns memory locations to the variables. In the preceding example, the compiler would try to eliminate the temporary variable T1
.
In the last phase, the compiler finally generates the executable machine code. Generally, programmers call this object code or simply code. Under Linux, it is usually either a (dynamic) library or the executable program.
Some compilers also produce assembler code, which is then converted into machine language by a downstream assembler. The compiler could generate the following code from A = A + B
:
lda a ; load a in the accumulator add b ; add b to the accumulator sto a ; store accumulator to a
Control structures such as if
, while
, and for
can usually be mapped with the jump instructions of the processor. Complex loops, such as for
, may optionally replace a (longer) while
loop.
Because it knows the processor instruction set, and the information from the tables, the compiler also makes the code more compact. The stack is used for function calls: Before starting a function, the compiler dumps its arguments and the return address onto the stack. The processor then performs the function. Finally, the compiler has to clean up the stack; current processors use special commands to support the compiler in this task.
« Previous 1 2 3 Next »
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Gnome OS Adopting systemd-sysupdate
Gnome OS is about to undergo a major under-the-hood change that promises enhanced security.
-
Endless OS 6 has Arrived
After more than a year since the last update, the latest release of Endless OS is now available for general usage.
-
Fedora Asahi 40 Remix Available for Macs with Apple Silicon
If you've been anticipating KDE's Plasma 6 for your Apple Silicon-powered Mac, then you're in luck.
-
Red Hat Adds New Deployment Option for Enterprise Linux Platforms
Red Hat has re-imagined enterprise Linux for an AI future with Image Mode.
-
OSJH and LPI Release 2024 Open Source Pros Job Survey Results
See what open source professionals look for in a new role.
-
Proton 9.0-1 Released to Improve Gaming with Steam
The latest release of Proton 9 adds several improvements and fixes an issue that has been problematic for Linux users.
-
So Long Neofetch and Thanks for the Info
Today is a day that every Linux user who enjoys bragging about their system(s) will mourn, as Neofetch has come to an end.
-
Ubuntu 24.04 Comes with a “Flaw"
If you're thinking you might want to upgrade from your current Ubuntu release to the latest, there's something you might want to consider before doing so.
-
Canonical Releases Ubuntu 24.04
After a brief pause because of the XZ vulnerability, Ubuntu 24.04 is now available for install.
-
Linux Servers Targeted by Akira Ransomware
A group of bad actors who have already extorted $42 million have their sights set on the Linux platform.