How compilers work
Symbol Table
To map the source code compactly in memory, the scanner replaces each recognized keyword, each variable name, and all other elements with a symbol. You could replace the rather long variable names with two numbers: The first number is used as a substitute for the variable name. The second number specifies the row in a table, the symbol table, which contains the variable name used by the developer. During syntax and semantics analysis, additional information, such as the type of variable, appears in the symbol table.
To find an entry in the symbol table as quickly as possible, many compilers run the variable name through a hash function that can be calculated quickly. This step spits out a number, which the compiler then uses as an index in the symbol table. If the compiler encounters the variable name again later on, it simply calculates the hash value and immediately gets the location in the symbol table with all important information about the variable – such as its type. The compiler does not have to search through the entire table.
Both the scanner and the semantic routine generate a number of tables in addition to the symbol table. Among other things, these tables contain the nesting structure for the loops and the loop variables. The compiler repeatedly accesses the information in the tables, even later on.
Interpreter, Assembler, and Translator
Unlike the compiler, an interpreter reads the source code and executes it directly; thus, no object code is generated. The classic interpreters analyze each command in the source code one after another.
Modern interpreters convert the complete source code into a special optimized internal representation. The interpreter then executes this intermediate or byte code much faster. Sometimes a just-in-time compiler translates the internal representation into machine language, which further increases the execution speed. Java uses this procedure.
An assembler is a special form of the compiler that translates a program into assembler or machine language. Since assembler is usually a symbolic representation of the machine commands, both languages are similar. The generic term translator usually refers to all three (i.e., compilers, assemblers, and interpreters).
Internal Representation of the Source Code
The scanner passes the symbols that it has determined to the parser. The syntax and semantic analysis ends with an internal representation of the source code. The compiler and the languages are responsible for what this representation looks like. The program could be present as a syntax tree or in Polish notation. Many compilers also use quadruples, for example, from the A = B + A
statement it would be:
+, B, A, T1 =, T1, A
T1
is a temporary variable created by the compiler.
So far, the compiler has only analyzed the source program. For this reason, experts also refer to this first phase as the analysis phase.
Generating Code
In the next step, another component optimizes the internal presentation. As a rule, the compiler optimizes the run time and assigns memory locations to the variables. In the preceding example, the compiler would try to eliminate the temporary variable T1
.
In the last phase, the compiler finally generates the executable machine code. Generally, programmers call this object code or simply code. Under Linux, it is usually either a (dynamic) library or the executable program.
Some compilers also produce assembler code, which is then converted into machine language by a downstream assembler. The compiler could generate the following code from A = A + B
:
lda a ; load a in the accumulator add b ; add b to the accumulator sto a ; store accumulator to a
Control structures such as if
, while
, and for
can usually be mapped with the jump instructions of the processor. Complex loops, such as for
, may optionally replace a (longer) while
loop.
Because it knows the processor instruction set, and the information from the tables, the compiler also makes the code more compact. The stack is used for function calls: Before starting a function, the compiler dumps its arguments and the return address onto the stack. The processor then performs the function. Finally, the compiler has to clean up the stack; current processors use special commands to support the compiler in this task.
« Previous 1 2 3 Next »
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
TUXEDO Computers Unveils Linux Laptop Featuring AMD Ryzen CPU
This latest release is the first laptop to include the new CPU from Ryzen and Linux preinstalled.
-
XZ Gets the All-Clear
The back door xz vulnerability has been officially reverted for Fedora 40 and versions 38 and 39 were never affected.
-
Canonical Collaborates with Qualcomm on New Venture
This new joint effort is geared toward bringing Ubuntu and Ubuntu Core to Qualcomm-powered devices.
-
Kodi 21.0 Open-Source Entertainment Hub Released
After a year of development, the award-winning Kodi cross-platform, media center software is now available with many new additions and improvements.
-
Linux Usage Increases in Two Key Areas
If market share is your thing, you'll be happy to know that Linux is on the rise in two areas that, if they keep climbing, could have serious meaning for Linux's future.
-
Vulnerability Discovered in xz Libraries
An urgent alert for Fedora 40 has been posted and users should pay attention.
-
Canonical Bumps LTS Support to 12 years
If you're worried that your Ubuntu LTS release won't be supported long enough to last, Canonical has a surprise for you in the form of 12 years of security coverage.
-
Fedora 40 Beta Released Soon
With the official release of Fedora 40 coming in April, it's almost time to download the beta and see what's new.
-
New Pentesting Distribution to Compete with Kali Linux
SnoopGod is now available for your testing needs
-
Juno Computers Launches Another Linux Laptop
If you're looking for a powerhouse laptop that runs Ubuntu, the Juno Computers Neptune 17 v6 should be on your radar.