An introduction to Parrot

Birdsong

© Dmitry Pichugin, Fotolia

© Dmitry Pichugin, Fotolia

Article from Issue 106/2009
Author(s):

Parrot is an all-in-one tool for developing and executing new programming languages. Perl 6 runs on Parrot; chances are your language can run on it, too.

Are you exhausted from working on the same old code day after day? Is your programming language a clunker? Do you need a little more oomph to merge onto the information superhighway? Suffer no more. Coder's Kingdom has something for everyone, so come on down! Here's PHP 5: practical and great for the web with spunky bytecode acceleration. Or Ruby 1.9: this looker, fresh from the factory, is a real cloud-pleaser. Do you want to save a little cash? Try certified, pre-owned programming languages. Perl 5.8 is priced to move. Coder's Kingdom. Where every coder is king.

Coder's Kingdom might not exist, but choosing a programming language is strikingly similar to buying a car. Like a car, each programming language is designed with an intent, be it utility, speed, or size. Like a car, a language can offer exotic, cutting-edge features. Choosing a language is a significant investment, too, with potentially harmful and even disastrous consequences if the selection is ill considered. Additionally, each language includes mandatory parts – literals, variables, subroutines, and flow control instead of four tires, a steering wheel, and a windshield.

Indeed, the bulk of the work developing a new programming language is spent on (pardon the allusion) reinventing the wheel. The source of language must be decomposed by a parser, assembled into tokens, collated into statements, abstracted into syntax trees, and finally, either interpreted or converted into something executable, such as bytecode or binary.

Moreover, even once constructed, a new programming language is limited unless it provides a raft of supplemental capabilities, such as regular expressions and interfaces to system calls. Some overhead can be avoided by constructing facades around C libraries, such as the Perl Compatible Regular Expression (PCRE) library, but that too requires at least some effort. And ironically, language after language pursues such work because there is little other choice.

However, there are exceptions: Microsoft's long-standing .NET initiative permits many languages – C#, Visual Basic, C++, and open source languages ported to .NET – to share code because each is ultimately compiled to a common low-level target, Common Intermediate Language (CIL) [1]. JRuby is also an exception: JRuby runs Ruby code on top the Java Virtual Machine (JVM).

Still, the vast majority of computer programming languages remain isolated, like trying to put a Hemi into a Honda.

A Unified Approach

To speed the development of new languages, facilitate code sharing among all languages, and hasten the execution of dynamic languages such as Perl, Ruby, and Python, the Parrot development team has constructed a virtual machine and a suite of tools to compile and execute (conceivably) any computer language. Parrot [2] was initially conceived as an engine for Perl 6, but its charter was expanded because it's suitable for a vast class of code. As the Parrot website states, "In theory, you will be able to write a class in Perl, subclass it in Python, and then instantiate and use that subclass in a Tcl program."

Because Parrot is a one-stop compiler shop, it provides all the tools required to convert source code into a running program.

  • The Parser Grammar Engine (PGE) defines the syntax of a programming language with rules. One rule might define the syntax of an assignment statement, whereas another might define the structure of a subroutine. PGE parses input, yields tokens, and attempts to match the stream of tokens against all the rules. A match indicates a valid clause in the language.
  • Each rule, in turn, may call one or more actions. If you are writing a new programming language, the semantics of the language are expressed in actions. Each action transforms a match into a language-independent data structure called a Parrot Abstract Syntax Tree (PAST). PAST represents a statement as a tree, typically with operands as the leaves and operators, including control structures, as branches. Cumulatively, an entire program is ultimately represented by a PAST tree with the special node TOP at the root.
  • Parrot Intermediate Representation (PIR) is easily written by hand and is the usual output of a compiler targeting Parrot. It abstracts away some low-level details to reduce the burden on programmers and compiler writers.
  • Parrot Assembly (PASM) is more mechanical than PIR. It can be coded manually and can be generated by a compiler, but every detail, such as register allocation, must be attended to.
  • The lowest level representation is Parrot Byte Code (PBC). Although you can think of this as machine code, it does not run on hardware. Instead, PBC is executed by Parrot.

Parrot can run PBC, PASM, PIR, and PAST files. PAST isn't suitable for manual coding, but it is ideal for integration between tools.

Rules

A rule is like a form you fill out in the doctor's office. Some of the fields are printed on the form and simply provide context; others fields are blank and must be filled out. In the rule if_statement, the subrules expression, block, and (the second block alias) else are the empty fields. As mentioned in the discussion about the Parrot object for an if-then-else, these subrules provide the values used to instantiate the object.

A traditional compiler like the ubiquitous GCC typically generates an abstract syntax tree as an intermediate form to deduce and apply optimizations. Adapting such a tree to PAST is straightforward, and because Parrot can read PAST, you can connect an existing compiler to Parrot with ease.

In addition to these components, Parrot also supplies a number of PAST nodes (think of a node as a template or object) to represent fundamental features found in every programming language.

For example, conceptually, Parrot includes a node for if-then-else, a construct found in all languages. To add such a node to your program's larger tree, you must provide a condition, a block of statements to execute if the condition is true, and, optionally, a block of statements to execute if the condition is false. No matter what the original syntax looked like, the if node captures the semantics of such a statement.

Other PAST nodes realize other features: a variable, including scope and type; a loop; and operators, including arithmetic and subroutine calls.

And that's the real power of Parrot. Most language features need not be reinvented because Parrot provides suitable abstractions for everyone to reuse, independent of syntax. However, if a language dictates something truly novel, you can focus on implementing a new PAST node to achieve its semantics.

Figures 1 through 5 picture the process of converting a C-like if statement to a PAST node. Figure 1 shows the source code; in this form, it is a sequence of characters that might or might not adhere to proper syntax.

Figure 1: The source code of a C-like if statement.

Figure 2 shows the source code decomposed into tokens according to the rules of the parser. At this point, individual characters are no longer important; instead, an aggregate is now the coin of the realm.

Figure 2: The source code aggregated into tokens.

Figure 3 pictures rules. Each rule dictates a piece of proper grammar. When a collection of tokens matches a rule, it is collected and formed into a rule, another kind of amalgam.

Figure 3: The tokens of

Each agglomeration step in the compilation process moves code from syntax closer to semantics. Figure 4 shows the action for the if_statement. Each amalgam in Figure 3 has already been reduced by its separate (sub)action and is now amassed for subsequent processing. At this point, each unit has purpose, but the logic of the conditional is not yet realized.

Figure 4: An action prepares the data associated with a matched rule to be transformed to a PAST. In this figure, the if_statement is ammassed.

Figure 5 shows the final assembly after the action for the if_statement. The PAST node captures the intent and operation of the statement in a data structure. The final step, not shown, traverses the PAST depth-first (top-to-bottom and left-to-right) and generates the proper code. The code would load the first variable, then the second, and perform a comparison. The comparison either continues execution in sequence (with the next code, which implements a successful comparison, printing "A less than B") or branches to the code for false (printing "B is greater than or equal to A").

Figure 5: Ultimately, the if statement yields a PAST, which is suitable for optimization and code generation.

Coding for Parrot

Listing 1 shows a concrete example of the data discussed above. This example demonstrates some grammar for Squaak, a sample programming language included with the Parrot source code. In particular, Listing 1 is a rule that defines the syntax of a Squaak if-then-else statement. A valid statement must have a leading literal, the if; an expression, which is defined in its own eponymous rule (not shown); the keyword then; a block of statements, also defined in its own rule; and the keyword end. Those are mandatory phrases.

Listing 1

A Rule Defining Syntax

01 rule if_statement {
02   'if' <expression> 'then' <block> ['else' <else=block>]? 'end'
03   {*}

The phrase ['else' <else=block>]? is optional, as denoted by the trailing question mark, which means "one or zero" occurrences. If the phrase does appear, it must have the keyword else followed by another block. The notation <else=block> simply assigns an alias called else to the second block to differentiate it from the first.

If this rule matches, an action is called at the end, which is the purpose of the quirky looking {*} marker at the end of the rule.

An action is simply a subroutine. By default, Parrot calls the action with the same name as the matched rule. Furthermore, the action is called with arguments, one argument per subrule.

Listing 2 shows the action associated with the if_statement rule. Even without an elaborate description, its operation should be fairly obvious. Given a condition and a block to execute if the condition is true, the code builds a PAST for the statement. If the additional block is provided, it's amended to the PAST. The final statement make does the heavy lifting for you.

Listing 2

An Action Associated with a Rule

01 method if_statement($/) {
02   my $cond := $<expression>.ast;
03   my $then := $<block>.ast;
04   my $past := PAST::Op.new    ( $cond, $then, :pasttype('if'), :node($/) );
05
06   ## if there's an else clause, add it to the PAST node.
07   if $<else> {
08     $past.push( $<else>[0].ast );
09   }
10   make $past;
11 }

With Listing 2 in mind, the following code is the action for an assignment statement, such as x = 10:

method assignment($/) {
 my $rhs := $<expression>.ast;
 my $lhs := $<primary>.ast;
 $lhs.lvalue(1);
 make PAST::Op.new( $lhs, $rhs, :pasttype('bind'), :node($/) );
}

Listing 3 shows an example PIR program taken from the Parrot website. The application sums the square of numbers 1 through 10. In the code, indentation is used for readability, but not semantics. (In other words, the white space is not active.)

Listing 3

A PIR Program

01  .sub main
02   # State the number of squares to sum.
03   .local int maxnum
04   maxnum = 10
05
06   # We'll use some named registers. Note that we can declare many
07   # registers of the same type on one line.
08   .local int i, total, temp
09   total = 0
10
11   # Loop to do the sum.
12   i = 1
13   loop:
14     temp = i * i
15     total += temp
16     inc i
17     if i <= maxnum goto loop
18
19   # Output result.
20   print "The sum of the first "
21   print maxnum
22   print " squares is "
23   print total
24   print ".\n"
25   .end

Listing 3 looks like assembly language, with some exceptions that are conveniences provided in PIR. Specifically, temp = i * i is shorthand for mul temp, i, i. The code .local int i, total, temp uses friendly names instead of hard-wired registers for the three variables and uses .local to specify a scope. The assignment operator (=) is an alias for set, and goto is also a convenience. Its equivalent is le i, maxnum, loop.

To run the code shown in Listing 3, you can either interpret the PIR code or compile the PIR into PBC and execute that code instead. (See the "Building Parrot" box for instructions on how to build Parrot.)

$ parrot sum.pir
The sum of the first 10 squares is 385.
$ parrot -o sum.pbc sum.pir
$ parrot sum.pbc
The sum of the first 10 squares is 385.

Parrot can emit PASM source, if you're curious how PIR translates. To generate PASM, substitute -o somefilename.pbc with -o somefilename.pasm instead:

$ parrot -o sum.pasm sum.pir

Listing 4 shows the PASM equivalent of Listing 3.

Listing 4

PASM Source

01 main:
02   set I0, 10
03   set I1, 0
04   set I3, 1
05 loop:
06   mul I2, I3, I3
07   add I1, I2
08   inc I3
09   le I3, I0, loop
10   print "The sum of the first "
11   print I0
12   print " squares is "
13   print I1
14   print ".\n"
15   returncc

Parrot can run PASM directly, too:

$ parrot sum.pasm
The sum of the first 10 squares is 385.

Building Parrot

Building Parrot is surprisingly easy, assuming you have common software development tools on your Linux system. If you have Perl 5, the GNU Compiler Collection (GCC), and Subversion (or if you have Xcode installed on Mac OS X), the entire process requires just a few minutes.

Your first step is to check out the latest code with Subversion:

$ svn co https://svn.parrot.org/parrot/trunk parrot
$ cd parrot

Next, configure the build with Parrot's special Configure.pl script. Typically, the defaults derived by the script work fine. However, if you want to customize your build, type perl Configure.pl --help to review the list of options. Most options are general, like the install option --prefix, but a handful affect Parrot features, such as which garbage collection strategy to apply.

$ perl Configure.pl
Parrot Version 1.2.0 Configure 2.0
Copyright (C) 2001-2009, Parrot Foundation.
Hello, I'm Configure. My job is to poke and prod your system
to figure out how to build Parrot. The process is completely
automated, unless you passed in the <C>--ask<C> flag on the
command line, in which case I'll prompt you for a few pieces
of info.
Since you're running this program, you obviously have Perl
5--I'll be pulling some defaults from its configuration.
...
Now you can use <C>make<C> to build your Parrot.
After that, you can use <C>make test<C> to run the test suite.

To continue, compile the software and run its verification suite to ensure the stability of your build:

$ make
$ make test

Assuming the tests pass, you can proceed and install the Parrot tools to your system:

$ sudo make install

By default, the Parrot software is installed in /usr/local. If you want to install it to your home directory, use perl Configure.pl --prefix=$HOME/parrot in the configuration step. The installation adds eight utilities to your system. The most important one is parrot, the Parrot virtual machine. To try it, use Listings 3 and 4 and the commands discussed previously.

Once you have the Parrot source code, you can keep up to date with svn update, with one caveat: You must do make realclean before any refresh.

$ make realclean
$ svn update

If you skip the cleanup, Parrot can break or exhibit strange and unreproducible bugs.

One Talkative Bird

Apropos of its name, Parrot speaks lots of languages [3].

Rakudo Perl 6 [4] is an implementation of Perl 6 on Parrot. Cardinal [5] is a version of Ruby 1.0, and Pipp [6] is a version of PHP.

Other projects are moving Java, Lua, and Smalltalk to the Parrot engine, and 40 other projects are listed in the index.

As this issue of Linux Magazine/Linux Pro Magazine goes to press, the latest version of Parrot is 1.2.0. The software is stable and suitable for use as both a programming language and for virtual machine research and development. If you need a domain-specific language or want to extend an existing language, try Perl 6 now.

Take Parrot down from its perch.

Infos

  1. The Common Intermediate Language: http://en.wikipedia.org/wiki/Common_Intermediate_Language
  2. The Parrot homepage: http://www.parrot.org/
  3. Language implementations based on Parrot: http://www.parrot.org/languages/
  4. Rakudo Perl 6 on Parrot: http://www.rakudo.org/
  5. Cardinal, Ruby for Parrot: http://github.com/cardinal/cardinal/tree/master
  6. Pipp, PHP for Parrot: http://wiki.github.com/bschmalhofer/pipp/

The Author

Martin Streicher is a freelance developer and author. He holds an advanced degree in computer science from Purdue University and has worked on software from a Unix assembler to the award-winning "You Don't Know Jack" CD-ROM game. The author thanks Mark Glines and Christoph Otto of the Parrot development team for reviewing this article. You can reach Martin at mailto:martin.streicher@gmail.com.

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Parrot 1.0 Said to Speak Your Virtual Language

    After years of development, version 1.0 of the Parrot virtual machine has arrived for Perl, Python and other dynamic languages. Whether the Parrot will take a firm foothold in software development is still up in the air.

  • OSCON 2006

    OSCON continues to grow in popularity and influence. Our Perlmeister stopped in for the talks, the tutorials, and the latest news from the battle lines of the digital millennium.

  • FOSDEM 2008

    Now into its eighth year, the FOSDEM developer conference in Brussels, Belgium, demonstrates that it is very much alive and kicking, and even celebrating records.

  • FOSDEM 2008: Jam-packed program, free video stream

    The annual developers’ conference FOSDEM will take place in Brussels on February 23 and 24, 2008. As a teaser, organizers of the open-source event have now placed interviews with some of the speakers online.

  • Google Code-in Contest Kicks Off

    Google Code-in (GCI) contest encourages pre-university students between the ages of 13 and 18 to begin participating in Open Source Development.

comments powered by Disqus

Direct Download

Read full article as PDF:

News

njobs Europe
What:
Where:
Country:
Njobs Netherlands Njobs Deutschland Njobs United Kingdom Njobs Italia Njobs France Njobs Espana Njobs Poland
Njobs Austria Njobs Denmark Njobs Belgium Njobs Czech Republic Njobs Mexico Njobs India Njobs Colombia