Application development for the Cell processor


The PPE program contains a loop (Listing 2), which distributes the workload over the SPEs involved and sets a seed for creating pseudo random numbers from the current system time. To launch a program on an SPE, three steps are required. First, the spe_context_create() function (line 7) needs to create an SPE context. Second, the spe_program_load() function (line 8) needs to specify the program to execute; the programmer needs to declare the spe_program_handle_t variable in the PPE program header for this. This variable is always declared externally, that is, outside of the function. The name is identical to the name that the SPE program will be given later when you compile it.

Listing 2

For Loop for Controlling the SPEs


The third step is for the spe_context_run() function to launch the program you want to execute. Normally, this function would block the PPE program while the SPE program is running, thus preventing any other SPE programs from launching parallel to it. A Posix thread helps to avoid this by executing the spu_pthread() function (line 10), which in turn launches an SPE program without interrupting the PPE program flow.

Now the SPE program needs to know where the parameters for the forthcoming calculations are located. Each SPE has a mailbox for incoming messages (four 32-bit words) and a mailbox for outgoing messages (one 32-bit word). Another mailbox triggers a software interrupt when data is available. In this case, the PPE program calls spe_in_mbox_write() (line 13) to pass in the start address of the array in which the parameters for the calculations are stored. The SPE context defines which SPE receives the message; its start address is the first function argument.

When all SPE programs have terminated, the PPE program releases the memory for the SPE context in question. Finally, the PPE program outputs the SPE's results on the console.

SPE Culture

The SPE's work starts with the compute_pi() function (Listing 3). compute_pi() expects a seed as an argument, which it will use to generate random numbers, and the number of pairs of numbers to calculate. The function returns an approximate value for PI as a function value. To allow this to happen, the main() function (Listing 4) reads the main memory address at which the structure with the parameters for the current SPE program is located. This address is also referred to as an effective address.

Listing 3

compute_pi Function


Listing 4

Main Function in the SPE Program


Because the spu_read_in_mbox() function can only read single 32-bit words, it must be called twice to retrieve the full 64-bit address (lines 7 and 8). The variables declared inside the SPE program all lie within the SPE's local memory space. Pointers also reference memory addresses in the local memory. Because the Cell processor uses Big Endian architecture, the first word contains the higher value, and the second word contains the lower value bits.

Next, the SPE program must reserve a tag ID to distinguish DMA data transfers between main and local memory (line 10). An SPE can manage up to 32 tag IDs. Following this, the spu_mfcdma64() function transfers the parameter block that points to the main memory address previously retrieved from the mailbox to the spe_par variable in local memory (line 12). This function can handle both read and write DMA transfer. The sixth argument defines the transfer direction, as a comparison with line 18 shows.

The spu_mfcdma64() function does not wait for the memory transfer to complete. To ensure data integrity, the SPE program must wait until the DMA controller (Memory Flow Controller, MFC) has finished; the mfc_read_tag_status_all() (line 14) makes sure of this. The mfc_write_tag_mask() function (lines 19 and 20) tells us which of the 32 possible parallel DMA transfers it is waiting for.

Now the calculations can start, and the results, which are again stored in the spe_par structure, make their way back into main memory. Finally, line 22 releases the tag ID.

Instilling Life

Creating the object files is the next step. Because the PPE SPE processor cores use different instruction sets, two different compilers must be used to build the source:

/opt/cell/toolchain/bin/spu-gcc -o pi_libspe_spe.spuo pi_libspe_spe.c
/opt/cell/toolchain/bin/ppu-gcc -c pi_libspe_ppe.c

The .spuo suffix indicates an object file based on the SPE instruction set. To create a single executable, the ppu-embedspu tool converts the SPE program's object code into a format that the PPE can read:

/opt/cell/toolchain/bin/ppu-embedspu pi_libspe_spe pi_libspe_spe.spuo pi_libspe_spe.o

The first parameter is the name used by the PPE to address the SPE program; it is identical to the name of the spe_program_handle_t type variable, which is declared in the pi_libspe_ppe.c source file.

The second parameter is the name of the file containing the SPE object code, and the third refers to the file where ppu-embedspu will write the PPE-readable object code. Finally, the developer must link the PPE and SPE programs with the libspe2 library to create an executable:

/opt/cell/toolchain/bin/ppu-gcc -o pi_libspe pi_libspe_ppe.o pi_libspe_spe.o -lspe2

If you have access to a computer with Cell hardware, you can simply copy the pi_libspe executable to it and execute the program. If you are using the simulator, you will need to take a small detour.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus

Direct Download

Read full article as PDF:

074-078_cell.pdf  (901.30 kB)