Intel's powerful new Xeon Phi co-processor

Build Problems

Xeon Phi developers are likely to have two questions in particular:

  1. Are special steps necessary for compiling source code on the card?
  2. How can I use the card's resources as efficiently as possible?

If you blindly follow Intel's marketing claims, special programming steps are not a problem because the Xeon Phi consists of x86 cores. However, the Xeon Phi cores differ significantly from those used by conventional x86 processors. Both the vector units and the associated registers are different, and the cores lack all post-MMX extensions. In other words, they cannot handle MMX, SSE, or AVX instructions, nor do they have the registers introduced with these instruction sets.

These limitations are a problem because, ever since the introduction of the MMX instruction set, both Intel and AMD have recommended using it or its successor for floating-point calculations and no longer support computations with an x87 unit. However, beside its own advanced vector instructions, the accelerator card only understands x87 instructions. This problem is one of the reasons why you cannot easily use, say, a vanilla GNU toolchain. Although Intel has developed a patch for the GNU assembler and the GNU GCC compiler to support compiling software for the Xeon Phi card, the GCC compiler has no support for the vector unit, because extensive optimizations would be necessary to compile. To use the vector unit, developers need a proprietary compiler by Intel [6].

Developers have several options for fully exploiting the computing power of the card. Because the card is an independent system that uses Linux as its operating system and only requires the resources of the host computer for input and output, you can run programs on it as you would on any other computer. A programmer can therefore use the usual methods, such as POSIX threads or OpenMP, to write and execute parallelized programs (Figure 3).

Figure 3: Xeon Phi appears to htop as a standalone Linux system, but one that has a few more cores on board.

You should be aware that the Xeon Phi has relatively little memory considering the number of cores. On average, a card from the 5100 series only has 35MB of RAM for each thread, compared with several hundred megabytes for each thread on current server systems. Because of the limited amount of memory, it makes sense to operate Xeon Phi as an accelerator unit in interaction with the host machine or other machines on the network. Several options are available for implementing this interaction. SCIF, which we referred to earlier, provides a convenient approach to exchanging data between the card and the host computer. Thus, the host can outsource certain parts of the computation to the Xeon Phi. For even more convenience, developers can use the Message Passing Interface (MPI) to hand over computations. This approach is feasible because the Xeon Phi, to oversimplify things, looks just like another computer on the network with a large number of cores. Finally, an OpenCL compiler can outsource computations to the card.

Because the Xeon Phi is partly a standalone system, the reverse path is also possible: Work can transfer from the Xeon Phi to the host computer or another computer on the network. Figure 4 shows the options for distributing the workload.

Figure 4: The Xeon Phi card is partly a standalone system, so it can distribute work to other machines on the network.

Xeon Phi versus GPU

What makes the Xeon Phi card worthwhile? The Kepler generation of NVidia Tesla cards offers about three times the raw performance in floating-point operations per second (FLOPS) for a slightly higher price, which makes the Xeon Phi vastly inferior in terms of value for money.

On closer inspection, however, the Xeon Phi has two things going for it: First, Xeon Phi supports the MPI programming model, which has been around for close to 20 years, whereas OpenCL is only five years old. One could argue that MPI programmers are more experienced and will therefore find it easier to write better programs. Second, thanks to its PC-related architecture, the Xeon Phi is capable of running existing software with significantly fewer modifications.

Outlook

Intel has already announced the successor to the Xeon Phi, which currently goes by the name Knights Landing. The platform will expand to include the AVX-512 instruction set, which conventional processors by Intel will probably also use in the future. AVX-512 is also intended to provide better compatibility with SSE and AVX. Whether this also applies to the Xeon Phi, only time will tell.

Far more interesting is that Intel's Knights Landing not only will be an expansion card but also will serve as a standalone platform or processor, thus taking a first step away from supporting a specific application profile toward a general solution for servers with highly parallel applications. Finally, Xeon Phi components might eventually make inroads into the desktop market, which would make the Xeon Phi a harbinger for a new generation of many-core systems.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

News