Advertisment

The Cell Broadband Engine 

author-image
PCQ Bureau
New Update


Advertisment
Direct Hit!
Applies to:

Mobile rich app developers
USP:

Understand the architecture and capabilities of the Cell BE powered CPUs
Primary Link:



http://www.alphaworks.ibm. com/topics/cell
Google keywords:

cell broadband engine

The next generation of processors to power our rich-media devices and applications is already here. In this article, we demystify what lies inside it

On November 29, 2005 three top corporate guns-Sony,

Toshiba and IBM- got together to announce the specifications of the first

implementation of the Cell BE (Broadband Engine) processor.

Advertisment

Codenamed Mambo, the Cell Broadband Engine is an

architecture jointly developed by the three companies and aimed at cutting-edge

processor intensive applications such as multimedia, video and image processing.

The name 'Cell Broadband Engine', or Cell BE as it is

called, is a trademark of Sony Computer Entertainment. However, that is not all

it can do. To start off, let us see what is inside the processor that makes it

so special for all.

What's inside it?



You're about to learn a few new acronyms, but let's take it one step at

a time. First up, this belongs to the Power-processor family. The basic design

(in layman terms) of the Cell BE CPU is similar to that of modern multi-core

processors. Only, in the Cell BE's case, you have two kinds of cores, the PPE

and the SPE. Each cell has eight SPEs but only one PPE. The easiest way to

explain this is the table that is given below.

Advertisment
Feature SPE PPE
Name Synergistic Processing

Element
POWER Processing Element
Function Computation Running the OS, abritration

between SPEs
Cache 256 KB local 512 KB L2
Instruction issue Dual  Dual
Order In-order In-order with limited

out-of-order for LOAD

Elements within the processor, like the various PEs, the

memory and so on are inter-connected by the EIB (Element Interconnect Bus). This

bus gives a maximum bandwidth of 204.8 GB/s and runs at half the speed of the

processor. For memory, the processor has an XDR memory along with a memory

controller built-in. This memory controller can exchange data at 25.6 GB per

second with external XDR memories. For I/O, the processor includes an integrated

I/O controller that provides peak bandwidths of 25 GB per second and 35 GB per

second for inbound and outbound respectively.

The PPE is similar to the PowerPC processors including the

G5 (PowerPC 970) because of its AS architecture. Also, it uses AltiVec (VMX)

instructions and parallelizes arithmetic operations for simultaneous

multithreading. Each SPU can use the same instructions to perform both 32-bit

scalar and 128 bit vector processing. There are no memory controllers or

instruction/data caches in the SPU. But, it can access any 128 bit location in

the memory at L1 speeds.

Advertisment

One good news for developers is that this is the end to

their pointer-hell. Pointer data gets aligned and truncated to the sizes in the

local cache within the SPUs. This means there are no more 'access violation'

errors that can be generated. Quite a lot of otherwise common errors from the

CPU are not even possible on the SPU because of its design.

Let's now go ahead and examine some of its other elements

in some detail.

Data transfer



Like we said earlier, the EIB is what connects everything in the processor.

It is actually a combination of a bus and some data storage arranged in four

rings. Each ring can hold 16 bytes of data at a time and can allow upto three

simultaneous non-overlapping read/write operations, and all the 16 bytes can be

sent or received at a time. By non-overlapping, we mean that if data requested

lies discontinuously in a ring and there's another request that's reading

the data that lies in between, the second request will have to wait till the

first one is done. Because of all this, in real-life conditions, sustained EIB

bandwidth lies between 78 GB per second to 197 GB per second.

Advertisment

The I/O system is a set of RAMBUS RRAC FlexIO links and has

11 byte wide channels. Of these seven are for outbound and five are for inbound

data. Both data and commands are transmitted through the I/O system as packets.

Memory



There are two XIO channels available for use and together they can hold upto

512 MB of XDRAM memory. XDRAM is currently the fastest memory technology and

even faster than both DDR and DDR2. These interfaces are based on RAMBUS

technology. This memory is external to the processor and is connected through a

memory interface controller. The controller can handle read/write queues to the

memory on either channel separately and prioritizes the requests too. There is

also a capability to directly write data to the memory in chunks over 16 bytes

but less than 128 bytes. The raw memory speed works out to 25.6 GB per second at

peak, without factors like refresh cycles bringing it down.

Resource allocation logic ensures that access to critical resources is

controlled and time-critical applications are not affected by race conditions.

Advertisment

Spufs



It's an IBM research project, but is quite interesting in its

applications. What the IBM team has done is to turn the SPEs within the Cell BE

processor into a virtual file system, similar to procfs. This lets applications

access and use the resources within the CPU, similar to a normal file system.

This file system is called 'spufs'.

Applications of Cell BE



Some of the currently envisaged applications for Cell BE include powering

the Sony PlayStation 2 console. But, it can also be put to use in computational

areas such as MPEG-2 video decoding, rendering advanced graphics and

cryptography.

What's available?



The Barcelona Supercomputing Center (www.bsc.es) hosts Cell BE specific

packages collected from a variety of sources, including patches for the Linux

kernel (v2.6.14), the GCC toolchain, SDKs and more. The Cell BE platform has

been included into the Linux kernel tree as a new platform under the ppc64

architecture. Hardware-wise, the Cell BE based blade servers are already

available in action since October 2005. There are also simulator and emulator

kits available for this platform from IBM.

The processor can run both 32 and 64- bit software and can

already run the 64-bit PowerPC Linux kernels and supports the ELF binaries. It

is also expected to run all PowerPC applications out of the box.

Sujay V Sarma

Advertisment