Implementation Guides

How Multi-processing Works

PCQ Bureau

10 May 2002 11:14 IST

New Update

Using two or more processors on the same machine is called SMP (Symmetric Multiprocessing). Working with an SMP system requires a minor revamp in hardware concepts used in traditional uniprocessor machines and a major revamp in the concepts of developing the software for it. Let’s skim through the basics.

Advertisment

On a uniprocessor machine, the processing of a program (say P1) is handled solely by a single processor. In a multiprocessor system, however, the processing of P1 must be divided among each processor, such that each processor can handle its own chunk, independently. For this, P1 must be broken into smaller parts or subtasks (say t1, t2 … tn, where n is the number of the processors). Each task is called a thread. The programmer is responsible to code his program in such way that it can be broken down into independent chunks. The compiler may also do some fine graining with the program to achieve the same.

Multithreading

Consider a program that simply counts from 1 to 1 billion. If there are two processors in the machine, then the first processor should count from 1 to 500000000 and the second processor, from 500000001 to 1 billion. This dividing of a program into multiple tasks or threads is called multithreading. Consider another program, which goes through 1 billion records of employees and sorts out employees whose age is more than fifty. Again, if we divide the task amongst two processors, then each processor would process the thread assigned to it. However, in this case each processor also needs some storage space for the sorted records. Where do the records get stored?

Each processor has its own cache memory (L1 and L2). The cache is used for the storage. What if the processor gets short of cache? Then the system memory or RAM may be used. On a multiprocessor machine, each processor and RAM is connected through the system bus. Hence the system bus is the hardware path used by the processors to access the system memory as well as to access each other. But going through the slow system bus will slow down the processing. To overcome this, manufacturers may use dedicated high-speed bus (often proprietary) in place of system bus. This dedicated bus will connect to a high-speed cache memory (static RAM) instead of system memory (Dynamic RAM). However, it’s desirable to have a large cache on the processors itself, to avoid communicating with outside. For example, Intel’s Xeon processors have about 2 MB of cache.

Advertisment

Inter-thread communication

Now take another kind of program, where thread 1 stores the details of all employees whose salary is less than Rs 10,000 and thread 2 stores all employees whose salary is above Rs 10,000. The first thread is run by processor 1 and the second by processor 2. What if the processor 2, running thread 2, comes across an employee with salary less than Rs 10,000. The record of this employee must be stored by thread 1 in the memory of processor 1. This is because thread 2 cannot access processors 1’s memory. For this, thread 2 communicates the data of this employee to thread 1. On receiving it, thread 1 will store the employee’s record in its memory. This is called Inter-thread communication. However, the passing of data between processors has associated overheads like construction of data, calling for and sending data to the thread, receiving of data by the thread, etc. It must be noted that intercommunicated data will travel across the slow system bus. Hence, while developing multithreaded applications one should try to minimize inter-thread communication, assigning independent data to the processors. This will result in communication only between the cache and processor. This is fast as the cache usually works at the same speed (or half) the speed of the processor.

Shared memory

There can be another approach to the above problem. What if the employee’s records are stored in a memory accessible by both the processors? This model is called a shared memory model. The earlier memory model — with each processor having its private memory - is called distributed memory model. In this model there is no need for communication between threads as both threads can access the shared memory.

However, each thread must lock access to the other thread while manipulating the data in the shared memory. If it does not, then the threads would be working on the same data set simultaneously, resulting in corrupted or inconsistent data. This locking presents an overhead to the processing as well as to the developer, who needs to take care of locking (and releasing locks after manipulation) in his code.

Advertisment

From the software point of view, not all applications can make use of multiple processors. Only those programs, which are multithreaded, can do that efficiently. Most programming languages come with a Thread API (a collection of library files for multithreading). So, if we develop the programs by making use of the thread libraries, then the program will be divided and executed on separate processors.

An example algorithm of a multithreaded program for a dual processor SMP machine

using

Main Program

{

….

….

/*Dividing the task and defining two subtasks for a dual processor system */

define task1

define task2

/*creating two threads named thread1 and thread2 and assigning them the task */

create thread1(task1)

create thread2(task2)

/*starting the threads, which in turn runs the task in each thread*/

start thread1(task1)

start thread2(task2)

….

….

}

thread1(task)

{

run task

}

thread2(task)

{

run task

}

Shekhar Govindrajan

Advertisment