Adapting code to memory architecture

From HP-SEE Wiki

(Difference between revisions)

Revision as of 10:31, 29 March 2012

Shared memory

Symmetric multiprocessing

ccNUMA

Section contributed by NIIFI

Non-Uniform Memory Access (NUMA) is a computer memory design used in Multiprocessing, where the memory access time depends on the memory location relative to a processor. Under NUMA, a processor can access its own local memory faster than non-local memory, that is, memory local to another processor or memory shared between processors. Nearly all CPU architectures use a small amount of very fast non-shared memory known as cache to exploit locality of reference in memory accesses. With NUMA, maintaining cache coherence across shared memory has a significant overhead. NUMA computers are using special-purpose hardware to maintain cache coherence, and thus class as "cache-coherent NUMA", or ccNUMA. Typically, this takes place by using inter-processor communication between cache controllers to keep a consistent memory image when more than one cache stores the same memory location. For this reason, ccNUMA may perform poorly when multiple processors attempt to access the same memory area in rapid succession. Because the Linux operating system has a tendency to migrate processes, the importance of using a placement tool becomes more apparent.

Useful placement tools on the SGI UV machine

dplace

You can use the dplace command to bind a related set of processes to specific CPUs or nodes to prevent process migration. By default, memory is allocated to a process on the node on which the process is executing. If a process moves from node to node while it running, a higher percentage of memory references are made to remote nodes. Because remote accesses typically have higher access times, process performance can be diminished. CPU instruction pipelines also have to be reloaded.

Using the dplace command with MPI Programs:

 mpirun -np 12 /usr/bin/dplace -s1 -c 24-35 ./connectivity

Using dplace command with OpenMP Programs:

 export OMP_NUM_THREADS 6
 /usr/bin/dplace -x6 –c 24-29 ./program</nowiki>

numactl

It runs processes with a specific NUMA scheduling or memory placement policy. The policy is set for command and inherited by all of its children. The processes will be bind to specific memory nodes in this case. Example for numactl:

 /usr/bin/numactl –m 5 ./program

The program can access only the number 5 memory node.

cpuset

The cpuset file system is a pseudo-file-system interface to the kernel cpuset mechanism, which is used to control the processor placement and memory placement of processes. It is commonly mounted at /dev/cpuset. Automatic cpuset allocation has been configured on the Pecs UV [pecs] machine which is integrated to the Sun Grid Engine (SGE) scheduler. If a new job is starting then the SGE create a new cpuset for the job which contain only the asked CPUs and their local memory nodes. Other jobs cannot access these.

RESOURCES

[pecs] http://wiki.hp-see.eu/index.php/Resource_centre_Pecs_SC Pecs UV

Adapting code to memory architecture

From HP-SEE Wiki

Revision as of 10:31, 29 March 2012

Contents

Shared memory

Symmetric multiprocessing

ccNUMA

dplace

numactl

cpuset

RESOURCES

Distributed memory

Views

Personal tools

Navigation

Search

Toolbox

@@ Line 4: / Line 4: @@
 === ccNUMA ===
+''Section contributed by NIIFI''
 Non-Uniform Memory Access (NUMA) is a computer memory design used in Multiprocessing, where the memory access time depends on the memory location relative to a processor. Under NUMA, a processor can access its own local memory faster than non-local memory, that is, memory local to another processor or memory shared between processors. Nearly all CPU architectures use a small amount of very fast non-shared memory known as cache to exploit locality of reference in memory accesses. With NUMA, maintaining cache coherence across shared memory has a significant overhead. NUMA computers are using special-purpose hardware to maintain cache coherence, and thus class as "cache-coherent NUMA", or ccNUMA. Typically, this takes place by using inter-processor communication between cache controllers to keep a consistent memory image when more than one cache stores the same memory location. For this reason, ccNUMA may perform poorly when multiple processors attempt to access the same memory area in rapid succession. Because the Linux operating system has a tendency to migrate processes, the importance of using a placement tool becomes more apparent.
@@ Line 9: / Line 11: @@
 Useful placement tools on the SGI UV machine
-== dplace ==
+=== dplace ===
-You can use the dplace command to bind a related set of processes to specific CPUs or nodes to prevent process migration. By default, memory is allocated to a process on the node on which the process is executing. If a process moves from node to node while it running, a higher percentage of memory references are made to remote nodes. Because remote accesses typically have higher access times, process performance can be diminished. CPU instruction pipelines also have to be reloaded.
+You can use the ''dplace'' command to bind a related set of processes to specific CPUs or nodes to prevent process migration. By default, memory is allocated to a process on the node on which the process is executing. If a process moves from node to node while it running, a higher percentage of memory references are made to remote nodes. Because remote accesses typically have higher access times, process performance can be diminished. CPU instruction pipelines also have to be reloaded.
 Using the dplace command with MPI Programs:
-mpirun -np 12 /usr/bin/dplace -s1 -c 24-35 ./connectivity
+  mpirun -np 12 /usr/bin/dplace -s1 -c 24-35 ./connectivity
 Using dplace command with OpenMP Programs:
-export OMP_NUM_THREADS 6
+  export OMP_NUM_THREADS 6
+  /usr/bin/dplace -x6 –c 24-29 ./program</nowiki>
-/usr/bin/dplace -x6 –c 24-29 ./program</nowiki>
+=== numactl ===
-== numactl ==
 It runs processes with a specific NUMA scheduling or memory placement policy. The policy is set for command and inherited by all of its children. The processes will be bind to specific memory nodes in this case.
 Example for numactl:
-/usr/bin/numactl –m 5 ./program
+  /usr/bin/numactl –m 5 ./program
 The program can access only the number 5 memory node.
-== cpuset ==
+=== cpuset ===
-The cpuset file system is a pseudo-file-system interface to the kernel cpuset mechanism, which is used to control the processor placement and memory placement of processes. It  is commonly mounted at /dev/cpuset.
+The cpuset file system is a pseudo-file-system interface to the kernel cpuset mechanism, which is used to control the processor placement and memory placement of processes. It  is commonly mounted at ''/dev/cpuset''.
-Automatic cpuset allocation has been configured on the [http://wiki.hp-see.eu/index.php/Resource_centre_Pecs_SC Pecs UV] machine which is integrated to the Sun Grid Engine (SGE) scheduler. If a new job is starting then the SGE create a new cpuset for the job which contain only the asked CPUs and their local memory nodes. Other jobs cannot access these.
+Automatic cpuset allocation has been configured on the Pecs UV [pecs] machine which is integrated to the Sun Grid Engine (SGE) scheduler. If a new job is starting then the SGE create a new cpuset for the job which contain only the asked CPUs and their local memory nodes. Other jobs cannot access these.
+=== RESOURCES ===
+[pecs] http://wiki.hp-see.eu/index.php/Resource_centre_Pecs_SC Pecs UV
 == Distributed memory ==