MVAPICH

From HP-SEE Wiki

Revision as of 09:07, 11 April 2012 by Dusan (Talk | contribs)
Jump to: navigation, search

Section contributed by IPB

InfiniBand, 10GigE/iWARP and RDMA over Converged Ethernet (RoCE) are emerging as high-performance networking technologies which deliver low latency and high bandwidth to HPC users and, in addition, they are also achieving widespread acceptance due to their open standards. MVAPICH is an open-source MPI implementation developed in the Network-Based Computing Laboratory (NBCL) of the Ohio State University which exploit the novel features and mechanisms of mentioned networking technologies. Currently, there are two versions of this MPI library: MVAPICH with MPI-1 semantics and MVAPICH2 with MPI-2 semantics.

These MPI implementations are used by many organizations world wide (national laboratories, universities and industry) and several InfiniBand systems using MVAPICH/MVAPICH2 are present in the TOP 500 ranking. Many InfiniBand, 10GigE/iWARP and RoCE vendors, server vendors, systems integrators and Linux distributors have been incorporating MVAPICH/MVAPICH2 into their software stacks. MVAPICH and MVAPICH2 are also available with the Open Fabrics Enterprise Distribution (OFED) stack (www.openfabrics.org) and through public anonymous MVAPICH SVN. Both MVAPICH and MVAPICH2 distributions are available under BSD licensing.

At the Institute of Physics Belgrade, MVAPICH MPI implementations are used within the tPARADOX cluster as it provides InfiniBand interconnect between its servers.

Contents

MVAPICH features

MVAPICH is an implementation of MPI-1 standard which is based on MPICH and MVICH (MPI for Virtual Interface Architecture). The latest release is MVAPICH 1.2 (includes MPICH 1.2.7). MVAPICH 1.2 supports the following underlying transport interfaces:

  • High-Performance support with scalability for OpenFabrics/Gen2 interface to work with InfiniBand and other RDMA interconnects.
  • High-Performance support with scalability for OpenFabrics/Gen2-RDMAoE interface.
  • High-Performance support with scalability (for clusters with multi-thousand cores) for OpenFabrics/Gen2-Hybrid interface to work with InfiniBand.
  • Shared-Memory only channel which is useful for running MPI jobs on multi-processor systems without using any high-performance network. For example, multi-core servers, desktops, and laptops; and clusters with serial nodes.
  • The InfiniPath interface for InfiniPath adapters.
  • The standard TCP/IP interface (provided by MPICH) to work with a range of networks. This interface can be used with IPoIB support of InfiniBand also.

In addition, MVAPICH 1.2 supports many features for high performance, scalability portability and fault tolerance. It also supports a wide range of platforms (architecture, OS, compilers and InfiniBand adapters).

MVAPICH2 features

This is an MPI-2 implementation (conforming to MPI 2.2 standard) which includes all MPI-1 features. It is based on MPICH2 and MVICH. The latest release is MVAPICH2 1.8 (includes MPICH2 1.4.1p1). The current release supports the ten underlying transport interfaces and some of them are:

  • OFA-IB-CH3: This interface supports all InfiniBand compliant devices based on the OpenFabrics Gen2 layer. This interface has the most features and is most widely used.
  • OFA-IB-Nemesis: This interface supports all InfiniBand compliant devices based on the OpenFabrics libibverbs layer with the emerging Nemesis channel of the MPICH2 stack.
  • OFA-RoCE-CH3: This interface supports the emerging RoCE (RDMA over Convergence Ethernet) interface for Mellanox ConnectX-EN adapters with 10GigE switches.
  • Shared-Memory-CH3: This interface provides native shared memory support on multi-core platforms where communication is required only within a node. Such as SMP-only systems, laptops, etc.
  • TCP/IP-CH3: The standard TCP/IP interface (provided by MPICH2) to work with a range of network adapters supporting TCP/IP interface. This interface can be used with IPoIB (TCP/IP over InfiniBand network) support of InfiniBand also.
  • Shared-Memory-Nemesis: This interface provides native shared memory support on multi-core platforms where communication is required only within a node. Such as SMP-only systems, laptops, etc.

MVAPICH2 supports a wide range of platforms (architecture, OS, compilers, InfiniBand adapters (Mellanox and QLogic), iWARP adapters, RoCE adapters and network adapters supporting uDAPL interface). It also provides many features including high-performance communication support for NVIDIA GPU with IPC, collective and non-contiguous datatype support, shared memory interface, fast process-level fault-tolerance with checkpoint-restart, etc.

MVAPICH and MVAPICH2 usage

Compiling MPI applications

MVAPICH and MVAPICH2 provide a variety of MPI compilers (wrappers) to support applications written in different programming languages. mpicc, mpif77, mpiCC, or mpif90 can be used to compile applications and correct compiler should be selected depending upon the programming language of the MPI application. These compilers are available in the bin directory of the MVAPICH installation directory.

 # Compiling MPI application in C 
 $ mpicc -o mpi_app.x mpi_app.c 
 
 # Compiling MPI application in C++ 
 $ mpiCC -o mpi_app.x mpi_app.cc 
 
 # Compiling MPI application in Fortran 77 
 $ mpif77 -o mpi_app.x mpi_app.f 
 
 # Compiling MPI application in Fortran 90 
 $ mpif90 -o mpi_app.x mpi_app.f90
Personal tools