Interconnect technologies

From HP-SEE Wiki

Jump to: navigation, search

Contents

Mellanox Infiniband

Mellanox has developed a new architecture, called Connect-IB, for high performance InfiniBand adapter. The new adapter doubles the throughput of the company’s FDR InfinBand gear, supporting speeds beyond 100 Gbps. With this adapter, Mellanox is attempting to re-sync the interconnect with the performance curve of the large clusters, with the goal to provide a balanced ratio of computational power and network bandwidth. Connect-IB was designed as a foundational technology for future exascale systems and ultra-scale datacenters.

Connect-IB increases performance for both MPI- and PGAS-based applications. The architecture also features the latest GPUDirect RDMA technology, known as GPUDirect v3. This allows direct GPU-to-GPU communication, bypassing the OS and CPU. New adapters can process up to 130 million messages per second, while current generation delivers only 33 million messages per second. The new generation of adapters will have latency of 0.7 ms, which is equal to that of the latest Connect-X hardware for FDR InfiniBand.

Prototypes are currently working at Mellanox labs and samples will be sent to customers in Q3, with general availability expected in early Q4 of 2012.

Links:

Connect-IB

Connect-IB adapter cards provide the highest performing and most scalable interconnect solution for server and storage systems. Maximum bandwidth is delivered across PCI Express 3.0 x16 and two ports of FDR InfiniBand, supplying more than 100Gb/s of throughput together with consistent low latency across all CPU cores. Connect-IB also enables PCI Express 2.0 x16 systems to take full advantage of FDR, delivering at least twice the bandwidth of existing PCIe 2.0 solutions.

  • 100Gb/s interconnect throughput
  • Unlimited scaling with new transport service
  • 4X higher message rate

Top500 news

The InfiniBand Trade Association (IBTA), a global organization dedicated to maintaining and furthering the InfiniBand specification, has announced that, for the first time, InfiniBand has exceeded all other interconnect technologies on the TOP500 list of the world’s fastest supercomputers. The latest list, available at top500.org, was released June 18, 2012 and shows that InfiniBand is now being utilized by 210 out of 500 clusters listed at the TOP500.

Intel interconnect solutions

Intel is planning to integrate fabric controllers with its server processors. The company is planning to use the recently acquired interconnect technologies from Cray, QLogic and Fulcrum to deliver chips that put what is essentially a network interface card (NIC) onto the processor die. As with other types of processor integration, the idea is to deliver more capability - greater performance, scalability and energy efficiency.

Links:

True Scale Infiniband

True Scale is a superset of IB that is focused on small packets rather than the larger ones that IB was originally intended to tackle in the data center. To do this, it replaces the traditional Verbs based MPI libraries with a layer that Intel calls Performance Scaled Messaging (PSM). Verbs and MPI libraries can still be used on your True Scale cluster, but the optimization efforts were put in to PSM.

Links:

Ethernet

RDMA on Ethernet

RDMA over Converged Ethernet (RoCE) is a network protocol that allows remote direct memory access over an Ethernet network. RoCE is a link layer protocol and hence allows communication between any two hosts in the same Ethernet broadcast domain. Although the RoCE protocol benefits from the characteristics of a converged Ethernet network, the protocol can also be used on a traditional or non-converged Ethernet network. In general, RoCE is aimed at users of clustered computing setups who might otherwise have opted for InfiniBand because of its speed and agility, but who are already have Ethernet -- either to maintain compatibility with existing storage networks and compute infrastructure or because their local datacenter already has a big investment in Ethernet technology, expertise and management tools.

Links:

40GbE Server and Storage Clusters

The 40 Gigabit Ethernet interconnect solutions with RoCE (RDMA over Converged Ethernet) support have been optimized to deliver the highest performance for compute and storage intensive applications. The clusters deliver a more than 80 percent application performance increase compared to 10GbE based clusters. For storage access, it delivers 4X faster storage throughput, enabling high storage density and dramatic savings in CAPEX and OPEX. CAE (Computational Aided Engineering) applications demonstrated more than 80 percent performance increase. ConnectX-3 PCI Express 3.0 40GbE NICs and SwitchX 40GbE switch systems are available.

iWARP

InfiniBand has been preferred over Ethernet for HPC applications in the past few years, mostly because InfiniBand has native support for Remote Direct Memory Access (RDMA) which is important for MPI implementations. The situation has change with the developement of RDMA solution over Ethernet. One of the solutions, iWARP (Internet Wide Area RDMA Protocol) enables MPI applications to run unmodified over the convenient Ethernet technology. Standardized by the Internet Engineering Task Force (IETF) and supported by the industry’s leading 10GbE Ethernet vendors, iWARP works with existing Ethernet switches and routers to deliver low latency fabric technology for high-­performance data centers. Offering the same API to applications and inboxed within the same middleware distributions, iWARP can be dropped in seamlessly in place of the esoteric fabric. While current iWARP solutions are 10Gbps Ethernet-based, higher speed 40Gbps and 100Gbps implementations will be soon available. iWARP offers competitive application level performance at 10Gbps against the latest FDR IB speeds.

Main iWARP features include:

  • low latency for supporting high-performance computing over TCP/IP
  • multivendor solution that works with legacy switches
  • built on top of IP, making it routable and scalable from just a few nodes to thousands of collocated or geographically dispersed endpoints
  • built on top of TCP, making it highly reliable and resilient to adverse network conditions
  • uses the familiar TCP/IP/Ethernet stack and therefore leverages all the existing traffic monitoring and debugging tools
  • allows RDMA and MPI applications to be ported from InfiniBand (IB) interconnect to IP/Ethernet interconnect in a seamless fashion

Links:

Open Ethernet initiative

Mellanox Technologies, a supplier of high-performance, end-to-end interconnect solutions for data center servers and storage systems, has launched the “Generation of Open Ethernet” initiative, an alternative approach to traditional closed-code Ethernet switches, which provides customers with full flexibility and freedom to custom-design their data center in order to optimize utilization, efficiency and overall return on investment. For years, Ethernet switch vendors have locked down their solutions, providing no choice or flexibility for their users. In the era of cloud computing and Web 2.0 applications, IT managers must control their data center network in order to achieve higher levels of utilization and scalability. The trend of open source traditionally focused only on operating systems, standards and applications. Mellanox is taking this initiative and is leading the “Generation of Open Ethernet” to enable the next era of open data centers by expanding open source into the data center infrastructure.

Links:

Personal tools