InfiniBand® is a network communications and storage system that makes use of high-speed serial links and processors to address the concerns of performance and reliability for high-performance computing. It is capable of addressing 64,000 nodes, and may provide speeds up to 2.5 gigabits per second (Gbps). This speed is about 2,500 million bits per second (Mbps), which may download a 1-gigabit file in less than 4 seconds.
The high-speed serial links now available, such as fiber optic, partly make InfiniBand® possible. In early computing, serial communications was considered too slow for fast data transfer rates, and computer buses, which were mostly parallel, were the only choice. Fast serial links and serial buses are very practical, and points of failure on hardware are less. In a 32-bit parallel bus, for instance, there are at least 32 sets of bit drivers and most often pressure contacts involved between interface cards. The probability of failure is then higher in parallel devices in this respect.
InfiniBand® also offers redundancy for increased reliability. The storage devices and hosts in InfiniBand® may be geographically distributed for optimized protection. For instance, a massive database application that requires 24 x 7, or 100%, uptime may consider many techniques, including site redundancy, server redundancy, and storage redundancy.
Site redundancy replicates an entire application site that works at the same time as the main application site, which may seem to waste computing and computer networks resources at first. The wisdom in the approach is observed when an operation in the primary application site is partially or totally disrupted. A backup site that always works together with the primary site is then promoted as the main site and keeps on processing until the main site is active and online.
Server redundancy replicates the primary server for ensuring continued operation in the event of a failure. Whether collocated or remote from the primary, the secondary server may be on parallel operation so that any major failure in primary server does not disrupt the service. In disk mirroring, the server local storage is written on two storage systems, while in site mirroring, the backup site does the job of the primary server and synchronizes with it.
A switched fabric communications link is able to deliver the required overall device bandwidths needed by supercomputers. Fabric refers to a relatively huge network of hosts and network devices that are interlinked to each other. In a switched fabric, remote high-speed storage and fast remote host services are enhanced by switching technologies that reduce computing overhead and collisions when packets of data are sent into the network.