User Tools

Site Tools


network_stuff:infiniband

This is an old revision of the document!


INFINIBAND:
From wikipedia: InfiniBand (IB) is a computer networking communications standard used in high-performance computing that features very high throughput and very low latency.[..]. InfiniBand provides remote direct memory access (RDMA) capabilities for low CPU overhead. More info here
The key here is to understand that infiniband is designed so servers and storage talk directly, roughly said, from memory record to memory record. They don't use the classical network stack, allowing them much faster rates, comparable to an internal memory bus. They do this via the remote direct memory access protocol (RDMA).
We can run routing protocols over Infiniband but, in our case, the setup is very simple. Mellanox infiniband switches create a high performance between a cluster of the servers and the DDN storage (controllers in infiniband jargon)

Terms:

  • RDMA provides access to the memory from one computer to the memory of another computer without involving either computer’s operating system. This technology enables high-throughput and low-latency networking with low CPU utilization.
    • Mellanox provides RDMA via the OFED package
  • lid : local indentifier (All devices in a subnet have a Local Identifier (LID)). Routing between different subnets is done on the basis of a Global Identifier (GID)
  • NSD (Network Shared Disks): In our context, NSD is the server that connects to the storage via the Mellanox switch. The servers share the NSD's to the clients, creating some sort of distributed logical disk (a bit like the hyperflex technology).
  • SM (Subnet Manager): It performs the InfiniBand specification's required tasks for initializing InfiniBand hardware. One SM must be running for each InfiniBand subnet. It's run by the OpenSM daemon which can run bith in the switches and the servers
    • SM master is the node truly acting as SM. The node with the highest priority [0-15] wins.
    • In our setup, servers all have priority 14 while switch has priority 15.
  • MAD: Infiniband management datagrams. They use RMPP (Reliable Multi Packet Protocol):
  • SRP : Discovers and connects to InfiniBand SCSI RDMA Protocol (SRP) targets in an IB fabric.
  • sysimgguid : system identifies
  • caguid : nic (hca) identifier

Useful MLNX-OS commands:

> sh interfaces | i state
! Below is under privilege exec mode
show run
show interfaces ib status  ! note is under config mode
show guids  ! To see the switch group identifier (like the switch main mac address)
fae sminfo  ! To show who is acting as sm master. Note it can be a server or the switch itself. If the latter, 'show guids; and 'fae sminfo' thrown the same value

Useful server side ib commands

ibstatus  # server infiniband (ib) interfaces status

Terms:


SUBNET MANAGER (opensm)
Infiniband subnet manager works in two planes:

  • SM-config: For config sync. It happens over mgmt network and relates to configuration and user management
  • smnode-OpenSM : Cluster master. SM-master. opensm is a software entity required to run for in order to initialize the InfiniBand hardware (at least one per each InfiniBand subnet).
    • sm keeps forwarding state; handout link identifiers (lid, l2 identifier); it calculates routes (doesn't apply to us too much)

Tshoot commands:

show ib smnodes
show ib smnode nyzsfsll51 sm-state
show guids   # so we can identify the macs
fae sminfo
fae ibnetdiscover  # this is 'scanning' all fabric and gives us a 'topology' of all elements found. a bit like lldp in ethernet.
show ib smnode  NYZSFSLL02 ha-role    # shows the current sm ha mode
If we look at the command prompt of thw two switches we see :
server1[serversmname: standby] #
server2 [serversmname: master] #
^^ the master/standby refers to the SM-config master node (the one just coordinating the configuration syncup). It does not refer to the 'smnode-OpenSM-cluster-master'.

Initial setup:
https://docs.mellanox.com/display/MLNXOSv381000/Getting+Started


UPGRADE PROCEDURE:
https://community.mellanox.com/s/article/howto-upgrade-switch-os-software-on-mellanox-switch-systems (The device has two partitions. We can install the new OS in the 'backup' partition and instruct the device to boot from it).
If not really a backup one, is the 'Active Image, partition of next boot', we set it up with image boot next

  • Configuration backup.
    • Better via UI: Setup → Configuration → Configuration files (Active configuration file + Binary configuration file)
      • If we needed to restore this configuration, we do:

configuration fetch <download url>

configuration switch-to <filename> # this is for binary configuration
configuration text fetch <download url>
configuration text file ss apply verbose fail-continue  # this is for the text configuration

enable 
image fetch scp://[username:password@IP/image]
image install im...
image boot next
reload

Downgrade software: https://docs.mellanox.com/display/MLNXOSv391906/Downgrading+OS+Software

network_stuff/infiniband.1653047329.txt.gz · Last modified: (external edit) · Currently locked by: 216.73.216.48