User Tools

Site Tools


network_stuff:machine_learning:networking:ai:networking

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
network_stuff:machine_learning:networking:ai:networking [2026/02/01 15:25] jotasandokunetwork_stuff:machine_learning:networking:ai:networking [2026/02/01 17:37] (current) jotasandoku
Line 55: Line 55:
 [[https://camarreal.dedyn.io/doku.php?id=network_stuff:machine_learning:networking:ai:infiniband]] [[https://camarreal.dedyn.io/doku.php?id=network_stuff:machine_learning:networking:ai:infiniband]]
  
-=== ROCEV2 ===+==== ROCEV2 =====
 [[ https://netdevconf.info/0x19/docs/netdev-0x19-paper18-talk-slides/netdev-0x19-AI-networking-RoCE-and-netdev.pdf ]] [[ https://netdevconf.info/0x19/docs/netdev-0x19-paper18-talk-slides/netdev-0x19-AI-networking-RoCE-and-netdev.pdf ]]
 \\ \\
Line 67: Line 67:
   * **Compatibility with AI Workloads**: Like InfiniBand, **RoCE** supports high-speed, low-latency, and lossless communication, making it ideal for distributed AI workloads such as training deep learning models across multiple GPUs or nodes.   * **Compatibility with AI Workloads**: Like InfiniBand, **RoCE** supports high-speed, low-latency, and lossless communication, making it ideal for distributed AI workloads such as training deep learning models across multiple GPUs or nodes.
   * __QP (Queue Pair)__: is a fundamental concept representing an RDMA connection. It consists of a send queue and a receive queue.   * __QP (Queue Pair)__: is a fundamental concept representing an RDMA connection. It consists of a send queue and a receive queue.
-  * __BTH Base Transport Header__: is a key component within RoCEv2 packets, carrying essential information like:#+  * __InfiniBand Base Transport Header (IB BTH)__: is a key component within RoCEv2 packets, carrying essential information like. It is the same as in Infiniband (but now is inside IP-UDP:#
     * Packet Sequence Number (PSN), QP Number, and acknowledgment request bits.     * Packet Sequence Number (PSN), QP Number, and acknowledgment request bits.
 +    * **Version:** Indicates the version of the InfiniBand protocol being used.
 +    * **Reserved:** A field reserved for future use or alignment purposes.
 +    * **Packet Length:** Specifies the total length of the packet in bytes.
 +    * **Class Version:** Indicates the class and version of the transport protocol.
 +    * **Operation (OpCode): Defines the type of operation being performed (e.g., WRITE, READ, SEND, RECEIVE).**
 +    * **Transaction ID:** A unique identifier for tracking requests and responses.
 +    * **Destination Queue Pair (QP):** Identifies the destination queue used for receiving the packet.
 +
 \\ \\
 Packet structure: Packet structure:
Line 99: Line 107:
  
  
-=== Ultra Ethernet === is an evolving concept that builds on **RoCE** to create even more robust, low-latency, and lossless Ethernet environments. Companies like **Nvidia** and **Arista** are leading the charge with **Ultra Ethernet** to create an optimized Ethernet fabric for AI workloads, where predictable, lossless communication is key.+===**Congestion control** in ROCEV2 === 
 +  * DCQCN 
 +    * PFC 
 +    * ECN 
 + 
 + 
 + 
 +==== Ultra Ethernet ==== is an evolving concept that builds on **RoCE** to create even more robust, low-latency, and lossless Ethernet environments. Companies like **Nvidia** and **Arista** are leading the charge with **Ultra Ethernet** to create an optimized Ethernet fabric for AI workloads, where predictable, lossless communication is key. 
  
 **Link:** [[https://ultraethernet.org/ultra-ethernet-specification-update/|Ultra Ethernet Specification]] **Link:** [[https://ultraethernet.org/ultra-ethernet-specification-update/|Ultra Ethernet Specification]]
network_stuff/machine_learning/networking/ai/networking.1769959541.txt.gz · Last modified: by jotasandoku