This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| network_stuff:machine_learning:networking [2025/07/15 18:06] – jotasandoku | network_stuff:machine_learning:networking [2025/07/15 21:08] (current) – jotasandoku | ||
|---|---|---|---|
| Line 52: | Line 52: | ||
| ==== LAN PROTOCOLS IN AI NETWORKING ==== | ==== LAN PROTOCOLS IN AI NETWORKING ==== | ||
| - | === NVIDIA | + | === InfiniBand === |
| **InfiniBand** is a key technology for AI workloads, widely used in high-performance computing (HPC) and AI clusters for its **ultra-low latency**, **high throughput**, | **InfiniBand** is a key technology for AI workloads, widely used in high-performance computing (HPC) and AI clusters for its **ultra-low latency**, **high throughput**, | ||
| Line 89: | Line 89: | ||
| * **Interoperability**: | * **Interoperability**: | ||
| * **Compatibility with AI Workloads**: | * **Compatibility with AI Workloads**: | ||
| - | * QP (Queue Pair): is a fundamental concept representing an RDMA connection. It consists of a send queue and a receive queue. | + | * __QP (Queue Pair)__: is a fundamental concept representing an RDMA connection. It consists of a send queue and a receive queue. |
| - | * BTH Base Transport | + | * __BTH Base Transport |
| * Packet Sequence Number (PSN), QP Number, and acknowledgment request bits. | * Packet Sequence Number (PSN), QP Number, and acknowledgment request bits. | ||
| \\ | \\ | ||
| - | == ROCE VERBS == | + | Packet structure: |
| - | **TODO** | + | Ethernet Header → IP Header → UDP Header → RoCE Packet (BTH + Payload) |
| + | The Base Transport Header (BTH) is a key component of the InfiniBand transport layer. It contains essential information for delivering messages in InfiniBand or RDMA over Converged Ethernet (RoCE). | ||
| + | |||
| + | \\ | ||
| + | Specifies the operation type (e.g., RDMA read, write, send, atomic). | ||
| + | * Solicited Event Indicator (SE): Indicates if a completion event is required. | ||
| + | * Migration State (M): Manages Queue Pair (QP) state transitions. | ||
| + | * P_Key: Identifies the partition the packet belongs to. | ||
| + | * Destination QP: Specifies the target Queue Pair for the message. | ||
| + | * Packet Sequence Number (PSN): Ensures ordered delivery and detects packet loss. | ||
| + | * Acknowledgment Request (A): Signals if an acknowledgment is needed for reliable transport. | ||
| + | * Resync Request (R): Handles retransmissions in reliable modes. | ||
| + | |||
| + | |||
| + | == RDMA VERBS == | ||
| + | They are the **same** for both infiniband and rocev2 | ||
| + | * ibv_alloc_pd: | ||
| + | * ibv_reg_mr: Registers a memory region for RDMA operations. | ||
| + | * ibv_create_cq: | ||
| + | * ibv_create_qp: | ||
| + | * ibv_modify_qp: | ||
| + | * ibv_post_send: | ||
| + | * ibv_post_recv: | ||
| + | * ibv_poll_cq: | ||
| + | * ibv_query_device: | ||
| + | * ibv_get_device_list: | ||