This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| network_stuff:machine_learning:networking [2025/07/15 18:07] – jotasandoku | network_stuff:machine_learning:networking [2025/07/15 21:08] (current) – jotasandoku | ||
|---|---|---|---|
| Line 52: | Line 52: | ||
| ==== LAN PROTOCOLS IN AI NETWORKING ==== | ==== LAN PROTOCOLS IN AI NETWORKING ==== | ||
| - | === NVIDIA | + | === InfiniBand === |
| **InfiniBand** is a key technology for AI workloads, widely used in high-performance computing (HPC) and AI clusters for its **ultra-low latency**, **high throughput**, | **InfiniBand** is a key technology for AI workloads, widely used in high-performance computing (HPC) and AI clusters for its **ultra-low latency**, **high throughput**, | ||
| Line 93: | Line 93: | ||
| * Packet Sequence Number (PSN), QP Number, and acknowledgment request bits. | * Packet Sequence Number (PSN), QP Number, and acknowledgment request bits. | ||
| \\ | \\ | ||
| - | == ROCE VERBS == | + | Packet structure: |
| - | **TODO** | + | Ethernet Header → IP Header → UDP Header → RoCE Packet (BTH + Payload) |
| + | The Base Transport Header (BTH) is a key component of the InfiniBand transport layer. It contains essential information for delivering messages in InfiniBand or RDMA over Converged Ethernet (RoCE). | ||
| + | |||
| + | \\ | ||
| + | Specifies the operation type (e.g., RDMA read, write, send, atomic). | ||
| + | * Solicited Event Indicator (SE): Indicates if a completion event is required. | ||
| + | * Migration State (M): Manages Queue Pair (QP) state transitions. | ||
| + | * P_Key: Identifies the partition the packet belongs to. | ||
| + | * Destination QP: Specifies the target Queue Pair for the message. | ||
| + | * Packet Sequence Number (PSN): Ensures ordered delivery and detects packet loss. | ||
| + | * Acknowledgment Request (A): Signals if an acknowledgment is needed for reliable transport. | ||
| + | * Resync Request (R): Handles retransmissions in reliable modes. | ||
| + | |||
| + | |||
| + | == RDMA VERBS == | ||
| + | They are the **same** for both infiniband and rocev2 | ||
| + | * ibv_alloc_pd: | ||
| + | * ibv_reg_mr: Registers a memory region for RDMA operations. | ||
| + | * ibv_create_cq: | ||
| + | * ibv_create_qp: | ||
| + | * ibv_modify_qp: | ||
| + | * ibv_post_send: | ||
| + | * ibv_post_recv: | ||
| + | * ibv_poll_cq: | ||
| + | * ibv_query_device: | ||
| + | * ibv_get_device_list: | ||