__**TCP NOTES**__ \\ This is announced during the tcp handshake:\\ * MSS is announced (not really negotiated but just announced). * Window scaling is also announced. The default window size is 64kB which is far too small. That's way window scaling is ON 99% of the times. * SACK also announced in the 3 way handshake and also ON 99% of the times. * SACK is like a lookeahead acknowledgement while we wait for slow Bytes to arrive. * Example: "If I received Bytes 1,2,3 5,6 but not 4, I acknowledge 3 and 'selectively acknowledge' 5 and 6." * How it works in practice: by appending to a duplicate acknowledgment packet a TCP option containing a range of noncontiguous data received \\ ACK is sent indicating that is has received cumulated data and is ready for the next segment The ACK number for a packet is the packet's sequence number plus the data length. See this for a **full explanation of ack and seq numbers**: [[https://packetlife.net/blog/2010/jun/7/understanding-tcp-sequence-acknowledgment-numbers/|External Link]] \\ * **__congestion window__** (CWND) is a **sender imposed, internal variable,** window that was implemented to avoid overrunning some routers in the middle of the network path. The **sender, with each segment sent, increases the congestion window slightly**, i.e. the sender will allow itself more outstanding sent data. * You can't 'get' that value directly from the capture file, as **it is NOT ADVERTISED, it lives in the sender** * **__receive window__** (RWND) is the buffer for incoming data in the receiver that has not been processed yet by the application * Receiver sends out window sizes to the sender in the packet (wireshark "window size value". The window sizes announce the number of bytes still free in the receiver buffer, i.e. the number of bytes the sender can still send without needing an acknowledgement from the receive * Is quite confusing because is the one that TRAVELS IN THE TCP HEADER. In books it appears as advertised window meaning that is **Advertised by the Receiver** \\ The three way handshake implies there are two clients and two servers << {{:network_stuff:3wayhandshakesimple.png?400|}} ---- TCP TIMERS: \\ * Time Out Timer: sender waiting for the ack. If ack doesn't arrive, TCP rtx. Value of Time Out Timer adapts depending on the traffic in the network. * Time Wait Timer: used for the orderly close and discard or ports at the end of a session. Sender starts the time wait timer after sending the ACK for the second FIN segment. * Keep Alive Timer: If server stops hearing a customer for 2 hours starts sending 10 probes every 75 seconds. * Persistent Timer: Used to deal with deal with a zero-widow-size deadlock situation. Keeps sending data for a while even when the receiver has closed the window. ---- __TCP CONGESTION CONTROL ALGORITHMS__ Slow start (congestion control vs congestion avoidance): [[https://blog.stackpath.com/glossary-cwnd-and-rwnd/|External Link]] \\ (Reno, Cubic, Tahoe, more recently, BBR + Vegas, , Westwood) \\ * BBR (Bottleneck Bandwidth and RTT). Used in QUIC and HTTP/3 protocols. More modern and **doesn't rely on packet loss**, aiming to maximize throughput by actively **probing network capacity**. It's better suited for modern, high-speed networks. * CUBIC (cubic function) is default in Linux. It relies on packet loss and a cubic window growth, making it less aggressive than BBR but well-optimized for traditional networks. * BBR offers more efficient bandwidth usage and lower latency, especially in unpredictable networks, while CUBIC is robust in traditional high-speed environments. * * sysctl net.ipv4.tcp_congestion_control # default is usually cubic or reno * sysctl net.ipv4.tcp_available_congestion_control # list available suites * sysctl -w net.ipv4.tcp_congestion_control=bbr # rebuilt suite so it takes latency as congestion, not packet losses (by google) To test performance: tc qdisc replace dev enp0s20f0 root netem loss 1.5% latency 70ms # introduces some latency and packet loss WINDOWING:\\ * MSS and window scaling is negotiated at the beginning. Normally ~*128 When a port is not available and the connection is rejected, an ICMP unreachable message is sent and then a RST tcp packet ---- __TCP OPTIMIZATION__\\ [[https://www.extrahop.com/company/blog/2016/tcp-nodelay-nagle-quickack-best-practices/]] \\ * **NAGLE**: Aim is to reduce the number of small packets sent over the network. You might want to fill up the truck instead of sending it just with one box, or not.. **Nagle's algorithm and delayed ACKs. Hence Nagle's algorithm is undesirable in highly interactive environments.**\\ * TCP_NODELAY socket option allows your network to bypass Nagle Delays by disabling Nagle's algorithm, and sending the data as soon as it's available * **Delayed ACK**: is basically a bet taken by the destination betting 200 - 500 ms, that a new packet will arrive before the delayed ACK timer expires. Nagle's algorithm effectively only allows one packet to be actively transporting on the network at any given time, this tends to hold back traffic due to the interactions between the Nagle's algorithm and delayed ACKs. * To disable Delayed ACKs, use the TCP_QUICKACK socket option. ---- **FUN FACTS ABOUT TCP**:\\ * reset flag is a rude way of finishing a connection. It can be used by scanners to detect closed/opened ports. Can be sent by the source or destination host, or a network device in transit such as a firewall. Avoids half closed connections due to missed FIN/ACKs etc. * An attacker in the middle can disrupt the communication between 2 peers. [[https://robertheaton.com/2020/04/27/how-does-a-tcp-reset-attack-work/|RST_attack]] * PUSH: is a flag that informs the receiver that all data has been sent. When we disable Nagle with TCP_NODELAY it also sends the push flag but is not exactly the same. [[http://smallvoid.com/article/winnt-tcp-push-flag.html|External Link]] ---- QUIC NOTES * You need decryption keys or won't see much [[https://youtu.be/fHBUOlvS3ts]] * 1 or 0 handshake modes * Multiplexes hundreds of flows over single connection * It has a long connection index (each side chooses its counterpart index). * ^^ allows reuse connection independently underlying protocol (ie IP changes when moving WiFi to 5G)