TCP NOTES
This is announced during the tcp handshake:
MSS is announced (not really negotiated but just announced).
Window scaling is also announced. The default window size is 64kB which is far too small. That's way window scaling is ON 99% of the times.
SACK also announced in the 3 way handshake and also ON 99% of the times.
SACK is like a lookeahead acknowledgement while we wait for slow Bytes to arrive.
Example: “If I received Bytes 1,2,3 5,6 but not 4, I acknowledge 3 and 'selectively acknowledge' 5 and 6.”
How it works in practice: by appending to a duplicate acknowledgment packet a TCP option containing a range of noncontiguous data received
ACK is sent indicating that is has received cumulated data and is ready for the next segment
The ACK number for a packet is the packet's sequence number plus the data length. See this for a full explanation of ack and seq numbers: External Link
congestion window (CWND) is a sender imposed, internal variable, window that was implemented to avoid overrunning some routers in the middle of the network path. The sender, with each segment sent, increases the congestion window slightly, i.e. the sender will allow itself more outstanding sent data.
receive window (RWND) is the buffer for incoming data in the receiver that has not been processed yet by the application
Receiver sends out window sizes to the sender in the packet (wireshark “window size value”. The window sizes announce the number of bytes still free in the receiver buffer, i.e. the number of bytes the sender can still send without needing an acknowledgement from the receive
Is quite confusing because is the one that TRAVELS IN THE TCP HEADER. In books it appears as advertised window meaning that is Advertised by the Receiver
The three way handshake implies there are two clients and two servers «
TCP TIMERS:
Time Out Timer: sender waiting for the ack. If ack doesn't arrive, TCP rtx. Value of Time Out Timer adapts depending on the traffic in the network.
Time Wait Timer: used for the orderly close and discard or ports at the end of a session. Sender starts the time wait timer after sending the ACK for the second FIN segment.
Keep Alive Timer: If server stops hearing a customer for 2 hours starts sending 10 probes every 75 seconds.
Persistent Timer: Used to deal with deal with a zero-widow-size deadlock situation. Keeps sending data for a while even when the receiver has closed the window.
TCP CONGESTION CONTROL ALGORITHMS
Slow start (congestion control vs congestion avoidance): External Link
(Reno, Cubic, Tahoe, more recently, BBR + Vegas, , Westwood)
BBR (Bottleneck Bandwidth and RTT). Used in QUIC and HTTP/3 protocols. More modern and doesn't rely on packet loss, aiming to maximize throughput by actively probing network capacity. It's better suited for modern, high-speed networks.
CUBIC (cubic function) is default in Linux. It relies on packet loss and a cubic window growth, making it less aggressive than BBR but well-optimized for traditional networks.
BBR offers more efficient bandwidth usage and lower latency, especially in unpredictable networks, while CUBIC is robust in traditional high-speed environments.
sysctl net.ipv4.tcp_congestion_control # default is usually cubic or reno
sysctl net.ipv4.tcp_available_congestion_control # list available suites
sysctl -w net.ipv4.tcp_congestion_control=bbr # rebuilt suite so it takes latency as congestion, not packet losses (by google)
To test performance:
tc qdisc replace dev enp0s20f0 root netem loss 1.5% latency 70ms # introduces some latency and packet loss
WINDOWING:
When a port is not available and the connection is rejected, an ICMP unreachable message is sent and then a RST tcp packet
TCP OPTIMIZATION
https://www.extrahop.com/company/blog/2016/tcp-nodelay-nagle-quickack-best-practices/
NAGLE: Aim is to reduce the number of small packets sent over the network. You might want to fill up the truck instead of sending it just with one box, or not.. Nagle's algorithm and delayed ACKs. Hence Nagle's algorithm is undesirable in highly interactive environments.
Delayed ACK: is basically a bet taken by the destination betting 200 - 500 ms, that a new packet will arrive before the delayed ACK timer expires. Nagle's algorithm effectively only allows one packet to be actively transporting on the network at any given time, this tends to hold back traffic due to the interactions between the Nagle's algorithm and delayed ACKs.
FUN FACTS ABOUT TCP:
QUIC NOTES
-
1 or 0 handshake modes
Multiplexes hundreds of flows over single connection
It has a long connection index (each side chooses its counterpart index).
^^ allows reuse connection independently underlying protocol (ie IP changes when moving WiFi to 5G)