Documentation - The Transport Layer
Introduction
The transport layer is a central part of the layered network architecture. It has the critical role of providing communication services directly to application processes (i.e. logical communication) running on different hosts.
Application processes use the logical communication provided by the transport layer to send messages to each other, without worrying about the details of the physical infrastructure used to transport these messages (we don’t have to worry about possible NATs or port forwarding).
Terminology
Segment
: name given to the (T)PDU (Transport Protocol Data Unit) of the transport layerDatagram
: a term used to refer to UDP segmentsSocket
: communication interface offered by the transport layer. Gateway through which data passes from the network to the process and vice versa.
Demultiplexing / Multiplexing
The most fundamental task of UDP and TCP is to extend the IP delivery service between two end systems to a delivery service between two processes running on the end systems. The extension of host-to-host transmission to process-to-process transmission is called transport layer multiplexing and demultiplexing.
The goal here is to have a solution to allow several processes (e.g. YouTube + Zoom + Direct Download) to receive the data directly using the network layer services at the same time.
Okay, let’s start again:
- Demultiplexing
💡 The task of routing data from a transport layer segment to the correct socket.
🏗️ Network layer to multiple applications
By analogy, receiving a letter at home and looking at who exactly the letter is addressed to (e.g. my brother) and passing it directly to him is demultiplexing.
⚠️ To transmit to the right application (demultiplexing), UDP uses the tuple (IPdest,port_dest) while TCP uses the quadruplet (IPsource,port_src,IPdest,port_dest).
- Multiplexing
💡 The task of gathering chunks of data on the source host from different sockets, encapsulating each chunk of data with header information (to be used later in demultiplexing) to create segments, and passing the segments to the network layer.
🏗️️ Multiple applications to one network layer
By analogy, collecting all the letters our family wants to send and giving them to the letter carrier is multiplexing.
💡 Sockets have unique identifiers, port numbers, that identify the source and destination process.
-
Each port number is written on 16bits from 0 to 65535.
-
0-1023
: reserved for standard protocols🤔 On Unix-like operating systems, a process must be running with superuser privileges in order to bind a network socket to an IP address using one of these ports.
-
-
A more exhaustive list can be found here
1024-49151
: user or registered ports49152-65535
: dynamic or temporary ports
💭 Each socket on the host could be assigned a port number, and when a segment arrives at the host, the transport layer examines the destination port number in the segment and directs the segment to the corresponding socket. The data from the segment then passes through the socket into the attached process. This is basically how UDP works. However, it is a bit more subtle for TCP.
UDP: stateless unconnected transport
👉 Routes at best, stateless and connectionless. It does not log in and does not correct errors (checksum only).
In itself, when you choose to use UDP instead of TCP, the application will talk almost directly with the IP protocol.
-
And how does it work?
UDP takes the messages from the application process, attaches the source and destination port number fields for the multiplexing/demultiplexing service, and adds two other small fields, and then transmits the resulting datagram to the network layer. The network layer encapsulates the transport layer datagram in an IP packet, and then makes its best attempt to deliver the datagram to the destination host. If the datagram reaches the destination host, the UDP protocol uses the destination port number to forward the segment data to the appropriate application process.
-
DNS is a good example of a layer 4 (i.e. Application) protocol that uses UDP.
When a host’s DNS application wishes to perform a query, it constructs a DNS query message and forwards the message to UDP. Without performing a handshake (https://en.wikipedia.org/wiki/Handshake_(computing)) with the UDP entity running on the destination end system, the host-side UDP adds header fields to the message and forwards the resulting datagram to the network layer. The network layer encapsulates the UDP datagram into a packet and sends it to a name server. The client’s DNS application then waits for a response to its query. If it does not receive a response (perhaps because the underlying network has lost the query or response), it may try to resend the query, send it to another DNS, or inform the invoking application that it cannot get a response.
-
The UDP segment structure is defined in RFC 768
🤔 One may wonder why UDP provides a checksum since many link layer protocols (including the well-known Ethernet protocol) also provide error checking. The reason is that there is no guarantee that all links between the source and the destination provide error correction. And even then, an error can be introduced at the memory level of a router.
Since IP is supposed to work on top of just about any Layer 2 protocol, it is useful for the transport layer to provide error checking as a security measure. Although UDP provides error checking, it does nothing to recover from an error. Some implementations of UDP simply discard the damaged datagram, others forward it to the application with a warning.
TCP: connected transport with state
👉 Routes with a state (i.e. waits for a response) with establishment of a connection.
💡 With TCP, unlike UDP, you can communicate with two different processes on the same port as long as you have a different source IP!
TCP is said to be connection oriented because before one application process can start sending data to another, the two must first “shake hands”.
A TCP connection provides a full-duplex (i.e. data flows in both directions) and point-to-point (i.e. between a single source and a single destination - no multicast!) service.
- The TCP session opening is done via three sequences
💡 The isn or Initial Sequence Number was intended to avoid overlapping connections. 2 connections can use the same port numbers AND close sequence numbers. Today this sequence is “completely” random. It is coded on 4 bytes.
TCP sees data as an unstructured but ordered stream of bytes. TCP’s use of sequence numbers reflects this view in that sequence numbers refer to the stream of bytes transmitted, not the series of segments transmitted. The sequence number of a segment is therefore the byte stream number of the first byte of the segment.
⚠️ The n°seq is the n° of the first byte of the sent segment !
⚠️ The n°ack is the n° of the next expected byte !
The maximum size of the data field is set by the MSS (Maximum Segment Size).
💭 The MSS is like the MTU, but used with TCP at Layer 4. In other words, the MSS is the maximum size the payload can be, after subtracting space for IP, TCP, and other headers. So if the MTU is 1500 bytes and the IP and TCP headers are 20 bytes each, the MSS is 1460 bytes.
⚠️ Flow control ≠ congestion control
💡 Flow control essentially means that TCP ensures that a sender does not overwhelm a receiver by sending packets faster than it can consume them. It concerns the end node (available space in the buffer).
💡 Congestion control aims to prevent a node from overwhelming the network (i.e. the links between two nodes).
Flow control
Flow control consists of making sure that no more packets are sent when the receive buffer is already full, because the receiver would not be able to handle them and would have to drop them.
To control the amount of data TCP can send, the receiver announces its Receive Window (aka rwnd
), which is the amount of free space in the receive buffer.
Each time TCP receives a packet, it must send an ack message to the sender, acknowledging receipt of that packet correctly, and with that ack message, it sends the value of the current receive window, so that the sender knows if it can continue sending data.
💡 **TCP uses a sliding window protocol to control the number of outstanding bytes it can have. ** In other words, the number of bytes that have been sent but not yet acknowledged.
Congestion control
Packet retransmission treats a symptom of network congestion (the loss of a specific segment of the transport layer) but does not treat the cause of network congestion, which is too many sources attempting to send data at too high a rate. To address the cause of network congestion, mechanisms are needed to limit senders in the face of network congestion.
Various congestion scenarios are possible:
-
Two connections that share a single hop (router) with infinite buffers 💡 One of the costs of a congested network: large queuing delays occur when the packet arrival rate approaches the link capacity.
-
Two hosts (with retransmissions) and a router with finite buffers 💡 Another cost of a congested network is that the sender must perform retransmissions to make up for packets dropped (lost) due to buffer overflow.
-
Four transmitters, routers with finite buffers, and multi-hop paths. 💡 Another cost associated with dropping a packet due to congestion: when a packet is dropped along a path, the transmission capacity that was used on each of the upstream links to get that packet to the point where it is dropped ends up being wasted.
💡 The standardized version of TCP (in RFC 5681) uses end-to-end congestion control rather than network-assisted congestion control, since the IP layer provides no explicit feedback to end systems about network congestion (buffer full = packet discarded).
🆕 In the case of ECN (Explicit Congestion Notification, congestion control is assisted by the IP layer in combination with active queue management (AQM) by allowing the forwarding element (e.g. router) to notify the onset of congestion but it remains regulated by the transport layer.
More recently, extensions to IP and TCP (RFC 3168 have been proposed, implemented and deployed to allow the network to explicitly report congestion to a TCP sender and receiver. In addition, a number of variants of TCP congestion control protocols have been proposed to infer congestion from measured packet delay.
Explicit Congestion Notification (RFC 3168) is the form of network-assisted congestion control performed within the Internet. At the network layer, two bits (with four possible values, in total) in the Type of Service field of the IP datagram header are used for ECN.
An ECN bit setting is used by a router to indicate that it (the router) is experiencing congestion.
MPTCP
Extension of TCP to allow terminals to establish communication and be able to exploit the resources of multiple paths through TCP sub-sessions.
👉 Since TCP basically doesn’t care about the order of arrival of the segments, we can make them arrive from any source!
💡 Other deployments use TCP Multipath to aggregate bandwidth from different networks. For example, several types of smartphones, particularly in Korea, use TCP Multipath to connect WiFi and 4G through SOCKS proxies.