|
The Transmission
Control Protocol (TCP) is one of the core protocols
of the Internet protocol suite. Using TCP, applications
on networked hosts can create connections to one
another, over which they can exchange data. The protocol
guarantees reliable and in-order delivery of sender to
receiver data. TCP also distinguishes data for multiple,
concurrent applications (e.g. Web server and email
server) running on the same host.
TCP supports many of the Internet's most popular
application protocols and resulting applications,
including the World Wide Web, email and Secure Shell.
Technical overview
Transmission Control Protocol (TCP) is a
connection-oriented, reliable-delivery byte-stream
transport layer communication protocol, currently
documented in IETF RFC 793.
In the Internet protocol suite, TCP is the intermediate
layer between the Internet Protocol below it, and an
application above it. Applications often need reliable
pipe-like connections to each other, whereas the
Internet Protocol does not provide such streams, but
rather only unreliable packets. TCP does the task of the
transport layer in the simplified OSI model of computer
networks.
Applications send
streams of 8-bit bytes to TCP for delivery through the
network, and TCP divides the byte stream into
appropriately sized segments (usually delineated by the
maximum transmission unit (MTU) size of the data link
layer of the network the computer is attached to). TCP
then passes the resulting packets to the Internet
Protocol, for delivery through an internet to the TCP
module of the entity at the other end. TCP checks to
make sure that no packets are lost by giving each packet
a sequence number, which is also used to make sure that
the data are delivered to the entity at the other end in
the correct order. The TCP module at the far end sends
back an acknowledgement for packets which have been
successfully received; a timer at the sending TCP will
cause a timeout if an acknowledgement is not received
within a reasonable round-trip time (or RTT), and the
(presumably lost) data will then be re-transmitted. The
TCP checks that no bytes are damaged by using a
checksum; one is computed at the sender for each block
of data before it is sent, and checked at the receiver.
Protocol operation
in detail
TCP connections contain three phases: connection
establishment, data transfer and connection termination.
A 3-way handshake is used to establish a connection. A
four-way handshake is used to disconnect. During
connection establishment, parameters such as sequence
numbers are initialized to help ensure ordered delivery
and robustness.
Connection establishment (3-way handshake)
While it is possible for a pair of end hosts to initiate
connection between themselves simultaneously, typically
one end opens a socket and listens passively for a
connection from the other. This is commonly referred to
as a passive open, and it designates the server-side of
a connection. The client-side of a connection initiates
an active open by sending an initial SYN segment to the
server as part of the 3-way handshake. The server-side
should respond to a valid SYN request with a SYN/ACK.
Finally, the client-side should respond to the server
with an ACK, completing the 3-way handshake and
connection establishment phase.
Port states
A connection progresses through a series of states:
LISTEN, SYN-SENT, SYN-RECEIVED, ESTABLISHED, FIN-WAIT-1,
FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, TIME-WAIT,
and CLOSED.
LISTEN: represents waiting for a connection
request from any remote TCP and port.
TIME-WAIT: represents waiting for enough time to
pass to be sure the remote TCP received the
acknowledgment of its connection termination request.
According to RFC 793 a connection can stay in TIME_WAIT
for a maximum of four minutes.
Data transfer
During the data transfer phase, a number of key
mechanisms determine TCP's reliability and robustness.
These include using sequence numbers for ordering
received TCP segments and detecting duplicate data,
checksums for segment error detection, and
acknowledgements and timers for detecting and adjusting
to loss or delay.
During the TCP connection establishment phase, initial
sequence numbers (ISNs) are exchanged between the two
TCP speakers. These sequence numbers are used to
identify data in the byte stream, and are numbers that
identify (and count) application data bytes. There are
always a pair of sequence numbers included in every TCP
segment, which are referred to as the sequence number
and the acknowledgement number. A TCP sender refers to
its own sequence number simply as the sequence number,
while the TCP sender refers to receiver's sequence
number as the acknowledgement number. To maintain
reliability, a receiver acknowledges TCP segment data by
indicating it has received up to some location of
contiguous bytes in the stream. An enhancement to TCP,
called selective acknowledgement (SACK), allows a TCP
receiver to acknowledge out of order blocks.
Through the use of sequence and acknowledgement numbers,
TCP can properly deliver received segments in the
correct byte stream order to a receiving application.
Sequence numbers are 32-bit, unsigned numbers, which
wrap to zero on the next byte in the stream after 232-1.
One key to maintaining robustness and security for TCP
connections is in the selection of the ISN.
A 16-bit checksum, consisting of the one's complement of
the one's complement sum of the contents of the TCP
segment header and data, is computed by a sender, and
included in a segment transmission. (The one's
complement sum is used because the end-around carry of
that method means that it can be computed in any
multiple of that length - 16-bit, 32-bit, 64-bit, etc -
and the result, once folded, will be the same.) The TCP
receiver recomputes the checksum on the received TCP
header and data. The complement was used (above) so that
the receiver does not have to zero the checksum field,
after saving the checksum value elsewhere; instead, the
receiver simply computes the one's complement sum with
the checksum in situ, and the result should be -0. If
so, the segment is assumed to have arrived intact and
without error.
Note that the TCP checksum also covers a 96 bit pseudo
header containing the Source Address, the Destination
Address, the Protocol, and TCP length. This provides
protection against misrouted segments.
The TCP checksum is a quite weak check by modern
standards. Data Link Layers with a high probability of
bit error rates may require additional link error
correction/detection capabilities. If TCP were to be
redesigned today, it would most probably have a 32-bit
cyclic redundancy check specified as an error check
instead of the current checksum. The weak checksum is
partially compensated for by the common use of a CRC or
better integrity check at layer 2, below both TCP and
IP, such as is used in PPP or the Ethernet frame.
However, this does not mean that the 16-bit TCP checksum
is redundant: remarkably, surveys of Internet traffic
have shown that software and hardware errors that
introduce errors in packets between CRC-protected hops
are common, and that the end-to-end 16-bit TCP checksum
catches most of these simple errors. This is the
end-to-end principle at work.
Acknowledgements for data sent, or lack of
acknowledgements, are used by senders to implicitly
interpret network conditions between the TCP sender and
receiver. Coupled with timers, TCP senders and receivers
can alter the behavior of the flow of data. This is more
generally referred to as flow control, congestion
control and/or network congestion avoidance. TCP uses a
number of mechanisms to achieve high performance and
avoid congesting the network (i.e. send data faster than
either the network, or the host on the other end, can
utilize it). These mechanisms include the use of a
sliding window, the slow-start algorithm, the congestion
avoidance algorithm, the fast retransmit and fast
recovery algorithms, and more. Enhancing TCP to reliably
handle loss, minimize errors, manage congestion and go
fast in very high-speed environments are ongoing areas
of research and standards development.
TCP window size
TCP sequence numbers and windows behave very much like a
clock. The window, whose width (in bytes) is defined by
the recieving host, shifts each time it receives and
acks a segment of data. Once it runs out of sequence
numbers, it loops back to 0.The TCP receive window size
is the amount of received data (in bytes) that can be
buffered during a connection. The sending host can send
only that amount of data before it must wait for an
acknowledgment and window update from the receiving
host. The Windows TCP/IP stack is designed to self-tune
itself in most environments, and uses larger default
window sizes than earlier versions.
Window scaling
For more efficient use of high bandwidth networks, a
larger TCP window size may be used. The TCP window size
field controls the flow of data and is limited to 2
bytes, or a window size of 65,535 bytes.
Since the size field cannot be expanded, a scaling
factor is used. TCP window scale, as defined in RFC
1323, is an option used to increase the maximum window
size from 65,535 bytes to 1 Gigabyte. Scaling up to
larger window sizes is a part of what is necessary for
TCP Tuning.
The window scale option is used only during the TCP
3-way handshake. The window scale value represents the
number of bits to left-shift the 16-bit window size
field. The window scale value can be set from 0 (no
shift) to 14.
Connection
termination
The connection termination phase uses a four-way
handshake, with each side of the connection terminating
independently. When an endpoint wishes to stop its half
of the connection, it transmits a FIN packet, which the
other end acknowledges with an ACK. Therefore, a typical
teardown requires a pair of FIN and ACK segments from
each TCP endpoint.
A connection can be "half-open", in which case one side
has terminated its end, but the other has not. The side
which has terminated can no longer send any data into
the connection, but the other side can.
TCP ports
TCP uses the notion of port
numbers to identify sending and receiving applications.
Each side of a TCP connection has an associated 16-bit
unsigned port number assigned to the sending or
receiving application. Ports are categorized into three
basic categories: well known, registered and
dynamic/private. The well known ports are assigned by
the Internet Assigned Numbers Authority (IANA) and are
typically used by system-level or root processes. Well
known applications running as servers and passively
listening for connections typically use these ports.
Some examples include: FTP (21), TELNET (23), SMTP (25)
and HTTP (80). Registered ports are typically used by
end user applications as ephemeral source ports when
contacting servers, but they can also identify named
services that have been registered by a third party.
Dynamic/private ports can also be used by end user
applications, but are less commonly so. Dynamic/private
ports do not contain any meaning outside of any
particular TCP connection. There are 65535 possible
ports officially recognized.
TCP
development
TCP is both a complex and evolving protocol. However,
while significant enhancements have been made and
proposed over the years, its most basic operation has
not changed significantly since RFC 793, published in
1981. RFC 1122, Host Requirements for Internet Hosts,
clarified a number of TCP protocol implementation
requirements. RFC 2581, TCP Congestion Control, one of
the most important TCP related RFCs in recent years,
describes updated algorithms to be used in order to
avoid undue congestion. In 2001, RFC 3168 was written to
describe explicit congestion notification (ECN), a
congestion avoidance signalling mechanism. In the early
21st century, TCP is typically used in approximately 95%
of all Internet packets. Common applications that use
TCP include HTTP/HTTPS (World Wide Web), SMTP/POP3/IMAP
(e-mail) and FTP (file transfer). Its widespread use is
testimony to the original designers that their creation
was exceptionally well done.
The original TCP congestion control was called TCP Reno,
but recently, several alternative congestion control
algorithms have been proposed:
High Speed TCP proposed by Sally Floyd in RFC 3649
TCP Vegas by Brakmo and Peterson at University of
Arizona
TCP Westwood by UCLA
BIC TCP by Injong Rhee at North Carolina State
University
H-TCP] by Hamilton Institute
Fast TCP (Fast Active queue management Scalable
Transmission Control Protocol) by Caltech.
There have been many studies comparing the fairness and
TCP performance of the different algorithms.
TCP Over Wireless
TCP was optimized for wired networks. Any packet loss is
considered as a congestion and hence window size is
reduced dramatically as a precaution. However wireless
links are known to experience sporadic and usually
temporary losses due to fading, shadowing, handoff etc.
which cannot be considered as congestion. Erroneous
back-off the window size due to wireless packet loss is
followed by a congestion avoidance phase with a
conservative increase which causes the radio link to be
underutilized. Note that radio resources are extremely
valuable in wireless communications. Extensive research
has been done over this subject to combat these harmful
effects. Suggested solutions can be categorized as
end-to-end solutions (which require modifications at the
client and/or server), link layer solutions (such as RLP
in CDMA2000), or proxy based solutions (which require
some changes in the network without modifying end
nodes).
Hardware TCP
Implementations
TCP Offload Engines
One way to overcome the processing power requirements of
TCP is building hardware implementations of it, widely
known as TCP Offload Engine (TOE). The main problem of
TOEs is that they are hard to integrate into computing
systems, requiring extensive changes in the Operating
System of the computer or device. The first company to
develop such a device was Alacritech.
Alternatives to TCP
TCP is not appropriate for many applications. The big
problem (at least with normal implementations) is that
the application cannot get at the packets coming after a
lost packet until the retransmitted copy of the lost
packet is received. This causes problems for real-time
applications such as streaming multimedia (such as
Internet radio), real-time multiplayer games and voice
over IP (VoIP) where it is sometimes more useful to get
most of the data in a timely fashion than it is to get
all of the data in order.
Also for embedded systems the complexity of TCP can be a
problem. The best known example of this is netbooting
which generally uses TFTP. Finally some tricks such as
transmitting data between two hosts that are both behind
NAT (using STUN or similar systems) is far simpler if
you don't have a complex protocol like TCP in your way.
Generally where TCP is unsuitable the User Datagram
Protocol (UDP) is used, This provides the application
multiplexing and checksums that TCP does but does not
handle building streams or retransmission giving the
application developer the ability to code those in a way
suitable for the situation and/or to replace them with
other methods like forward error correction or
interpolation.
SCTP is another IP protocol which provides reliable
stream oriented services not so dissimilar from TCP. It
is newer and considerably more complex than TCP so has
not yet seen widespread deployment, however it is
especially designed to be used in situations where
reliability and near-realtime considerations are
important. |