Thursday, March 29, 2012

TCP Options - Section 6


The TCP Options (MSS, Window Scaling, Selective Acknowledgements, Timestamps, Nop) are located at the end of the TCP Header which is also why they are covered last. Thanks to the TCP Options field we have been able to enhance the TCP protocol by introducing new features or 'addons' as some people like to call them, defined by their respective RFC's.

As data communication continues to become more complex and less tolerable to errors and latency, it was clear that these new features had to be incorporated to the TCP transport to help overcome the problems created by the new links and speeds available.

To give you an example, Window Scaling, mentioned in the previous pages and elaborated here, is possible using the TCP Options field because the original Window field is only 16 bits long, allowing a maximum decimal number of 65,535. Clearly this is far too small when we want to express 'Window size' values using numbers in the range of thousands to a million e.g 400,000 or 950,000.

Before we delve into any details, let's take a look at the TCP Options field:


As you can see, the TCP Options field is the sixth section of the TCP Header analysis.

Located at the end of the header and right before the Data section, it allows us to make use of the new enhancements recommended by the engineers who help design the protocols we use in data communications today.

TCP Options

Most of the TCP Options we will be analysing are required to appear only during the initial SYN and SYN/ACK phase of the 3-way-handshake TCP performs to establish a virtual link before transferring any data. Other options, however, can be used at will, during the TCP session.

It is also important to note that the TCP Options may occupy space at the end of the TCP header and are a multiple of 8 bits in length. This means that if we use one TCP Option that is 4 bits in length, there must be another 4 bits of padding in order to comply with the TCP RFC. So the TCP Options length MUST be in multiples of 8 bits, that is 8, 16, 24, 32 e.t.c

Here's a brief view of the TCP Options we are going to analyse:

  • Maximum Segment Size (MSS)
  • Window Scaling
  • Selective Acknowledgements (SACK)
  • Timestamps
  • Nop

Let's now take a look at the exciting options available and explain the purpose of each one.

Maximum Segment Size (MSS)

The Maximum Segment Size is used to define the maximum segment that will be used during a connection between two hosts. As such, you should only see this option used during the SYN and SYN/ACK phase of the 3-way-handshake. The MSS TCP Option occupies 4 bytes (32 bits) of length.

If you have previously come across the term "MTU" which stands for Maximum Transfer Unit, you will be pleased to know that the MSS helps define the MTU used on the network.

If your scratching your head because the MSS and MTU field doesn't make any sense to you, or it is not quite clear, don't worry, the following diagram will help you get the big picture:


You can see the Maximum Segment Size consists of the Data segment, while the Maximum Transfer Unit includes the TCP Header, MSS and the IP Header.

It would also benefit us to recognise the correct terminology that corresponds to each level of the OSI Model: The TCP Header and Data is called a Segment (Layer 4), while the IP Header and the Segment is called an IP Datagram (Layer 3).

Furthermore, regardless of the size the MTU will have, there is an additional 18 bytes overhead placed by the Datalink layer. This overhead includes the Source and Destination MAC Address, the Protocol type, followed by the Frame Check Sequence placed at the end of the frame.

This is also the reason why we can only have a maximum MTU of 1500 bytes. Since the maximum size of an Ethernet II frame is 1518 bytes, subtracting 18 bytes (Datalink overhead) leaves us with 1500 bytes to play with.

TCP usually computes the Maximum Segment Size (MSS) that results in IP Datagrams that match the network MTU. In practice, this means the MSS will have such a value that if we add the IP Header as well, the IP Datagram (IP Header+TCP Header+DATA) would be equal to the network MTU.

If the MSS option is omitted by one or both ends of the connection, then the value of 536 bytes will be used. The MSS value of 536 bytes is defined by RFC 1122 and is calculated by taking the default value of an IP Datagram, 576 bytes, minus the standard length of the IP and TCP Header (40 bytes), which gives us 536 bytes.

In general, it is very important to use the best possible MSS value for your network because your network performance could be extremely poor if this value is too large or too small. To help you understand why, lets look at a simple example:

If you wanted to transfer 1 byte of data through the network, you would need to create a datagram with 40 bytes of overhead, 20 for the IP Header and 20 for the TCP Header. This means that your using 1/41 of your available network bandwidth for data. The rest is nothing but overhead!

On the other hand, if the MSS is very large, your IP Datagrams will also be very large, meaning that they will most probably fail to fit into one packet should the MTU be too small. Therefore they will require to be fragmented, increasing the overhead by a factor of 2.

Window Scaling

We briefly mentioned Window Scaling in the previous section of the TCP analysis, though you will soon discover that this topic is quite broad and requires a great deal of attention.

After gaining a sound understanding of what the Window size flag is used for, Window Scaling is, in essence, an extention to the Window size flag. Because the largest possible value in the Window size flag is only 65,535 bytes (64 kb), it was clear that a larger field was required in order to increase the value to a whopping 1 Gig! Thus, Window Scaling was born.

The Window Scaling option can be a maximum of 30 bits in size, which includes the original 16 bit Window size field covered in the previous section. So that's 16 (original window field) + 14 (TCP Options 'Window Scaling') = 30 bits in total.

If you're wondering where on earth would someone use such an extremely large Window size, think again. Window Scaling was created for high-latency, high-bandwidth WAN links where a limited Window size can cause severe performance problems.

To consolidate all these technological terms and numbers, an example would prove to be beneficial:


The above example assumes we are using the maximum Window size of 64 kbs and because the WAN link has very high latency, the packets take some time to arrive to their destination, that is, Host B. Due to the high latency, Host A has stopped transmitting data since there are 64 kbs of data sent and they have not yet been acknowledged.

When the Time = 4, Host B has received the data and sends the long awaited acknowledgement to Host A so it can continue to send data, but the acknowledgement will not arrive until somewhere around Time = 6.

So, from Time = 1 up until Time = 6, Host A is sitting and waiting. You can imagine how poor the performance of this transfer would be in this situation. If we were to transfer a 10 Mb file, it would take hours!

Let's now consider the same example, using Window Scaling:


As you can see, with the use of Window Scaling, the window size has increased to256 kb! Since the value is quite large, which translates to more data during transit, Host B has already received the first few packets, while Host A is still sending the first 256 kb window.

On Time = 2, Host B sends an Acknowledgement to Host A, which is still busy sending data. Host A will receive the Acknowledgement before it finishes the 256 kb window and will therefore continue sending data without pause since it will soon receive another Acknowledgement from Host B.

Clearly the difference that a large window size has made is evident, increasing the network performance and minimising the ideal time for the sending host.

The Window Scale option is defined in RFC 1072, which lets a system advertise 30-bit (16 from the original window + 14 from the TCP Options) Window size values, with a maximum buffer size of 1 GB. This option has been clarified and redefined in RFC 1323, which is the specification that all implementations employ today.

Lastly, for those who deal with Cisco routers, it may benefit you to know that you are also able to configure the Window size on Cisco routers running the Cisco IOS v9 and greater. Also, routers with versions 12.2(8)T and above support Window Scaling, which is automatically enabled for Window sizes above 65,535 bytes (64 kb), with a maximum value of 1,073,741,823 bytes (1 GByte)!

Selective Acknowledgments (SACK)

TCP has been designed to be a fairly robust protocol though, despite this, it still has several disadvantages, one of which concerns Acknowledgements, which also happens to be the reason Selective Acknowledgement were introduced with RFC 1072.

The problem with the good old plain Acknowledgements is that there are no mechanisms for a receiver to state "I'm still waiting for bytes 20 through 25, but have received bytes 30 through 35". And if your wondering whether this is possible, then the answer is 'yes' it is!

If segments arrive out of order and there is a hole in the receiver's queue, then using the 'classical' Acknowledgements supported by TCP, can only say "I've received everything up to byte 20". The sender then needs to recognise that something has gone wrong and continue sending from that point onwards (byte 20).

As you may have concluded, the above situation is totally unacceptable, so a more robust service had to be created, hence Selective Acknowledgments!

Firstly, when a virtual connection is established using the classic 3-way-handshake the hosts must send a "Selective Acknowledgments Permitted" in the TCP Options to indicate that they are able to use SACK's. From this point onwards, the SACK option is sent whenever a selective acknowledgment is required.

For example, if we have a Windows98 client that is waiting for byte 4,268, but the SACK option shows that the Windows98 client has also received bytes 7,080 through 8,486, it is obvious that it is missing bytes 4,268 through 7,079, so the server should only resend the missing 2,810 bytes, rather than restarting the entire transfer at byte number 4,268.

Lastly, we should note that the SACK field in the TCP Options uses two 16 bit fields, a total of 32 bits together. The reason there are two fields is because the receiver must be able to specify the range of bytes it has received, just like the example we used. In the case where Window Scaling is also used, these 2 x 16 bit fields can be expanded to two 24 or 32 bit fields.


Another aspect of TCP's flow-control and reliability services is the round-trip delivery times that a virtual circuit is experiencing. The round-trip delivery time will accurately determine how long TCP will wait before attempting to retransmit a segment that has not been acknowledged.

Because every network has unique latency characteristics, TCP has to understand these characteristics in order to set accurate acknowledgment timer threshold values. LANs typically have very low latency times, and as such TCP can use low values for the acknowledgment timers. If a segment is not acknowledged quickly, a sender can retransmit the questionable data quickly, thus minimizing any lost bandwidth and delay.

On the other hand, using a low threshold value on a WAN is sure to cause problems simply because the acknowledgment timers will expire before the data ever reaches the destination.

Therefore, in order for TCP to accurately set the timer threshold value for a virtual circuit, it has to measure the round-trip delivery times for various segments. Finally, it has to monitor additional segments throughout the connection's lifetime to keep up with the changes in the network. This is where the Timestamp option comes into the picture.

Similarly to the majority of the other TCP Options covered here, the Timestamp option must be sent during the 3-way-handshake in order to enable its use during any subsequent segments.

The Timestamp field consists of a Timestamp Echo and Timestamp Reply field, both of which the reply field is always set to zero by the sender and completed by the receiver after which it is sent back to the original sender. Both timestamp fields are 4 bytes long!


The nop TCP Option means "No Option" and is used to separate the different options used within the TCP Option field. The implementation of the nop field depends on the operating system used. For example, if options MSS and SACK are used, Windows XP will usually place two nop's between them, as was indicated in the first picture on this page.

Lastly, we should note that the nop option occupies 1 byte. In our example at the beggining of the page, it would occupy 2 bytes since it's used twice. You should also be aware that this field is usually checked by hackers when trying to determine the remote host's operating system.


This page provided all the available TCP Options that have been introduced to the TCP protocol in its efforts to extend its reliability and performance. While these options are critical in some cases, most users are totally unaware of their existence, especially network administrators. The information provided here is essential to help administrators deal with odd local and wan network problems that can't be solved by rebooting a server or router :)

The final page to this topic is a summary covering the previous six pages of TCP, as there is little to analyse in the data section of the TCP Segment. It is highly suggested you read it as a recap to help you remember the material covered.

No comments:

Post a Comment