Tuesday, August 17, 2010

Identifying Slow Server Response at Packet Level (by Chris Greer)

Why is tracking down a server performance problem so difficult? First, it can be hard to dig through thousands of packets to find a solid example of a slow response. Once a slow response is isolated, identifying the root cause can also present a challenge. In this tip, we will show how to isolate a slow response from a server, filter on it, and determine if the root cause is the network or the server itself.

• Start at the client end

Often, when first analyzing a slow application, it is easiest to start at the client end. Although the problem may not be fully understood until a capture is taken at the server end, the trace file will be much simpler and easier to read when only one client experience is captured. Make sure that while capturing, the user is able to reproduce the performance problem.

• Look for client connecting to server

Look through the trace file to find where the client initiates a DNS query for the slow application server. It may be that they already have this server in their DNS cache, in which case the client may simply send a TCP SYN directly to the application server. If DNS is used, make sure that the DNS response time is low using the time column your packet analyzer.

-Note: When measuring application response, be sure to use a delta timer that shows the amount of time between packets. This can be accessed in Wireshark from the View drop-down menu.


F1

If the DNS response time is quick (it should not be longer than 150ms or so), the client will next send a connection request to the application server. This will be a TCP SYN packet, the first in the TCP three-way handshake. Use a TCP Stream filter to isolate this connection (right click on any packet in the TCP connection, select TCP Stream Filter). The goal in isolating this connection is to compare the network roundtrip time to the server response time.

Once this connection is isolated, look at the delta time between the TCP SYN sent by the client and the TCP SYN-ACK sent back from the server. This can be used as a benchmark connection setup time. In the picture below, the response from the server is displayed in packet 7. It took 134msec to hear back from the server.


F2


• Measure application response time – compare to connection setup time

Next, after the TCP connection has been established, the client will request data from the server. In the web-based application above, the client performs an HTTP GET. Use the delta time column to see how long it takes the server to respond to this request. In our example above, the server responds after 125msec with a TCP ACK. This indicates that the server received the request, but has not yet responded with actual data. After waiting 4.85 SECONDS, the server finally sends a packet with application data. After this, packets are flying by at wire-speed. Comparing 4.85 seconds to the connection setup time, 134msec, we see that the server response time is very slow.


• Server, client or network delay?

From this information, it is simple to determine where to troubleshoot next. If the server response time is significantly higher than the connection setup time, and there are no TCP retransmissions, the problem is on the server end. In the case above, the server responded to the request with an ACK, showing it received the request and was busy processing it. The network is not to blame for this delay.
If any retransmissions are observed, the network is dropping packets somewhere. The server may not be to blame for slow performance, especially if it isn’t getting requests in the first place.


• If no delay is observed in this transaction ...

Move to the next request, keeping an eye on the amount of time it is taking for the server to respond to requests. Always use the connection setup time as a benchmark network roundtrip timer. This may take some time to do packet by packet, but since the capture was taken client-end, this is an excellent way to get familiar with the application behavior and look for patterns in client requests.

Once you get a good feel for the requests involved in this application, the analyzer can be moved to the server end – this way you can look for packets that are being sent during the slow requests. In the example above, we would be interested in what the server is busy doing during the 4.85 seconds of delay. Is a downstream server being called? Is a DNS request timing out?

Getting started in analyzing a slow application is sometimes the hardest step. By starting at the client, reproducing the problem, carefully watching TCP connection setup time, and comparing this with server response time, you can narrow down on which requests are slow and identify the root cause. Even if the problem cannot be determined at the client end, you will have an idea on what the next step in troubleshooting will be, whether to focus on the network or server.


No comments:

Post a Comment