Wednesday, March 29, 2017

Is a device introducing significant jitter to transiting packets?

Business VoIP as well as video traffic is a very common traffic type in corporate networks. When a network infrastructure needs to support such delay-sensitive VoIP or video traffic, care must be taken to ensure any network devices sitting inline do not introduce large variable delays to the transiting packets. Any significant delay variation, more commonly known as jitter, may reduce the quality of the affected streams if the receiver jitter buffer is unable to sufficiently compensate. Other than the typical core networking devices, there are many different devices that may be inline to manipulate network traffic in real-time, such as WAN accelerators, intrusion prevention systems, next-gen firewalls etc. If there is a decrease in VoIP call quality as reported by your monitoring tool for example, do you have an option to rule out certain devices or network segments as a possible contributor?

One possible option is to perform a long ping test from certain endpoints to traverse the same path as the VoIP packets, while taking packet captures from before and after a network device of concern. Here is a simple illustration:

By comparing and computing the packet switching delay, one could work out the switching delay. However, this will quickly become very tedious and computing jitter will compound the challenge.

How could we do better?

Wireshark and its family of tools might come in handy to try to automate certain aspects. For example, to read a packet capture file and display only relevant fields of interest, this could be readily accomplished with tshark. To track an ICMP echo request/reply packet through a device, we need to identify what suitable protocol fields to use to uniquely identify them between packet captures. The followings show populated ICMP echo request and reply fields, respectively, for completeness:

Internet Control Message Protocol
    Type: 8 (Echo (ping) request)
    Code: 0
    Checksum: 0xb51c
    Identifier (BE): 110 (0x006e)
    Sequence number (BE): 38865 (0x97d1)
    Data (32 bytes)

Internet Control Message Protocol
    Type: 0 (Echo (ping) reply)
    Code: 0
    Checksum: 0xbd1c
    Identifier (BE): 110 (0x006e)
    Sequence number (BE): 38865 (0x97d1)
    Data (32 bytes)


To uniquely identify an ICMP echo packet, we could use the highlighted ICMP sequence number along with its source and destination IP addresses. The following tshark (version 1.6.2) options will display only these fields along with the packet arrival time:

$ tshark -r ingress.cap -R icmp -T fields -e frame.time -e ip.src -e ip.dst -e icmp.seq
Mar  5, 2017 19:34:11.033332000    172.25.7.31    172.18.4.2    1066
Mar  5, 2017 19:34:11.033429000    172.18.4.2    172.25.7.31    1066
Mar  5, 2017 19:34:12.033172000    172.25.7.31    172.18.4.2    1067
Mar  5, 2017 19:34:12.033267000    172.18.4.2    172.25.7.31    1067
...

$ tshark -r egress.cap -R icmp -T fields -e frame.time -e ip.src -e ip.dst -e icmp.seq
Mar  5, 2017 19:34:11.033354000    172.25.7.31    172.18.4.2    1066
Mar  5, 2017 19:34:11.033425000    172.18.4.2    172.25.7.31    1066
Mar  5, 2017 19:34:12.033186000    172.25.7.31    172.18.4.2    1067
Mar  5, 2017 19:34:12.033261000    172.18.4.2    172.25.7.31    1067
...

The availability of such an output would make it easier to compute the switching delta, however this would still be manual. What if we could code this in a suitable programming language. For our illustration, we will use Python v2.6+.

From within Python, we could invoke tshark using the call function of the subprocess module. Here is a code extract:

import sys
from subprocess import call

file1, file2 = sys.argv[1:]
call("tshark -r "+ file1 + " -R icmp -T fields -e frame.time -e ip.src -e ip.dst -e icmp.seq | awk '{print $4,$5,$6,$7}' > tmp-1.txt", shell=True)
call("tshark -r "+ file2 + " -R icmp -T fields -e frame.time -e ip.src -e ip.dst -e icmp.seq | awk '{print $4,$5,$6,$7}' > tmp-2.txt", shell=True)

Variables file1 and file2 are packet capture files of before (on interface_1 or iface_1) and after (on iface_2) the device in question, respectively, and are populated with values from the command-line arguments. The output of tshark is filtered with awk to include only the time portion of the date/time display along with the remaining fields, and written to a separate tmp file for each capture file.

It is possible that the capture files may be started or stopped at a slightly different time introducing inconsistent number of packets. To cater for such inconsistency, we could compare the tmp files and include only matching lines, where the match will be based on IP addresses and the sequence number. The following code extract shows how awk is invoked using the call function again with an array indexed by a concatenation of IP addresses and the ICMP echo sequence number, and written to another set of tmp files when matching lines are found:

call("awk 'NR==FNR {packets[$2$3$4]++;next}; packets[$2$3$4] > 0' tmp-1.txt tmp-2.txt > tmp-2-m.txt", shell=True)
call("awk 'NR==FNR {packets[$2$3$4]++;next}; packets[$2$3$4] > 0' tmp-2.txt tmp-1.txt > tmp-1-m.txt", shell=True)

We should now have consistent packet details on the resulting files to perform computation. In order to compute per-packet delay, we need to maintain a number of variables. The followings show a number of variables for packets firstly seen on iface_1 and traverses in the iface_1 to iface_2 direction:

    lpkt = 0           # number of packets seen
    ltotaldelay = 0.0  # cumulative switching delay
    lpktarrival = 0    # previous packet switching delay
    ljittertotal = 0.0 # cumulative jitter

A similarly-named set of variables starting with 'w' will be used for packets seen on iface_2 first (not shown here). Let us open both files to read and iterate through. We start with the first packet line from tmp-1-m.txt and look for a match in the second tmp file, tmp-2-m.txt. Each line from either file is split and stored as a list. The fourth item on the list represents the sequence number. These sequence numbers are then compared. The following code extract illustrates this logic:

file-iface1 = open("tmp-1-m.txt","r")
file-iface2 = open("tmp-2-m.txt","r")

while file-iface1:
    line = file-iface1.readline().split()
    timestamp = datetime.strptime(line[0],'%H:%M:%S.%f000');
    seq = line[3]

    while file-iface2:
        line2 = file-iface2.readline().split()
        seq2 = line2[3]
        if seq2 == seq:

When a match is found, we check to see which packet has a smaller timestamp. If the packet in file-iface1 has a smaller timestamp, it would imply this packet traversed from iface_1 to iface_2 on this network device. Otherwise, the packet traversed in the reverse direction. In both cases, the switching delta could be readily computed and relevant packet counter and total delay is updated as shown below:

            timestamp2 = datetime.strptime(line2[0],'%H:%M:%S.%f000')
            if timestamp < timestamp2:
                delta = timestamp2 - timestamp
                lpkt += 1
                ltotaldelay += delta.microseconds
                if lpktarrival > 0:
                    ljittertotal += abs(delta.microseconds - lpktarrival)
                    lpktarrival = delta.microseconds
                else:
                    lpktarrival = delta.microseconds
            else:
                delta = timestamp - timestamp2
                wpkt += 1
                wtotaldelay += delta.microseconds
                if wpktarrival > 0:
                    wjittertotal += abs(delta.microseconds - wpktarrival)
                    wpktarrival = delta.microseconds
                else:
                    wpktarrival = delta.microseconds
            break

The nested if blocks above are used to compute jitter for the second packet and onwards for packets traversing iface_1 to iface_2, and vice-versa, respectively. We compute and increment the cumulative variable with the absolute difference of the previous switching delta with current as well as update the previous delta. Once a matching packet line is found on the second file and computation completed, the flow exits the inner while loop and proceeds with the next line on the first file until all packet lines are processed. As such, the inner while loop would have a O(1) worst-case complexity (in big-O notation), due to the pre-processing done during the packet consistency processing.

At the end of the loop, average switching delay as well as jitter for each direction could be readily computed and displayed as follows:

    print("Average Switching Delay")
    print("="*25)
    print("iface_1 -> iface_2 delay : %.2f microsec" % (ltotaldelay/lpkt))

    print("iface_1 -> iface_2 jitter: %.2f microsec" % (ljittertotal/(lpkt-1)))
    print("iface_2 -> iface_1 delay : %.2f microsec" % (wtotaldelay/wpkt))
    print("iface_2 -> iface_1 jitter: %.2f microsec" % (wjittertotal/(wpkt-1)))

Here is a sample output for a set of packet captures called ingress.cap and egress.cap:

$ time_delta.py ingress.cap egress.cap
Average Switching Delay
=========================
iface_1 -> iface_2 delay : 79.19 microsec
iface_1 -> iface_2 jitter: 137.90 microsec
iface_2 -> iface_1 delay : 4.24 microsec
iface_2 -> iface_1 jitter: 1.07 microsec


To estimate the real execution time of this code, I used two rather large capture files with up to 140,000 packets each. 99.8% of the execution time was spent during the tshark calls.

Now, what if you are looking for more than just the average values, for instance, you want to see switching delay behaviour over time. May be one would also want to see the packet loss performance of a network segment, not just of ICMP packets.

SteelCentral PacketAnalyzer Plus


If you haven't heard about this, I am pleased to introduce SteelCentral Packet Analyzer Plus, which runs on Windows machines. To generate a delay over time chart, you will need to create a multi-segment source using the two before/after capture files, and then apply the MSA-Segment-Delay view against this source using just a couple of mouse-clicks. This may sound a bit complicated, but it is far from it. The result will be presented as charts in a short order by this tool. The following shows a screenshot of delay over time along with the per-direction average chart and a pie-chart illustrating the relative direction delay:



From our Python program, here is the corresponding output for the same capture files:

Average Switching Delay
=========================
iface_1 -> iface_2 delay : 18.45 microsec
iface_1 -> iface_2 jitter: 24.59 microsec
iface_2 -> iface_1 delay : 10.48 microsec
iface_2 -> iface_1 jitter: 6.57 microsec


Using PacketAnalyzer Plus, one could readily perform an analysis to show frame size distribution for the same files as presented below:


Similarly, to determine the packet loss performance for a segment, you can apply a Segment-Dropped-Packets-Overview view to the source. The following is a sample output:



Such analyses can be done rather quickly with PacketAnalyzer Plus even with large packet capture files. The use cases described above for PacketAnalyzer Plus are only a very small fraction of what it is capable of. There are numerous other supported use cases, including VoIP related analysis, relevant to the theme of this blog. If this has sparked any interest, check it out for yourself with a trial download. Otherwise, you have the poor-man's version of delay and jitter analysis tool that could work on Linux or Mac machines with tshark v1.6.2+ and Python v2.6+. 😊

Saturday, July 5, 2014

tshark as a Command Line Troubleshooting tool

Wireshark is one of the must have tool in the arsenal of network professional. It allows one to take a trace on a client host to look deep inside frames to understand an operation or to look for clues of any misbehaving network applications or protocols. One typically uses other tools to take a packet trace on network devices and then analyse it in Wireshark. Another lessor known command line tool that comes with Wireshark is tshark. It has similar features as Wireshark.

For someone who is more comfortable on command line, using tshark along with various Linux tools allows for more advanced filtering. The simplest way to view a trace file with tshark is as below:

$ tshark -r capture.cap
  1 10:12:22.050975 0.000000 172.16.13.253 -> 172.16.1.137 CFLOW 1022 total: 15 (v9) records
  2 10:12:22.051314 0.000339 172.16.99.114 -> 172.16.98.156 TPKT 475 Continuation
  3 10:12:22.055856 0.004542 172.16.98.156 -> 172.16.1.14  TCP 66 55603 > 51513 [ACK] Seq=1827602552 Ack=1175114189 Win=1728 Len=0
  4 10:12:22.055963 0.000107 172.16.99.232 -> 172.16.80.7  SNMP 83 get-request SNMPv2-MIB::sysDescr.0

Each frame in the trace is shown as one line output with summary info of the highest level protocol that tshark is able to decode. The above sample trace appears to be an open capture with various exchanges between different hosts, possibly taken without any specific filter. If a trace file is large, console output will scroll very quickly. Like Wireshark, we can use filters to narrow our focus towards any traffic of interest. For instance, to see just NetFlow packets on port 2055, we can use filter udp.port == 2055 as shown:

$ tshark -r capture.cap -Y "(udp.port == 2055)"
  1 10:12:22.050975 0.000000 172.16.13.253 -> 172.16.1.137 CFLOW 1022 total: 15 (v9) records
 17 10:12:22.077250 0.026275 172.16.20.253 -> 172.16.1.137 CFLOW 1022 total: 15 (v9) records
 21 10:12:22.082645 0.005395 172.16.20.253 -> 172.16.1.137 CFLOW 1022 total: 15 (v9) records

The above output shows that frames 1, 17 and 21 are NetFlow frames. One each line, the third field represents the delta time, and 1022 is the frame size. The details after the frame size represent a summary of NetFlow specific information.

At times, we may be interested to get more than just a single line summary for each frame. To display full decode of each relevant protocol in a trace for a specific traffic of interest, we could use –V. However be warned, this outputs a lot of details. Here is an example against a single frame (removed lines shown with ellipses):

$ tshark -Vnr capture.cap -Y "(frame.number == 1)"
Frame 1: 1022 bytes on wire (8176 bits), 1022 bytes captured (8176 bits)
    Encapsulation type: Ethernet (1)

    Frame Length: 1022 bytes (8176 bits)

    [Protocols in frame: eth:ip:udp:cflow]
Ethernet II, Src: f8:c0:01:8b:7e:81 (f8:c0:01:8b:7e:81), Dst: 00:50:56:82:5c:bf (00:50:56:82:5c:bf)
    Destination: 00:50:56:82:5c:bf (00:50:56:82:5c:bf)

    Type: IP (0x0800)
Internet Protocol Version 4, Src: 172.16.13.253 (172.16.13.253), Dst: 172.16.1.137 (172.16.1.137)
    Version: 4

    Total Length: 1008
    Identification: 0x0000 (0)
    Flags: 0x02 (Don't Fragment)

User Datagram Protocol, Src Port: 12240 (12240), Dst Port: 2055 (2055)
    Source port: 12240 (12240)
    Destination port: 2055 (2055)

Cisco NetFlow/IPFIX
    Version: 9
    Count: 15
    SysUptime: 994443240


The details of all known protocols are decoded. This is usually used when a few relevant frames of interest were identified in the trace. Otherwise, there would be an overload of details.

If you are interested only on a specific protocol, we could display more details just for it using –O <proto>. Here is an example:

$ tshark -O eth -Vnr capture.cap -Y "(frame.number == 1)"
Frame 1: 1022 bytes on wire (8176 bits), 1022 bytes captured (8176 bits)
Ethernet II, Src: f8:c0:01:8b:7e:81 (f8:c0:01:8b:7e:81), Dst: 00:50:56:82:5c:bf (00:50:56:82:5c:bf)
    Destination: 00:50:56:82:5c:bf (00:50:56:82:5c:bf)
        Address: 00:50:56:82:5c:bf (00:50:56:82:5c:bf)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
    Source: f8:c0:01:8b:7e:81 (f8:c0:01:8b:7e:81)
        Address: f8:c0:01:8b:7e:81 (f8:c0:01:8b:7e:81)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
    Type: IP (0x0800)
Internet Protocol Version 4, Src: 172.16.13.253 (172.16.13.253), Dst: 172.16.1.137 (172.16.1.137)
User Datagram Protocol, Src Port: 12240 (12240), Dst Port: 2055 (2055)
Cisco NetFlow/IPFIX


Here, only Ethernet header details were expanded, and all other protocols displayed with a summary line. To display the list of supported protocols, use the following command:

$ tshark -G protocols
For the same frame 1 above, to decode Netflow instead, you could determine which keyword to use by performing a simple filter (i.e. the required keyword is cflow):

$ tshark -G protocols | grep NetFlow
Cisco NetFlow/IPFIX  CFLOW     cflow

There are times when a known protocol is using a non-standard port. The following example shows a trace where NetBIOS traffic is on port 7801, whose packet length is greater than 41 bytes:

$ tshark -nr capture.cap -Y "(tcp.port == 7801 && ip.len > 41)"
 41 21:36:26.608905 0.000000 10.114.22.254 -> 10.114.96.230 TCP 188 61901 > 7801 [PSH, ACK] Seq=3940639060 Ack=3275647333 Win=16425 Len=130
 43 21:36:26.613294 0.004389 10.114.22.254 -> 10.114.96.230 TCP 188 61902 > 7801 [PSH, ACK] Seq=2746223592 Ack=4123396674 Win=16425 Len=130

These are decoded as TCP segments. Since we know it is NetBIOS session traffic, we can explicitly request tshark to decode using -d <port,proto> option as shown here:

$ tshark -d tcp.port==7801,nbss -nr capture.cap -Y "(tcp.port == 7801 && ip.len > 41)"
 41 21:36:26.608905 0.000000 10.114.22.254 -> 10.114.96.230 NBSS 188 NBSS Continuation Message
 43 21:36:26.613294 0.004389 10.114.22.254 -> 10.114.96.230 NBSS 188 NBSS Continuation Message

If we compare the two outputs above, the summary lines are clearly different depending on how the frames are decoded.

Finally, there are some interesting overall statistics that we could glean using tshark, especially for large packet traces. If we have a large HTTP trace, run the following command to determine the distribution of HTTP methods used and HTTP responses from the server:

$ tshark -z http,tree -qr capture.cap
=================================================================
 HTTP/Packet Counter           value       rate           percent
-----------------------------------------------------------------
 Total HTTP Packets             208       0.004884              
  HTTP Request Packets           106       0.002489          50.96%
   GET                            104       0.002442          98.11%
   POST                             2       0.000047           1.89%
  HTTP Response Packets          102       0.002395          49.04%
   ???: broken                      0       0.000000           0.00%
   1xx: Informational               0       0.000000           0.00%
   2xx: Success                    98       0.002301          96.08%
    200 OK                          98       0.002301         100.00%
   3xx: Redirection                 4       0.000094           3.92%
    302 Found                        4       0.000094         100.00%
   4xx: Client Error                0       0.000000           0.00%
   5xx: Server Error                0       0.000000           0.00%
  Other HTTP Packets               0       0.000000           0.00%
=================================================================


The -q option ensures only the summary is shown and not every single frame in the trace. My favourite summary is the packet length distribution. This is one of the more useful facts of a network that could help in troubleshooting certain network behaviour:

$ tshark -z plen,tree -qr capture.cap
=================================================================
 Packet Lengths          value         rate           percent
-------------------------------------------------------------------
 Packet Lengths          5579       0.011575              
  0-19                       0       0.000000           0.00%
  20-39                      0       0.000000           0.00%
  40-79                   2837       0.005886          50.85%
  80-159                   500       0.001037           8.96%
  160-319                  243       0.000504           4.36%
  320-639                  111       0.000230           1.99%
  640-1279                 148       0.000307           2.65%
  1280-2559               1740       0.003610          31.19%
  2560-5119                  0       0.000000           0.00%
  5120-4294967295            0       0.000000           0.00%
=================================================================


I hope you find tshark a useful tool to add to your arsenal. Enjoy!