NIC interrupt coalesce impact on NTP

I noticed that my Odroid C2 system had a much higher round-trip time than I had expected. It has a gigabit ethernet built into the CPU, so it should be decent.

rtt min/avg/max/mdev = 0.577/0.605/0.627/0.020 ms (10x ICMP pings)

To compare it, a Raspberry Pi 2 has 100M ethernet via USB. Both the lower speed and going through USB should mean it should have a higher latency.

rtt min/avg/max/mdev = 0.323/0.409/0.445/0.039 ms (10x ICMP pings)

Examining the Odroid C2's ethtool settings brings up something to try:

$ ethtool -c eth0
Coalesce parameters for eth0:
Adaptive RX: off  TX: off
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 393 <--------------- 393us!
rx-frames: 0
rx-usecs-irq: 0
rx-frames-irq: 0

tx-usecs: 40000
tx-frames: 64
tx-usecs-irq: 0
tx-frames-irq: 0

rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0

rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0

I changed this to its lowest setting of 50us with sudo ethtool -C eth0 rx-usecs 50

By comparison, an Intel NIC has the default setting of: rx-usecs: 3

rtt min/avg/max/mdev = 0.161/0.270/0.301/0.044 ms (average rtt dropped by 335us after reducing the rx-usec coalesce setting by 343us)

I had two NTP clients polling every second for 100 seconds and made the change at second 48. The "1G" system (in purple) is an Intel machine and the "100M" system (in cyan/green) is a Raspberry Pi 2. The y-axis round trip times are in microseconds (one millionth of a second).

Round trip time

You can see the round trip time dropped dramatically after the setting was changed. There was one random packet from the Raspberry Pi 2 that took over 1ms.

The clients also recorded the request and response latency.

Request and response latency

You can see the configuration change only affected the two request latencies. This is because the coalesce setting that was changed only affects the receive packet path. You can see the random slow packet the Raspberry Pi system saw mainly affected the receive latency. This could either be in the Odroid C2's transmit or the Pi's receive path. This is likely unrelated to the coalesce change.

The clocks didn't change in offset relative to each other (the three systems are within 15us). But asymmetric changes in latencies show up as an offset change.

Offset

This coalesce change brings the three clocks much closer together.

The Odroid C2 maxes out at receiving around 90k-100k packets per second. Changing the interrupt coalesce doesn't lower that rate. The maximum NTP response rate with this hardware is around 55k packets per second.

By comparison, the Raspberry Pi 2 can handle around 18k packets per second.