Embedded NTP client/NTP interleaved mode, part 2
Previous post: part 1
Next goal I focused on was TX timestamps on my embedded device. With them, I can measure how much time was spent in the NTP client's IP stack between submitting the packet for transmission and when it was actually sent. Below is a graph of the difference in time between when the software submits the packet and when the hardware actually transmits it on this platform.
You can see the 6.05ns step of this clock in the individual horizontal bands. There's 133ns between the max and min values. That's a pretty good result, not having to run other code or handle interrupts makes it a very predictable delay. However, I was hoping to spend far less than 10us in this part of the code, but the average is 14.1us. I'll have to add speeding this up as a future goal.
The round trip time spent on the network (single 100M/1G switch) was 99% between 14.845us and 15.481us.
To break down what this 15us entails, let's break the time down.
In this NTP request, the clock starts at the HW Timestamp. This NTP client transmits 14+20+8+48+4+12 bytes@100M (8.480us). The switch spends some amount of time, and then forwards it on to the NTP server. This NTP server receives 7+1 bytes@1G (0.064us) before it takes a HW timestamp. Because NTP traditionally used software timestamps, the RX timestamp is adjusted as if it was after the Ethernet CRC. This adds another 14+20+8+48+4 bytes@1G (0.752us).
The clock is paused while the packet is in the NTP server's NIC buffer, the OS, and the NTP software. The clock starts again when the NTP server starts transmitting at the HW timestamp point. The NTP server sends 14+20+8+48+4+12 bytes@1G (0.848us). The switch spends some more time, and then forwards it on to the NTP client. The NTP client receives 7+1 bytes@100M (0.640us), and then takes a timestamp.
All this adds up to 10.784us, which means this packet spends on average 2.2us in the switch each direction.
To compare, I plugged the NTP client directly into the NTP server, without at switch.
This lowered the average latency from 15.160us to 11.277us.