By Dan Drown in stm32 — May 28, 2020

Embedded NTP client/NTP interleaved mode, part 4

Part 3 ended with an unexplained 3us offset, this post will reduce that.

Verifying PTP clock sync

I wanted to measure the NTP client's PTP clock sync externally. The PTP peripheral has the option to output a PPS that I can compare to the GPS's PPS.

Starting with the software's internal estimate of the offset between the PTP clock and the GPS PPS:

I connected both PPSes to an oscilloscope.

The blue line is the GPS PPS on channel 2. I have it as the trigger, so it is in the middle of the screenshot. I've turned on infinite persistence, so the scope is showing all the positions the PTP clock's PPS happened (channel 1, in yellow) relative to the GPS's PPS.

Each grey dotted vertical line is 1us, and you can see the vast majority of PTP PPSes happened within +/-4us (the solid yellow area to the left and right of the blue vertical line). The earliest was over 5us, and the latest was off the right hand side's 9us. This lines up to what the software was reporting, so I believe the PTP clock is properly synchronized.

100M/1G asymmetry

Over the years, a few people have told me that having the RX timestamp at the end of the packet is important for symmetry. I didn't fully appreciate that. From "Timestamp capture principals" section on Store and Forward Delay Errors:

...some consideration must also be given to the forwarding behavior of the switches connecting A and B when the link speeds differ. Using the preamble timestamp as the transmit timestamp and the trailer timestamp as the receive timestamp solves this problem.

To give a concrete example, let's take a NTP client connected via 100M and a NTP server connected at 1G. Each NTP request is 48 bytes of user data, and 46 bytes of header and trailer. This 752 bits takes 7520ns to transmit at 100M, it spends some amount of time in the switch, and the server takes the RX hardware timestamp right after it receives the ethernet preamble and SFD (64ns at 1G). That's a total of 7584ns in the NTP request direction. The NTP response takes 752ns to transmit at 1G, it spends some time in the switch, and then the client takes the RX hardware timestamp right after it receives the ethernet preamble and SFD (640ns at 100M). That's a total of 1392ns.

Because the NTP request and NTP response have different latency, that creates an offset in NTP. NTP assumes the latencies are equal. The fix for this problem is as the quote says, take the RX timestamp as the end of the packet (trailer timestamp).

Verifying asymmetry

To verify this was a problem, I switched one of my NTP servers from 1G to 100M. This would verify that having a symmetrical network speed would result in symmetrical latency. Both sides were still using RX preamble timestamps. This happened at 27-06:10 (middle of the graph).

As expected, the NTP client TX direction didn't change much, but the RX direction moved closer to a symmetrical latency (and closer to 0 offset). The times of low noise/jitter are an artifact from the local clock having times of stability vs instability.

Moving from preamble to trailer timestamps

Next, I moved the RX timestamps from preamble back to trailer timestamps. This shouldn't do anything to the offset of the 100M NTP server, but it should lower the offset of the 1G server.

Still symmetrical latency on the 100M server

This looks good as well.

Results

These are much closer to 0 offset. Now my 3 time sources are within 500ns of each other. I'm not sure where the rest of this offset is coming from. Looking at the PHY's data sheets, I've accounted for their published latency. The 100M PHY has 170ns TX delay and 380ns RX delay. The 1G PHY's datasheet isn't as clear, but the only delay numbers it has are in the 2-5ns range.