Stratum 2 NTP over a Cable Modem

Goals

I have a Stratum 1 NTP server at home, and ideally I'd like to join the ntp pool as a server. The problem is, my home connection isn't up to this task. My IP is dynamic, there's an asymmetric latency built in, and my upstream is tiny. So it'd be great if the NTP server on my VM in the cloud could use my home NTP server as a clock source.

Plan

  1. Write a program to do the NTP measurements and submit the results to ntpd
  2. Use dynamic DNS, and have the program automatically switch to the new dynamic IP if the DNS record changes
  3. Measure the request latency to avoid the upstream cable modem latency noise
  4. Estimate the one way latency and subtract that (assumption: the one way latency doesn't change often)
  5. Throw away any measurement that has a round trip time over 2x the minimum (currently around 5% of the samples)
  6. Measure once per second
  7. Every 20 seconds, remove the extreme samples, and send the median to ntp
  8. Configure ntpd to accept the best sample every 64 seconds
  9. For backup and comparison, configure the other two Stratum 1 NTP servers I run with a minpoll 6 (64 seconds), and add two more Stratum 1 NTP servers with a minpoll 10 (~17 minutes)

Experiment

A process on "vps" was setup to take 4 hours of NTP request latency samples, one per second. The clock on vps was sync'd to public timeservers with ~17 minute poll times. The clock on "sandfish" was sync'd to GPS. This graph is limited to a max of 20ms, samples over that amount are not shown.

The purple and cyan lines are the 90th and 3rd percentiles of the last 200 samples. The uneven percentiles are used based on the assumption that the error in the offset measurement will be positive more often than negative. They are recalculated every 20 samples, and are used to calculate the filtered mode (incorrectly labeled "filtered mean" in this graph) and filtered average. Both filtered statistics are only over the last 20 samples, while the last 200 samples are used to generate the filter. The filter moves at a slower pace to limit the effects of noise.

Comparison

Putting this data into practice, I created a asymmetric ntp client to send the filtered samples to ntpd.

I configured ntpd to log both the filtered samples vs going direct (without a filter). The dark blue line is the filtered offset, and the lighter blue line is the direct offset. The filtered offset has an order of magnitude better jitter.

Comparing the filtered NTP samples ("sandfish") vs other Stratum 1 NTP servers. I've adjusted the offsets of the other NTP servers to be within 1ms with a static +/- number.

The filtered NTP samples are still more noisy than the other sources, but they all follow each other very closely. You can see the "lon" clock lost sync for roughly 6 hours, and then came back. This is because its antenna placement isn't great and sometimes loses signal. 3ms in 6 hours is a 138 parts per billion error, which is reasonable holdover performance.

Results

Lastly, a look at the local clock's performance.

The offset is within +/- 144us 90% of the time, and +/- 482us 99.98% of the time. On average, the clock wandered 86 parts per billion in 30 minutes, and wandered less than 185 parts per billion 90% of the time.