By Dan Drown — 18 May 2026

Retiring an NTP server

To run a NTP server in the NTP pool, I rented a physical server. Since the server hardware is relatively old (Xeon E3-1240 v3, 2013) it was discounted (about $45/mo). I've had it since October 2024 and it's been reasonably stable. But now it's time to retire it.

This hardware was nice because the NIC has hardware timestamps

$ sudo ethtool -T eno1
Time stamping parameters for eno1:
Capabilities:
        hardware-transmit
        software-transmit
        hardware-receive
        software-receive
        software-system-clock
        hardware-raw-clock
Hardware timestamp provider index: 0
Hardware timestamp provider qualifier: Precise (IEEE 1588 quality)
Hardware Transmit Timestamp Modes:
        off
        on
Hardware Receive Filter Modes:
        none
        all

So I was able to turn on NTP hardware timestamps:

# fragment from /etc/chrony.conf
# Enable hardware timestamping on all interfaces that support it.
hwtimestamp *
# allocate 100MB to client information
clientloglimit 100000000

This might not be worth much on the general internet, as other sources of noise will dominate, especially if the client is on a wireless or home internet connection. But it was the highest accuracy possible for a reasonable price point, and that made it worth it to me.

I started noticing problems on 2025-05-25, when it crashed and came back online by itself. It did it again 7 months later on 2025-12-15. And again 4 months later on 2026-04-28. The crashes themselves weren't much of a problem as it'd come back online, but they are a warning sign that the hardware is starting to fail.

I noticed the CMOS battery was dying, generating these events in the SEL (Voltage #0x33 is VBAT):

$ sudo ipmitool sel list
   1 | 04/30/2026 | 03:20:29 | Voltage #0x33 | Lower Critical going low  | Asserted
   2 | 04/30/2026 | 03:20:29 | Voltage #0x33 | Lower Non-recoverable going low  | Asserted

But this was unrelated as the provider replaced it and the server continued to crash. This is just another sign of the hardware's age.

Trying to narrow down what was failing, I looked at the SSD's SMART statistics and the LOM's state. The SSD's POR_Recovery_Count and Power_Cycle_Count counters would increment, suggesting a power issue. The provider says there were no power issues during these times. Sometimes "ipmitool chassis poh" would reset and other times it the LOM would stay running (POH = power on hours, SEL = system event log), suggesting that it's an issue with the server's power supply:

$ sudo ipmitool chassis poh; uptime
POH Counter  : 8 days, 12 hours
 23:30:38 up  2:10,  1 user,  load average: 0.03, 0.13, 0.06
$ sudo ipmitool sel list
SEL has no entries

The crashes started coming more frequently:

2026-04-28 - crash #3
2026-05-04 - switch port failed, not a server crash
2026-05-06 - crash #4
2026-05-09 - crash #5
2026-05-12 - crash #6
2026-05-15 - crash #7
2026-05-29 - crash #8

So it's time to retire this server. I removed it from the NTP pool when the switch port failed on the 4th.

I left NTP running for a little over a week, and then started blocking it to encourage clients to choose a different server:

bandwidth in to server - mostly NTP

Telegraf often starts before the hostname is set on this server, so it shows up as "localhost.localdomain" sometimes. The first blue line is when the switchport failed on 2026-05-04. You can see that the inbound traffic was significantly reduced after the server was brought back online because it was no longer in the NTP pool. I had the NTP pool bandwidth setting to be 1gbit/s which was around 2TB/month worth of NTP:

$ vnstat -m

 eno1  /  monthly

        month        rx      |     tx      |    total    |   avg. rate
     ------------------------+-------------+-------------+---------------
       2025-06      1.11 TiB |    1.17 TiB |    2.28 TiB |    7.74 Mbit/s
       2025-07      1.11 TiB |    1.18 TiB |    2.29 TiB |    7.52 Mbit/s
       2025-08      1.12 TiB |    1.20 TiB |    2.33 TiB |    7.64 Mbit/s
       2025-09      1.03 TiB |    1.13 TiB |    2.16 TiB |    7.34 Mbit/s
       2025-10      1.01 TiB |    1.08 TiB |    2.09 TiB |    6.87 Mbit/s
       2025-11    972.14 GiB |    1.02 TiB |    1.97 TiB |    6.68 Mbit/s
       2025-12    967.21 GiB |    0.98 TiB |    1.92 TiB |    6.31 Mbit/s
       2026-01    932.63 GiB |  934.22 GiB |    1.82 TiB |    5.99 Mbit/s
       2026-02    846.76 GiB |  847.52 GiB |    1.65 TiB |    6.02 Mbit/s
       2026-03    925.32 GiB |  931.50 GiB |    1.81 TiB |    5.96 Mbit/s
       2026-04    865.38 GiB |  842.17 GiB |    1.67 TiB |    5.66 Mbit/s
       2026-05    154.19 GiB |  151.17 GiB |  305.36 GiB |    1.74 Mbit/s
     ------------------------+-------------+-------------+---------------
     estimated    273.36 GiB |  267.99 GiB |  541.35 GiB |

The second blue line is when I started dropping NTP requests. You can see some clients that retry immediately when the request is lost, and so the inbound bandwidth went up. I plan to leave the server running till 2026-06-04, to give it a month since removing it from the NTP pool.

Goodbye server, you provided many time packets to the world. 🪦⚰️

Questions? Comments? Contact information