Recently I witnessed an unexpected issue with reundant PTP devices: an IGMP snooping issue on the switch side made both time appliances unavailable.
For some time I wanted to add NTP redundancy to my PTP setup, this was a good excuse to put some time on it. It is not the same, but when you are working with certain hardware it is a good failsafe mechanism.
From what I read in the official documentation, Chrony could be used with SFPTPD but with a few caveats. SFPTPD needs to be able to control Chrony either via external helper (prior to 3.8) or via socket on the newest releases.
Disclaimer: This configuration is still experimental and under testing.
This is my SFPTPD config file:
[general]
sync_module ptp ptp1
sync_module crny crny0
message_log syslog
stats_log syslog
clock_control no-step
selection_holdoff_interval 60
[ptp1]
ptp_mode slave
ptp_delay_mechanism end-to-end
ptp_network_mode hybrid
ptp_domain 28
priority 10
[ptp]
interface core
[crny0]
clock_control on
priority 20
The basic additions are Chrony sync module and its configuration, with higher priority than the PTP domain. Also, a hold off interval in between kicking Chrony in or out.
On the Chrony configuration side (without the comments):
allow 127.0.0.1 # only for local purposes
server 169.254.169.123 iburst prefer # AWS ntp or your preferred ntp source
rtcsync # keep the bios updates, useful for older hardware
bindcmdaddress /var/run/chrony/chronyd.sock # local control via socket
makestep 0.0 0 # no stepping
maxslewrate 10000 # roughtly 1 hour to slew 1 minute
After restarting both daemons, we can see SFPTPD recognising the additional clocksource and failing over to Chrony if PTP is not available:
systemd[1]: Started sfptpd.service - Solarflare Enhanced PTP Daemon.
sfptpd[3388701]: ntp: changed state from ntp-listening to ntp-disabled
sfptpd[3388701]: crny: unblocking system clock
sfptpd[3388701]: crny: changed state from ntp-listening to ntp-selection
sfptpd[3388701]: crny: changed state from ntp-selection to ntp-slave
sfptpd[3388701]: selection: rank 1: crny0 by rule state (2) <- BEST
sfptpd[3388701]: selection: rank 2: ptp1 <- WORST
sfptpd[3388701]: will switch to sync instance crny0 in 10 seconds if ptp1 does not recover
sfptpd[3388701]: ptp ptp1: failed to receive Announce within 12.000 seconds
sfptpd[3388701]: crny: enabled chronyd clock control
sfptpd[3388701]: selected sync instance crny0 (ptp1 was active for 16.355s)
It can be observed in SFPTPD logs the block / unblock chrony events to config it is able to talk to chrony via socket (you can also replicate with chronyc -h <socket path>):
$ sudo journalctl -u sfptpd -l | egrep -i block sfptpd[2343906]: crny: blocking system clock sfptpd[2343906]: crny: unblocking system clock
I need to do more tests on this configuration, but so far looks promising.
Additional documentation sources:
- /usr/share/doc/sfptpd/config/
ptp_slave_chrony_fallback.cfg (local file on your SFPTPD server) - https://docs.amd.com/r/en-US/
ug1602-ptp-user/ptp_slave_ chrony_fallback.cfg - https://docs.amd.com/r/en-US/ug1602-ptp-user/Chronyd
No comments:
Post a Comment