Thursday, April 2, 2026

PTP with NTP backup with Solarflare PTP daemon

Recently I witnessed an unexpected issue with reundant PTP devices: an IGMP snooping issue on the switch side made both time appliances unavailable.

For some time I wanted to add NTP redundancy to my PTP setup, this was a good excuse to put some time on it. It is not the same, but when you are working with certain hardware it is a good failsafe mechanism.

From what I read in the official documentation, Chrony could be used with SFPTPD but with a few caveats. SFPTPD needs to be able to control Chrony either via external helper (prior to 3.8) or via socket on the newest releases.

Disclaimer: This configuration is still experimental and under testing.

This is my SFPTPD config file:

[general]
sync_module ptp ptp1
sync_module crny crny0
message_log syslog
stats_log syslog
clock_control no-step
selection_holdoff_interval 60
[ptp1]
ptp_mode slave
ptp_delay_mechanism end-to-end
ptp_network_mode hybrid
ptp_domain 28
priority 10
[ptp]
interface core
[crny0]
clock_control on
priority 20


The basic additions are Chrony sync module and its configuration, with higher priority than the PTP domain. Also, a hold off interval in between kicking Chrony in or out.

On the Chrony configuration side (without the comments):

allow 127.0.0.1 # only for local purposes
server 169.254.169.123 iburst prefer # AWS ntp or your preferred ntp source
rtcsync # keep the bios updates, useful for older hardware
bindcmdaddress /var/run/chrony/chronyd.sock # local control via socket
makestep 0.0 0 # no stepping
maxslewrate 10000 # roughtly 1 hour to slew 1 minute

 

After restarting both daemons, we can see SFPTPD recognising the additional clocksource and failing over to Chrony if PTP is not available:

systemd[1]: Started sfptpd.service - Solarflare Enhanced PTP Daemon.
sfptpd[3388701]: ntp: changed state from ntp-listening to ntp-disabled
sfptpd[3388701]: crny: unblocking system clock
sfptpd[3388701]: crny: changed state from ntp-listening to ntp-selection
sfptpd[3388701]: crny: changed state from ntp-selection to ntp-slave
sfptpd[3388701]: selection: rank 1: crny0 by rule state (2) <- BEST

sfptpd[3388701]: selection: rank 2: ptp1 <- WORST
sfptpd[3388701]: will switch to sync instance crny0 in 10 seconds if ptp1 does not recover
sfptpd[3388701]: ptp ptp1: failed to receive Announce within 12.000 seconds
sfptpd[3388701]: crny: enabled chronyd clock control
sfptpd[3388701]: selected sync instance crny0 (ptp1 was active for 16.355s)


It can be observed in SFPTPD logs the block / unblock chrony events to config it is able to talk to chrony via socket (you can also replicate with chronyc -h <socket path>):

$ sudo journalctl -u sfptpd -l | egrep -i block sfptpd[2343906]: crny: blocking system clock sfptpd[2343906]: crny: unblocking system clock

I need to do more tests on this configuration, but so far looks promising.


Additional documentation sources:

 

Sunday, January 4, 2026

Istio Ingress Gateway for direct node ingress with Kubernetes applications

Recently I migrated a few market data applications from nomad to EKS. 

These services are optimised to run within a small, simple footprint where SSL is offloaded to other components. Before the migration, this used to be an nginx companion using a sidecar pattern. On the new setup, Istio is the ingress tool of choice so there is no need to keep a sidecar.

With a traditional Istio setup, pretty soon we came to the realisation that we are introducing additional hops in our network and increasing service latency:




The network circuit was:
  • Consumers connect to the NLB / Service definition in Istio
  • The NLB sends the requests to the worker nodes running Istio Ingress
  • Istio Ingress accesses the application service definition and routes the traffic to one of the instances
  • The MD instance receives the request, processes the information
  • The HTTP response is returned to the client

The new platform would be slower, since now we need to contact a newly introduced NLB and the intermediate Istio worker node.

Looking around at what was possible with Istio, there is a feature that allows you to configure edge ingress with Istio Ingress Gateway Summing up the additional configuration:
  • Make use of external-dns for direct service resolution
  • Make use of nodeport to proxy into this service type with automatic pod resolution
  • Make use of Ingress Gateway helm deployment to have Istio running on the application nodes

The resulting picture is:



The resulting network circuit is now:
  • Consumers resolve the endpoint via DNS which external-dns populates 
  • The Edge Istio container receives the request, routes it internally or sends it to the next MD APP node
  • The MD instance receives the request, processes the information
  • The HTTP response is returned to the client

The implementation is relatively simple. Since I already had an Istio deployment, I created a new helm chart deploying the ingressgateway component of Istio:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: istio-ingressgateway
  namespace: istio-system
  labels:
    app: istio-ingressgateway
    istio: ingressgateway
spec:
  selector:
    matchLabels:
      app: istio-ingressgateway
      istio: ingressgateway
  template:
    metadata:
      labels:
        app: istio-ingressgateway
        istio: ingressgateway
      annotations:
        inject.istio.io/templates: gateway
    spec:
      serviceAccountName: istio-ingress # shared account with Istio
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet # This will be needed for edge access

      nodeSelector:
        mylabels/ingressgateway: "true" # What labels will run Istio Ingress Gateway
      tolerations:
        # Any tolerations?

      containers:
  [... from here pretty much same as the original Istio ...]


After getting Istio Ingress Gateway running on the desired labels / nodes, we need to add the necessary settings to on the Market Data application deploymenty side. The application previously had these relevant components:

  • Service definition (ClusterIP)
  • Gateway
  • VirtualService

First, modified the service to add the external-dns annotation adding the additional internal DNS. For this setup the application will have two endpoints:

  • mymarketdata.localdomain.service.internal - the original dns endpoint
  • igw-mymarketdata.localdomain.service.internal - Istio Ingress Gateway dns endpoint

Note: This is an addition to an existing service, so I will only highlight the modifications to a traditional service setup and not the entire setup.

On the existing Service definition, we need to add an annotation for external-dns:

apiVersion: v1
kind: Service
metadata:
  name: market-data-v1
  namespace: marketdata
  annotations:
    external-dns.alpha.kubernetes.io/hostname: igw-mymarketdata.localdomain.service.internal

 [...]

Since this is is a ClusterIP service type we will use nodeport to automatically map the pod IPs on the target DNS records. Sample nodeport configuration:

apiVersion: v1
kind: Service
metadata:
  name: nodeport-market-data-v1
  namespace: istio-system
  annotations:
    external-dns.alpha.kubernetes.io/hostname: igw-mymarketdata.localdomain.service.internal
spec:
  type: NodePort
  externalTrafficPolicy: Local
  selector:
    istio: ingressgateway # or the app labels if more convenient or have ingress gateway running in more places
  ports:
  - name: https
    port: 443 
    targetPort: 443 # 8443 is the default unless we bind envoy to 443
    nodePort: 38443   # valid NodePort

Note: Nodeport operates on the range 30000-32767, and by default will bind to port 8443 unless envoy runs as root or is allowed to bind to low ports such as 443.

Now that we have the service modification and nodeport, we need to create the new Gateway:

apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: gateway-market-data-v1
  namespace: istio-system
spec:
  selector:
    istio: ingressgateway
  servers:
    - port:
        number: 443
        name: https
        protocol: HTTPS
      hosts:
        - igw-mymarketdata.localdomain.service.internal
      tls:
        mode: SIMPLE
        credentialName: [your cert config]

 and VirtualService:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: vs-market-data-v1
  namespace: marketdata
spec:
  hosts:
    - igw-mymarketdata.localdomain.service.internal
  gateways:
    - istio-system/gateway-market-data-v1
  http:
    - route:
        - destination:
            host: market-data-v1.marketdata.svc.cluster.local
            port:
              name: [your container port label or port number with "number:"]


Now Istio should start to publish the running pod's IP on the new dns name. By default it will be a round robin, but we can expand this to least connections, random, etc with a DestinationRule.