Recently I was having some issues at work with internal clocks.
For high performance systems, you might prefer using TSC (time stamp counter) instead of HPET (high precession event timer). There's a good explanation on these two in this Red Hat documentation page.
In my case, running kernel 4.19.0-XX-amd64, I was occasionally seeing this error in the logs:
kernel: [517300.909751] clocksource: timekeeping watchdog on CPU15: hpet retried 2 times before success
/** Interval: 0.5sec Threshold: 0.0625s*/#define WATCHDOG_INTERVAL (HZ >> 1)#define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)/** Maximum permissible delay between two readouts of the watchdog* clocksource surrounding a read of the clocksource being validated.* This delay could be due to SMIs, NMIs, or to VCPU preemptions.*/#define WATCHDOG_MAX_SKEW (100 * NSEC_PER_USEC)
kernel: [4027805.681972] clocksource: timekeeping watchdog on CPU8: hpet read-back delay of 113166ns, attempt 4, marking unstable
sudo taskset -c 0-15 stress-ng --timeout 180 --times --verify --metrics-brief --ioport 32 --sysinfo 32 --aggressive --schedpolicy 40 --cpu-load-slice 100
/** Threshold: 0.0312s, when doubled: 0.0625s.* Also a default for cs->uncertainty_margin when registering clocks.*/#define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 5)/** Maximum permissible delay between two readouts of the watchdog* clocksource surrounding a read of the clocksource being validated.* This delay could be due to SMIs, NMIs, or to VCPU preemptions. Used as* a lower bound for cs->uncertainty_margin values when registering clocks.*/#define WATCHDOG_MAX_SKEW (100 * NSEC_PER_USEC)
The threshold is increased a bit, but the max skew still 100 microseconds. In kernel 6 we can see this value increased to 125. An alternative if this issue would happen again could be custom build my own kernel, increasing the max skew to something more accommodating to what my servers are reporting. Although that could complicate a bit upgrading kernels later on, and perhaps have an impact on overall performance.
In the end, besides making my clock source issue go away, I'm also observing more user time in the cpu usage space, and less IO wait. A good reminder of the benefits of trying new kernels.
No comments:
Post a Comment