summaryrefslogtreecommitdiff
path: root/kernel/time/timekeeping.c
AgeCommit message (Collapse)Author
3 daysMerge tag 'timers-ptp-2025-07-27' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timekeeping and VDSO updates from Thomas Gleixner: - Introduce support for auxiliary timekeepers PTP clocks can be disconnected from the universal CLOCK_TAI reality for various reasons including regularatory requirements for functional safety redundancy. The kernel so far only supports a single notion of time, which means that all clocks are correlated in frequency and only differ by offset to each other. Access to non-correlated PTP clocks has been available so far only through the file descriptor based "POSIX clock IDs", which are subject to locking and have to go all the way out to the hardware. The access is not only horribly slow, as it has to go all the way out to the NIC/PTP hardware, but that also prevents the kernel to read the time of such clocks e.g. from the network stack, where it is required for TSN networking both on the transmit and receive side unless the hardware provides offloading. The auxiliary clocks provide a mechanism to support arbitrary clocks which are not correlated to the system clock. This is not restricted to the PTP use case on purpose as there is no kernel side association of these clocks to a particular PTP device because that's a pure user space configuration decision. Having them independent allows to utilize them for other purposes and also enables them to be tested without hardware dependencies. To avoid pointless overhead these clocks have to be enabled individualy via a new sysfs interface to reduce the overhead to a single compare in the hotpath if they are enabled at the Kconfig level at all. These clocks utilize the existing timekeeping/NTP infrastructures, which has been made possible over the recent releases by incrementaly converting these infrastructures over from a single static instance to a multi-instance pointer based implementation without any performance regression reported. The auxiliary clocks provide the same "emulation" of a "correct" clock as the existing CLOCK_* variants do with an independent instance of data and provide the same steering mechanism through the existing sys_clock_adjtime() interface, which has been confirmed to work by the chronyd(8) maintainer. That allows to provide lockless kernel internal and VDSO support so that applications and kernel internal functionalities can access these clocks without restrictions and at the same performance as the existing system clocks. - Avoid double notifications in the adjtimex() syscall. Not a big issue, but a trivial to avoid latency source. * tag 'timers-ptp-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits) vdso/gettimeofday: Add support for auxiliary clocks vdso/vsyscall: Update auxiliary clock data in the datapage vdso: Introduce aux_clock_resolution_ns() vdso/gettimeofday: Introduce vdso_get_timestamp() vdso/gettimeofday: Introduce vdso_set_timespec() vdso/gettimeofday: Introduce vdso_clockid_valid() vdso/gettimeofday: Return bool from clock_gettime() helpers vdso/gettimeofday: Return bool from clock_getres() helpers vdso/helpers: Add helpers for seqlocks of single vdso_clock vdso/vsyscall: Split up __arch_update_vsyscall() into __arch_update_vdso_clock() vdso/vsyscall: Introduce a helper to fill clock configurations timekeeping: Remove the temporary CLOCK_AUX workaround timekeeping: Provide ktime_get_clock_ts64() timekeeping: Provide interface to control auxiliary clocks timekeeping: Provide update for auxiliary timekeepers timekeeping: Provide adjtimex() for auxiliary clocks timekeeping: Prepare do_adtimex() for auxiliary clocks timekeeping: Make do_adjtimex() reusable timekeeping: Add auxiliary clock support to __timekeeping_inject_offset() timekeeping: Make timekeeping_inject_offset() reusable ...
11 daystimekeeping: Zero initialize system_counterval when querying time from phc ↵Markus Blöchl
drivers Most drivers only populate the fields cycles and cs_id of system_counterval in their get_time_fn() callback for get_device_system_crosststamp(), unless they explicitly provide nanosecond values. When the use_nsecs field was added to struct system_counterval, most drivers did not care. Clock sources other than CSID_GENERIC could then get converted in convert_base_to_cs() based on an uninitialized use_nsecs field, which usually results in -EINVAL during the following range check. Pass in a fully zero initialized system_counterval_t to cure that. Fixes: 6b2e29977518 ("timekeeping: Provide infrastructure for converting to/from a base clock") Signed-off-by: Markus Blöchl <markus@blochl.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20250720-timekeeping_uninit_crossts-v2-1-f513c885b7c2@blochl.de
2025-07-18vdso/vsyscall: Update auxiliary clock data in the datapageThomas Weißschuh
Expose the auxiliary clock data so it can be read from the vDSO. Architectures not using the generic vDSO time framework, namely SPARC64, are not supported. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250701-vdso-auxclock-v1-11-df7d9f87b9b8@linutronix.de
2025-07-18vdso: Introduce aux_clock_resolution_ns()Thomas Weißschuh
Move the constant resolution to a shared header, so the vDSO can use it and return it without going through a syscall. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250701-vdso-auxclock-v1-10-df7d9f87b9b8@linutronix.de
2025-07-03Merge tag 'ktime-get-clock-ts64-for-ptp' into timers/ptpThomas Gleixner
Pull the base implementation of ktime_get_clock_ts64() for PTP, which contains a temporary CLOCK_AUX* workaround. That was created to allow integration of depending changes into the networking tree. The workaround is going to be removed in a subsequent change in the timekeeping tree. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2025-07-03timekeeping: Provide ktime_get_clock_ts64()Thomas Gleixner
PTP implements an inline switch case for taking timestamps from various POSIX clock IDs, which already consumes quite some text space. Expanding it for auxiliary clocks really becomes too big for inlining. Provide a out of line version. The function invalidates the timestamp in case the clock is invalid. The invalidation allows to implement a validation check without the need to propagate a return value through deep existing call chains. Due to merge logistics this temporarily defines CLOCK_AUX[_LAST] if undefined, so that the plain branch, which does not contain any of the core timekeeper changes, can be pulled into the networking tree as prerequisite for the PTP side changes. These temporary defines are removed after that branch is merged into the tip::timers/ptp branch. That way the result in -next or upstream in the next merge window has zero dependencies. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250701132628.357686408@linutronix.de
2025-06-27timekeeping: Provide interface to control auxiliary clocksThomas Gleixner
Auxiliary clocks are disabled by default and attempts to access them fail. Provide an interface to enable/disable them at run-time. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183758.444626478@linutronix.de
2025-06-27timekeeping: Provide update for auxiliary timekeepersThomas Gleixner
Update the auxiliary timekeepers periodically. For now this is tied to the system timekeeper update from the tick. This might be revisited and moved out of the tick. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183758.382451331@linutronix.de
2025-06-27timekeeping: Provide adjtimex() for auxiliary clocksThomas Gleixner
The behaviour is close to clock_adtime(CLOCK_REALTIME) with the following differences: 1) ADJ_SETOFFSET adjusts the auxiliary clock offset 2) ADJ_TAI is not supported 3) Leap seconds are not supported 4) PPS is not supported Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183758.317946543@linutronix.de
2025-06-27timekeeping: Prepare do_adtimex() for auxiliary clocksThomas Gleixner
Exclude ADJ_TAI, leap seconds and PPS functionality as they make no sense in the context of auxiliary clocks and provide a time stamp based on the actual clock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183758.253203783@linutronix.de
2025-06-27timekeeping: Make do_adjtimex() reusableThomas Gleixner
Split out the actual functionality of adjtimex() and make do_adjtimex() a wrapper which feeds the core timekeeper into it and handles the result including audit at the call site. This allows to reuse the actual functionality for auxiliary clocks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183758.187322876@linutronix.de
2025-06-27timekeeping: Add auxiliary clock support to __timekeeping_inject_offset()Thomas Gleixner
Redirect the relative offset adjustment to the auxiliary clock offset instead of modifying CLOCK_REALTIME, which has no meaning in context of these clocks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183758.124057787@linutronix.de
2025-06-27timekeeping: Make timekeeping_inject_offset() reusableThomas Gleixner
Split out the inner workings for auxiliary clock support and feed the core time keeper into it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183758.059934561@linutronix.de
2025-06-27timekeeping: Provide time setter for auxiliary clocksThomas Gleixner
Add clock_settime(2) support for auxiliary clocks. The function affects the AUX offset which is added to the "monotonic" clock readout of these clocks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183757.995688714@linutronix.de
2025-06-27timekeeping: Add minimal posix-timers support for auxiliary clocksThomas Gleixner
Provide clock_getres(2) and clock_gettime(2) for auxiliary clocks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183757.932220594@linutronix.de
2025-06-27timekeeping: Provide time getters for auxiliary clocksThomas Gleixner
Provide interfaces similar to the ktime_get*() family which provide access to the auxiliary clocks. These interfaces have a boolean return value, which indicates whether the accessed clock is valid or not. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183757.868342628@linutronix.de
2025-06-27timekeeping: Update auxiliary timekeepers on clocksource changeThomas Gleixner
Propagate a system clocksource change to the auxiliary timekeepers so that they can pick up the new clocksource. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183757.803890875@linutronix.de
2025-06-19timekeeping: Provide ktime_get_ntp_seconds()Thomas Gleixner
ntp_adjtimex() requires access to the actual time keeper per timekeeper ID. Provide an interface. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250519083026.411809421@linutronix.de
2025-06-19timekeeping: Introduce auxiliary timekeepersAnna-Maria Behnsen
Provide timekeepers for auxiliary clocks and initialize them during boot. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250519083026.350061049@linutronix.de
2025-06-19timekeeping: Add clock_valid flag to timekeeperThomas Gleixner
In preparation for supporting independent auxiliary timekeepers, add a clock valid field and set it to true for the system timekeeper. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250519083026.287145536@linutronix.de
2025-06-19timekeeping: Prepare timekeeping_update_from_shadow()Thomas Gleixner
Don't invoke the VDSO and paravirt updates when utilized for auxiliary clocks. This is a temporary workaround until the VDSO and paravirt interfaces have been worked out. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250519083026.223876435@linutronix.de
2025-06-19timekeeping: Make __timekeeping_advance() reusableAnna-Maria Behnsen
In __timekeeping_advance() the pointer to struct tk_data is hardcoded by the use of &tk_core. As long as there is only a single timekeeper (tk_core), this is not a problem. But when __timekeeping_advance() will be reused for per auxiliary timekeepers, __timekeeping_advance() needs to be generalized. Add a pointer to struct tk_data as function argument of __timekeeping_advance() and adapt all call sites. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250519083026.160967312@linutronix.de
2025-06-19ntp: Rename __do_adjtimex() to ntp_adjtimex()Thomas Gleixner
Clean up the name space. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250519083026.095637820@linutronix.de
2025-06-19ntp: Add timekeeper ID arguments to public functionsThomas Gleixner
In preparation for supporting auxiliary POSIX clocks, add a timekeeper ID to the relevant functions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250519083026.032425931@linutronix.de
2025-06-19timekeeping: Introduce timekeeper IDAnna-Maria Behnsen
As long as there is only a single timekeeper, there is no need to clarify which timekeeper is used. But with the upcoming reusage of the timekeeper infrastructure for auxiliary clock timekeepers, an ID is required to differentiate. Introduce an enum for timekeeper IDs, introduce a field in struct tk_data to store this timekeeper id and add also initialization. The id struct field is added at the end of the second cachline, as there is a 4 byte hole anyway. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250519083025.842476378@linutronix.de
2025-06-19timekeeping: Avoid double notification in do_adjtimex()Thomas Gleixner
Consolidate do_adjtimex() so that it does not notify about clock changes twice. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250519083025.779267274@linutronix.de
2025-06-19timekeeping: Cleanup kernel doc of __ktime_get_real_seconds()Thomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250519083025.715836017@linutronix.de
2025-06-19timekeeping: Remove hardcoded access to tk_coreThomas Gleixner
This was overlooked in the initial conversion. Use the provided pointer to access the shadow timekeeper. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250519083025.652611452@linutronix.de
2025-04-28timekeeping: Prevent coarse clocks going backwardsThomas Gleixner
Lei Chen raised an issue with CLOCK_MONOTONIC_COARSE seeing time inconsistencies. Lei tracked down that this was being caused by the adjustment: tk->tkr_mono.xtime_nsec -= offset; which is made to compensate for the unaccumulated cycles in offset when the multiplicator is adjusted forward, so that the non-_COARSE clockids don't see inconsistencies. However, the _COARSE clockid getter functions use the adjusted xtime_nsec value directly and do not compensate the negative offset via the clocksource delta multiplied with the new multiplicator. In that case the caller can observe time going backwards in consecutive calls. By design, this negative adjustment should be fine, because the logic run from timekeeping_adjust() is done after it accumulated approximately multiplicator * interval_cycles into xtime_nsec. The accumulated value is always larger then the mult_adj * offset value, which is subtracted from xtime_nsec. Both operations are done together under the tk_core.lock, so the net change to xtime_nsec is always always be positive. However, do_adjtimex() calls into timekeeping_advance() as well, to apply the NTP frequency adjustment immediately. In this case, timekeeping_advance() does not return early when the offset is smaller then interval_cycles. In that case there is no time accumulated into xtime_nsec. But the subsequent call into timekeeping_adjust(), which modifies the multiplicator, subtracts from xtime_nsec to correct for the new multiplicator. Here because there was no accumulation, xtime_nsec becomes smaller than before, which opens a window up to the next accumulation, where the _COARSE clockid getters, which don't compensate for the offset, can observe the inconsistency. This has been tried to be fixed by forwarding the timekeeper in the case that adjtimex() adjusts the multiplier, which resets the offset to zero: 757b000f7b93 ("timekeeping: Fix possible inconsistencies in _COARSE clockids") That works correctly, but unfortunately causes a regression on the adjtimex() side. There are two issues: 1) The forwarding of the base time moves the update out of the original period and establishes a new one. 2) The clearing of the accumulated NTP error is changing the behaviour as well. User-space expects that multiplier/frequency updates are in effect, when the syscall returns, so delaying the update to the next tick is not solving the problem either. Commit 757b000f7b93 was reverted so that the established expectations of user space implementations (ntpd, chronyd) are restored, but that obviously brought the inconsistencies back. One of the initial approaches to fix this was to establish a separate storage for the coarse time getter nanoseconds part by calculating it from the offset. That was dropped on the floor because not having yet another state to maintain was simpler. But given the result of the above exercise, this solution turns out to be the right one. Bring it back in a slightly modified form. Thus introduce timekeeper::coarse_nsec and store that nanoseconds part in it, switch the time getter functions and the VDSO update to use that value. coarse_nsec is set on operations which forward or initialize the timekeeper and after time was accumulated during a tick. If there is no accumulation the timestamp is unchanged. This leaves the adjtimex() behaviour unmodified and prevents coarse time from going backwards. [ jstultz: Simplified the coarse_nsec calculation and kept behavior so coarse clockids aren't adjusted on each inter-tick adjtimex call, slightly reworked the comments and commit message ] Fixes: da15cfdae033 ("time: Introduce CLOCK_REALTIME_COARSE") Reported-by: Lei Chen <lei.chen@smartx.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/all/20250419054706.2319105-1-jstultz@google.com Closes: https://lore.kernel.org/lkml/20250310030004.3705801-1-lei.chen@smartx.com/
2025-04-04Revert "timekeeping: Fix possible inconsistencies in _COARSE clockids"Thomas Gleixner
This reverts commit 757b000f7b936edf79311ab0971fe465bbda75ea. Miroslav reported that the changes for handling the inconsistencies in the coarse time getters result in a regression on the adjtimex() side. There are two issues: 1) The forwarding of the base time moves the update out of the original period and establishes a new one. 2) The clearing of the accumulated NTP error is changing the behaviour as well. Userspace expects that multiplier/frequency updates are in effect, when the syscall returns, so delaying the update to the next tick is not solving the problem either. Revert the change, so that the established expectations of user space implementations (ntpd, chronyd) are restored. The re-introduced inconsistency of the coarse time getters will be addressed in a subsequent fix. Fixes: 757b000f7b93 ("timekeeping: Fix possible inconsistencies in _COARSE clockids") Reported-by: Miroslav Lichvar <mlichvar@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/Z-qsg6iDGlcIJulJ@localhost
2025-03-21timekeeping: Fix possible inconsistencies in _COARSE clockidsJohn Stultz
Lei Chen raised an issue with CLOCK_MONOTONIC_COARSE seeing time inconsistencies. Lei tracked down that this was being caused by the adjustment tk->tkr_mono.xtime_nsec -= offset; which is made to compensate for the unaccumulated cycles in offset when the multiplicator is adjusted forward, so that the non-_COARSE clockids don't see inconsistencies. However, the _COARSE clockid getter functions use the adjusted xtime_nsec value directly and do not compensate the negative offset via the clocksource delta multiplied with the new multiplicator. In that case the caller can observe time going backwards in consecutive calls. By design, this negative adjustment should be fine, because the logic run from timekeeping_adjust() is done after it accumulated approximately multiplicator * interval_cycles into xtime_nsec. The accumulated value is always larger then the mult_adj * offset value, which is subtracted from xtime_nsec. Both operations are done together under the tk_core.lock, so the net change to xtime_nsec is always always be positive. However, do_adjtimex() calls into timekeeping_advance() as well, to to apply the NTP frequency adjustment immediately. In this case, timekeeping_advance() does not return early when the offset is smaller then interval_cycles. In that case there is no time accumulated into xtime_nsec. But the subsequent call into timekeeping_adjust(), which modifies the multiplicator, subtracts from xtime_nsec to correct for the new multiplicator. Here because there was no accumulation, xtime_nsec becomes smaller than before, which opens a window up to the next accumulation, where the _COARSE clockid getters, which don't compensate for the offset, can observe the inconsistency. To fix this, rework the timekeeping_advance() logic so that when invoked from do_adjtimex(), the time is immediately forwarded to accumulate also the sub-interval portion into xtime. That means the remaining offset becomes zero and the subsequent multiplier adjustment therefore does not modify xtime_nsec. There is another related inconsistency. If xtime is forwarded due to the instantaneous multiplier adjustment, the NTP error, which was accumulated with the previous setting, becomes meaningless. Therefore clear the NTP error as well, after forwarding the clock for the instantaneous multiplier update. Fixes: da15cfdae033 ("time: Introduce CLOCK_REALTIME_COARSE") Reported-by: Lei Chen <lei.chen@smartx.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250320200306.1712599-1-jstultz@google.com Closes: https://lore.kernel.org/lkml/20250310030004.3705801-1-lei.chen@smartx.com/
2025-01-15timekeeping: Remove unused ktime_get_fast_timestamps()Dr. David Alan Gilbert
ktime_get_fast_timestamps() was added in 2020 by commit e2d977c9f1ab ("timekeeping: Provide multi-timestamp accessor to NMI safe timekeeper") but has remained unused. Remove it. [ tglx: Fold the inline as David suggested in the submission ] Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250112160132.450209-1-linux@treblig.org
2024-12-05clocksource: Make negative motion detection more robustThomas Gleixner
Guenter reported boot stalls on a emulated ARM 32-bit platform, which has a 24-bit wide clocksource. It turns out that the calculated maximal idle time, which limits idle sleeps to prevent clocksource wrap arounds, is close to the point where the negative motion detection triggers. max_idle_ns: 597268854 ns negative motion tripping point: 671088640 ns If the idle wakeup is delayed beyond that point, the clocksource advances far enough to trigger the negative motion detection. This prevents the clock to advance and in the worst case the system stalls completely if the consecutive sleeps based on the stale clock are delayed as well. Cure this by calculating a more robust cut-off value for negative motion, which covers 87.5% of the actual clocksource counter width. Compare the delta against this value to catch negative motion. This is specifically for clock sources with a small counter width as their wrap around time is close to the half counter width. For clock sources with wide counters this is not a problem because the maximum idle time is far from the half counter width due to the math overflow protection constraints. For the case at hand this results in a tripping point of 1174405120ns. Note, that this cannot prevent issues when the delay exceeds the 87.5% margin, but that's not different from the previous unchecked version which allowed arbitrary time jumps. Systems with small counter width are prone to invalid results, but this problem is unlikely to be seen on real hardware. If such a system completely stalls for more than half a second, then there are other more urgent problems than the counter wrapping around. Fixes: c163e40af9b2 ("timekeeping: Always check for negative motion") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/all/8734j5ul4x.ffs@tglx Closes: https://lore.kernel.org/all/387b120b-d68a-45e8-b6ab-768cd95d11c2@roeck-us.net
2024-11-19Merge tag 'timers-core-2024-11-18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "A rather large update for timekeeping and timers: - The final step to get rid of auto-rearming posix-timers posix-timers are currently auto-rearmed by the kernel when the signal of the timer is ignored so that the timer signal can be delivered once the corresponding signal is unignored. This requires to throttle the timer to prevent a DoS by small intervals and keeps the system pointlessly out of low power states for no value. This is a long standing non-trivial problem due to the lock order of posix-timer lock and the sighand lock along with life time issues as the timer and the sigqueue have different life time rules. Cure this by: - Embedding the sigqueue into the timer struct to have the same life time rules. Aside of that this also avoids the lookup of the timer in the signal delivery and rearm path as it's just a always valid container_of() now. - Queuing ignored timer signals onto a seperate ignored list. - Moving queued timer signals onto the ignored list when the signal is switched to SIG_IGN before it could be delivered. - Walking the ignored list when SIG_IGN is lifted and requeue the signals to the actual signal lists. This allows the signal delivery code to rearm the timer. This also required to consolidate the signal delivery rules so they are consistent across all situations. With that all self test scenarios finally succeed. - Core infrastructure for VFS multigrain timestamping This is required to allow the kernel to use coarse grained time stamps by default and switch to fine grained time stamps when inode attributes are actively observed via getattr(). These changes have been provided to the VFS tree as well, so that the VFS specific infrastructure could be built on top. - Cleanup and consolidation of the sleep() infrastructure - Move all sleep and timeout functions into one file - Rework udelay() and ndelay() into proper documented inline functions and replace the hardcoded magic numbers by proper defines. - Rework the fsleep() implementation to take the reality of the timer wheel granularity on different HZ values into account. Right now the boundaries are hard coded time ranges which fail to provide the requested accuracy on different HZ settings. - Update documentation for all sleep/timeout related functions and fix up stale documentation links all over the place - Fixup a few usage sites - Rework of timekeeping and adjtimex(2) to prepare for multiple PTP clocks A system can have multiple PTP clocks which are participating in seperate and independent PTP clock domains. So far the kernel only considers the PTP clock which is based on CLOCK TAI relevant as that's the clock which drives the timekeeping adjustments via the various user space daemons through adjtimex(2). The non TAI based clock domains are accessible via the file descriptor based posix clocks, but their usability is very limited. They can't be accessed fast as they always go all the way out to the hardware and they cannot be utilized in the kernel itself. As Time Sensitive Networking (TSN) gains traction it is required to provide fast user and kernel space access to these clocks. The approach taken is to utilize the timekeeping and adjtimex(2) infrastructure to provide this access in a similar way how the kernel provides access to clock MONOTONIC, REALTIME etc. Instead of creating a duplicated infrastructure this rework converts timekeeping and adjtimex(2) into generic functionality which operates on pointers to data structures instead of using static variables. This allows to provide time accessors and adjtimex(2) functionality for the independent PTP clocks in a subsequent step. - Consolidate hrtimer initialization hrtimers are set up by initializing the data structure and then seperately setting the callback function for historical reasons. That's an extra unnecessary step and makes Rust support less straight forward than it should be. Provide a new set of hrtimer_setup*() functions and convert the core code and a few usage sites of the less frequently used interfaces over. The bulk of the htimer_init() to hrtimer_setup() conversion is already prepared and scheduled for the next merge window. - Drivers: - Ensure that the global timekeeping clocksource is utilizing the cluster 0 timer on MIPS multi-cluster systems. Otherwise CPUs on different clusters use their cluster specific clocksource which is not guaranteed to be synchronized with other clusters. - Mostly boring cleanups, fixes, improvements and code movement" * tag 'timers-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (140 commits) posix-timers: Fix spurious warning on double enqueue versus do_exit() clocksource/drivers/arm_arch_timer: Use of_property_present() for non-boolean properties clocksource/drivers/gpx: Remove redundant casts clocksource/drivers/timer-ti-dm: Fix child node refcount handling dt-bindings: timer: actions,owl-timer: convert to YAML clocksource/drivers/ralink: Add Ralink System Tick Counter driver clocksource/drivers/mips-gic-timer: Always use cluster 0 counter as clocksource clocksource/drivers/timer-ti-dm: Don't fail probe if int not found clocksource/drivers:sp804: Make user selectable clocksource/drivers/dw_apb: Remove unused dw_apb_clockevent functions hrtimers: Delete hrtimer_init_on_stack() alarmtimer: Switch to use hrtimer_setup() and hrtimer_setup_on_stack() io_uring: Switch to use hrtimer_setup_on_stack() sched/idle: Switch to use hrtimer_setup_on_stack() hrtimers: Delete hrtimer_init_sleeper_on_stack() wait: Switch to use hrtimer_setup_sleeper_on_stack() timers: Switch to use hrtimer_setup_sleeper_on_stack() net: pktgen: Switch to use hrtimer_setup_sleeper_on_stack() futex: Switch to use hrtimer_setup_sleeper_on_stack() fs/aio: Switch to use hrtimer_setup_sleeper_on_stack() ...
2024-11-19Merge tag 'locking-core-2024-11-18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "Lockdep: - Enable PROVE_RAW_LOCK_NESTING with PROVE_LOCKING (Sebastian Andrzej Siewior) - Add lockdep_cleanup_dead_cpu() (David Woodhouse) futexes: - Use atomic64_inc_return() in get_inode_sequence_number() (Uros Bizjak) - Use atomic64_try_cmpxchg_relaxed() in get_inode_sequence_number() (Uros Bizjak) RT locking: - Add sparse annotation PREEMPT_RT's locking (Sebastian Andrzej Siewior) spinlocks: - Use atomic_try_cmpxchg_release() in osq_unlock() (Uros Bizjak) atomics: - x86: Use ALT_OUTPUT_SP() for __alternative_atomic64() (Uros Bizjak) - x86: Use ALT_OUTPUT_SP() for __arch_{,try_}cmpxchg64_emu() (Uros Bizjak) KCSAN, seqlocks: - Support seqcount_latch_t (Marco Elver) <linux/cleanup.h>: - Add if_not_guard() conditional guard helper (David Lechner) - Adjust scoped_guard() macros to avoid potential warning (Przemek Kitszel) - Remove address space of returned pointer (Uros Bizjak) WW mutexes: - locking/ww_mutex: Adjust to lockdep nest_lock requirements (Thomas Hellström) Rust integration: - Fix raw_spin_lock initialization on PREEMPT_RT (Eder Zulian) Misc cleanups & fixes: - lockdep: Fix wait-type check related warnings (Ahmed Ehab) - lockdep: Use info level for initial info messages (Jiri Slaby) - spinlocks: Make __raw_* lock ops static (Geert Uytterhoeven) - pvqspinlock: Convert fields of 'enum vcpu_state' to uppercase (Qiuxu Zhuo) - iio: magnetometer: Fix if () scoped_guard() formatting (Stephen Rothwell) - rtmutex: Fix misleading comment (Peter Zijlstra) - percpu-rw-semaphores: Fix grammar in percpu-rw-semaphore.rst (Xiu Jianfeng)" * tag 'locking-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits) locking/Documentation: Fix grammar in percpu-rw-semaphore.rst iio: magnetometer: fix if () scoped_guard() formatting rust: helpers: Avoid raw_spin_lock initialization for PREEMPT_RT kcsan, seqlock: Fix incorrect assumption in read_seqbegin() seqlock, treewide: Switch to non-raw seqcount_latch interface kcsan, seqlock: Support seqcount_latch_t time/sched_clock: Broaden sched_clock()'s instrumentation coverage time/sched_clock: Swap update_clock_read_data() latch writes locking/atomic/x86: Use ALT_OUTPUT_SP() for __arch_{,try_}cmpxchg64_emu() locking/atomic/x86: Use ALT_OUTPUT_SP() for __alternative_atomic64() cleanup: Add conditional guard helper cleanup: Adjust scoped_guard() macros to avoid potential warning locking/osq_lock: Use atomic_try_cmpxchg_release() in osq_unlock() cleanup: Remove address space of returned pointer locking/rtmutex: Fix misleading comment locking/rt: Annotate unlock followed by lock for sparse. locking/rt: Add sparse annotation for RCU. locking/rt: Remove one __cond_lock() in RT's spin_trylock_irqsave() locking/rt: Add sparse annotation PREEMPT_RT's sleeping locks. locking/pvqspinlock: Convert fields of 'enum vcpu_state' to uppercase ...
2024-11-05seqlock, treewide: Switch to non-raw seqcount_latch interfaceMarco Elver
Switch all instrumentable users of the seqcount_latch interface over to the non-raw interface. Co-developed-by: "Peter Zijlstra (Intel)" <peterz@infradead.org> Signed-off-by: "Peter Zijlstra (Intel)" <peterz@infradead.org> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20241104161910.780003-5-elver@google.com
2024-11-02timekeeping: Remove CONFIG_DEBUG_TIMEKEEPINGThomas Gleixner
Since 135225a363ae timekeeping_cycles_to_ns() handles large offsets which would lead to 64bit multiplication overflows correctly. It's also protected against negative motion of the clocksource unconditionally, which was exclusive to x86 before. timekeeping_advance() handles large offsets already correctly. That means the value of CONFIG_DEBUG_TIMEKEEPING which analyzed these cases is very close to zero. Remove all of it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241031120328.536010148@linutronix.de
2024-10-25timekeeping: Merge timekeeping_update_staged() and timekeeping_update()Anna-Maria Behnsen
timekeeping_update_staged() is the only call site of timekeeping_update(). Merge those functions. No functional change. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-25-554456a44a15@linutronix.de
2024-10-25timekeeping: Remove TK_MIRROR timekeeping_update() actionAnna-Maria Behnsen
All call sites of using TK_MIRROR flag in timekeeping_update() are gone. The TK_MIRROR dependent code path is therefore dead code. Remove it along with the TK_MIRROR define. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-24-554456a44a15@linutronix.de
2024-10-25timekeeping: Rework do_adjtimex() to use shadow_timekeeperAnna-Maria Behnsen
Updates of the timekeeper can be done by operating on the shadow timekeeper and afterwards copying the result into the real timekeeper. This has the advantage, that the sequence count write protected region is kept as small as possible. Convert do_adjtimex() to use this scheme and take the opportunity to use a scoped_guard() for locking. That requires to have a separate function for updating the leap state so that the update is protected by the sequence count. This also brings the timekeeper and the shadow timekeeper in sync for this state, which was not the case so far. That's not a correctness problem as the state is only used at the read sides which use the real timekeeper, but it's inconsistent nevertheless. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-23-554456a44a15@linutronix.de
2024-10-25timekeeping: Rework timekeeping_suspend() to use shadow_timekeeperAnna-Maria Behnsen
Updates of the timekeeper can be done by operating on the shadow timekeeper and afterwards copying the result into the real timekeeper. This has the advantage, that the sequence count write protected region is kept as small as possible. While the sequence count held time is not relevant for the resume path as there is no concurrency, there is no reason to have this function different than all the other update sites. Convert timekeeping_inject_offset() to use this scheme and cleanup the variable declarations while at it. As halt_fast_timekeeper() does not need protection sequence counter, it is no problem to move it with this change outside of the sequence counter protected area. But it still needs to be executed while holding the lock. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-22-554456a44a15@linutronix.de
2024-10-25timekeeping: Rework timekeeping_resume() to use shadow_timekeeperAnna-Maria Behnsen
Updates of the timekeeper can be done by operating on the shadow timekeeper and afterwards copying the result into the real timekeeper. This has the advantage, that the sequence count write protected region is kept as small as possible. While the sequence count held time is not relevant for the resume path as there is no concurrency, there is no reason to have this function different than all the other update sites. Convert timekeeping_inject_offset() to use this scheme and cleanup the variable declaration while at it. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-21-554456a44a15@linutronix.de
2024-10-25timekeeping: Rework timekeeping_inject_sleeptime64() to use shadow_timekeeperAnna-Maria Behnsen
Updates of the timekeeper can be done by operating on the shadow timekeeper and afterwards copying the result into the real timekeeper. This has the advantage, that the sequence count write protected region is kept as small as possible. Convert timekeeping_inject_sleeptime64() to use this scheme. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-20-554456a44a15@linutronix.de
2024-10-25timekeeping: Rework timekeeping_init() to use shadow_timekeeperAnna-Maria Behnsen
For timekeeping_init() the sequence count write held time is not relevant and it could keep working on the real timekeeper, but there is no reason to make it different from other timekeeper updates. Convert it to operate on the shadow timekeeper. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-19-554456a44a15@linutronix.de
2024-10-25timekeeping: Rework change_clocksource() to use shadow_timekeeperAnna-Maria Behnsen
Updates of the timekeeper can be done by operating on the shadow timekeeper and afterwards copying the result into the real timekeeper. This has the advantage, that the sequence count write protected region is kept as small as possible. Convert change_clocksource() to use this scheme. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-18-554456a44a15@linutronix.de
2024-10-25timekeeping: Rework timekeeping_inject_offset() to use shadow_timekeeperAnna-Maria Behnsen
Updates of the timekeeper can be done by operating on the shadow timekeeper and afterwards copying the result into the real timekeeper. This has the advantage, that the sequence count write protected region is kept as small as possible. Convert timekeeping_inject_offset() to use this scheme. That allows to use a scoped_guard() for locking the timekeeper lock as the usage of the shadow timekeeper allows a rollback in the error case instead of the full timekeeper update of the original code. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-17-554456a44a15@linutronix.de
2024-10-25timekeeping: Rework do_settimeofday64() to use shadow_timekeeperAnna-Maria Behnsen
Updates of the timekeeper can be done by operating on the shadow timekeeper and afterwards copying the result into the real timekeeper. This has the advantage, that the sequence count write protected region is kept as small as possible. Convert do_settimeofday64() to use this scheme. That allows to use a scoped_guard() for locking the timekeeper lock as the usage of the shadow timekeeper allows a rollback in the error case instead of the full timekeeper update of the original code. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-16-554456a44a15@linutronix.de
2024-10-25timekeeping: Provide timekeeping_restore_shadow()Thomas Gleixner
Functions which operate on the real timekeeper, e.g. do_settimeofday(), have error conditions. If they are hit a full timekeeping update is still required because the already committed operations modified the timekeeper. When switching these functions to operate on the shadow timekeeper then the full update can be avoided in the error case, but the modified shadow timekeeper has to be restored. Provide a helper function for that. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-15-554456a44a15@linutronix.de
2024-10-25timekeeping: Introduce combined timekeeping action flagAnna-Maria Behnsen
Instead of explicitly listing all the separate timekeeping actions flags, introduce a new one which covers all actions except TK_MIRROR action. No functional change. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-14-554456a44a15@linutronix.de
2024-10-25timekeeping: Split out timekeeper update of timekeeping_advanced()Anna-Maria Behnsen
timekeeping_advance() is the only optimized function which uses shadow_timekeeper for updating the real timekeeper to keep the sequence counter protected region as small as possible. To be able to transform timekeeper updates in other functions to use the same logic, split out functionality into a separate function timekeeper_update_staged(). While at it, document the reason why the sequence counter must be write held over the call to timekeeping_update() and the copying to the real timekeeper and why using a pointer based update is suboptimal. No functional change. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-13-554456a44a15@linutronix.de