summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2025-03-04Merge branches 'docs.2025.02.04a', 'lazypreempt.2025.03.04a', ↵Boqun Feng
'misc.2025.03.04a', 'srcu.2025.02.05a' and 'torture.2025.02.05a'
2025-03-04rcu: limit PREEMPT_RCU configurationsAnkur Arora
PREEMPT_LAZY can be enabled stand-alone or alongside PREEMPT_DYNAMIC which allows for dynamic switching of preemption models. The choice of PREEMPT_RCU or not, however, is fixed at compile time. Given that PREEMPT_RCU makes some trade-offs to optimize for latency as opposed to throughput, configurations with limited preemption might prefer the stronger forward-progress guarantees of PREEMPT_RCU=n. Accordingly, explicitly limit PREEMPT_RCU=y to the latency oriented preemption models: PREEMPT, PREEMPT_RT, and the runtime configurable model PREEMPT_DYNAMIC. This means the throughput oriented models, PREEMPT_NONE, PREEMPT_VOLUNTARY, and PREEMPT_LAZY will run with PREEMPT_RCU=n. Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-03-04rcutorture: Update ->extendables check for lazy preemptionBoqun Feng
The rcutorture_one_extend_check() function's second last check assumes that "preempt_count() & PREEMPT_MASK" is non-zero only if RCUTORTURE_RDR_PREEMPT or RCUTORTURE_RDR_SCHED bit is set. This works for preemptible RCU and for non-preemptible RCU running in a non-preemptible kernel. But it fails for non-preemptible RCU running in a preemptible kernel because then rcu_read_lock() is just preempt_disable(), which increases preempt count. This commit therefore adjusts this check to take into account the case fo non-preemptible RCU running in a preemptible kernel. Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-03-04rcutorture: Update rcutorture_one_extend_check() for lazy preemptionPaul E. McKenney
The rcutorture_one_extend_check() function's last check assumes that if cur_ops->readlock_nesting() returns greater than zero, either the RCUTORTURE_RDR_RCU_1 or the RCUTORTURE_RDR_RCU_2 bit must be set, that is, there must be at least one rcu_read_lock() in effect. This works for preemptible RCU and for non-preemptible RCU running in a non-preemptible kernel. But it fails for non-preemptible RCU running in a preemptible kernel because then RCU's cur_ops->readlock_nesting() function, which is rcu_torture_readlock_nesting(), will return the PREEMPT_MASK mask bits from preempt_count(). The result will be greater than zero if preemption is disabled, including by the RCUTORTURE_RDR_PREEMPT and RCUTORTURE_RDR_SCHED bits. This commit therefore adjusts this check to take into account the case fo non-preemptible RCU running in a preemptible kernel. [boqun: Fix the if condition and add comment] Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202502171415.8ec87c87-lkp@intel.com Co-developed-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Co-developed-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-03-04osnoise: provide quiescent statesAnkur Arora
To reduce RCU noise for nohz_full configurations, osnoise depends on cond_resched() providing quiescent states for PREEMPT_RCU=n configurations. For PREEMPT_RCU=y configurations -- where cond_resched() is a stub -- we do this by directly calling rcu_momentary_eqs(). With (PREEMPT_LAZY=y, PREEMPT_DYNAMIC=n), however, we have a configuration with (PREEMPTION=y, PREEMPT_RCU=n) where neither of the above can help. Handle that by providing an explicit quiescent state here for all configurations. As mentioned above this is not needed for non-stubbed cond_resched(), but, providing a quiescent state here just pulls in one that a future cond_resched() would provide, so doesn't cause any extra work for this configuration. Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Daniel Bristot de Oliveira <bristot@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Suggested-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-03-04rcu: Use _full() API to debug synchronize_rcu()Uladzislau Rezki (Sony)
Switch for using of get_state_synchronize_rcu_full() and poll_state_synchronize_rcu_full() pair to debug a normal synchronize_rcu() call. Just using "not" full APIs to identify if a grace period is passed or not might lead to a false-positive kernel splat. It can happen, because get_state_synchronize_rcu() compresses both normal and expedited states into one single unsigned long value, so a poll_state_synchronize_rcu() can miss GP-completion when synchronize_rcu()/synchronize_rcu_expedited() concurrently run. To address this, switch to poll_state_synchronize_rcu_full() and get_state_synchronize_rcu_full() APIs, which use separate variables for expedited and normal states. Reported-by: cheung wall <zzqq0103.hey@gmail.com> Closes: https://lore.kernel.org/lkml/Z5ikQeVmVdsWQrdD@pc636/T/ Fixes: 988f569ae041 ("rcu: Reduce synchronize_rcu() latency") Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20250227131613.52683-3-urezki@gmail.com Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-03-04rcutorture: Allow a negative value for nfakewritersUladzislau Rezki (Sony)
Currently "nfakewriters" parameter can be set to any value but there is no possibility to adjust it automatically based on how many CPUs a system has where a test is run on. To address this, if the "nfakewriters" is set to negative it will be adjusted to num_online_cpus() during torture initialization. Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Link: https://lore.kernel.org/r/20250227131613.52683-1-urezki@gmail.com Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-03-04Flush console log from kernel_power_off()Paul E. McKenney
Kernels built with CONFIG_PREEMPT_RT=y can lose significant console output and shutdown time, which hides shutdown-time RCU issues from rcutorture. Therefore, make pr_flush() public and invoke it after then last print in kernel_power_off(). [ paulmck: Apply John Ogness feedback. ] [ paulmck: Appy Sebastian Andrzej Siewior feedback. ] [ paulmck: Apply kernel test robot feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Link: https://lore.kernel.org/r/5f743488-dc2a-4f19-bdda-cf50b9314832@paulmck-laptop Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-03-04context_tracking: Make RCU watch ct_kernel_exit_state() warningPaul E. McKenney
The WARN_ON_ONCE() in ct_kernel_exit_state() follows the call to ct_state_inc(), which means that RCU is not watching this WARN_ON_ONCE(). This can (and does) result in extraneous lockdep warnings when this WARN_ON_ONCE() triggers. These extraneous warnings are the opposite of helpful. Therefore, invert the WARN_ON_ONCE() condition and move it before the call to ct_state_inc(). This does mean that the ct_state_inc() return value can no longer be used in the WARN_ON_ONCE() condition, so discard this return value and instead use a call to rcu_is_watching_curr_cpu(). This call is executed only in CONFIG_RCU_EQS_DEBUG=y kernels, so there is no added overhead in production use. [Boqun: Add the subsystem tag in the title] Reported-by: Breno Leitao <leitao@debian.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/bd911cd9-1fe9-447c-85e0-ea811a1dc896@paulmck-laptop Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-03-04rcu/nocb: Print segment lengths in show_rcu_nocb_gp_state()Paul E. McKenney
Analysis of an rcutorture callback-based forward-progress test failure was hampered by the lack of ->cblist segment lengths. This commit therefore adds this information, so that what would have been ".W85620.N." (there are some callbacks waiting for grace period sequence number 85620 and some number more that have not yet been assigned to a grace period) now prints as ".W2(85620).N6." (there are 2 callbacks waiting for grace period 85620 and 6 not yet assigned to a grace period). Note that "D" (done), "N" (next and not yet assigned to a grace period, and "B" (bypass, also not yet assigned to a grace period) have just the number of callbacks without the parenthesized grace-period sequence number. In contrast, "W" (waiting for the current grace period) and "R" (ready to wait for the next grace period to start) both have parenthesized grace-period sequence numbers. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-03-04rcu-tasks: Move RCU Tasks self-tests to core_initcall()Paul E. McKenney
The timer and hrtimer softirq processing has moved to dedicated threads for kernels built with CONFIG_IRQ_FORCED_THREADING=y. This results in timers not expiring until later in early boot, which in turn causes the RCU Tasks self-tests to hang in kernels built with CONFIG_PROVE_RCU=y, which further causes the entire kernel to hang. One fix would be to make timers work during this time, but there are no known users of RCU Tasks grace periods during that time, so no justification for the added complexity. Not yet, anyway. This commit therefore moves the call to rcu_init_tasks_generic() from kernel_init_freeable() to a core_initcall(). This works because the timer and hrtimer kthreads are created at early_initcall() time. Fixes: 49a17639508c3 ("softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.") Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: <linux-trace-kernel@vger.kernel.org> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-03-04rcu: Fix get_state_synchronize_rcu_full() GP-start detectionPaul E. McKenney
The get_state_synchronize_rcu_full() and poll_state_synchronize_rcu_full() functions use the root rcu_node structure's ->gp_seq field to detect the beginnings and ends of grace periods, respectively. This choice is necessary for the poll_state_synchronize_rcu_full() function because (give or take counter wrap), the following sequence is guaranteed not to trigger: get_state_synchronize_rcu_full(&rgos); synchronize_rcu(); WARN_ON_ONCE(!poll_state_synchronize_rcu_full(&rgos)); The RCU callbacks that awaken synchronize_rcu() instances are guaranteed not to be invoked before the root rcu_node structure's ->gp_seq field is updated to indicate the end of the grace period. However, these callbacks might start being invoked immediately thereafter, in particular, before rcu_state.gp_seq has been updated. Therefore, poll_state_synchronize_rcu_full() must refer to the root rcu_node structure's ->gp_seq field. Because this field is updated under this structure's ->lock, any code following a call to poll_state_synchronize_rcu_full() will be fully ordered after the full grace-period computation, as is required by RCU's memory-ordering semantics. By symmetry, the get_state_synchronize_rcu_full() function should also use this same root rcu_node structure's ->gp_seq field. But it turns out that symmetry is profoundly (though extremely infrequently) destructive in this case. To see this, consider the following sequence of events: 1. CPU 0 starts a new grace period, and updates rcu_state.gp_seq accordingly. 2. As its first step of grace-period initialization, CPU 0 examines the current CPU hotplug state and decides that it need not wait for CPU 1, which is currently offline. 3. CPU 1 comes online, and updates its state. But this does not affect the current grace period, but rather the one after that. After all, CPU 1 was offline when the current grace period started, so all pre-existing RCU readers on CPU 1 must have completed or been preempted before it last went offline. The current grace period therefore has nothing it needs to wait for on CPU 1. 4. CPU 1 switches to an rcutorture kthread which is running rcutorture's rcu_torture_reader() function, which starts a new RCU reader. 5. CPU 2 is running rcutorture's rcu_torture_writer() function and collects a new polled grace-period "cookie" using get_state_synchronize_rcu_full(). Because the newly started grace period has not completed initialization, the root rcu_node structure's ->gp_seq field has not yet been updated to indicate that this new grace period has already started. This cookie is therefore set up for the end of the current grace period (rather than the end of the following grace period). 6. CPU 0 finishes grace-period initialization. 7. If CPU 1’s rcutorture reader is preempted, it will be added to the ->blkd_tasks list, but because CPU 1’s ->qsmask bit is not set in CPU 1's leaf rcu_node structure, the ->gp_tasks pointer will not be updated.  Thus, this grace period will not wait on it.  Which is only fair, given that the CPU did not come online until after the grace period officially started. 8. CPUs 0 and 2 then detect the new grace period and then report a quiescent state to the RCU core. 9. Because CPU 1 was offline at the start of the current grace period, CPUs 0 and 2 are the only CPUs that this grace period needs to wait on. So the grace period ends and post-grace-period cleanup starts. In particular, the root rcu_node structure's ->gp_seq field is updated to indicate that this grace period has now ended. 10. CPU 2 continues running rcu_torture_writer() and sees that, from the viewpoint of the root rcu_node structure consulted by the poll_state_synchronize_rcu_full() function, the grace period has ended.  It therefore updates state accordingly. 11. CPU 1 is still running the same RCU reader, which notices this update and thus complains about the too-short grace period. The fix is for the get_state_synchronize_rcu_full() function to use rcu_state.gp_seq instead of the root rcu_node structure's ->gp_seq field. With this change in place, if step 5's cookie indicates that the grace period has not yet started, then any prior code executed by CPU 2 must have happened before CPU 1 came online. This will in turn prevent CPU 1's code in steps 3 and 11 from spanning CPU 2's grace-period wait, thus preventing CPU 1 from being subjected to a too-short grace period. This commit therefore makes this change. Note that there is no change to the poll_state_synchronize_rcu_full() function, which as noted above, must continue to use the root rcu_node structure's ->gp_seq field. This is of course an asymmetry between these two functions, but is an asymmetry that is absolutely required for correct operation. It is a common human tendency to greatly value symmetry, and sometimes symmetry is a wonderful thing. Other times, symmetry results in poor performance. But in this case, symmetry is just plain wrong. Nevertheless, the asymmetry does require an additional adjustment. It is possible for get_state_synchronize_rcu_full() to see a given grace period as having started, but for an immediately following poll_state_synchronize_rcu_full() to see it as having not yet started. Given the current rcu_seq_done_exact() implementation, this will result in a false-positive indication that the grace period is done from poll_state_synchronize_rcu_full(). This is dealt with by making rcu_seq_done_exact() reach back three grace periods rather than just two of them. However, simply changing get_state_synchronize_rcu_full() function to use rcu_state.gp_seq instead of the root rcu_node structure's ->gp_seq field results in a theoretical bug in kernels booted with rcutree.rcu_normal_wake_from_gp=1 due to the following sequence of events: o The rcu_gp_init() function invokes rcu_seq_start() to officially start a new grace period. o A new RCU reader begins, referencing X from some RCU-protected list. The new grace period is not obligated to wait for this reader. o An updater removes X, then calls synchronize_rcu(), which queues a wait element. o The grace period ends, awakening the updater, which frees X while the reader is still referencing it. The reason that this is theoretical is that although the grace period has officially started, none of the CPUs are officially aware of this, and thus will have to assume that the RCU reader pre-dated the start of the grace period. Detailed explanation can be found at [2] and [3]. Except for kernels built with CONFIG_PROVE_RCU=y, which use the polled grace-period APIs, which can and do complain bitterly when this sequence of events occurs. Not only that, there might be some future RCU grace-period mechanism that pulls this sequence of events from theory into practice. This commit therefore also pulls the call to rcu_sr_normal_gp_init() to precede that to rcu_seq_start(). Although this fixes commit 91a967fd6934 ("rcu: Add full-sized polling for get_completed*() and poll_state*()"), it is not clear that it is worth backporting this commit. First, it took me many weeks to convince rcutorture to reproduce this more frequently than once per year. Second, this cannot be reproduced at all without frequent CPU-hotplug operations, as in waiting all of 50 milliseconds from the end of the previous operation until starting the next one. Third, the TREE03.boot settings cause multi-millisecond delays during RCU grace-period initialization, which greatly increase the probability of the above sequence of events. (Don't do this in production workloads!) Fourth, the TREE03 rcutorture scenario was modified to use four-CPU guest OSes, to have a single-rcu_node combining tree, no testing of RCU priority boosting, and no random preemption, and these modifications were necessary to reproduce this issue in a reasonable timeframe. Fifth, extremely heavy use of get_state_synchronize_rcu_full() and/or poll_state_synchronize_rcu_full() is required to reproduce this, and as of v6.12, only kfree_rcu() uses it, and even then not particularly heavily. [boqun: Apply the fix [1], and add the comment before the moved rcu_sr_normal_gp_init(). Additional links are added for explanation.] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Tested-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Link: https://lore.kernel.org/rcu/d90bd6d9-d15c-4b9b-8a69-95336e74e8f4@paulmck-laptop/ [1] Link: https://lore.kernel.org/rcu/20250303001507.GA3994772@joelnvbox/ [2] Link: https://lore.kernel.org/rcu/Z8bcUsZ9IpRi1QoP@pc636/ [3] Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-03-04x86/smp: Move cpu number to percpu hot sectionBrian Gerst
No functional change. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Uros Bizjak <ubizjak@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250303165246.2175811-5-brgerst@gmail.com
2025-03-04Merge branch 'x86/asm' into x86/core, to pick up dependent commitsIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-03-04sched_ext: Add trace point to track sched_ext core eventsChangwoo Min
Add tracing support to track sched_ext core events (/sched_ext/sched_ext_event). This may be useful for debugging sched_ext schedulers that trigger a particular event. The trace point can be used as other trace points, so it can be used in, for example, `perf trace` and BPF programs, as follows: ====== $> sudo perf trace -e sched_ext:sched_ext_event --filter 'name == "SCX_EV_ENQ_SLICE_DFL"' ====== ====== struct tp_sched_ext_event { struct trace_entry ent; u32 __data_loc_name; s64 delta; }; SEC("tracepoint/sched_ext/sched_ext_event") int rtp_add_event(struct tp_sched_ext_event *ctx) { char event_name[128]; unsigned short offset = ctx->__data_loc_name & 0xFFFF; bpf_probe_read_str((void *)event_name, 128, (char *)ctx + offset); bpf_printk("name %s delta %lld", event_name, ctx->delta); return 0; } ====== Signed-off-by: Changwoo Min <changwoo@igalia.com> Acked-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-04sched_ext: Change the event type from u64 to s64Changwoo Min
The event count could be negative in the future, so change the event type from u64 to s64. Signed-off-by: Changwoo Min <changwoo@igalia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-04rv: Add license identifiers to monitor filesGabriele Monaco
Some monitor files like the main header and the Kconfig are missing the license identifier. Add it to those and make sure the automatic generation script includes the line in newly created monitors. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Link: https://lore.kernel.org/20250218123121.253551-3-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-04ftrace: Add arguments to function tracerSven Schnelle
Wire up the code to print function arguments in the function tracer. This functionality can be enabled/disabled during runtime with options/func-args. ping-689 [004] b.... 77.170220: dummy_xmit(skb = 0x82904800, dev = 0x882d0000) <-dev_hard_start_xmit Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Guo Ren <guoren@kernel.org> Cc: Donglin Peng <dolinux.peng@gmail.com> Cc: Zheng Yejian <zhengyejian@huaweicloud.com> Link: https://lore.kernel.org/20250227185823.154996172@goodmis.org Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Co-developed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-04ftrace: Have funcgraph-args take affect during tracingSteven Rostedt
Currently, when function_graph is started, it looks at the option funcgraph-args, and if it is set, it will enable tracing of the arguments. But if tracing is already running, and the user enables funcgraph-args, it will have no effect. Instead, it should enable argument tracing when it is enabled, even if it means disabling the function graph tracing for a short time in order to do the transition. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Guo Ren <guoren@kernel.org> Cc: Donglin Peng <dolinux.peng@gmail.com> Cc: Zheng Yejian <zhengyejian@huaweicloud.com> Link: https://lore.kernel.org/20250227185822.978998710@goodmis.org Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-04ftrace: Add support for function argument to graph tracerSven Schnelle
Wire up the code to print function arguments in the function graph tracer. This functionality can be enabled/disabled during runtime with options/funcgraph-args. Example usage: 6) | dummy_xmit [dummy](skb = 0x8887c100, dev = 0x872ca000) { 6) | consume_skb(skb = 0x8887c100) { 6) | skb_release_head_state(skb = 0x8887c100) { 6) 0.178 us | sock_wfree(skb = 0x8887c100) 6) 0.627 us | } Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Guo Ren <guoren@kernel.org> Cc: Donglin Peng <dolinux.peng@gmail.com> Cc: Zheng Yejian <zhengyejian@huaweicloud.com> Link: https://lore.kernel.org/20250227185822.810321199@goodmis.org Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Co-developed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-04ftrace: Add print_function_args()Sven Schnelle
Add a function to decode argument types with the help of BTF. Will be used to display arguments in the function and function graph tracer. It can only handle simply arguments and up to FTRACE_REGS_MAX_ARGS number of arguments. When it hits a max, it will print ", ...": page_to_skb(vi=0xffff8d53842dc980, rq=0xffff8d53843a0800, page=0xfffffc2e04337c00, offset=6160, len=64, truesize=1536, ...) And if it hits an argument that is not recognized, it will print the raw value and the type of argument it is: make_vfsuid(idmap=0xffffffff87f99db8, fs_userns=0xffffffff87e543c0, kuid=0x0 (STRUCT)) __pti_set_user_pgtbl(pgdp=0xffff8d5384ab47f8, pgd=0x110e74067 (STRUCT)) Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Guo Ren <guoren@kernel.org> Cc: Donglin Peng <dolinux.peng@gmail.com> Cc: Zheng Yejian <zhengyejian@huaweicloud.com> Link: https://lore.kernel.org/20250227185822.639418500@goodmis.org Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Co-developed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-04ftrace: Have ftrace_free_filter() WARN and exit if ops is activeSteven Rostedt
The ftrace_free_filter() is used to reset the ops filters. But it must be done if the ops is not currently active (tracing). If it is, it will mess up the ftrace accounting of what functions are attached and what is not. WARN and exit the ftrace_free_filter() if the ops is active when it is called. Currently, it doesn't seem if anything does this, but it may in the future. Link: https://lore.kernel.org/all/20250219095330.2e9f171c@gandalf.local.home/ Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20250219135040.3a9fbe00@gandalf.local.home Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-04fgraph: Correct typo in ftrace_return_to_handler commentHaiyue Wang
Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20250218122052.58348-1-haiyuewa@163.com Signed-off-by: Haiyue Wang <haiyuewa@163.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-04livepatch: Add comment to clarify klp_add_nops()Yafang Shao
Add detailed comments to clarify the purpose of klp_add_nops() function. These comments are based on Petr's explanation[0]. Link: https://lore.kernel.org/all/Z6XUA7D0eU_YDMVp@pathway.suse.cz/ [0] Suggested-by: Petr Mladek <pmladek@suse.com> Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20250227024733.16989-2-laoar.shao@gmail.com Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-03-04Merge branch 'x86/cpu' into x86/asm, to pick up dependent commitsIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-03-04Merge branch 'x86/urgent' into x86/cpu, to pick up dependent commitsIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-03-04perf/core: Fix perf_mmap() failure pathPeter Zijlstra
When f_ops->mmap() returns failure, m_ops->close() is *not* called. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20241104135519.248358497@infradead.org
2025-03-04perf/core: Detach 'struct perf_cpu_pmu_context' and 'struct pmu' lifetimesPeter Zijlstra
In prepration for being able to unregister a PMU with existing events, it becomes important to detach struct perf_cpu_pmu_context lifetimes from that of struct pmu. Notably struct perf_cpu_pmu_context embeds a struct perf_event_pmu_context that can stay referenced until the last event goes. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20241104135518.760214287@infradead.org
2025-03-04perf/core: Lift event->mmap_mutex in perf_mmap()Peter Zijlstra
This puts 'all' of perf_mmap() under single event->mmap_mutex. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20241104135519.582252957@infradead.org
2025-03-04perf/core: Remove retry loop from perf_mmap()Peter Zijlstra
AFAICT there is no actual benefit from the mutex drop on re-try. The 'worst' case scenario is that we instantly re-gain the mutex without perf_mmap_close() getting it. So might as well make that the normal case. Reflow the code to make the ring buffer detach case naturally flow into the no ring buffer case. [ mingo: Forward ported it ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20241104135519.463607258@infradead.org
2025-03-04perf/core: Further simplify perf_mmap()Peter Zijlstra
Perform CSE and such. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20241104135519.354909594@infradead.org
2025-03-04perf/core: Simplify the perf_mmap() control flowPeter Zijlstra
Identity-transform: if (c) { X1; } else { Y; goto l; } X2; l: into the simpler: if (c) { X1; X2; } else { Y; } [ mingo: Forward ported it ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20241104135519.095904637@infradead.org
2025-03-04perf/bpf: Robustify perf_event_free_bpf_prog()Peter Zijlstra
Ensure perf_event_free_bpf_prog() is safe to call a second time; notably without making any references to event->pmu when there is no prog left. Note: perf_event_detach_bpf_prog() might leave a stale event->prog Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20241104135518.978956692@infradead.org
2025-03-04perf/core: Introduce perf_free_addr_filters()Peter Zijlstra
Replace _free_event()'s use of perf_addr_filters_splice()s use with an explicit perf_free_addr_filters() with the explicit propery that it is able to be called a second time without ill effect. Most notable, referencing event->pmu must be avoided when there are no filters left (from eg a previous call). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20241104135518.868460518@infradead.org
2025-03-04perf/core: Add this_cpc() helperPeter Zijlstra
As a preparation for adding yet another indirection. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20241104135518.650051565@infradead.org
2025-03-04perf/core: Merge struct pmu::pmu_disable_count into struct ↵Peter Zijlstra
perf_cpu_pmu_context::pmu_disable_count Because it makes no sense to have two per-cpu allocations per pmu. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20241104135518.518730578@infradead.org
2025-03-04perf/core: Simplify perf_event_alloc()Peter Zijlstra
Using the previous simplifications, transition perf_event_alloc() to the cleanup way of things -- reducing error path magic. [ mingo: Ported it to recent kernels. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20241104135518.410755241@infradead.org
2025-03-04perf/core: Simplify perf_init_event()Peter Zijlstra
Use the <linux/cleanup.h> guard() and scoped_guard() infrastructure to simplify the control flow. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20241104135518.302444446@infradead.org
2025-03-04perf/core: Simplify perf_pmu_register()Peter Zijlstra
Using the previously introduced perf_pmu_free() and a new IDR helper, simplify the perf_pmu_register error paths. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20241104135518.198937277@infradead.org
2025-03-04perf/core: Simplify the perf_pmu_register() error pathPeter Zijlstra
The error path of perf_pmu_register() is of course very similar to a subset of perf_pmu_unregister(). Extract this common part in perf_pmu_free() and simplify things. [ mingo: Forward ported it ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20241104135518.090915501@infradead.org
2025-03-04perf/core: Simplify the perf_event_alloc() error pathPeter Zijlstra
The error cleanup sequence in perf_event_alloc() is a subset of the existing _free_event() function (it must of course be). Split this out into __free_event() and simplify the error path. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20241104135517.967889521@infradead.org
2025-03-04perf/hw_breakpoint: Return EOPNOTSUPP for unsupported breakpoint typeSaket Kumar Bhaskar
Currently, __reserve_bp_slot() returns -ENOSPC for unsupported breakpoint types on the architecture. For example, powerpc does not support hardware instruction breakpoints. This causes the perf_skip BPF selftest to fail, as neither ENOENT nor EOPNOTSUPP is returned by perf_event_open for unsupported breakpoint types. As a result, the test that should be skipped for this arch is not correctly identified. To resolve this, hw_breakpoint_event_init() should exit early by checking for unsupported breakpoint types using hw_breakpoint_slots_cached() and return the appropriate error (-EOPNOTSUPP). Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Marco Elver <elver@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: https://lore.kernel.org/r/20250303092451.1862862-1-skb99@linux.ibm.com
2025-03-03genirq/msi: Expose MSI message data in debugfsHans Zhang
When debugging MSI-related hardware issues (e.g. interrupt delivery failures), developers currently need to either: 1. Recompile the kernel with dynamic debug for tracing msi_desc. 2. Manually read device registers through low-level tools. Both approaches become challenging in production environments where dynamic debugging is often disabled. The interrupt core provides a debugfs interface for inspection of interrupt related data, which contains the per interrupt information in the view of the hierarchical interrupt domains. Though this interface does not expose the MSI address/data pair, which is important information to: - Verify whether the MSI configuration matches the hardware expectations - Diagnose interrupt routing errors (e.g., mismatched destination ID) - Validate remapping behavior in virtualized environments Implement the debug_show() callback for the generic MSI interrupt domains, and use it to expose the MSI address/data pair in the per interrupt diagnostics. Sample output: address_hi: 0x00000000 address_lo: 0xfe670040 msg_data: 0x00000001 [ tglx: Massaged change log. Use irq_data_get_msi_desc() to avoid pointless lookup. ] Signed-off-by: Hans Zhang <18255117159@163.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250303121008.309265-1-18255117159@163.com
2025-03-03Merge tag 'v6.14-rc5' into x86/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-03-03sched_ext: Merge branch 'for-6.14-fixes' into for-6.15Tejun Heo
Pull for-6.14-fixes to receive: 9360dfe4cbd6 ("sched_ext: Validate prev_cpu in scx_bpf_select_cpu_dfl()") which conflicts with: 337d1b354a29 ("sched_ext: Move built-in idle CPU selection policy to a separate file") Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-03sched_ext: Validate prev_cpu in scx_bpf_select_cpu_dfl()Andrea Righi
If a BPF scheduler provides an invalid CPU (outside the nr_cpu_ids range) as prev_cpu to scx_bpf_select_cpu_dfl() it can cause a kernel crash. To prevent this, validate prev_cpu in scx_bpf_select_cpu_dfl() and trigger an scx error if an invalid CPU is specified. Fixes: f0e1a0643a59b ("sched_ext: Implement BPF extensible scheduler class") Cc: stable@vger.kernel.org # v6.12+ Signed-off-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-03Merge tag 'probes-fixes-v6.14-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probe events fixes from Masami Hiramatsu: - probe-events: Remove unused MAX_ARG_BUF_LEN macro - it is not used - fprobe-events: Log error for exceeding the number of entry args. Since the max number of entry args is limited, it should be checked and rejected when the parser detects it. - tprobe-events: Reject invalid tracepoint name If a user specifies an invalid tracepoint name (e.g. including '/') then the new event is not defined correctly in the eventfs. - tprobe-events: Fix a memory leak when tprobe defined with $retval There is a memory leak if tprobe is defined with $retval. * tag 'probes-fixes-v6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: probe-events: Remove unused MAX_ARG_BUF_LEN macro tracing: fprobe-events: Log error for exceeding the number of entry args tracing: tprobe-events: Reject invalid tracepoint name tracing: tprobe-events: Fix a memory leak when tprobe with $retval
2025-03-03PM: hibernate: Avoid deadlock in hibernate_compressor_param_set()Lizhi Xu
syzbot reported a deadlock in lock_system_sleep() (see below). The write operation to "/sys/module/hibernate/parameters/compressor" conflicts with the registration of ieee80211 device, resulting in a deadlock when attempting to acquire system_transition_mutex under param_lock. To avoid this deadlock, change hibernate_compressor_param_set() to use mutex_trylock() for attempting to acquire system_transition_mutex and return -EBUSY when it fails. Task flags need not be saved or adjusted before calling mutex_trylock(&system_transition_mutex) because the caller is not going to end up waiting for this mutex and if it runs concurrently with system suspend in progress, it will be frozen properly when it returns to user space. syzbot report: syz-executor895/5833 is trying to acquire lock: ffffffff8e0828c8 (system_transition_mutex){+.+.}-{4:4}, at: lock_system_sleep+0x87/0xa0 kernel/power/main.c:56 but task is already holding lock: ffffffff8e07dc68 (param_lock){+.+.}-{4:4}, at: kernel_param_lock kernel/params.c:607 [inline] ffffffff8e07dc68 (param_lock){+.+.}-{4:4}, at: param_attr_store+0xe6/0x300 kernel/params.c:586 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (param_lock){+.+.}-{4:4}: __mutex_lock_common kernel/locking/mutex.c:585 [inline] __mutex_lock+0x19b/0xb10 kernel/locking/mutex.c:730 ieee80211_rate_control_ops_get net/mac80211/rate.c:220 [inline] rate_control_alloc net/mac80211/rate.c:266 [inline] ieee80211_init_rate_ctrl_alg+0x18d/0x6b0 net/mac80211/rate.c:1015 ieee80211_register_hw+0x20cd/0x4060 net/mac80211/main.c:1531 mac80211_hwsim_new_radio+0x304e/0x54e0 drivers/net/wireless/virtual/mac80211_hwsim.c:5558 init_mac80211_hwsim+0x432/0x8c0 drivers/net/wireless/virtual/mac80211_hwsim.c:6910 do_one_initcall+0x128/0x700 init/main.c:1257 do_initcall_level init/main.c:1319 [inline] do_initcalls init/main.c:1335 [inline] do_basic_setup init/main.c:1354 [inline] kernel_init_freeable+0x5c7/0x900 init/main.c:1568 kernel_init+0x1c/0x2b0 init/main.c:1457 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:148 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 -> #2 (rtnl_mutex){+.+.}-{4:4}: __mutex_lock_common kernel/locking/mutex.c:585 [inline] __mutex_lock+0x19b/0xb10 kernel/locking/mutex.c:730 wg_pm_notification drivers/net/wireguard/device.c:80 [inline] wg_pm_notification+0x49/0x180 drivers/net/wireguard/device.c:64 notifier_call_chain+0xb7/0x410 kernel/notifier.c:85 notifier_call_chain_robust kernel/notifier.c:120 [inline] blocking_notifier_call_chain_robust kernel/notifier.c:345 [inline] blocking_notifier_call_chain_robust+0xc9/0x170 kernel/notifier.c:333 pm_notifier_call_chain_robust+0x27/0x60 kernel/power/main.c:102 snapshot_open+0x189/0x2b0 kernel/power/user.c:77 misc_open+0x35a/0x420 drivers/char/misc.c:179 chrdev_open+0x237/0x6a0 fs/char_dev.c:414 do_dentry_open+0x735/0x1c40 fs/open.c:956 vfs_open+0x82/0x3f0 fs/open.c:1086 do_open fs/namei.c:3830 [inline] path_openat+0x1e88/0x2d80 fs/namei.c:3989 do_filp_open+0x20c/0x470 fs/namei.c:4016 do_sys_openat2+0x17a/0x1e0 fs/open.c:1428 do_sys_open fs/open.c:1443 [inline] __do_sys_openat fs/open.c:1459 [inline] __se_sys_openat fs/open.c:1454 [inline] __x64_sys_openat+0x175/0x210 fs/open.c:1454 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #1 ((pm_chain_head).rwsem){++++}-{4:4}: down_read+0x9a/0x330 kernel/locking/rwsem.c:1524 blocking_notifier_call_chain_robust kernel/notifier.c:344 [inline] blocking_notifier_call_chain_robust+0xa9/0x170 kernel/notifier.c:333 pm_notifier_call_chain_robust+0x27/0x60 kernel/power/main.c:102 snapshot_open+0x189/0x2b0 kernel/power/user.c:77 misc_open+0x35a/0x420 drivers/char/misc.c:179 chrdev_open+0x237/0x6a0 fs/char_dev.c:414 do_dentry_open+0x735/0x1c40 fs/open.c:956 vfs_open+0x82/0x3f0 fs/open.c:1086 do_open fs/namei.c:3830 [inline] path_openat+0x1e88/0x2d80 fs/namei.c:3989 do_filp_open+0x20c/0x470 fs/namei.c:4016 do_sys_openat2+0x17a/0x1e0 fs/open.c:1428 do_sys_open fs/open.c:1443 [inline] __do_sys_openat fs/open.c:1459 [inline] __se_sys_openat fs/open.c:1454 [inline] __x64_sys_openat+0x175/0x210 fs/open.c:1454 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (system_transition_mutex){+.+.}-{4:4}: check_prev_add kernel/locking/lockdep.c:3163 [inline] check_prevs_add kernel/locking/lockdep.c:3282 [inline] validate_chain kernel/locking/lockdep.c:3906 [inline] __lock_acquire+0x249e/0x3c40 kernel/locking/lockdep.c:5228 lock_acquire.part.0+0x11b/0x380 kernel/locking/lockdep.c:5851 __mutex_lock_common kernel/locking/mutex.c:585 [inline] __mutex_lock+0x19b/0xb10 kernel/locking/mutex.c:730 lock_system_sleep+0x87/0xa0 kernel/power/main.c:56 hibernate_compressor_param_set+0x1c/0x210 kernel/power/hibernate.c:1452 param_attr_store+0x18f/0x300 kernel/params.c:588 module_attr_store+0x55/0x80 kernel/params.c:924 sysfs_kf_write+0x117/0x170 fs/sysfs/file.c:139 kernfs_fop_write_iter+0x33d/0x500 fs/kernfs/file.c:334 new_sync_write fs/read_write.c:586 [inline] vfs_write+0x5ae/0x1150 fs/read_write.c:679 ksys_write+0x12b/0x250 fs/read_write.c:731 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f other info that might help us debug this: Chain exists of: system_transition_mutex --> rtnl_mutex --> param_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(param_lock); lock(rtnl_mutex); lock(param_lock); lock(system_transition_mutex); *** DEADLOCK *** Reported-by: syzbot+ace60642828c074eb913@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=ace60642828c074eb913 Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com> Link: https://patch.msgid.link/20250224013139.3994500-1-lizhi.xu@windriver.com [ rjw: New subject matching the code changes, changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-03-03tracing: probe-events: Remove unused MAX_ARG_BUF_LEN macroMasami Hiramatsu (Google)
Commit 18b1e870a496 ("tracing/probes: Add $arg* meta argument for all function args") introduced MAX_ARG_BUF_LEN but it is not used. Remove it. Link: https://lore.kernel.org/all/174055075876.4079315.8805416872155957588.stgit@mhiramat.tok.corp.google.com/ Fixes: 18b1e870a496 ("tracing/probes: Add $arg* meta argument for all function args") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-01Merge branch 'perf/urgent' into perf/core, to pick up dependent patches and ↵Ingo Molnar
fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>