diff options
author | Andrea Righi <arighi@nvidia.com> | 2025-08-05 10:59:11 +0200 |
---|---|---|
committer | Greg Kroah-Hartman <gregkh@linuxfoundation.org> | 2025-08-28 16:34:31 +0200 |
commit | 6a32cbe95029ebe21cc08349fd7ef2a3d32d2043 (patch) | |
tree | ee7647872336de46f6e9bbd430e832d3ba0b60e4 | |
parent | 61009439e4bd8d74e705ee15940760321be91d8a (diff) |
sched/ext: Fix invalid task state transitions on class switch
commit ddf7233fcab6c247379d0928d46cc316ee122229 upstream.
When enabling a sched_ext scheduler, we may trigger invalid task state
transitions, resulting in warnings like the following (which can be
easily reproduced by running the hotplug selftest in a loop):
sched_ext: Invalid task state transition 0 -> 3 for fish[770]
WARNING: CPU: 18 PID: 787 at kernel/sched/ext.c:3862 scx_set_task_state+0x7c/0xc0
...
RIP: 0010:scx_set_task_state+0x7c/0xc0
...
Call Trace:
<TASK>
scx_enable_task+0x11f/0x2e0
switching_to_scx+0x24/0x110
scx_enable.isra.0+0xd14/0x13d0
bpf_struct_ops_link_create+0x136/0x1a0
__sys_bpf+0x1edd/0x2c30
__x64_sys_bpf+0x21/0x30
do_syscall_64+0xbb/0x370
entry_SYSCALL_64_after_hwframe+0x77/0x7f
This happens because we skip initialization for tasks that are already
dead (with their usage counter set to zero), but we don't exclude them
during the scheduling class transition phase.
Fix this by also skipping dead tasks during class swiching, preventing
invalid task state transitions.
Fixes: a8532fac7b5d2 ("sched_ext: TASK_DEAD tasks must be switched into SCX on ops_enable")
Cc: stable@vger.kernel.org # v6.12+
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-rw-r--r-- | kernel/sched/ext.c | 4 |
1 files changed, 4 insertions, 0 deletions
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c index 7dd5cbcb7a06..717e3d1d6a2f 100644 --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -5694,6 +5694,9 @@ static int scx_enable(struct sched_ext_ops *ops, struct bpf_link *link) __setscheduler_class(p->policy, p->prio); struct sched_enq_and_set_ctx ctx; + if (!tryget_task_struct(p)) + continue; + if (old_class != new_class && p->se.sched_delayed) dequeue_task(task_rq(p), p, DEQUEUE_SLEEP | DEQUEUE_DELAYED); @@ -5706,6 +5709,7 @@ static int scx_enable(struct sched_ext_ops *ops, struct bpf_link *link) sched_enq_and_set_task(&ctx); check_class_changed(task_rq(p), p, old_class, p->prio); + put_task_struct(p); } scx_task_iter_stop(&sti); percpu_up_write(&scx_fork_rwsem); |