summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorYun Lu <luyun@kylinos.cn>2025-07-11 17:33:00 +0800
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>2025-07-24 08:56:25 +0200
commit1ddedbd8087d0c1a5477514400f08888f2ade980 (patch)
treeacf77bc67c95dae70c92302a393bca6a1e15f715
parentfa0796cd62c255b4af6dfebdd3572cd8de784967 (diff)
af_packet: fix soft lockup issue caused by tpacket_snd()
commit 55f0bfc0370539213202f4ce1a07615327ac4713 upstream. When MSG_DONTWAIT is not set, the tpacket_snd operation will wait for pending_refcnt to decrement to zero before returning. The pending_refcnt is decremented by 1 when the skb->destructor function is called, indicating that the skb has been successfully sent and needs to be destroyed. If an error occurs during this process, the tpacket_snd() function will exit and return error, but pending_refcnt may not yet have decremented to zero. Assuming the next send operation is executed immediately, but there are no available frames to be sent in tx_ring (i.e., packet_current_frame returns NULL), and skb is also NULL, the function will not execute wait_for_completion_interruptible_timeout() to yield the CPU. Instead, it will enter a do-while loop, waiting for pending_refcnt to be zero. Even if the previous skb has completed transmission, the skb->destructor function can only be invoked in the ksoftirqd thread (assuming NAPI threading is enabled). When both the ksoftirqd thread and the tpacket_snd operation happen to run on the same CPU, and the CPU trapped in the do-while loop without yielding, the ksoftirqd thread will not get scheduled to run. As a result, pending_refcnt will never be reduced to zero, and the do-while loop cannot exit, eventually leading to a CPU soft lockup issue. In fact, skb is true for all but the first iterations of that loop, and as long as pending_refcnt is not zero, even if incremented by a previous call, wait_for_completion_interruptible_timeout() should be executed to yield the CPU, allowing the ksoftirqd thread to be scheduled. Therefore, the execution condition of this function should be modified to check if pending_refcnt is not zero, instead of check skb. - if (need_wait && skb) { + if (need_wait && packet_read_pending(&po->tx_ring)) { As a result, the judgment conditions are duplicated with the end code of the while loop, and packet_read_pending() is a very expensive function. Actually, this loop can only exit when ph is NULL, so the loop condition can be changed to while (1), and in the "ph = NULL" branch, if the subsequent condition of if is not met, the loop can break directly. Now, the loop logic remains the same as origin but is clearer and more obvious. Fixes: 89ed5b519004 ("af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET") Cc: stable@kernel.org Suggested-by: LongJun Tang <tanglongjun@kylinos.cn> Signed-off-by: Yun Lu <luyun@kylinos.cn> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-rw-r--r--net/packet/af_packet.c23
1 files changed, 11 insertions, 12 deletions
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 52a4ea3dc2ac..19c4c1f27e58 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2845,15 +2845,21 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
ph = packet_current_frame(po, &po->tx_ring,
TP_STATUS_SEND_REQUEST);
if (unlikely(ph == NULL)) {
- if (need_wait && skb) {
+ /* Note: packet_read_pending() might be slow if we
+ * have to call it as it's per_cpu variable, but in
+ * fast-path we don't have to call it, only when ph
+ * is NULL, we need to check the pending_refcnt.
+ */
+ if (need_wait && packet_read_pending(&po->tx_ring)) {
timeo = wait_for_completion_interruptible_timeout(&po->skb_completion, timeo);
if (timeo <= 0) {
err = !timeo ? -ETIMEDOUT : -ERESTARTSYS;
goto out_put;
}
- }
- /* check for additional frames */
- continue;
+ /* check for additional frames */
+ continue;
+ } else
+ break;
}
skb = NULL;
@@ -2942,14 +2948,7 @@ tpacket_error:
}
packet_increment_head(&po->tx_ring);
len_sum += tp_len;
- } while (likely((ph != NULL) ||
- /* Note: packet_read_pending() might be slow if we have
- * to call it as it's per_cpu variable, but in fast-path
- * we already short-circuit the loop with the first
- * condition, and luckily don't have to go that path
- * anyway.
- */
- (need_wait && packet_read_pending(&po->tx_ring))));
+ } while (1);
err = len_sum;
goto out_put;