summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-12-17af_unix: Clean up error paths in unix_stream_sendmsg().Kuniyuki Iwashima
If we move send_sig() to the SEND_SHUTDOWN check before the while loop, then we can reuse the same kfree_skb() after the pipe_err_free label. Let's gather the scattered kfree_skb()s in error paths. While at it, some style issues are fixed, and the pipe_err_free label is renamed to out_pipe to match other label names. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-17af_unix: Set error only when needed in unix_stream_sendmsg().Kuniyuki Iwashima
We will introduce skb drop reason for AF_UNIX, then we need to set an errno and a drop reason for each path. Let's set an error only when it's needed in unix_stream_sendmsg(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-17af_unix: Clean up error paths in unix_stream_connect().Kuniyuki Iwashima
The label order is weird in unix_stream_connect(), and all NULL checks are unnecessary if reordered. Let's clean up the error paths to make it easy to set a drop reason for each path. While at it, a comment with the old style is updated. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-17af_unix: Set error only when needed in unix_stream_connect().Kuniyuki Iwashima
We will introduce skb drop reason for AF_UNIX, then we need to set an errno and a drop reason for each path. Let's set an error only when it's needed in unix_stream_connect(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-17rust: net::phy fix module autoloadingFUJITA Tomonori
The alias symbol name was renamed. Adjust module_phy_driver macro to create the proper symbol name to fix module autoloading. Fixes: 054a9cd395a7 ("modpost: rename alias symbol for MODULE_DEVICE_TABLE()") Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com> Link: https://patch.msgid.link/20241212130015.238863-1-fujita.tomonori@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-17x86/xen: remove hypercall pageJuergen Gross
The hypercall page is no longer needed. It can be removed, as from the Xen perspective it is optional. But, from Linux's perspective, it removes naked RET instructions that escape the speculative protections that Call Depth Tracking and/or Untrain Ret are trying to achieve. This is part of XSA-466 / CVE-2024-53241. Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
2024-12-17x86/xen: use new hypercall functions instead of hypercall pageJuergen Gross
Call the Xen hypervisor via the new xen_hypercall_func static-call instead of the hypercall page. This is part of XSA-466 / CVE-2024-53241. Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Co-developed-by: Peter Zijlstra <peterz@infradead.org> Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com>
2024-12-17x86/xen: add central hypercall functionsJuergen Gross
Add generic hypercall functions usable for all normal (i.e. not iret) hypercalls. Depending on the guest type and the processor vendor different functions need to be used due to the to be used instruction for entering the hypervisor: - PV guests need to use syscall - HVM/PVH guests on Intel need to use vmcall - HVM/PVH guests on AMD and Hygon need to use vmmcall As PVH guests need to issue hypercalls very early during boot, there is a 4th hypercall function needed for HVM/PVH which can be used on Intel and AMD processors. It will check the vendor type and then set the Intel or AMD specific function to use via static_call(). This is part of XSA-466 / CVE-2024-53241. Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Co-developed-by: Peter Zijlstra <peterz@infradead.org>
2024-12-16Merge branch 'r8169-add-support-for-rtl8125d-rev-b'Jakub Kicinski
Heiner Kallweit says: ==================== r8169: add support for RTL8125D rev.b Add support for RTL8125D rev.b. Its XID is 0x689. It is basically based on the one with XID 0x688, but with different firmware file. To avoid a mess with the version numbering, adjust it first. ==================== Link: https://patch.msgid.link/15c4a9fd-a653-4b09-825d-751964832a7a@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16r8169: add support for RTL8125D rev.bChunHao Lin
Add support for RTL8125D rev.b. Its XID is 0x689. It is basically based on the one with XID 0x688, but with different firmware file. Signed-off-by: ChunHao Lin <hau@realtek.com> [hkallweit1@gmail.com: rebased after adjusted version numbering] Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/75e5e9ec-d01f-43ac-b0f4-e7456baf18d1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16r8169: adjust version numbering for RTL8126Heiner Kallweit
Adjust version numbering for RTL8126, so that it doesn't overlap with new RTL8125 versions. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/6a354364-20e9-48ad-a198-468264288757@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16net: hinic: Fix cleanup in create_rxqs/txqs()Dan Carpenter
There is a check for NULL at the start of create_txqs() and create_rxqs() which tess if "nic_dev->txqs" is non-NULL. The intention is that if the device is already open and the queues are already created then we don't create them a second time. However, the bug is that if we have an error in the create_txqs() then the pointer doesn't get set back to NULL. The NULL check at the start of the function will say that it's already open when it's not and the device can't be used. Set ->txqs back to NULL on cleanup on error. Fixes: c3e79baf1b03 ("net-next/hinic: Add logical Txq and Rxq") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/0cc98faf-a0ed-4565-a55b-0fa2734bc205@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16team: Fix feature exposure when no ports are presentDaniel Borkmann
Small follow-up to align this to an equivalent behavior as the bond driver. The change in 3625920b62c3 ("teaming: fix vlan_features computing") removed the netdevice vlan_features when there is no team port attached, yet it leaves the full set of enc_features intact. Instead, leave the default features as pre 3625920b62c3, and recompute once we do have ports attached. Also, similarly as in bonding case, call the netdev_base_features() helper on the enc_features. Fixes: 3625920b62c3 ("teaming: fix vlan_features computing") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20241213123657.401868-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16Merge branch 'add-support-for-so_priority-cmsg'Jakub Kicinski
Anna Emese Nyiri says: ==================== Add support for SO_PRIORITY cmsg Introduce a new helper function, `sk_set_prio_allowed`, to centralize the logic for validating priority settings. Add support for the `SO_PRIORITY` control message, enabling user-space applications to set socket priority via control messages (cmsg). ==================== Link: https://patch.msgid.link/20241213084457.45120-1-annaemesenyiri@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16sock: Introduce SO_RCVPRIORITY socket optionAnna Emese Nyiri
Add new socket option, SO_RCVPRIORITY, to include SO_PRIORITY in the ancillary data returned by recvmsg(). This is analogous to the existing support for SO_RCVMARK, as implemented in commit 6fd1d51cfa253 ("net: SO_RCVMARK socket option for SO_MARK with recvmsg()"). Reviewed-by: Willem de Bruijn <willemb@google.com> Suggested-by: Ferenc Fejes <fejes@inf.elte.hu> Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com> Link: https://patch.msgid.link/20241213084457.45120-5-annaemesenyiri@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16selftests: net: test SO_PRIORITY ancillary data with cmsg_senderAnna Emese Nyiri
Extend cmsg_sender.c with a new option '-Q' to send SO_PRIORITY ancillary data. cmsg_so_priority.sh script added to validate SO_PRIORITY behavior by creating VLAN device with egress QoS mapping and testing packet priorities using flower filters. Verify that packets with different priorities are correctly matched and counted by filters for multiple protocols and IP versions. Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com> Link: https://patch.msgid.link/20241213084457.45120-4-annaemesenyiri@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16sock: support SO_PRIORITY cmsgAnna Emese Nyiri
The Linux socket API currently allows setting SO_PRIORITY at the socket level, applying a uniform priority to all packets sent through that socket. The exception to this is IP_TOS, when the priority value is calculated during the handling of ancillary data, as implemented in commit f02db315b8d8 ("ipv4: IP_TOS and IP_TTL can be specified as ancillary data"). However, this is a computed value, and there is currently no mechanism to set a custom priority via control messages prior to this patch. According to this patch, if SO_PRIORITY is specified as ancillary data, the packet is sent with the priority value set through sockc->priority, overriding the socket-level values set via the traditional setsockopt() method. This is analogous to the existing support for SO_MARK, as implemented in commit c6af0c227a22 ("ip: support SO_MARK cmsg"). If both cmsg SO_PRIORITY and IP_TOS are passed, then the one that takes precedence is the last one in the cmsg list. This patch has the side effect that raw_send_hdrinc now interprets cmsg IP_TOS. Reviewed-by: Willem de Bruijn <willemb@google.com> Suggested-by: Ferenc Fejes <fejes@inf.elte.hu> Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com> Link: https://patch.msgid.link/20241213084457.45120-3-annaemesenyiri@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16sock: Introduce sk_set_prio_allowed helper functionAnna Emese Nyiri
Simplify priority setting permissions with the 'sk_set_prio_allowed' function, centralizing the validation logic. This change is made in anticipation of a second caller in a following patch. No functional changes. Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Suggested-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com> Link: https://patch.msgid.link/20241213084457.45120-2-annaemesenyiri@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16netlink: specs: add phys-binding attr to rt_link specDonald Hunter
Add the missing phys-binding attr to the mctp-attrs in the rt_link spec. This fixes commit 580db513b4a9 ("net: mctp: Expose transport binding identifier via IFLA attribute"). Note that enum mctp_phys_binding is not currently uapi, but perhaps it should be? Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20241213112551.33557-1-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16chelsio/chtls: prevent potential integer overflow on 32bitDan Carpenter
The "gl->tot_len" variable is controlled by the user. It comes from process_responses(). On 32bit systems, the "gl->tot_len + sizeof(struct cpl_pass_accept_req) + sizeof(struct rss_header)" addition could have an integer wrapping bug. Use size_add() to prevent this. Fixes: a08943947873 ("crypto: chtls - Register chtls with net tls") Cc: stable@vger.kernel.org Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/c6bfb23c-2db2-4e1b-b8ab-ba3925c82ef5@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16rxrpc: Fix ability to add more data to a call once MSG_MORE deassertedDavid Howells
When userspace is adding data to an RPC call for transmission, it must pass MSG_MORE to sendmsg() if it intends to add more data in future calls to sendmsg(). Calling sendmsg() without MSG_MORE being asserted closes the transmission phase of the call (assuming sendmsg() adds all the data presented) and further attempts to add more data should be rejected. However, this is no longer the case. The change of call state that was previously the guard got bumped over to the I/O thread, which leaves a window for a repeat sendmsg() to insert more data. This previously went unnoticed, but the more recent patch that changed the structures behind the Tx queue added a warning: WARNING: CPU: 3 PID: 6639 at net/rxrpc/sendmsg.c:296 rxrpc_send_data+0x3f2/0x860 and rejected the additional data, returning error EPROTO. Fix this by adding a guard flag to the call, setting the flag when we queue the final packet and then rejecting further attempts to add data with EPROTO. Fixes: 2d689424b618 ("rxrpc: Move call state changes from sendmsg to I/O thread") Reported-by: syzbot+ff11be94dfcd7a5af8da@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/6757fb68.050a0220.2477f.005f.GAE@google.com/ Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: syzbot+ff11be94dfcd7a5af8da@syzkaller.appspotmail.com cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/2870480.1734037462@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16rxrpc: Disable IRQ, not BH, to take the lock for ->attend_linkDavid Howells
Use spin_lock_irq(), not spin_lock_bh() to take the lock when accessing the ->attend_link() to stop a delay in the I/O thread due to an interrupt being taken in the app thread whilst that holds the lock and vice versa. Fixes: a2ea9a907260 ("rxrpc: Use irq-disabling spinlocks between app and I/O thread") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/2870146.1734037095@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16Merge branch 'netdev-fix-repeated-netlink-messages-in-queue-dumps'Jakub Kicinski
Jakub Kicinski says: ==================== netdev: fix repeated netlink messages in queue dumps Fix dump continuation for queues and queue stats in the netdev family. Because we used post-increment when saving id of dumped queue next skb would re-dump the already dumped queue. ==================== Link: https://patch.msgid.link/20241213152244.3080955-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16selftests: net-drv: stats: sanity check netlink dumpsJakub Kicinski
Sanity check netlink dumps, to make sure dumps don't have repeated entries or gaps in IDs. Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20241213152244.3080955-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16selftests: net-drv: queues: sanity check netlink dumpsJakub Kicinski
This test already catches a netlink bug fixed by this series, but only when running on HW with many queues. Make sure the netdevsim instance created has a lot of queues, and constrain the size of the recv_buffer used by netlink. While at it test both rx and tx queues. Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20241213152244.3080955-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16selftests: net: support setting recv_size in YNLJakub Kicinski
recv_size parameter allows constraining the buffer size for dumps. It's useful in testing kernel handling of dump continuation, IOW testing dumps which span multiple skbs. Let the tests set this parameter when initializing the YNL family. Keep the normal default, we don't want tests to unintentionally behave very differently than normal code. Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20241213152244.3080955-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16netdev: fix repeated netlink messages in queue statsJakub Kicinski
The context is supposed to record the next queue to dump, not last dumped. If the dump doesn't fit we will restart from the already-dumped queue, duplicating the message. Before this fix and with the selftest improvements later in this series we see: # ./run_kselftest.sh -t drivers/net:stats.py timeout set to 45 selftests: drivers/net: stats.py KTAP version 1 1..5 ok 1 stats.check_pause ok 2 stats.check_fec ok 3 stats.pkt_byte_sum # Check| At /root/ksft-net-drv/drivers/net/./stats.py, line 125, in qstat_by_ifindex: # Check| ksft_eq(len(queues[qtype]), len(set(queues[qtype])), # Check failed 45 != 44 repeated queue keys # Check| At /root/ksft-net-drv/drivers/net/./stats.py, line 127, in qstat_by_ifindex: # Check| ksft_eq(len(queues[qtype]), max(queues[qtype]) + 1, # Check failed 45 != 44 missing queue keys # Check| At /root/ksft-net-drv/drivers/net/./stats.py, line 125, in qstat_by_ifindex: # Check| ksft_eq(len(queues[qtype]), len(set(queues[qtype])), # Check failed 45 != 44 repeated queue keys # Check| At /root/ksft-net-drv/drivers/net/./stats.py, line 127, in qstat_by_ifindex: # Check| ksft_eq(len(queues[qtype]), max(queues[qtype]) + 1, # Check failed 45 != 44 missing queue keys # Check| At /root/ksft-net-drv/drivers/net/./stats.py, line 125, in qstat_by_ifindex: # Check| ksft_eq(len(queues[qtype]), len(set(queues[qtype])), # Check failed 103 != 100 repeated queue keys # Check| At /root/ksft-net-drv/drivers/net/./stats.py, line 127, in qstat_by_ifindex: # Check| ksft_eq(len(queues[qtype]), max(queues[qtype]) + 1, # Check failed 103 != 100 missing queue keys # Check| At /root/ksft-net-drv/drivers/net/./stats.py, line 125, in qstat_by_ifindex: # Check| ksft_eq(len(queues[qtype]), len(set(queues[qtype])), # Check failed 102 != 100 repeated queue keys # Check| At /root/ksft-net-drv/drivers/net/./stats.py, line 127, in qstat_by_ifindex: # Check| ksft_eq(len(queues[qtype]), max(queues[qtype]) + 1, # Check failed 102 != 100 missing queue keys not ok 4 stats.qstat_by_ifindex ok 5 stats.check_down # Totals: pass:4 fail:1 xfail:0 xpass:0 skip:0 error:0 With the fix: # ./ksft-net-drv/run_kselftest.sh -t drivers/net:stats.py timeout set to 45 selftests: drivers/net: stats.py KTAP version 1 1..5 ok 1 stats.check_pause ok 2 stats.check_fec ok 3 stats.pkt_byte_sum ok 4 stats.qstat_by_ifindex ok 5 stats.check_down # Totals: pass:5 fail:0 xfail:0 xpass:0 skip:0 error:0 Fixes: ab63a2387cb9 ("netdev: add per-queue statistics") Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20241213152244.3080955-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16netdev: fix repeated netlink messages in queue dumpJakub Kicinski
The context is supposed to record the next queue to dump, not last dumped. If the dump doesn't fit we will restart from the already-dumped queue, duplicating the message. Before this fix and with the selftest improvements later in this series we see: # ./run_kselftest.sh -t drivers/net:queues.py timeout set to 45 selftests: drivers/net: queues.py KTAP version 1 1..2 # Check| At /root/ksft-net-drv/drivers/net/./queues.py, line 32, in get_queues: # Check| ksft_eq(queues, expected) # Check failed 102 != 100 # Check| At /root/ksft-net-drv/drivers/net/./queues.py, line 32, in get_queues: # Check| ksft_eq(queues, expected) # Check failed 101 != 100 not ok 1 queues.get_queues ok 2 queues.addremove_queues # Totals: pass:1 fail:1 xfail:0 xpass:0 skip:0 error:0 not ok 1 selftests: drivers/net: queues.py # exit=1 With the fix: # ./ksft-net-drv/run_kselftest.sh -t drivers/net:queues.py timeout set to 45 selftests: drivers/net: queues.py KTAP version 1 1..2 ok 1 queues.get_queues ok 2 queues.addremove_queues # Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0 Fixes: 6b6171db7fc8 ("netdev-genl: Add netlink framework functions for queue") Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20241213152244.3080955-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16Merge branch 'mlx5-next' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Tariq Toukan says: ==================== mlx5-next 2024-12-16 The following pull-request contains mlx5 IFC updates. * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Add device cap abs_native_port_num net/mlx5: qos: Add ifc support for cross-esw scheduling net/mlx5: Add support for new scheduling elements net/mlx5: Add ConnectX-8 device to ifc net/mlx5: ifc: Reorganize mlx5_ifc_flow_table_context_bits ==================== Link: https://patch.msgid.link/20241216124028.973763-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16fortify: Hide run-time copy size from value range trackingKees Cook
GCC performs value range tracking for variables as a way to provide better diagnostics. One place this is regularly seen is with warnings associated with bounds-checking, e.g. -Wstringop-overflow, -Wstringop-overread, -Warray-bounds, etc. In order to keep the signal-to-noise ratio high, warnings aren't emitted when a value range spans the entire value range representable by a given variable. For example: unsigned int len; char dst[8]; ... memcpy(dst, src, len); If len's value is unknown, it has the full "unsigned int" range of [0, UINT_MAX], and GCC's compile-time bounds checks against memcpy() will be ignored. However, when a code path has been able to narrow the range: if (len > 16) return; memcpy(dst, src, len); Then the range will be updated for the execution path. Above, len is now [0, 16] when reading memcpy(), so depending on other optimizations, we might see a -Wstringop-overflow warning like: error: '__builtin_memcpy' writing between 9 and 16 bytes into region of size 8 [-Werror=stringop-overflow] When building with CONFIG_FORTIFY_SOURCE, the fortified run-time bounds checking can appear to narrow value ranges of lengths for memcpy(), depending on how the compiler constructs the execution paths during optimization passes, due to the checks against the field sizes. For example: if (p_size_field != SIZE_MAX && p_size != p_size_field && p_size_field < size) As intentionally designed, these checks only affect the kernel warnings emitted at run-time and do not block the potentially overflowing memcpy(), so GCC thinks it needs to produce a warning about the resulting value range that might be reaching the memcpy(). We have seen this manifest a few times now, with the most recent being with cpumasks: In function ‘bitmap_copy’, inlined from ‘cpumask_copy’ at ./include/linux/cpumask.h:839:2, inlined from ‘__padata_set_cpumasks’ at kernel/padata.c:730:2: ./include/linux/fortify-string.h:114:33: error: ‘__builtin_memcpy’ reading between 257 and 536870904 bytes from a region of size 256 [-Werror=stringop-overread] 114 | #define __underlying_memcpy __builtin_memcpy | ^ ./include/linux/fortify-string.h:633:9: note: in expansion of macro ‘__underlying_memcpy’ 633 | __underlying_##op(p, q, __fortify_size); \ | ^~~~~~~~~~~~~ ./include/linux/fortify-string.h:678:26: note: in expansion of macro ‘__fortify_memcpy_chk’ 678 | #define memcpy(p, q, s) __fortify_memcpy_chk(p, q, s, \ | ^~~~~~~~~~~~~~~~~~~~ ./include/linux/bitmap.h:259:17: note: in expansion of macro ‘memcpy’ 259 | memcpy(dst, src, len); | ^~~~~~ kernel/padata.c: In function ‘__padata_set_cpumasks’: kernel/padata.c:713:48: note: source object ‘pcpumask’ of size [0, 256] 713 | cpumask_var_t pcpumask, | ~~~~~~~~~~~~~~^~~~~~~~ This warning is _not_ emitted when CONFIG_FORTIFY_SOURCE is disabled, and with the recent -fdiagnostics-details we can confirm the origin of the warning is due to FORTIFY's bounds checking: ../include/linux/bitmap.h:259:17: note: in expansion of macro 'memcpy' 259 | memcpy(dst, src, len); | ^~~~~~ '__padata_set_cpumasks': events 1-2 ../include/linux/fortify-string.h:613:36: 612 | if (p_size_field != SIZE_MAX && | ~~~~~~~~~~~~~~~~~~~~~~~~~~~ 613 | p_size != p_size_field && p_size_field < size) | ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~ | | | (1) when the condition is evaluated to false | (2) when the condition is evaluated to true '__padata_set_cpumasks': event 3 114 | #define __underlying_memcpy __builtin_memcpy | ^ | | | (3) out of array bounds here Note that the cpumask warning started appearing since bitmap functions were recently marked __always_inline in commit ed8cd2b3bd9f ("bitmap: Switch from inline to __always_inline"), which allowed GCC to gain visibility into the variables as they passed through the FORTIFY implementation. In order to silence these false positives but keep otherwise deterministic compile-time warnings intact, hide the length variable from GCC with OPTIMIZE_HIDE_VAR() before calling the builtin memcpy. Additionally add a comment about why all the macro args have copies with const storage. Reported-by: "Thomas Weißschuh" <linux@weissschuh.net> Closes: https://lore.kernel.org/all/db7190c8-d17f-4a0d-bc2f-5903c79f36c2@t-8ch.de/ Reported-by: Nilay Shroff <nilay@linux.ibm.com> Closes: https://lore.kernel.org/all/20241112124127.1666300-1-nilay@linux.ibm.com/ Tested-by: Nilay Shroff <nilay@linux.ibm.com> Acked-by: Yury Norov <yury.norov@gmail.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kees Cook <kees@kernel.org>
2024-12-16ftrace: Do not find "true_parent" if HAVE_DYNAMIC_FTRACE_WITH_ARGS is not setSteven Rostedt
When function tracing and function graph tracing are both enabled (in different instances) the "parent" of some of the function tracing events is "return_to_handler" which is the trampoline used by function graph tracing. To fix this, ftrace_get_true_parent_ip() was introduced that returns the "true" parent ip instead of the trampoline. To do this, the ftrace_regs_get_stack_pointer() is used, which uses kernel_stack_pointer(). The problem is that microblaze does not implement kerenl_stack_pointer() so when function graph tracing is enabled, the build fails. But microblaze also does not enabled HAVE_DYNAMIC_FTRACE_WITH_ARGS. That option has to be enabled by the architecture to reliably get the values from the fregs parameter passed in. When that config is not set, the architecture can also pass in NULL, which is not tested for in that function and could cause the kernel to crash. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Michal Simek <monstr@monstr.eu> Cc: Jeff Xie <jeff.xie@linux.dev> Link: https://lore.kernel.org/20241216164633.6df18e87@gandalf.local.home Fixes: 60b1f578b578 ("ftrace: Get the true parent ip for function tracer") Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-12-16fgraph: Still initialize idle shadow stacks when startingSteven Rostedt
A bug was discovered where the idle shadow stacks were not initialized for offline CPUs when starting function graph tracer, and when they came online they were not traced due to the missing shadow stack. To fix this, the idle task shadow stack initialization was moved to using the CPU hotplug callbacks. But it removed the initialization when the function graph was enabled. The problem here is that the hotplug callbacks are called when the CPUs come online, but the idle shadow stack initialization only happens if function graph is currently active. This caused the online CPUs to not get their shadow stack initialized. The idle shadow stack initialization still needs to be done when the function graph is registered, as they will not be allocated if function graph is not registered. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20241211135335.094ba282@batman.local.home Fixes: 2c02f7375e65 ("fgraph: Use CPU hotplug mechanism to initialize idle shadow stacks") Reported-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Linus Walleij <linus.walleij@linaro.org> Closes: https://lore.kernel.org/all/CACRpkdaTBrHwRbbrphVy-=SeDz6MSsXhTKypOtLrTQ+DgGAOcQ@mail.gmail.com/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-12-16Merge tag 'soc-fixes-6.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC fixes from Arnd Bergmann: "Three small fixes for the soc tree: - devicetee fix for the Arm Juno reference machine, to allow more interesting PCI configurations - build fix for SCMI firmware on the NXP i.MX platform - fix for a race condition in Arm FF-A firmware" * tag 'soc-fixes-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: arm64: dts: fvp: Update PCIe bus-range property firmware: arm_ffa: Fix the race around setting ffa_dev->properties firmware: arm_scmi: Fix i.MX build dependency
2024-12-16Merge tag 'platform-drivers-x86-v6.13-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Ilpo Järvinen: - alienware-wmi: - Add support for Alienware m16 R1 AMD - Do not setup legacy LED control with X and G Series - intel/ifs: Clearwater Forest support - intel/vsec: Panther Lake support - p2sb: Do not hide the device if BIOS left it unhidden - touchscreen_dmi: Add SARY Tab 3 tablet information * tag 'platform-drivers-x86-v6.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86/intel/vsec: Add support for Panther Lake platform/x86/intel/ifs: Add Clearwater Forest to CPU support list platform/x86: touchscreen_dmi: Add info for SARY Tab 3 tablet p2sb: Do not scan and remove the P2SB device when it is unhidden p2sb: Move P2SB hide and unhide code to p2sb_scan_and_cache() p2sb: Introduce the global flag p2sb_hidden_by_bios p2sb: Factor out p2sb_read_from_cache() alienware-wmi: Adds support to Alienware m16 R1 AMD alienware-wmi: Fix X Series and G Series quirks
2024-12-16erofs: use buffered I/O for file-backed mounts by defaultGao Xiang
For many use cases (e.g. container images are just fetched from remote), performance will be impacted if underlay page cache is up-to-date but direct i/o flushes dirty pages first. Instead, let's use buffered I/O by default to keep in sync with loop devices and add a (re)mount option to explicitly give a try to use direct I/O if supported by the underlying files. The container startup time is improved as below: [workload] docker.io/library/workpress:latest unpack 1st run non-1st runs EROFS snapshotter buffered I/O file 4.586404265s 0.308s 0.198s EROFS snapshotter direct I/O file 4.581742849s 2.238s 0.222s EROFS snapshotter loop 4.596023152s 0.346s 0.201s Overlayfs snapshotter 5.382851037s 0.206s 0.214s Fixes: fb176750266a ("erofs: add file-backed mount support") Cc: Derek McGowan <derek@mcg.dev> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20241212134336.2059899-1-hsiangkao@linux.alibaba.com
2024-12-16erofs: reference `struct erofs_device_info` for erofs_map_devGao Xiang
Record `m_sb` and `m_dif` to replace `m_fscache`, `m_daxdev`, `m_fp` and `m_dax_part_off` in order to simplify the codebase. Note that `m_bdev` is still left since it can be assigned from `sb->s_bdev` directly. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20241212235401.2857246-1-hsiangkao@linux.alibaba.com
2024-12-16erofs: use `struct erofs_device_info` for the primary deviceGao Xiang
Instead of just listing each one directly in `struct erofs_sb_info` except that we still use `sb->s_bdev` for the primary block device. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20241216125310.930933-2-hsiangkao@linux.alibaba.com
2024-12-16Merge branch 'net-timestamp-selectable'David S. Miller
Kory Maincent says: ==================== net: Make timestamping selectable Up until now, there was no way to let the user select the hardware PTP provider at which time stamping occurs. The stack assumed that PHY time stamping is always preferred, but some MAC/PHY combinations were buggy. This series updates the default MAC/PHY default timestamping and aims to allow the user to select the desired hwtstamp provider administratively. Here is few netlink spec usage examples: ./ynl/cli.py --spec netlink/specs/ethtool.yaml --no-schema --dump tsinfo-get --json '{"header":{"dev-name":"eth0"}}' [{'header': {'dev-index': 3, 'dev-name': 'eth0'}, 'hwtst-provider': {'index': 0, 'qualifier': 0}, 'phc-index': 0, 'rx-filters': {'bits': {'bit': [{'index': 0, 'name': 'none'}, {'index': 2, 'name': 'some'}]}, 'nomask': True, 'size': 16}, 'timestamping': {'bits': {'bit': [{'index': 0, 'name': 'hardware-transmit'}, {'index': 2, 'name': 'hardware-receive'}, {'index': 6, 'name': 'hardware-raw-clock'}]}, 'nomask': True, 'size': 17}, 'tx-types': {'bits': {'bit': [{'index': 0, 'name': 'off'}, {'index': 1, 'name': 'on'}]}, 'nomask': True, 'size': 4}}, {'header': {'dev-index': 3, 'dev-name': 'eth0'}, 'hwtst-provider': {'index': 2, 'qualifier': 0}, 'phc-index': 2, 'rx-filters': {'bits': {'bit': [{'index': 0, 'name': 'none'}, {'index': 1, 'name': 'all'}]}, 'nomask': True, 'size': 16}, 'timestamping': {'bits': {'bit': [{'index': 0, 'name': 'hardware-transmit'}, {'index': 1, 'name': 'software-transmit'}, {'index': 2, 'name': 'hardware-receive'}, {'index': 3, 'name': 'software-receive'}, {'index': 4, 'name': 'software-system-clock'}, {'index': 6, 'name': 'hardware-raw-clock'}]}, 'nomask': True, 'size': 17}, 'tx-types': {'bits': {'bit': [{'index': 0, 'name': 'off'}, {'index': 1, 'name': 'on'}, {'index': 2, 'name': 'onestep-sync'}]}, 'nomask': True, 'size': 4}}] ./ynl/cli.py --spec netlink/specs/ethtool.yaml --no-schema --do tsinfo-get --json '{"header":{"dev-name":"eth0"}, "hwtst-provider":{"index":0, "qualifier":0 } }' {'header': {'dev-index': 3, 'dev-name': 'eth0'}, 'hwtst-provider': {'index': 0, 'qualifier': 0}, 'phc-index': 0, 'rx-filters': {'bits': {'bit': [{'index': 0, 'name': 'none'}, {'index': 2, 'name': 'some'}]}, 'nomask': True, 'size': 16}, 'timestamping': {'bits': {'bit': [{'index': 0, 'name': 'hardware-transmit'}, {'index': 2, 'name': 'hardware-receive'}, {'index': 6, 'name': 'hardware-raw-clock'}]}, 'nomask': True, 'size': 17}, 'tx-types': {'bits': {'bit': [{'index': 0, 'name': 'off'}, {'index': 1, 'name': 'on'}]}, 'nomask': True, 'size': 4}} ./ynl/cli.py --spec netlink/specs/ethtool.yaml --no-schema --do tsinfo-set --json '{"header":{"dev-name":"eth0"}, "hwtst-provider":{"index":2, "qualifier":0}}' None ./ynl/cli.py --spec netlink/specs/ethtool.yaml --no-schema --do tsconfig-get --json '{"header":{"dev-name":"eth0"}}' {'header': {'dev-index': 3, 'dev-name': 'eth0'}, 'hwtstamp-flags': 1, 'hwtstamp-provider': {'index': 1, 'qualifier': 0}, 'rx-filters': {'bits': {'bit': [{'index': 12, 'name': 'ptpv2-event'}]}, 'nomask': True, 'size': 16}, 'tx-types': {'bits': {'bit': [{'index': 1, 'name': 'on'}]}, 'nomask': True, 'size': 4}} ./ynl/cli.py --spec netlink/specs/ethtool.yaml --no-schema --do tsconfig-set --json '{"header":{"dev-name":"eth0"}, "hwtstamp-provider":{"index":1, "qualifier":0 }, "rx-filters":{"bits": {"bit": {"name":"ptpv2-l4-event"}}, "nomask": 1}, "tx-types":{"bits": {"bit": {"name":"on"}}, "nomask": 1}}' {'header': {'dev-index': 3, 'dev-name': 'eth0'}, 'hwtstamp-flags': 1, 'hwtstamp-provider': {'index': 1, 'qualifier': 0}, 'rx-filters': {'bits': {'bit': [{'index': 12, 'name': 'ptpv2-event'}]}, 'nomask': True, 'size': 16}, 'tx-types': {'bits': {'bit': [{'index': 1, 'name': 'on'}]}, 'nomask': True, 'size': 4}} Changes in v21: - NIT fixes. - Link to v20: https://lore.kernel.org/r/20241204-feature_ptp_netnext-v20-0-9bd99dc8a867@bootlin.com Changes in v20: - Change hwtstamp provider design to avoid saving "user" (phy or net) in the ptp clock structure. - Link to v19: https://lore.kernel.org/r/20241030-feature_ptp_netnext-v19-0-94f8aadc9d5c@bootlin.com Changes in v19: - Rebase on net-next - Link to v18: https://lore.kernel.org/r/20241023-feature_ptp_netnext-v18-0-ed948f3b6887@bootlin.com Changes in v18: - Few changes in the tsconfig-set ethtool command. - Add tsconfig-set-reply ethtool netlink socket. - Add missing netlink tsconfig documentation - Link to v17: https://lore.kernel.org/r/20240709-feature_ptp_netnext-v17-0-b5317f50df2a@bootlin.com Changes in v17: - Fix a documentation nit. - Add a missing kernel_ethtool_tsinfo update from a new MAC driver. - Link to v16: https://lore.kernel.org/r/20240705-feature_ptp_netnext-v16-0-5d7153914052@bootlin.com Changes in v16: - Add a new patch to separate tsinfo into a new tsconfig command to get and set the hwtstamp config. - Used call_rcu() instead of synchronize_rcu() to free the hwtstamp_provider - Moved net core changes of patch 12 directly to patch 8. - Link to v15: https://lore.kernel.org/r/20240612-feature_ptp_netnext-v15-0-b2a086257b63@bootlin.com Changes in v15: - Fix uninitialized ethtool_ts_info structure. - Link to v14: https://lore.kernel.org/r/20240604-feature_ptp_netnext-v14-0-77b6f6efea40@bootlin.com Changes in v14: - Add back an EXPORT_SYMBOL() missing. - Link to v13: https://lore.kernel.org/r/20240529-feature_ptp_netnext-v13-0-6eda4d40fa4f@bootlin.com Changes in v13: - Add PTP builtin code to fix build errors when building PTP as a module. - Fix error spotted by smatch and sparse. - Link to v12: https://lore.kernel.org/r/20240430-feature_ptp_netnext-v12-0-2c5f24b6a914@bootlin.com Changes in v12: - Add missing return description in the kdoc. - Fix few nit. - Link to v11: https://lore.kernel.org/r/20240422-feature_ptp_netnext-v11-0-f14441f2a1d8@bootlin.com Changes in v11: - Add netlink examples. - Remove a change of my out of tree marvell_ptp patch in the patch series. - Remove useless extern. - Link to v10: https://lore.kernel.org/r/20240409-feature_ptp_netnext-v10-0-0fa2ea5c89a9@bootlin.com Changes in v10: - Move declarations to net/core/dev.h instead of netdevice.h - Add netlink documentation. - Add ETHTOOL_A_TSINFO_GHWTSTAMP netlink attributes instead of a bit in ETHTOOL_A_TSINFO_TIMESTAMPING bitset. - Send "Move from simple ida to xarray" patch standalone. - Add tsinfo ntf command. - Add rcu_lock protection mechanism to avoid memory leak. - Fixed doc and kdoc issue. - Link to v9: https://lore.kernel.org/r/20240226-feature_ptp_netnext-v9-0-455611549f21@bootlin.com Changes in v9: - Remove the RFC prefix. - Correct few NIT fixes. - Link to v8: https://lore.kernel.org/r/20240216-feature_ptp_netnext-v8-0-510f42f444fb@bootlin.com Changes in v8: - Drop the 6 first patch as they are now merged. - Change the full implementation to not be based on the hwtstamp layer (MAC/PHY) but on the hwtstamp provider which mean a ptp clock and a phc qualifier. - Made some patch to prepare the new implementation. - Expand netlink tsinfo instead of a new ts command for new hwtstamp configuration uAPI and for dumping tsinfo of specific hwtstamp provider. - Link to v7: https://lore.kernel.org/r/20231114-feature_ptp_netnext-v7-0-472e77951e40@bootlin.com Changes in v7: - Fix a temporary build error. - Link to v6: https://lore.kernel.org/r/20231019-feature_ptp_netnext-v6-0-71affc27b0e5@bootlin.com Changes in v6: - Few fixes from the reviews. - Replace the allowlist to default_timestamp flag to know which phy is using old API behavior. - Rename the timestamping layer enum values. - Move to a simple enum instead of the mix between enum and bitfield. - Update ts_info and ts-set in software timestamping case. Changes in v5: - Update to ndo_hwstamp_get/set. This bring several new patches. - Add few patches to make the glue. - Convert macb to ndo_hwstamp_get/set. - Add netlink specs description of new ethtool commands. - Removed netdev notifier. - Split the patches that expose the timestamping to userspace to separate the core and ethtool development. - Add description of software timestamping. - Convert PHYs hwtstamp callback to use kernel_hwtstamp_config. Changes in v4: - Move on to ethtool netlink instead of ioctl. - Add a netdev notifier to allow packet trapping by the MAC in case of PHY time stamping. - Add a PHY whitelist to not break the old PHY default time-stamping preference API. Changes in v3: - Expose the PTP choice to ethtool instead of sysfs. You can test it with the ethtool source on branch feature_ptp of: https://github.com/kmaincent/ethtool - Added a devicetree binding to select the preferred timestamp. Changes in v2: - Move selected_timestamping_layer variable of the concerned patch. - Use sysfs_streq instead of strmcmp. - Use the PHY timestamp only if available. ==================== Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-16net: ethtool: Add support for tsconfig command to get/set hwtstamp configKory Maincent
Introduce support for ETHTOOL_MSG_TSCONFIG_GET/SET ethtool netlink socket to read and configure hwtstamp configuration of a PHC provider. Note that simultaneous hwtstamp isn't supported; configuring a new one disables the previous setting. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-16net: ethtool: tsinfo: Enhance tsinfo to support several hwtstamp by net topologyKory Maincent
Either the MAC or the PHY can provide hwtstamp, so we should be able to read the tsinfo for any hwtstamp provider. Enhance 'get' command to retrieve tsinfo of hwtstamp providers within a network topology. Add support for a specific dump command to retrieve all hwtstamp providers within the network topology, with added functionality for filtered dump to target a single interface. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-16net: Add the possibility to support a selected hwtstamp in netdeviceKory Maincent
Introduce the description of a hwtstamp provider, mainly defined with a the hwtstamp source and the phydev pointer. Add a hwtstamp provider description within the netdev structure to allow saving the hwtstamp we want to use. This prepares for future support of an ethtool netlink command to select the desired hwtstamp provider. By default, the old API that does not support hwtstamp selectability is used, meaning the hwtstamp provider pointer is unset. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-16net: Make net_hwtstamp_validate accessibleKory Maincent
Make the net_hwtstamp_validate function accessible in prevision to use it from ethtool to validate the hwtstamp configuration before setting it. Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-16net: Make dev_get_hwtstamp_phylib accessibleKory Maincent
Make the dev_get_hwtstamp_phylib function accessible in prevision to use it from ethtool to read the hwtstamp current configuration. Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-16Merge branch 'tls1.3-key-updates'David S. Miller
Sabrina Dubroca says: ==================== tls: implement key updates for TLS1.3 This adds support for receiving KeyUpdate messages (RFC 8446, 4.6.3 [1]). A sender transmits a KeyUpdate message and then changes its TX key. The receiver should react by updating its RX key before processing the next message. This patchset implements key updates by: 1. pausing decryption when a KeyUpdate message is received, to avoid attempting to use the old key to decrypt a record encrypted with the new key 2. returning -EKEYEXPIRED to syscalls that cannot receive the KeyUpdate message, until the rekey has been performed by userspace 3. passing the KeyUpdate message to userspace as a control message 4. allowing updates of the crypto_info via the TLS_TX/TLS_RX setsockopts This API has been tested with gnutls to make sure that it allows userspace libraries to implement key updates [2]. Thanks to Frantisek Krenzelok <fkrenzel@redhat.com> for providing the implementation in gnutls and testing the kernel patches. ======================================================================= Discussions around v2 of this patchset focused on how HW offload would interact with rekey. RX - The existing SW path will handle all records between the KeyUpdate message signaling the change of key and the new key becoming known to the kernel -- those will be queued encrypted, and decrypted in SW as they are read by userspace (once the key is provided, ie same as this patchset) - Call ->tls_dev_del + ->tls_dev_add immediately during setsockopt(TLS_RX) TX - After setsockopt(TLS_TX), switch to the existing SW path (not the current device_fallback) until we're able to re-enable HW offload - tls_device_sendmsg will call into tls_sw_sendmsg under lock_sock to avoid changing socket ops during the rekey while another thread might be waiting on the lock - We only re-enable HW offload (call ->tls_dev_add to install the new key in HW) once all records sent with the old key have been ACKed. At this point, all unacked records are SW-encrypted with the new key, and the old key is unused by both HW and retransmissions. - If there are no unacked records when userspace does setsockopt(TLS_TX), we can (try to) install the new key in HW immediately. - If yet another key has been provided via setsockopt(TLS_TX), we don't install intermediate keys, only the latest. - TCP notifies ktls of ACKs via the icsk_clean_acked callback. In case of a rekey, tls_icsk_clean_acked will record when all data sent with the most recent past key has been sent. The next call to sendmsg will install the new key in HW. - We close and push the current SW record before reenabling offload. If ->tls_dev_add fails to install the new key in HW, we stay in SW mode. We can add a counter to keep track of this. In addition: Because we can't change socket ops during a rekey, we'll also have to modify do_tls_setsockopt_conf to check ctx->tx_conf and only call either tls_set_device_offload or tls_set_sw_offload. RX already uses the same ops for both TLS_HW and TLS_SW, so we could switch between HW and SW mode on rekey. An alternative would be to have a common sendmsg which locks the socket and then calls the correct implementation. We'll need that anyway for the offload under rekey case, so that would only add a test to the SW path's ops (compared to the current code). That should allow us to simplify build_protos a bit, but might have a performance impact - we'll need to check it if we want to go that route. ======================================================================= Changes since v4: - add counter for received KeyUpdate messages - improve wording in the documentation - improve handling of bogus messages when looking for KeyUpdate's - some coding style clean ups Changes since v3: - rebase on top of net-next - rework tls_check_pending_rekey according to Jakub's feedback - add statistics for rekey: {RX,TX}REKEY{OK,ERROR} - some coding style clean ups ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-16selftests: tls: add rekey testsSabrina Dubroca
Test the kernel's ability to: - update the key (but not the version or cipher), only for TLS1.3 - pause decryption after receiving a KeyUpdate message, until a new RX key has been provided - reflect the pause/non-readable socket in poll() Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-16selftests: tls: add key_generation argument to tls_crypto_info_initSabrina Dubroca
This allows us to generate different keys, so that we can test that rekey is using the correct one. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-16docs: tls: document TLS1.3 key updatesSabrina Dubroca
Document the kernel's behavior and userspace expectations. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-16tls: add counters for rekeySabrina Dubroca
This introduces 5 counters to keep track of key updates: Tls{Rx,Tx}Rekey{Ok,Error} and TlsRxRekeyReceived. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-16tls: implement rekey for TLS1.3Sabrina Dubroca
This adds the possibility to change the key and IV when using TLS1.3. Changing the cipher or TLS version is not supported. Once we have updated the RX key, we can unblock the receive side. If the rekey fails, the context is unmodified and userspace is free to retry the update or close the socket. This change only affects tls_sw, since 1.3 offload isn't supported. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-16tls: block decryption when a rekey is pendingSabrina Dubroca
When a TLS handshake record carrying a KeyUpdate message is received, all subsequent records will be encrypted with a new key. We need to stop decrypting incoming records with the old key, and wait until userspace provides a new key. Make a note of this in the RX context just after decrypting that record, and stop recvmsg/splice calls with EKEYEXPIRED until the new key is available. key_update_pending can't be combined with the existing bitfield, because we will read it locklessly in ->poll. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>