summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-10-29net: ibm: emac: use platform_get_irqRosen Penev
No need for irq_of_parse_and_map since we have platform_device. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29net: ibm: emac: use devm_platform_ioremap_resourceRosen Penev
No need to have a struct resource. Gets rid of the TODO. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29net: ibm: emac: use netif_receive_skb_listRosen Penev
Small rx improvement. Would use napi_gro_receive instead but that's a lot more involved than netif_receive_skb_list because of how the function is implemented. Before: > iperf -c 192.168.1.1 ------------------------------------------------------------ Client connecting to 192.168.1.1, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [ 1] local 192.168.1.101 port 51556 connected with 192.168.1.1 port 5001 [ ID] Interval Transfer Bandwidth [ 1] 0.00-10.04 sec 559 MBytes 467 Mbits/sec > iperf -c 192.168.1.1 ------------------------------------------------------------ Client connecting to 192.168.1.1, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [ 1] local 192.168.1.101 port 48228 connected with 192.168.1.1 port 5001 [ ID] Interval Transfer Bandwidth [ 1] 0.00-10.03 sec 558 MBytes 467 Mbits/sec > iperf -c 192.168.1.1 ------------------------------------------------------------ Client connecting to 192.168.1.1, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [ 1] local 192.168.1.101 port 47600 connected with 192.168.1.1 port 5001 [ ID] Interval Transfer Bandwidth [ 1] 0.00-10.04 sec 557 MBytes 466 Mbits/sec > iperf -c 192.168.1.1 ------------------------------------------------------------ Client connecting to 192.168.1.1, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [ 1] local 192.168.1.101 port 37252 connected with 192.168.1.1 port 5001 [ ID] Interval Transfer Bandwidth [ 1] 0.00-10.05 sec 559 MBytes 467 Mbits/sec After: > iperf -c 192.168.1.1 ------------------------------------------------------------ Client connecting to 192.168.1.1, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [ 1] local 192.168.1.101 port 40786 connected with 192.168.1.1 port 5001 [ ID] Interval Transfer Bandwidth [ 1] 0.00-10.05 sec 572 MBytes 478 Mbits/sec > iperf -c 192.168.1.1 ------------------------------------------------------------ Client connecting to 192.168.1.1, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [ 1] local 192.168.1.101 port 52482 connected with 192.168.1.1 port 5001 [ ID] Interval Transfer Bandwidth [ 1] 0.00-10.04 sec 571 MBytes 477 Mbits/sec > iperf -c 192.168.1.1 ------------------------------------------------------------ Client connecting to 192.168.1.1, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [ 1] local 192.168.1.101 port 48370 connected with 192.168.1.1 port 5001 [ ID] Interval Transfer Bandwidth [ 1] 0.00-10.04 sec 572 MBytes 478 Mbits/sec > iperf -c 192.168.1.1 ------------------------------------------------------------ Client connecting to 192.168.1.1, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [ 1] local 192.168.1.101 port 46086 connected with 192.168.1.1 port 5001 [ ID] Interval Transfer Bandwidth [ 1] 0.00-10.05 sec 571 MBytes 476 Mbits/sec > iperf -c 192.168.1.1 ------------------------------------------------------------ Client connecting to 192.168.1.1, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [ 1] local 192.168.1.101 port 46062 connected with 192.168.1.1 port 5001 [ ID] Interval Transfer Bandwidth [ 1] 0.00-10.04 sec 572 MBytes 478 Mbits/sec Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29Merge branch 'ipv4-convert-rtm_-new-del-addr-and-more-to-per-netns-rtnl'Paolo Abeni
Kuniyuki Iwashima says: ==================== ipv4: Convert RTM_{NEW,DEL}ADDR and more to per-netns RTNL. The IPv4 address hash table and GC are already namespacified. This series converts RTM_NEWADDR/RTM_DELADDR and some more RTNL users to per-netns RTNL. Changes: v2: * Add patch 1 to address sparse warning for CONFIG_DEBUG_NET_SMALL_RTNL=n * Add Eric's tags to patch 2-12 v1: https://lore.kernel.org/netdev/20241018012225.90409-1-kuniyu@amazon.com/ ==================== Link: https://patch.msgid.link/20241021183239.79741-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29ipv4: Convert devinet_ioctl to per-netns RTNL.Kuniyuki Iwashima
ioctl(SIOCGIFCONF) calls dev_ifconf() that operates on the current netns. Let's use per-netns RTNL helpers in dev_ifconf() and inet_gifconf(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29ipv4: Convert devinet_ioctl() to per-netns RTNL except for SIOCSIFFLAGS.Kuniyuki Iwashima
Basically, devinet_ioctl() operates on a single netns. However, ioctl(SIOCSIFFLAGS) will trigger the netdev notifier that could touch another netdev in different netns. Let's use per-netns RTNL helper in devinet_ioctl() and place ASSERT_RTNL() for SIOCSIFFLAGS. We will remove ASSERT_RTNL() once RTM_SETLINK and RTM_DELLINK are converted. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29ipv4: Convert devinet_sysctl_forward() to per-netns RTNL.Kuniyuki Iwashima
devinet_sysctl_forward() touches only a single netns. Let's use rtnl_trylock() and __in_dev_get_rtnl_net(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29rtnetlink: Define rtnl_net_trylock().Kuniyuki Iwashima
We will need the per-netns version of rtnl_trylock(). rtnl_net_trylock() calls __rtnl_net_lock() only when rtnl_trylock() successfully holds RTNL. When RTNL is removed, we will use mutex_trylock() for per-netns RTNL. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29ipv4: Convert check_lifetime() to per-netns RTNL.Kuniyuki Iwashima
Since commit 1675f385213e ("ipv4: Namespacify IPv4 address GC."), check_lifetime() works on a per-netns basis. Let's use rtnl_net_lock() and rtnl_net_dereference(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29ipv4: Convert RTM_DELADDR to per-netns RTNL.Kuniyuki Iwashima
Let's push down RTNL into inet_rtm_deladdr() as rtnl_net_lock(). Now, ip_mc_autojoin_config() is always called under per-netns RTNL, so ASSERT_RTNL() can be replaced with ASSERT_RTNL_NET(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29ipv4: Use per-netns RTNL helpers in inet_rtm_newaddr().Kuniyuki Iwashima
inet_rtm_to_ifa() and find_matching_ifa() are called under rtnl_net_lock(). __in_dev_get_rtnl() and in_dev_for_each_ifa_rtnl() there can use per-netns RTNL helpers. Let's define and use __in_dev_get_rtnl_net() and in_dev_for_each_ifa_rtnl_net(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29ipv4: Convert RTM_NEWADDR to per-netns RTNL.Kuniyuki Iwashima
The address hash table and GC are already namespacified. Let's push down RTNL into inet_rtm_newaddr() as rtnl_net_lock(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29ipv4: Don't allocate ifa for 0.0.0.0 in inet_rtm_newaddr().Kuniyuki Iwashima
When we pass 0.0.0.0 to __inet_insert_ifa(), it frees ifa and returns 0. We can do this check much earlier for RTM_NEWADDR even before allocating struct in_ifaddr. Let's move the validation to 1. inet_insert_ifa() for ioctl() 2. inet_rtm_newaddr() for RTM_NEWADDR Now, we can remove the same check in find_matching_ifa(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29ipv4: Factorise RTM_NEWADDR validation to inet_validate_rtm().Kuniyuki Iwashima
rtm_to_ifaddr() validates some attributes, looks up a netdev, allocates struct in_ifaddr, and validates IFA_CACHEINFO. There is no reason to delay IFA_CACHEINFO validation. We will push RTNL down to inet_rtm_newaddr(), and then we want to complete rtnetlink validation before rtnl_net_lock(). Let's factorise the validation parts. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29rtnetlink: Define RTNL_FLAG_DOIT_PERNET for per-netns RTNL doit().Kuniyuki Iwashima
We will push RTNL down to each doit() as rtnl_net_lock(). We can use RTNL_FLAG_DOIT_UNLOCKED to call doit() without RTNL, but doit() will still hold RTNL. Let's define RTNL_FLAG_DOIT_PERNET as an alias of RTNL_FLAG_DOIT_UNLOCKED. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29rtnetlink: Make per-netns RTNL dereference helpers to macro.Kuniyuki Iwashima
When CONFIG_DEBUG_NET_SMALL_RTNL is off, rtnl_net_dereference() is the static inline wrapper of rtnl_dereference() returning a plain (void *) pointer to make sure net is always evaluated as requested in [0]. But, it makes sparse complain [1] when the pointer has __rcu annotation: net/ipv4/devinet.c:674:47: sparse: warning: incorrect type in argument 2 (different address spaces) net/ipv4/devinet.c:674:47: sparse: expected void *p net/ipv4/devinet.c:674:47: sparse: got struct in_ifaddr [noderef] __rcu * Also, if we evaluate net as (void *) in a macro, then the compiler in turn fails to build due to -Werror=unused-value. #define rtnl_net_dereference(net, p) \ ({ \ (void *)net; \ rtnl_dereference(p); \ }) net/ipv4/devinet.c: In function ‘inet_rtm_deladdr’: ./include/linux/rtnetlink.h:154:17: error: statement with no effect [-Werror=unused-value] 154 | (void *)net; \ net/ipv4/devinet.c:674:21: note: in expansion of macro ‘rtnl_net_dereference’ 674 | (ifa = rtnl_net_dereference(net, *ifap)) != NULL; | ^~~~~~~~~~~~~~~~~~~~ Let's go back to the original simplest macro. Note that checkpatch complains about this approach, but it's one-shot and less noisy than the other two. WARNING: Argument 'net' is not used in function-like macro #76: FILE: include/linux/rtnetlink.h:142: +#define rtnl_net_dereference(net, p) \ + rtnl_dereference(p) Fixes: 844e5e7e656d ("rtnetlink: Add assertion helpers for per-netns RTNL.") Link: https://lore.kernel.org/netdev/20241004132145.7fd208e9@kernel.org/ [0] Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202410200325.SaEJmyZS-lkp@intel.com/ [1] Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-28neighbour: use kvzalloc()/kvfree()Eric Dumazet
mm layer is providing convenient functions, we do not have to work around old limitations. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Gilad Naaman <gnaaman@drivenets.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241022150059.1345406-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28netlink: specs: Add missing phy-ntf command to ethtool specKory Maincent
ETHTOOL_MSG_PHY_NTF description is missing in the ethtool netlink spec. Add it to the spec. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20241022151418.875424-1-kory.maincent@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28vsock: do not leave dangling sk pointer in vsock_create()Eric Dumazet
syzbot was able to trigger the following warning after recent core network cleanup. On error vsock_create() frees the allocated sk object, but sock_init_data() has already attached it to the provided sock object. We must clear sock->sk to avoid possible use-after-free later. WARNING: CPU: 0 PID: 5282 at net/socket.c:1581 __sock_create+0x897/0x950 net/socket.c:1581 Modules linked in: CPU: 0 UID: 0 PID: 5282 Comm: syz.2.43 Not tainted 6.12.0-rc2-syzkaller-00667-g53bac8330865 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 RIP: 0010:__sock_create+0x897/0x950 net/socket.c:1581 Code: 7f 06 01 65 48 8b 34 25 00 d8 03 00 48 81 c6 b0 08 00 00 48 c7 c7 60 0b 0d 8d e8 d4 9a 3c 02 e9 11 f8 ff ff e8 0a ab 0d f8 90 <0f> 0b 90 e9 82 fd ff ff 89 e9 80 e1 07 fe c1 38 c1 0f 8c c7 f8 ff RSP: 0018:ffffc9000394fda8 EFLAGS: 00010293 RAX: ffffffff89873c46 RBX: ffff888079f3c818 RCX: ffff8880314b9e00 RDX: 0000000000000000 RSI: 00000000ffffffed RDI: 0000000000000000 RBP: ffffffff8d3337f0 R08: ffffffff8987384e R09: ffffffff8989473a R10: dffffc0000000000 R11: fffffbfff203a276 R12: 00000000ffffffed R13: ffff888079f3c8c0 R14: ffffffff898736e7 R15: dffffc0000000000 FS: 00005555680ab500(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f22b11196d0 CR3: 00000000308c0000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> sock_create net/socket.c:1632 [inline] __sys_socket_create net/socket.c:1669 [inline] __sys_socket+0x150/0x3c0 net/socket.c:1716 __do_sys_socket net/socket.c:1730 [inline] __se_sys_socket net/socket.c:1728 [inline] __x64_sys_socket+0x7a/0x90 net/socket.c:1728 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f22b117dff9 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fff56aec0e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000029 RAX: ffffffffffffffda RBX: 00007f22b1335f80 RCX: 00007f22b117dff9 RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000028 RBP: 00007f22b11f0296 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f22b1335f80 R14: 00007f22b1335f80 R15: 00000000000012dd Fixes: 48156296a08c ("net: warn, if pf->create does not clear sock->sk on error") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ignat Korchagin <ignat@cloudflare.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20241022134819.1085254-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28net/mlx5: unique names for per device cachesSebastian Ott
Add the device name to the per device kmem_cache names to ensure their uniqueness. This fixes warnings like this: "kmem_cache of name 'mlx5_fs_fgs' already exists". Signed-off-by: Sebastian Ott <sebott@redhat.com> Reviewed-by: Breno Leitao <leitao@debian.org> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241023134146.28448-1-sebott@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28Merge branch 'bonding-returns-detailed-error-about-xdp-failures'Jakub Kicinski
Hangbin Liu says: ==================== Bonding: returns detailed error about XDP failures Based on discussion[1], this patch set returns detailed error about XDP failures. And update bonding document about XDP supports. https://lore.kernel.org/8088f2a7-3ab1-4a1e-996d-c15703da13cc@blackwall.org ==================== Link: https://patch.msgid.link/20241021031211.814-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28Documentation: bonding: add XDP support explanationHangbin Liu
Add document about which modes have native XDP support. Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20241021031211.814-3-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28bonding: return detailed error when loading native XDP failsHangbin Liu
Bonding only supports native XDP for specific modes, which can lead to confusion for users regarding why XDP loads successfully at times and fails at others. This patch enhances error handling by returning detailed error messages, providing users with clearer insights into the specific reasons for the failure when loading native XDP. Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20241021031211.814-2-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28Merge branch 'mptcp-various-small-improvements'Jakub Kicinski
Matthieu Baerts says: ==================== mptcp: various small improvements The following patches are not related to each other. - Patch 1: Avoid sending advertisements on stale subflows, reducing risks on loosing them. - Patch 2: Annotate data-races around subflow->fully_established, using READ/WRITE_ONCE(). - Patch 3: A small clean-up on the PM side, avoiding a bit of duplicated code. - Patch 4: Use "Middlebox interference" MP_TCPRST code in reaction to a packet received without MPTCP options in the middle of a connection. ==================== Link: https://patch.msgid.link/20241021-net-next-mptcp-misc-6-13-v1-0-1ef02746504a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28mptcp: use "middlebox interference" RST when no DSSDavide Caratti
RFC8684 suggests use of "Middlebox interference (code 0x06)" in case of fully established subflow that carries data at TCP level with no DSS sub-option. This is generally the case when mpext is NULL or mpext->use_map is 0: use a dedicated value of 'mapping_status' and use it before closing the socket in subflow_check_data_avail(). Link: https://github.com/multipath-tcp/mptcp_net-next/issues/518 Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20241021-net-next-mptcp-misc-6-13-v1-4-1ef02746504a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28mptcp: implement mptcp_pm_connection_closedGeliang Tang
The MPTCP path manager event handler mptcp_pm_connection_closed interface has been added in the commit 1b1c7a0ef7f3 ("mptcp: Add path manager interface") but it was an empty function from then on. With such name, it sounds good to invoke mptcp_event with the MPTCP_EVENT_CLOSED event type from it. It also removes a bit of duplicated code. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20241021-net-next-mptcp-misc-6-13-v1-3-1ef02746504a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28mptcp: annotate data-races around subflow->fully_establishedGang Yan
We introduce the same handling for potential data races with the 'fully_established' flag in subflow as previously done for msk->fully_established. Additionally, we make a crucial change: convert the subflow's 'fully_established' from 'bit_field' to 'bool' type. This is necessary because methods for avoiding data races don't work well with 'bit_field'. Specifically, the 'READ_ONCE' needs to know the size of the variable being accessed, which is not supported in 'bit_field'. Also, 'test_bit' expect the address of 'bit_field'. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/516 Signed-off-by: Gang Yan <yangang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20241021-net-next-mptcp-misc-6-13-v1-2-1ef02746504a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28mptcp: pm: send ACK on non-stale subflowsMatthieu Baerts (NGI0)
If the subflow is considered as "staled", it is better to avoid it to send an ACK carrying an ADD_ADDR or RM_ADDR. Another subflow, if any, will then be selected. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20241021-net-next-mptcp-misc-6-13-v1-1-1ef02746504a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28Merge branch 'net-systemport-minor-io-macros-changes'Jakub Kicinski
Florian Fainelli says: ==================== net: systemport: Minor IO macros changes This patch series addresses the warning initially reported by Vladimir here: https://lore.kernel.org/all/20241014150139.927423-1-vladimir.oltean@nxp.com/ and follows on with proceeding with his suggestion the IO macros to the header file. ==================== Link: https://patch.msgid.link/20241021174935.57658-1-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28net: systemport: Move IO macros to header fileFlorian Fainelli
Move the BCM_SYSPORT_IO_MACRO() definition and its use to bcmsysport.h where it is more appropriate and where static inline helpers are acceptable. While at it, make sure that the macro 'offset' argument does not trigger a checkpatch warning due to possible argument re-use. Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241021174935.57658-3-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28net: systemport: Remove unused txchk accessorsFlorian Fainelli
Vladimir reported the following warning with clang-16 and W=1: warning: unused function 'txchk_readl' [-Wunused-function] BCM_SYSPORT_IO_MACRO(txchk, SYS_PORT_TXCHK_OFFSET); note: expanded from macro 'BCM_SYSPORT_IO_MACRO' warning: unused function 'txchk_writel' [-Wunused-function] note: expanded from macro 'BCM_SYSPORT_IO_MACRO' warning: unused function 'tbuf_readl' [-Wunused-function] BCM_SYSPORT_IO_MACRO(tbuf, SYS_PORT_TBUF_OFFSET); note: expanded from macro 'BCM_SYSPORT_IO_MACRO' warning: unused function 'tbuf_writel' [-Wunused-function] note: expanded from macro 'BCM_SYSPORT_IO_MACRO' The TXCHK and RBUF blocks are not being accessed, remove the IO macros used to access those blocks. No functional impact. Reported-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241021174935.57658-2-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28selftest/tcp-ao: Add filter testsLeo Stone
Add tests that check if getsockopt(TCP_AO_GET_KEYS) returns the right keys when using different filters. Sample output: > # ok 114 filter keys: by sndid, rcvid, address > # ok 115 filter keys: by is_current > # ok 116 filter keys: by is_rnext > # ok 117 filter keys: by sndid, rcvid > # ok 118 filter keys: correct nkeys when in.nkeys < matches Acked-by: Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by: Leo Stone <leocstone@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241021174652.6949-1-leocstone@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28net: amd8111e: Remove duplicate definition of PCI_VENDOR_ID_AMDYazen Ghannam
The AMD PCI vendor ID is already defined in <linux/pci_ids.h>. Remove this local definition as it is not needed. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20241021153825.2536819-1-yazen.ghannam@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28dt-bindings: nfc: nxp,nci: Document PN553 compatibleDanila Tikhonov
The PN553 is another NFC chip from NXP, document the compatible in the bindings. Signed-off-by: Danila Tikhonov <danila@jiaxyga.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20241020205615.211256-2-danila@jiaxyga.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28configs/debug: make sure PROVE_RCU_LIST=y takes effectJakub Kicinski
Commit 0aaa8977acbf ("configs: introduce debug.config for CI-like setup") added CONFIG_PROVE_RCU_LIST=y to the common CI config, but RCU_EXPERT is not set, and it's a dependency for CONFIG_PROVE_RCU_LIST=y. Make sure CIs take advantage of CONFIG_PROVE_RCU_LIST=y, recent fixes in networking indicate that it does catch bugs. Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Acked-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241016011144.3058445-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28net: dsa: mv88e6xxx: fix unreleased fwnode_handle in setup_port()Javier Carrasco
'ports_fwnode' is initialized via device_get_named_child_node(), which requires a call to fwnode_handle_put() when the variable is no longer required to avoid leaking memory. Add the missing fwnode_handle_put() after 'ports_fwnode' has been used and is no longer required. Fixes: 94a2a84f5e9e ("net: dsa: mv88e6xxx: Support LED control") Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-28bareudp: Use pcpu stats to update rx_dropped counter.Guillaume Nault
Use the core_stats rx_dropped counter to avoid the cost of atomic increments. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-25wifi: ath12k: convert tasklet to BH workqueue for CE interruptsRaj Kumar Bhagat
Currently in Ath12k, tasklet is used to handle the BH context of CE interrupts. However the tasklet is marked deprecated and has some design flaws. To replace tasklets, BH workqueue support has been added. BH workqueue behaves similarly to regular workqueues except that the queued work items are executed in the BH context. Hence, convert the tasklet to BH workqueue for handling CE interrupts in the BH context. Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.1.1-00214-QCAHKSWPL_SILICONZ-1 Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.3.1-00173-QCAHKSWPL_SILICONZ-1 Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3 Signed-off-by: Raj Kumar Bhagat <quic_rajkbhag@quicinc.com> Acked-by: Kalle Valo <kvalo@kernel.org> Link: https://patch.msgid.link/20241022072406.3231450-1-quic_rajkbhag@quicinc.com Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
2024-10-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni
Cross-merge networking fixes after downstream PR. No conflicts and no adjacent changes. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-24Merge tag 'net-6.12-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfiler, xfrm and bluetooth. Oddly this includes a fix for a posix clock regression; in our previous PR we included a change there as a pre-requisite for networking one. That fix proved to be buggy and requires the follow-up included here. Thomas suggested we should send it, given we sent the buggy patch. Current release - regressions: - posix-clock: Fix unbalanced locking in pc_clock_settime() - netfilter: fix typo causing some targets not to load on IPv6 Current release - new code bugs: - xfrm: policy: remove last remnants of pernet inexact list Previous releases - regressions: - core: fix races in netdev_tx_sent_queue()/dev_watchdog() - bluetooth: fix UAF on sco_sock_timeout - eth: hv_netvsc: fix VF namespace also in synthetic NIC NETDEV_REGISTER event - eth: usbnet: fix name regression - eth: be2net: fix potential memory leak in be_xmit() - eth: plip: fix transmit path breakage Previous releases - always broken: - sched: deny mismatched skip_sw/skip_hw flags for actions created by classifiers - netfilter: bpf: must hold reference on net namespace - eth: virtio_net: fix integer overflow in stats - eth: bnxt_en: replace ptp_lock with irqsave variant - eth: octeon_ep: add SKB allocation failures handling in __octep_oq_process_rx() Misc: - MAINTAINERS: add Simon as an official reviewer" * tag 'net-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (40 commits) net: dsa: mv88e6xxx: support 4000ps cycle counter period net: dsa: mv88e6xxx: read cycle counter period from hardware net: dsa: mv88e6xxx: group cycle counter coefficients net: usb: qmi_wwan: add Fibocom FG132 0x0112 composition hv_netvsc: Fix VF namespace also in synthetic NIC NETDEV_REGISTER event net: dsa: microchip: disable EEE for KSZ879x/KSZ877x/KSZ876x Bluetooth: ISO: Fix UAF on iso_sock_timeout Bluetooth: SCO: Fix UAF on sco_sock_timeout Bluetooth: hci_core: Disable works on hci_unregister_dev posix-clock: posix-clock: Fix unbalanced locking in pc_clock_settime() r8169: avoid unsolicited interrupts net: sched: use RCU read-side critical section in taprio_dump() net: sched: fix use-after-free in taprio_change() net/sched: act_api: deny mismatched skip_sw/skip_hw flags for actions created by classifiers net: usb: usbnet: fix name regression mlxsw: spectrum_router: fix xa_store() error checking virtio_net: fix integer overflow in stats net: fix races in netdev_tx_sent_queue()/dev_watchdog() net: wwan: fix global oob in wwan_rtnl_policy netfilter: xtables: fix typo causing some targets not to load on IPv6 ...
2024-10-24Merge tag 'hid-for-linus-20241024' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID fixes from Jiri Kosina: "Device-specific functionality quirks for Thinkpad X1 Gen3, Logitech Bolt and some Goodix touchpads (Bartłomiej Maryńczak, Hans de Goede and Kenneth Albanowski)" * tag 'hid-for-linus-20241024' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: lenovo: Add support for Thinkpad X1 Tablet Gen 3 keyboard HID: multitouch: Add quirk for Logitech Bolt receiver w/ Casa touchpad HID: i2c-hid: Delayed i2c resume wakeup for 0x0d42 Goodix touchpad
2024-10-24Merge tag 'loongarch-fixes-6.12-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch fixes from Huacai Chen: "Get correct cores_per_package for SMT systems, enable IRQ if do_ale() triggered in irq-enabled context, and fix some bugs about vDSO, memory managenent, hrtimer in KVM, etc" * tag 'loongarch-fixes-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: LoongArch: KVM: Mark hrtimer to expire in hard interrupt context LoongArch: Make KASAN usable for variable cpu_vabits LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel space LoongArch: Don't crash in stack_top() for tasks without vDSO LoongArch: Set correct size for vDSO code mapping LoongArch: Enable IRQ if do_ale() triggered in irq-enabled context LoongArch: Get correct cores_per_package for SMT systems LoongArch: Use "Exception return address" to comment ERA
2024-10-24Merge tag 'probes-fixes-v6.12-rc4.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fixes from Masami Hiramatsu: - objpool: Fix choosing allocation for percpu slots Fixes to allocate objpool's percpu slots correctly according to the GFP flag. It checks whether "any bit" in GFP_ATOMIC is set to choose the vmalloc source, but it should check "all bits" in GFP_ATOMIC flag is set, because GFP_ATOMIC is a combined flag. - tracing/probes: Fix MAX_TRACE_ARGS limit handling If more than MAX_TRACE_ARGS are passed for creating a probe event, the entries over MAX_TRACE_ARG in trace_arg array are not initialized. Thus if the kernel accesses those entries, it crashes. This rejects creating event if the number of arguments is over MAX_TRACE_ARGS. - tracing: Consider the NUL character when validating the event length A strlen() is used when parsing the event name, and the original code does not consider the terminal null byte. Thus it can pass the name one byte longer than the buffer. This fixes to check it correctly. * tag 'probes-fixes-v6.12-rc4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Consider the NULL character when validating the event length tracing/probes: Fix MAX_TRACE_ARGS limit handling objpool: fix choosing allocation for percpu slots
2024-10-24Merge tag 'for-6.12-rc4-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - mount option fixes: - fix handling of compression mount options on remount - reject rw remount in case there are options that don't work in read-write mode (like rescue options) - fix zone accounting of unusable space - fix in-memory corruption when merging extent maps - fix delalloc range locking for sector < page - use more convenient default value of drop subtree threshold, clean more subvolumes without the fallback to marking quotas inconsistent - fix smatch warning about incorrect value passed to ERR_PTR * tag 'for-6.12-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix passing 0 to ERR_PTR in btrfs_search_dir_index_item() btrfs: reject ro->rw reconfiguration if there are hard ro requirements btrfs: fix read corruption due to race with extent map merging btrfs: fix the delalloc range locking if sector size < page size btrfs: qgroup: set a more sane default value for subtree drop threshold btrfs: clear force-compress on remount when compress mount option is given btrfs: zoned: fix zone unusable accounting for freed reserved extent
2024-10-24Merge tag 'jfs-6.12-rc5' of github.com:kleikamp/linux-shaggyLinus Torvalds
Pull jfs fix from David Kleikamp: "Fix a regression introduced in 6.12-rc1" * tag 'jfs-6.12-rc5' of github.com:kleikamp/linux-shaggy: jfs: Fix sanity check in dbMount
2024-10-24Merge tag 'bcachefs-2024-10-22' of https://github.com/koverstreet/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: "Lots of hotfixes: - transaction restart injection has been shaking out a few things - fix a data corruption in the buffered write path on -ENOSPC, found by xfstests generic/299 - Some small show_options fixes - Repair mismatches in inode hash type, seed: different snapshot versions of an inode must have the same hash/type seed, used for directory entries and xattrs. We were checking the hash seed, but not the type, and a user contributed a filesystem where the hash type on one inode had somehow been flipped; these fixes allow his filesystem to repair. Additionally, the hash type flip made some directory entries invisible, which were then recreated by userspace; so the hash check code now checks for duplicate non dangling dirents, and renames one of them if necessary. - Don't use wait_event_interruptible() in recovery: this fixes some filesystems failing to mount with -ERESTARTSYS - Workaround for kvmalloc not supporting > INT_MAX allocations, causing an -ENOMEM when allocating the sorted array of journal keys: this allows a 75 TB filesystem to mount - Make sure bch_inode_unpacked.bi_snapshot is set in the old inode compat path: this alllows Marcin's filesystem (in use since before 6.7) to repair and mount" * tag 'bcachefs-2024-10-22' of https://github.com/koverstreet/bcachefs: (26 commits) bcachefs: Set bch_inode_unpacked.bi_snapshot in old inode path bcachefs: Mark more errors as AUTOFIX bcachefs: Workaround for kvmalloc() not supporting > INT_MAX allocations bcachefs: Don't use wait_event_interruptible() in recovery bcachefs: Fix __bch2_fsck_err() warning bcachefs: fsck: Improve hash_check_key() bcachefs: bch2_hash_set_or_get_in_snapshot() bcachefs: Repair mismatches in inode hash seed, type bcachefs: Add hash seed, type to inode_to_text() bcachefs: INODE_STR_HASH() for bch_inode_unpacked bcachefs: Run in-kernel offline fsck without ratelimit errors bcachefs: skip mount option handle for empty string. bcachefs: fix incorrect show_options results bcachefs: Fix data corruption on -ENOSPC in buffered write path bcachefs: bch2_folio_reservation_get_partial() is now better behaved bcachefs: fix disk reservation accounting in bch2_folio_reservation_get() bcachefS: ec: fix data type on stripe deletion bcachefs: Don't use commit_do() unnecessarily bcachefs: handle restarts in bch2_bucket_io_time_reset() bcachefs: fix restart handling in __bch2_resume_logged_op_finsert() ...
2024-10-24Revert "9p: Enable multipage folios"Dominique Martinet
This reverts commit 1325e4a91a405f88f1b18626904d37860a4f9069. using multipage folios apparently break some madvise operations like MADV_PAGEOUT which do not reliably unload the specified page anymore, Revert the patch until that is figured out. Reported-by: Andrii Nakryiko <andrii@kernel.org> Fixes: 1325e4a91a40 ("9p: Enable multipage folios") Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-10-24selftests: tls: add a selftest for wrapping rec_seqSabrina Dubroca
Set the initial rec_seq to 0xffffffffffffffff so that it wraps immediately. The send() call should fail with EBADMSG. A bug in this code was fixed in commit cfaa80c91f6f ("net/tls: do not free tls_rec on async operation in bpf_exec_tx_verdict()"). Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20775fcfd0371422921ee60a42de170c0398ac10.1729244987.git.sd@queasysnail.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-24Merge branch 'phonet-convert-all-doit-and-dumpit-to-rcu'Paolo Abeni
Kuniyuki Iwashima says: ==================== phonet: Convert all doit() and dumpit() to RCU. addr_doit() and route_doit() access only phonet_device_list(dev_net(dev)) and phonet_pernet(dev_net(dev))->routes, respectively. Each per-netns struct has its dedicated mutex, and RTNL also protects the structs. __dev_change_net_namespace() has synchronize_net(), so we have two options to convert addr_doit() and route_doit(). 1. Use per-netns RTNL 2. Use RCU and convert each struct mutex to spinlock_t As RCU is preferable, this series converts all PF_PHONET's doit() and dumpit() to RCU. 4 doit()s and 1 dumpit() are now converted to RCU, 70 doit()s and 28 dumpit()s are still under RTNL. ==================== Link: https://patch.msgid.link/20241017183140.43028-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-24phonet: Don't hold RTNL for route_doit().Kuniyuki Iwashima
Now only __dev_get_by_index() depends on RTNL in route_doit(). Let's use dev_get_by_index_rcu() and register route_doit() with RTNL_FLAG_DOIT_UNLOCKED. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>