summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-01-31selftests: net: add missing config for pmtu.sh testsPaolo Abeni
The mentioned test uses a few Kconfig still missing the net config, add them. Before: # Error: Specified qdisc kind is unknown. # Error: Specified qdisc kind is unknown. # Error: Qdisc not classful. # We have an error talking to the kernel # Error: Qdisc not classful. # We have an error talking to the kernel # policy_routing not supported # TEST: ICMPv4 with DSCP and ECN: PMTU exceptions [SKIP] After: # TEST: ICMPv4 with DSCP and ECN: PMTU exceptions [ OK ] Fixes: ec730c3e1f0e ("selftest: net: Test IPv4 PMTU exceptions with DSCP and ECN") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/8d27bf6762a5c7b3acc457d6e6872c533040f9c1.1706635101.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-31Merge branch 'pds_core-various-fixes'Jakub Kicinski
Brett Creeley says: ==================== pds_core: Various fixes This series includes the following changes: There can be many users of the pds_core's adminq. This includes pds_core's uses and any clients that depend on it. When the pds_core device goes through a reset for any reason the adminq is freed and reconfigured. There are some gaps in the current implementation that will cause crashes during reset if any of the previously mentioned users of the adminq attempt to use it after it's been freed. Issues around how resets are handled, specifically regarding the driver's error handlers. Originally these patches were aimed at net-next, but it was requested to push the fixes patches to net. The original patches can be found here: https://lore.kernel.org/netdev/20240126174255.17052-1-brett.creeley@amd.com/ Also, the Reviewed-by tags were left in place from net-next reviews as the patches didn't change. ==================== Link: https://lore.kernel.org/r/20240129234035.69802-1-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-31pds_core: Rework teardown/setup flow to be more commonBrett Creeley
Currently the teardown/setup flow for driver probe/remove is quite a bit different from the reset flows in pdsc_fw_down()/pdsc_fw_up(). One key piece that's missing are the calls to pci_alloc_irq_vectors() and pci_free_irq_vectors(). The pcie reset case is calling pci_free_irq_vectors() on reset_prepare, but not calling the corresponding pci_alloc_irq_vectors() on reset_done. This is causing unexpected/unwanted interrupt behavior due to the adminq interrupt being accidentally put into legacy interrupt mode. Also, the pci_alloc_irq_vectors()/pci_free_irq_vectors() functions are being called directly in probe/remove respectively. Fix this inconsistency by making the following changes: 1. Always call pdsc_dev_init() in pdsc_setup(), which calls pci_alloc_irq_vectors() and get rid of the now unused pds_dev_reinit(). 2. Always free/clear the pdsc->intr_info in pdsc_teardown() since this structure will get re-alloced in pdsc_setup(). 3. Move the calls of pci_free_irq_vectors() to pdsc_teardown() since pci_alloc_irq_vectors() will always be called in pdsc_setup()->pdsc_dev_init() for both the probe/remove and reset flows. 4. Make sure to only create the debugfs "identity" entry when it doesn't already exist, which it will in the reset case because it's already been created in the initial call to pdsc_dev_init(). Fixes: ffa55858330f ("pds_core: implement pci reset handlers") Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20240129234035.69802-7-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-31pds_core: Clear BARs on resetBrett Creeley
During reset the BARs might be accessed when they are unmapped. This can cause unexpected issues, so fix it by clearing the cached BAR values so they are not accessed until they are re-mapped. Also, make sure any places that can access the BARs when they are NULL are prevented. Fixes: 49ce92fbee0b ("pds_core: add FW update feature to devlink") Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20240129234035.69802-6-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-31pds_core: Prevent race issues involving the adminqBrett Creeley
There are multiple paths that can result in using the pdsc's adminq. [1] pdsc_adminq_isr and the resulting work from queue_work(), i.e. pdsc_work_thread()->pdsc_process_adminq() [2] pdsc_adminq_post() When the device goes through reset via PCIe reset and/or a fw_down/fw_up cycle due to bad PCIe state or bad device state the adminq is destroyed and recreated. A NULL pointer dereference can happen if [1] or [2] happens after the adminq is already destroyed. In order to fix this, add some further state checks and implement reference counting for adminq uses. Reference counting was used because multiple threads can attempt to access the adminq at the same time via [1] or [2]. Additionally, multiple clients (i.e. pds-vfio-pci) can be using [2] at the same time. The adminq_refcnt is initialized to 1 when the adminq has been allocated and is ready to use. Users/clients of the adminq (i.e. [1] and [2]) will increment the refcnt when they are using the adminq. When the driver goes into a fw_down cycle it will set the PDSC_S_FW_DEAD bit and then wait for the adminq_refcnt to hit 1. Setting the PDSC_S_FW_DEAD before waiting will prevent any further adminq_refcnt increments. Waiting for the adminq_refcnt to hit 1 allows for any current users of the adminq to finish before the driver frees the adminq. Once the adminq_refcnt hits 1 the driver clears the refcnt to signify that the adminq is deleted and cannot be used. On the fw_up cycle the driver will once again initialize the adminq_refcnt to 1 allowing the adminq to be used again. Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands") Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20240129234035.69802-5-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-31pds_core: Use struct pdsc for the pdsc_adminq_isr private dataBrett Creeley
The initial design for the adminq interrupt was done based on client drivers having their own adminq and adminq interrupt. So, each client driver's adminq isr would use their specific adminqcq for the private data struct. For the time being the design has changed to only use a single adminq for all clients. So, instead use the struct pdsc for the private data to simplify things a bit. This also has the benefit of not dereferencing the adminqcq to access the pdsc struct when the PDSC_S_STOPPING_DRIVER bit is set and the adminqcq has actually been cleared/freed. Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands") Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20240129234035.69802-4-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-31pds_core: Cancel AQ work on teardownBrett Creeley
There is a small window where pdsc_work_thread() calls pdsc_process_adminq() and pdsc_process_adminq() passes the PDSC_S_STOPPING_DRIVER check and starts to process adminq/notifyq work and then the driver starts a fw_down cycle. This could cause some undefined behavior if the notifyqcq/adminqcq are free'd while pdsc_process_adminq() is running. Use cancel_work_sync() on the adminqcq's work struct to make sure any pending work items are cancelled and any in progress work items are completed. Also, make sure to not call cancel_work_sync() if the work item has not be initialized. Without this, traces will happen in cases where a reset fails and teardown is called again or if reset fails and the driver is removed. Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands") Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20240129234035.69802-3-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-31pds_core: Prevent health thread from running during reset/removeBrett Creeley
The PCIe reset handlers can run at the same time as the health thread. This can cause the health thread to stomp on the PCIe reset. Fix this by preventing the health thread from running while a PCIe reset is happening. As part of this use timer_shutdown_sync() during reset and remove to make sure the timer doesn't ever get rearmed. Fixes: ffa55858330f ("pds_core: implement pci reset handlers") Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20240129234035.69802-2-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-31af_unix: fix lockdep positive in sk_diag_dump_icons()Eric Dumazet
syzbot reported a lockdep splat [1]. Blamed commit hinted about the possible lockdep violation, and code used unix_state_lock_nested() in an attempt to silence lockdep. It is not sufficient, because unix_state_lock_nested() is already used from unix_state_double_lock(). We need to use a separate subclass. This patch adds a distinct enumeration to make things more explicit. Also use swap() in unix_state_double_lock() as a clean up. v2: add a missing inline keyword to unix_state_lock_nested() [1] WARNING: possible circular locking dependency detected 6.8.0-rc1-syzkaller-00356-g8a696a29c690 #0 Not tainted syz-executor.1/2542 is trying to acquire lock: ffff88808b5df9e8 (rlock-AF_UNIX){+.+.}-{2:2}, at: skb_queue_tail+0x36/0x120 net/core/skbuff.c:3863 but task is already holding lock: ffff88808b5dfe70 (&u->lock/1){+.+.}-{2:2}, at: unix_dgram_sendmsg+0xfc7/0x2200 net/unix/af_unix.c:2089 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&u->lock/1){+.+.}-{2:2}: lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754 _raw_spin_lock_nested+0x31/0x40 kernel/locking/spinlock.c:378 sk_diag_dump_icons net/unix/diag.c:87 [inline] sk_diag_fill+0x6ea/0xfe0 net/unix/diag.c:157 sk_diag_dump net/unix/diag.c:196 [inline] unix_diag_dump+0x3e9/0x630 net/unix/diag.c:220 netlink_dump+0x5c1/0xcd0 net/netlink/af_netlink.c:2264 __netlink_dump_start+0x5d7/0x780 net/netlink/af_netlink.c:2370 netlink_dump_start include/linux/netlink.h:338 [inline] unix_diag_handler_dump+0x1c3/0x8f0 net/unix/diag.c:319 sock_diag_rcv_msg+0xe3/0x400 netlink_rcv_skb+0x1df/0x430 net/netlink/af_netlink.c:2543 sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280 netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline] netlink_unicast+0x7e6/0x980 net/netlink/af_netlink.c:1367 netlink_sendmsg+0xa37/0xd70 net/netlink/af_netlink.c:1908 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] sock_write_iter+0x39a/0x520 net/socket.c:1160 call_write_iter include/linux/fs.h:2085 [inline] new_sync_write fs/read_write.c:497 [inline] vfs_write+0xa74/0xca0 fs/read_write.c:590 ksys_write+0x1a0/0x2c0 fs/read_write.c:643 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b -> #0 (rlock-AF_UNIX){+.+.}-{2:2}: check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x1909/0x5ab0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162 skb_queue_tail+0x36/0x120 net/core/skbuff.c:3863 unix_dgram_sendmsg+0x15d9/0x2200 net/unix/af_unix.c:2112 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] ____sys_sendmsg+0x592/0x890 net/socket.c:2584 ___sys_sendmsg net/socket.c:2638 [inline] __sys_sendmmsg+0x3b2/0x730 net/socket.c:2724 __do_sys_sendmmsg net/socket.c:2753 [inline] __se_sys_sendmmsg net/socket.c:2750 [inline] __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2750 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&u->lock/1); lock(rlock-AF_UNIX); lock(&u->lock/1); lock(rlock-AF_UNIX); *** DEADLOCK *** 1 lock held by syz-executor.1/2542: #0: ffff88808b5dfe70 (&u->lock/1){+.+.}-{2:2}, at: unix_dgram_sendmsg+0xfc7/0x2200 net/unix/af_unix.c:2089 stack backtrace: CPU: 1 PID: 2542 Comm: syz-executor.1 Not tainted 6.8.0-rc1-syzkaller-00356-g8a696a29c690 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106 check_noncircular+0x366/0x490 kernel/locking/lockdep.c:2187 check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x1909/0x5ab0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162 skb_queue_tail+0x36/0x120 net/core/skbuff.c:3863 unix_dgram_sendmsg+0x15d9/0x2200 net/unix/af_unix.c:2112 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] ____sys_sendmsg+0x592/0x890 net/socket.c:2584 ___sys_sendmsg net/socket.c:2638 [inline] __sys_sendmmsg+0x3b2/0x730 net/socket.c:2724 __do_sys_sendmmsg net/socket.c:2753 [inline] __se_sys_sendmmsg net/socket.c:2750 [inline] __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2750 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b RIP: 0033:0x7f26d887cda9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f26d95a60c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 00007f26d89abf80 RCX: 00007f26d887cda9 RDX: 000000000000003e RSI: 00000000200bd000 RDI: 0000000000000004 RBP: 00007f26d88c947a R08: 0000000000000000 R09: 0000000000000000 R10: 00000000000008c0 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000000b R14: 00007f26d89abf80 R15: 00007ffcfe081a68 Fixes: 2aac7a2cb0d9 ("unix_diag: Pending connections IDs NLA") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240130184235.1620738-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-31Merge branch 'selftests-net-a-couple-of-typos-fixes-in-key-management-rst-tests'Jakub Kicinski
Dmitry Safonov says: ==================== selftests/net: A couple of typos fixes in key-management/rst tests Two typo fixes, noticed by Mohammad's review. And a fix for an issue that got uncovered. v1: https://lore.kernel.org/r/20240118-tcp-ao-test-key-mgmt-v1-0-3583ca147113@arista.com Signed-off-by: Dmitry Safonov <dima@arista.com> ==================== Link: https://lore.kernel.org/r/20240130-tcp-ao-test-key-mgmt-v2-0-d190430a6c60@arista.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-31selftests/net: Repair RST passive reset selftestDmitry Safonov
Currently, the test is racy and seems to not pass anymore. In order to rectify it, aim on TCP_TW_RST. Doesn't seem way too good with this sleep() part, but it seems as a reasonable compromise for the test. There is a plan in-line comment on how-to improve it, going to do it on the top, at this moment I want it to run on netdev/patchwork selftests dashboard. It also slightly changes tcp_ao-lib in order to get SO_ERROR propagated to test_client_verify() return value. Fixes: c6df7b2361d7 ("selftests/net: Add TCP-AO RST test") Signed-off-by: Dmitry Safonov <dima@arista.com> Link: https://lore.kernel.org/r/20240130-tcp-ao-test-key-mgmt-v2-3-d190430a6c60@arista.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-31selftests/net: Rectify key counters checksDmitry Safonov
As the names of (struct test_key) members didn't reflect whether the key was used for TX or RX, the verification for the counters was done incorrectly for asymmetrical selftests. Rename these with _tx appendix and fix checks in verify_counters(). While at it, as the checks are now correct, introduce skip_counters_checks, which is intended for tests where it's expected that a key that was set with setsockopt(sk, IPPROTO_TCP, TCP_AO_INFO, ...) might had no chance of getting used on the wire. Fixes the following failures, exposed by the previous commit: > not ok 51 server: Check current != rnext keys set before connect(): Counter pkt_good was expected to increase 0 => 0 for key 132:5 > not ok 52 server: Check current != rnext keys set before connect(): Counter pkt_good was not expected to increase 0 => 21 for key 137:10 > > not ok 63 server: Check current flapping back on peer's RnextKey request: Counter pkt_good was expected to increase 0 => 0 for key 132:5 > not ok 64 server: Check current flapping back on peer's RnextKey request: Counter pkt_good was not expected to increase 0 => 40 for key 137:10 Cc: Mohammad Nassiri <mnassiri@ciena.com> Fixes: 3c3ead555648 ("selftests/net: Add TCP-AO key-management test") Signed-off-by: Dmitry Safonov <dima@arista.com> Link: https://lore.kernel.org/r/20240130-tcp-ao-test-key-mgmt-v2-2-d190430a6c60@arista.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-31selftests/net: Argument value mismatch when calling verify_counters()Mohammad Nassiri
The end_server() function only operates in the server thread and always takes an accept socket instead of a listen socket as its input argument. To align with this, invert the boolean values used when calling verify_counters() within the end_server() function. As a result of this typo, the test didn't correctly check for the non-symmetrical scenario, where i.e. peer-A uses a key <100:200> to send data, but peer-B uses another key <105:205> to send its data. So, in simple words, different keys for TX and RX. Fixes: 3c3ead555648 ("selftests/net: Add TCP-AO key-management test") Signed-off-by: Mohammad Nassiri <mnassiri@ciena.com> Link: https://lore.kernel.org/all/934627c5-eebb-4626-be23-cfb134c01d1a@arista.com/ [amended 'Fixes' tag, added the issue description and carried-over to lkml] Signed-off-by: Dmitry Safonov <dima@arista.com> Link: https://lore.kernel.org/r/20240130-tcp-ao-test-key-mgmt-v2-1-d190430a6c60@arista.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-31net: dsa: mv88e6xxx: Fix failed probe due to unsupported C45 readsAndrew Lunn
Not all mv88e6xxx device support C45 read/write operations. Those which do not return -EOPNOTSUPP. However, when phylib scans the bus, it considers this fatal, and the probe of the MDIO bus fails, which in term causes the mv88e6xxx probe as a whole to fail. When there is no device on the bus for a given address, the pull up resistor on the data line results in the read returning 0xffff. The phylib core code understands this when scanning for devices on the bus. C45 allows multiple devices to be supported at one address, so phylib will perform a few reads at each address, so although thought not the most efficient solution, it is a way to avoid fatal errors. Make use of this as a minimal fix for stable to fix the probing problems. Follow up patches will rework how C45 operates to make it similar to C22 which considers -ENODEV as a none-fatal, and swap mv88e6xxx to using this. Cc: stable@vger.kernel.org Fixes: 743a19e38d02 ("net: dsa: mv88e6xxx: Separate C22 and C45 transactions") Reported-by: Tim Menninger <tmenninger@purestorage.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20240129224948.1531452-1-andrew@lunn.ch Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-31net: ipv4: fix a memleak in ip_setup_corkZhipeng Lu
When inetdev_valid_mtu fails, cork->opt should be freed if it is allocated in ip_setup_cork. Otherwise there could be a memleak. Fixes: 501a90c94510 ("inet: protect against too small mtu values.") Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240129091017.2938835-1-alexious@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-31MAINTAINERS: Drop unreachable reviewer for Qualcomm ETHQOS ethernet driverAndrew Halaney
Bhupesh's email responds indicating they've changed employers and with no new contact information. Let's drop the line from MAINTAINERS to avoid getting the same response over and over. Signed-off-by: Andrew Halaney <ahalaney@redhat.com> Link: https://lore.kernel.org/r/20240129-remove-dwmac-qcom-ethqos-reviewer-v1-1-2645eab61451@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-31netfilter: nft_ct: sanitize layer 3 and 4 protocol number in custom expectationsPablo Neira Ayuso
- Disallow families other than NFPROTO_{IPV4,IPV6,INET}. - Disallow layer 4 protocol with no ports, since destination port is a mandatory attribute for this object. Fixes: 857b46027d6f ("netfilter: nft_ct: add ct expectations support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-01-31netfilter: nf_log: replace BUG_ON by WARN_ON_ONCE when putting loggerPablo Neira Ayuso
Module reference is bumped for each user, this should not ever happen. But BUG_ON check should use rcu_access_pointer() instead. If this ever happens, do WARN_ON_ONCE() instead of BUG_ON() and consolidate pointer check under the rcu read side lock section. Fixes: fab4085f4e24 ("netfilter: log: nf_log_packet() as real unified interface") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-01-31netfilter: ipset: fix performance regression in swap operationJozsef Kadlecsik
The patch "netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test", commit 28628fa9 fixes a race condition. But the synchronize_rcu() added to the swap function unnecessarily slows it down: it can safely be moved to destroy and use call_rcu() instead. Eric Dumazet pointed out that simply calling the destroy functions as rcu callback does not work: sets with timeout use garbage collectors which need cancelling at destroy which can wait. Therefore the destroy functions are split into two: cancelling garbage collectors safely at executing the command received by netlink and moving the remaining part only into the rcu callback. Link: https://lore.kernel.org/lkml/C0829B10-EAA6-4809-874E-E1E9C05A8D84@automattic.com/ Fixes: 28628fa952fe ("netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test") Reported-by: Ale Crismani <ale.crismani@automattic.com> Reported-by: David Wang <00107082@163.com> Tested-by: David Wang <00107082@163.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-01-31netfilter: conntrack: check SCTP_CID_SHUTDOWN_ACK for vtag setting in sctp_newXin Long
The annotation says in sctp_new(): "If it is a shutdown ack OOTB packet, we expect a return shutdown complete, otherwise an ABORT Sec 8.4 (5) and (8)". However, it does not check SCTP_CID_SHUTDOWN_ACK before setting vtag[REPLY] in the conntrack entry(ct). Because of that, if the ct in Router disappears for some reason in [1] with the packet sequence like below: Client > Server: sctp (1) [INIT] [init tag: 3201533963] Server > Client: sctp (1) [INIT ACK] [init tag: 972498433] Client > Server: sctp (1) [COOKIE ECHO] Server > Client: sctp (1) [COOKIE ACK] Client > Server: sctp (1) [DATA] (B)(E) [TSN: 3075057809] Server > Client: sctp (1) [SACK] [cum ack 3075057809] Server > Client: sctp (1) [HB REQ] (the ct in Router disappears somehow) <-------- [1] Client > Server: sctp (1) [HB ACK] Client > Server: sctp (1) [DATA] (B)(E) [TSN: 3075057810] Client > Server: sctp (1) [DATA] (B)(E) [TSN: 3075057810] Client > Server: sctp (1) [HB REQ] Client > Server: sctp (1) [DATA] (B)(E) [TSN: 3075057810] Client > Server: sctp (1) [HB REQ] Client > Server: sctp (1) [ABORT] when processing HB ACK packet in Router it calls sctp_new() to initialize the new ct with vtag[REPLY] set to HB_ACK packet's vtag. Later when sending DATA from Client, all the SACKs from Server will get dropped in Router, as the SACK packet's vtag does not match vtag[REPLY] in the ct. The worst thing is the vtag in this ct will never get fixed by the upcoming packets from Server. This patch fixes it by checking SCTP_CID_SHUTDOWN_ACK before setting vtag[REPLY] in the ct in sctp_new() as the annotation says. With this fix, it will leave vtag[REPLY] in ct to 0 in the case above, and the next HB REQ/ACK from Server is able to fix the vtag as its value is 0 in nf_conntrack_sctp_packet(). Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-01-31netfilter: nf_tables: restrict tunnel object to NFPROTO_NETDEVPablo Neira Ayuso
Bail out on using the tunnel dst template from other than netdev family. Add the infrastructure to check for the family in objects. Fixes: af308b94a2a4 ("netfilter: nf_tables: add tunnel support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-01-31netfilter: conntrack: correct window scaling with retransmitted SYNRyan Schaefer
commit c7aab4f17021 ("netfilter: nf_conntrack_tcp: re-init for syn packets only") introduces a bug where SYNs in ORIGINAL direction on reused 5-tuple result in incorrect window scale negotiation. This commit merged the SYN re-initialization and simultaneous open or SYN retransmits cases. Merging this block added the logic in tcp_init_sender() that performed window scale negotiation to the retransmitted syn case. Previously. this would only result in updating the sender's scale and flags. After the merge the additional logic results in improperly clearing the scale in ORIGINAL direction before any packets in the REPLY direction are received. This results in packets incorrectly being marked invalid for being out-of-window. This can be reproduced with the following trace: Packet Sequence: > Flags [S], seq 1687765604, win 62727, options [.. wscale 7], length 0 > Flags [S], seq 1944817196, win 62727, options [.. wscale 7], length 0 In order to fix the issue, only evaluate window negotiation for packets in the REPLY direction. This was tested with simultaneous open, fast open, and the above reproduction. Fixes: c7aab4f17021 ("netfilter: nf_conntrack_tcp: re-init for syn packets only") Signed-off-by: Ryan Schaefer <ryanschf@amazon.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-01-31Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Six small fixes. Five are obvious and in drivers. The last one is a core fix to remove the host lock acquisition and release, caused by a dynamic check of host_busy, in the error handling loop which has been reported to cause lockups" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: storvsc: Fix ring buffer size calculation scsi: core: Move scsi_host_busy() out of host lock for waking up EH handler scsi: MAINTAINERS: Update ibmvscsi_tgt maintainer scsi: initio: Remove redundant variable 'rb' scsi: virtio_scsi: Remove duplicate check if queue is broken scsi: isci: Fix an error code problem in isci_io_request_build()
2024-01-31selftests: net: add missing config for GENEVEMatthias May
l2_tos_ttl_inherit.sh verifies the inheritance of tos and ttl for GRETAP, VXLAN and GENEVE. Before testing it checks if the required module is available and if not skips the tests accordingly. Currently only GRETAP and VXLAN are tested because the GENEVE module is missing. Fixes: b690842d12fd ("selftests/net: test l2 tunnel TOS/TTL inheriting") Signed-off-by: Matthias May <matthias.may@westermo.com> Link: https://lore.kernel.org/r/20240130101157.196006-1-matthias.may@westermo.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-31kconfig: initialize sym->curr.tri to 'no' for all symbol types againMasahiro Yamada
Geert Uytterhoeven reported that commit 4e244c10eab3 ("kconfig: remove unneeded symbol_empty variable") changed the default value of CONFIG_LOG_CPU_MAX_BUF_SHIFT from 12 to 0. As it turned out, this is an undefined behavior because sym_calc_value() stopped setting the sym->curr.tri field for 'int', 'hex', and 'string' symbols. This commit restores the original behavior, where 'int', 'hex', 'string' symbols are interpreted as false if used in boolean contexts. CONFIG_LOG_CPU_MAX_BUF_SHIFT will default to 12 again, irrespective of CONFIG_BASE_SMALL. Presumably, this is not the intended behavior, as already reported [1], but this is another issue that should be addressed by a separate patch. [1]: https://lore.kernel.org/all/f6856be8-54b7-0fa0-1d17-39632bf29ada@oracle.com/ Fixes: 4e244c10eab3 ("kconfig: remove unneeded symbol_empty variable") Reported-by: Geert Uytterhoeven <geert+renesas@glider.be> Closes: https://lore.kernel.org/all/CAMuHMdWm6u1wX7efZQf=2XUAHascps76YQac6rdnQGhc8nop_Q@mail.gmail.com/ Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-01-31kbuild: rpm-pkg: simplify installkernel %postJose Ignacio Tornos Martinez
The new installkernel application that is now included in systemd-udev package allows installation although destination files are already present in the boot directory of the kernel package, but is failing with the implemented workaround for the old installkernel application from grubby package. For the new installkernel application, as Davide says: <<The %post currently does a shuffling dance before calling installkernel. This isn't actually necessary afaict, and the current implementation ends up triggering downstream issues such as https://github.com/systemd/systemd/issues/29568 This commit simplifies the logic to remove the shuffling. For reference, the original logic was added in commit 3c9c7a14b627("rpm-pkg: add %post section to create initramfs and grub hooks").>> But we need to keep the old behavior as well, because the old installkernel application from grubby package, does not allow this simplification and we need to be backward compatible to avoid issues with the different packages. Mimic Fedora shipping process and store vmlinuz, config amd System.map in the module directory instead of the boot directory. In this way, we will avoid the commented problem for all the cases, because the new destination files are not going to exist in the boot directory of the kernel package. Replace installkernel tool with kernel-install tool, because the latter is more complete. Besides, after installkernel tool execution, check to complete if the correct package files vmlinuz, System.map and config files are present in /boot directory, and if necessary, copy manually for install operation. In this way, take into account if files were not previously copied from /usr/lib/kernel/install.d/* scripts and if the suitable files for the requested package are present (it could be others if the rpm files were replace with a new pacakge with the same release and a different build). Tested with Fedora 38, Fedora 39, RHEL 9, Oracle Linux 9.3, openSUSE Tumbleweed and openMandrive ROME, using dnf/zypper and rpm tools. cc: stable@vger.kernel.org Co-Developed-by: Davide Cavalca <dcavalca@meta.com> Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-01-31kbuild: Replace tabs with spaces when followed by conditionalsDmitry Goncharov
This is needed for the future (post make-4.4.1) versions of gnu make. Starting from https://git.savannah.gnu.org/cgit/make.git/commit/?id=07fcee35f058a876447c8a021f9eb1943f902534 gnu make won't allow conditionals to follow recipe prefix. For example there is a tab followed by ifeq on line 324 in the root Makefile. With the new make this conditional causes the following $ make cpu.o /home/dgoncharov/src/linux-kbuild/Makefile:2063: *** missing 'endif'. Stop. make: *** [Makefile:240: __sub-make] Error 2 This patch replaces tabs followed by conditionals with 8 spaces. See https://savannah.gnu.org/bugs/?64185 and https://savannah.gnu.org/bugs/?64259 for details. Signed-off-by: Dmitry Goncharov <dgoncharov@users.sf.net> Reported-by: Martin Dorey <martin.dorey@hitachivantara.com> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-01-31modpost: avoid using the alias attributeMasahiro Yamada
Aiden Leong reported modpost fails to build on macOS since commit 16a473f60edc ("modpost: inform compilers that fatal() never returns"): scripts/mod/modpost.c:93:21: error: aliases are not supported on darwin Nathan's research indicates that Darwin seems to support weak aliases at least [1]. Although the situation might be improved in future Clang versions, we can achieve a similar outcome without relying on it. This commit makes fatal() a macro of error() + exit(1) in modpost.h, as compilers recognize that exit() never returns. [1]: https://github.com/llvm/llvm-project/issues/71001 Fixes: 16a473f60edc ("modpost: inform compilers that fatal() never returns") Reported-by: Aiden Leong <aiden.leong@aibsd.com> Closes: https://lore.kernel.org/all/d9ac2960-6644-4a87-b5e4-4bfb6e0364a8@aibsd.com/ Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-01-31kbuild: fix W= flags in the help messageMasahiro Yamada
W=c and W=e are supported. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
2024-01-31parisc: BTLB: Fix crash when setting up BTLB at CPU bringupHelge Deller
When using hotplug and bringing up a 32-bit CPU, ask the firmware about the BTLB information to set up the static (block) TLB entries. For that write access to the static btlb_info struct is needed, but since it is marked __ro_after_init the kernel segfaults with missing write permissions. Fix the crash by dropping the __ro_after_init annotation. Fixes: e5ef93d02d6c ("parisc: BTLB: Initialize BTLB tables at CPU startup") Signed-off-by: Helge Deller <deller@gmx.de> Cc: <stable@vger.kernel.org> # v6.6+
2024-01-31HID: bpf: use __bpf_kfunc instead of noinlineBenjamin Tissoires
Follow the docs at Documentation/bpf/kfuncs.rst: - declare the function with `__bpf_kfunc` - disables missing prototype warnings, which allows to remove them from include/linux/hid-bpf.h Removing the prototypes is not an issue because we currently have to redeclare them when writing the BPF program. They will eventually be generated by bpftool directly AFAIU. Link: https://lore.kernel.org/r/20240124-b4-hid-bpf-fixes-v2-3-052520b1e5e6@kernel.org Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
2024-01-31HID: bpf: actually free hdev memory after attaching a HID-BPF programBenjamin Tissoires
Turns out that I got my reference counts wrong and each successful bus_find_device() actually calls get_device(), and we need to manually call put_device(). Ensure each bus_find_device() gets a matching put_device() when releasing the bpf programs and fix all the error paths. Cc: <stable@vger.kernel.org> Fixes: f5c27da4e3c8 ("HID: initial BPF implementation") Link: https://lore.kernel.org/r/20240124-b4-hid-bpf-fixes-v2-2-052520b1e5e6@kernel.org Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
2024-01-31HID: bpf: remove double fdget()Benjamin Tissoires
When the kfunc hid_bpf_attach_prog() is called, we called twice fdget(): one for fetching the type of the bpf program, and one for actually attaching the program to the device. The problem is that between those two calls, we have no guarantees that the prog_fd is still the same file descriptor for the given program. Solve this by calling bpf_prog_get() earlier, and use this to fetch the program type. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/bpf/CAO-hwJJ8vh8JD3-P43L-_CLNmPx0hWj44aom0O838vfP4=_1CA@mail.gmail.com/T/#t Cc: <stable@vger.kernel.org> Fixes: f5c27da4e3c8 ("HID: initial BPF implementation") Link: https://lore.kernel.org/r/20240124-b4-hid-bpf-fixes-v2-1-052520b1e5e6@kernel.org Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
2024-01-30Merge tag 'erofs-for-6.8-rc3-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: - fix an infinite loop issue of sub-page compressed data support found with lengthy stress tests on a 64k-page arm64 VM - optimize the temporary buffer allocation for low-memory scenarios, which can reduce 20.21% on average under a heavy multi-app launch benchmark workload - get rid of unnecessary GFP_NOFS * tag 'erofs-for-6.8-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: relaxed temporary buffers allocation on readahead erofs: fix infinite loop due to a race of filling compressed_bvecs erofs: get rid of unneeded GFP_NOFS
2024-01-30Merge branch '1GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-01-29 (e1000e, ixgbe) This series contains updates to e1000e and ixgbe drivers. Jake corrects values used for maximum frequency adjustment for e1000e. Christophe Jaillet adjusts error handling path so that semaphore is released on ixgbe. * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ixgbe: Fix an error handling path in ixgbe_read_iosf_sb_reg_x550() e1000e: correct maximum frequency adjustment values ==================== Link: https://lore.kernel.org/r/20240129185240.787397-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-30devlink: Fix referring to hw_addr attribute during state validationParav Pandit
When port function state change is requested, and when the driver does not support it, it refers to the hw address attribute instead of state attribute. Seems like a copy paste error. Fix it by referring to the port function state attribute. Fixes: c0bea69d1ca7 ("devlink: Validate port function request") Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240129191059.129030-1-parav@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-30bridge: mcast: fix disabled snooping after long uptimeLinus Lüssing
The original idea of the delay_time check was to not apply multicast snooping too early when an MLD querier appears. And to instead wait at least for MLD reports to arrive before switching from flooding to group based, MLD snooped forwarding, to avoid temporary packet loss. However in a batman-adv mesh network it was noticed that after 248 days of uptime 32bit MIPS based devices would start to signal that they had stopped applying multicast snooping due to missing queriers - even though they were the elected querier and still sending MLD queries themselves. While time_is_before_jiffies() generally is safe against jiffies wrap-arounds, like the code comments in jiffies.h explain, it won't be able to track a difference larger than ULONG_MAX/2. With a 32bit large jiffies and one jiffies tick every 10ms (CONFIG_HZ=100) on these MIPS devices running OpenWrt this would result in a difference larger than ULONG_MAX/2 after 248 (= 2^32/100/60/60/24/2) days and time_is_before_jiffies() would then start to return false instead of true. Leading to multicast snooping not being applied to multicast packets anymore. Fix this issue by using a proper timer_list object which won't have this ULONG_MAX/2 difference limitation. Fixes: b00589af3b04 ("bridge: disable snooping if there is no querier") Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20240127175033.9640-1-linus.luessing@c0d3.blue Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-30selftests: net: Add missing matchall classifierIdo Schimmel
One of the test cases in the test_bridge_backup_port.sh selftest relies on a matchall classifier to drop unrelated traffic so that the Tx drop counter on the VXLAN device will only be incremented as a result of traffic generated by the test. However, the configuration option for the matchall classifier is missing from the configuration file which might explain the failures we see in the netdev CI [1]. Fix by adding CONFIG_NET_CLS_MATCHALL to the configuration file. [1] # Backup nexthop ID - invalid IDs # ------------------------------- [...] # TEST: Forwarding out of vx0 [ OK ] # TEST: No forwarding using backup nexthop ID [ OK ] # TEST: Tx drop increased [FAIL] # TEST: IPv6 address family nexthop as backup nexthop [ OK ] # TEST: No forwarding out of swp1 [ OK ] # TEST: Forwarding out of vx0 [ OK ] # TEST: No forwarding using backup nexthop ID [ OK ] # TEST: Tx drop increased [FAIL] [...] Fixes: b408453053fb ("selftests: net: Add bridge backup port and backup nexthop ID test") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20240129123703.1857843-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-30Merge tag 'linux_kselftest-kunit-fixes-6.8-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kunit fixes from Shuah Khan: "NULL vs IS_ERR() bug fixes, documentation update, MAINTAINERS file update to add Rae Moar as a reviewer, and a fix to run test suites only after module initialization completes" * tag 'linux_kselftest-kunit-fixes-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: Documentation: KUnit: Update the instructions on how to test static functions kunit: run test suites only after module initialization completes MAINTAINERS: kunit: Add Rae Moar as a reviewer kunit: device: Fix a NULL vs IS_ERR() check in init() kunit: Fix a NULL vs IS_ERR() bug
2024-01-30Merge tag 'linux_kselftest-fixes-6.8-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fixes from Shuah Khan: "Three fixes to livepatch, rseq, and seccomp tests" * tag 'linux_kselftest-fixes-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kselftest/seccomp: Report each expectation we assert as a KTAP test kselftest/seccomp: Use kselftest output functions for benchmark selftests/livepatch: fix and refactor new dmesg message code selftests/rseq: Do not skip !allowed_cpus for mm_cid
2024-01-30lsm: fix default return value of the socket_getpeersec_*() hooksOndrej Mosnacek
For these hooks the true "neutral" value is -EOPNOTSUPP, which is currently what is returned when no LSM provides this hook and what LSMs return when there is no security context set on the socket. Correct the value in <linux/lsm_hooks.h> and adjust the dispatch functions in security/security.c to avoid issues when the BPF LSM is enabled. Cc: stable@vger.kernel.org Fixes: 98e828a0650f ("security: Refactor declaration of LSM hooks") Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> [PM: subject line tweak] Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-01-30soc: apple: mailbox: error pointers are negative integersLinus Torvalds
In an entirely unrelated discussion where I pointed out a stupid thinko of mine, Rasmus piped up and noted that that obvious mistake already existed elsewhere in the kernel tree. An "error pointer" is the negative error value encoded as a pointer, making the whole "return error or valid pointer" use-case simple and straightforward. We use it all over the kernel. But the key here is that errors are _negative_ error numbers, not the horrid UNIX user-level model of "-1 and the value of 'errno'". The Apple mailbox driver used the positive error values, and thus just returned invalid normal pointers instead of actual errors. Of course, the reason nobody ever noticed is that the errors presumably never actually happen, so this is fixing a conceptual bug rather than an actual one. Reported-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Link: https://lore.kernel.org/all/5c30afe0-f9fb-45d5-9333-dd914a1ea93a@prevas.dk/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-01-30parisc: Fix random data corruption from exception handlerHelge Deller
The current exception handler implementation, which assists when accessing user space memory, may exhibit random data corruption if the compiler decides to use a different register than the specified register %r29 (defined in ASM_EXCEPTIONTABLE_REG) for the error code. If the compiler choose another register, the fault handler will nevertheless store -EFAULT into %r29 and thus trash whatever this register is used for. Looking at the assembly I found that this happens sometimes in emulate_ldd(). To solve the issue, the easiest solution would be if it somehow is possible to tell the fault handler which register is used to hold the error code. Using %0 or %1 in the inline assembly is not posssible as it will show up as e.g. %r29 (with the "%r" prefix), which the GNU assembler can not convert to an integer. This patch takes another, better and more flexible approach: We extend the __ex_table (which is out of the execution path) by one 32-word. In this word we tell the compiler to insert the assembler instruction "or %r0,%r0,%reg", where %reg references the register which the compiler choosed for the error return code. In case of an access failure, the fault handler finds the __ex_table entry and can examine the opcode. The used register is encoded in the lowest 5 bits, and the fault handler can then store -EFAULT into this register. Since we extend the __ex_table to 3 words we can't use the BUILDTIME_TABLE_SORT config option any longer. Signed-off-by: Helge Deller <deller@gmx.de> Cc: <stable@vger.kernel.org> # v6.0+
2024-01-30kselftest/seccomp: Report each expectation we assert as a KTAP testMark Brown
The seccomp benchmark test makes a number of checks on the performance it measures and logs them to the output but does so in a custom format which none of the automated test runners understand meaning that the chances that anyone is paying attention are slim. Let's additionally log each result in KTAP format so that automated systems parsing the test output will see each comparison as a test case. The original logs are left in place since they provide the actual numbers for analysis. As part of this rework the flow for the main program so that when we skip tests we still log all the tests we skip, this is because the standard KTAP headers and footers include counts of the number of expected and run tests. Tested-by: Anders Roxell <anders.roxell@linaro.org> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-01-30kselftest/seccomp: Use kselftest output functions for benchmarkMark Brown
In preparation for trying to output the test results themselves in TAP format rework all the prints in the benchmark to use the kselftest output functions. The uses of system() all produce single line output so we can avoid having to deal with fully managing the child process and continue to use system() by simply printing an empty message before we invoke system(). We also leave one printf() used to complete a line of output in place. Tested-by: Anders Roxell <anders.roxell@linaro.org> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-01-30selftests/livepatch: fix and refactor new dmesg message codeJoe Lawrence
The livepatching kselftests rely on comparing expected vs. observed dmesg output. After each test, new dmesg entries are determined by the 'comm' utility comparing a saved, pre-test copy of dmesg to post-test dmesg output. Alexander reports that the 'comm --nocheck-order -13' invocation used by the tests can be confused when dmesg entry timestamps vary in magnitude (ie, "[ 98.820331]" vs. "[ 100.031067]"), in which case, additional messages are reported as new. The unexpected entries then spoil the test results. Instead of relying on 'comm' or 'diff' to determine new testing dmesg entries, refactor the code: - pre-test : log a unique canary dmesg entry - test : run tests, log messages - post-test : filter dmesg starting from pre-test message Reported-by: Alexander Gordeev <agordeev@linux.ibm.com> Closes: https://lore.kernel.org/live-patching/ZYAimyPYhxVA9wKg@li-008a6a4c-3549-11b2-a85c-c5cc2836eea2.ibm.com/ Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Tested-by: Marcos Paulo de Souza <mpdesouza@suse.com> Reviewed-by: Marcos Paulo de Souza <mpdesouza@suse.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-01-30spi: sh-msiof: avoid integer overflow in constantsWolfram Sang
cppcheck rightfully warned: drivers/spi/spi-sh-msiof.c:792:28: warning: Signed integer overflow for expression '7<<29'. [integerOverflow] sh_msiof_write(p, SIFCTR, SIFCTR_TFWM_1 | SIFCTR_RFWM_1); Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://msgid.link/r/20240130094053.10672-1-wsa+renesas@sang-engineering.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-01-30regulator (max5970): Fix IRQ handlerPatrick Rudolph
The max5970 datasheet gives the impression that IRQ status bits must be cleared by writing a one to set bits, as those are marked with 'R/C', however tests showed that a zero must be written. Fixes an IRQ storm as the interrupt handler actually clears the IRQ status bits. Signed-off-by: Patrick Rudolph <patrick.rudolph@9elements.com> Signed-off-by: Naresh Solanki <naresh.solanki@9elements.com> Link: https://msgid.link/r/20240130150257.3643657-1-naresh.solanki@9elements.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-01-30llc: call sock_orphan() at release timeEric Dumazet
syzbot reported an interesting trace [1] caused by a stale sk->sk_wq pointer in a closed llc socket. In commit ff7b11aa481f ("net: socket: set sock->sk to NULL after calling proto_ops::release()") Eric Biggers hinted that some protocols are missing a sock_orphan(), we need to perform a full audit. In net-next, I plan to clear sock->sk from sock_orphan() and amend Eric patch to add a warning. [1] BUG: KASAN: slab-use-after-free in list_empty include/linux/list.h:373 [inline] BUG: KASAN: slab-use-after-free in waitqueue_active include/linux/wait.h:127 [inline] BUG: KASAN: slab-use-after-free in sock_def_write_space_wfree net/core/sock.c:3384 [inline] BUG: KASAN: slab-use-after-free in sock_wfree+0x9a8/0x9d0 net/core/sock.c:2468 Read of size 8 at addr ffff88802f4fc880 by task ksoftirqd/1/27 CPU: 1 PID: 27 Comm: ksoftirqd/1 Not tainted 6.8.0-rc1-syzkaller-00049-g6098d87eaf31 #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:377 [inline] print_report+0xc4/0x620 mm/kasan/report.c:488 kasan_report+0xda/0x110 mm/kasan/report.c:601 list_empty include/linux/list.h:373 [inline] waitqueue_active include/linux/wait.h:127 [inline] sock_def_write_space_wfree net/core/sock.c:3384 [inline] sock_wfree+0x9a8/0x9d0 net/core/sock.c:2468 skb_release_head_state+0xa3/0x2b0 net/core/skbuff.c:1080 skb_release_all net/core/skbuff.c:1092 [inline] napi_consume_skb+0x119/0x2b0 net/core/skbuff.c:1404 e1000_unmap_and_free_tx_resource+0x144/0x200 drivers/net/ethernet/intel/e1000/e1000_main.c:1970 e1000_clean_tx_irq drivers/net/ethernet/intel/e1000/e1000_main.c:3860 [inline] e1000_clean+0x4a1/0x26e0 drivers/net/ethernet/intel/e1000/e1000_main.c:3801 __napi_poll.constprop.0+0xb4/0x540 net/core/dev.c:6576 napi_poll net/core/dev.c:6645 [inline] net_rx_action+0x956/0xe90 net/core/dev.c:6778 __do_softirq+0x21a/0x8de kernel/softirq.c:553 run_ksoftirqd kernel/softirq.c:921 [inline] run_ksoftirqd+0x31/0x60 kernel/softirq.c:913 smpboot_thread_fn+0x660/0xa10 kernel/smpboot.c:164 kthread+0x2c6/0x3a0 kernel/kthread.c:388 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242 </TASK> Allocated by task 5167: kasan_save_stack+0x33/0x50 mm/kasan/common.c:47 kasan_save_track+0x14/0x30 mm/kasan/common.c:68 unpoison_slab_object mm/kasan/common.c:314 [inline] __kasan_slab_alloc+0x81/0x90 mm/kasan/common.c:340 kasan_slab_alloc include/linux/kasan.h:201 [inline] slab_post_alloc_hook mm/slub.c:3813 [inline] slab_alloc_node mm/slub.c:3860 [inline] kmem_cache_alloc_lru+0x142/0x6f0 mm/slub.c:3879 alloc_inode_sb include/linux/fs.h:3019 [inline] sock_alloc_inode+0x25/0x1c0 net/socket.c:308 alloc_inode+0x5d/0x220 fs/inode.c:260 new_inode_pseudo+0x16/0x80 fs/inode.c:1005 sock_alloc+0x40/0x270 net/socket.c:634 __sock_create+0xbc/0x800 net/socket.c:1535 sock_create net/socket.c:1622 [inline] __sys_socket_create net/socket.c:1659 [inline] __sys_socket+0x14c/0x260 net/socket.c:1706 __do_sys_socket net/socket.c:1720 [inline] __se_sys_socket net/socket.c:1718 [inline] __x64_sys_socket+0x72/0xb0 net/socket.c:1718 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xd3/0x250 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b Freed by task 0: kasan_save_stack+0x33/0x50 mm/kasan/common.c:47 kasan_save_track+0x14/0x30 mm/kasan/common.c:68 kasan_save_free_info+0x3f/0x60 mm/kasan/generic.c:640 poison_slab_object mm/kasan/common.c:241 [inline] __kasan_slab_free+0x121/0x1b0 mm/kasan/common.c:257 kasan_slab_free include/linux/kasan.h:184 [inline] slab_free_hook mm/slub.c:2121 [inline] slab_free mm/slub.c:4299 [inline] kmem_cache_free+0x129/0x350 mm/slub.c:4363 i_callback+0x43/0x70 fs/inode.c:249 rcu_do_batch kernel/rcu/tree.c:2158 [inline] rcu_core+0x819/0x1680 kernel/rcu/tree.c:2433 __do_softirq+0x21a/0x8de kernel/softirq.c:553 Last potentially related work creation: kasan_save_stack+0x33/0x50 mm/kasan/common.c:47 __kasan_record_aux_stack+0xba/0x100 mm/kasan/generic.c:586 __call_rcu_common.constprop.0+0x9a/0x7b0 kernel/rcu/tree.c:2683 destroy_inode+0x129/0x1b0 fs/inode.c:315 iput_final fs/inode.c:1739 [inline] iput.part.0+0x560/0x7b0 fs/inode.c:1765 iput+0x5c/0x80 fs/inode.c:1755 dentry_unlink_inode+0x292/0x430 fs/dcache.c:400 __dentry_kill+0x1ca/0x5f0 fs/dcache.c:603 dput.part.0+0x4ac/0x9a0 fs/dcache.c:845 dput+0x1f/0x30 fs/dcache.c:835 __fput+0x3b9/0xb70 fs/file_table.c:384 task_work_run+0x14d/0x240 kernel/task_work.c:180 exit_task_work include/linux/task_work.h:38 [inline] do_exit+0xa8a/0x2ad0 kernel/exit.c:871 do_group_exit+0xd4/0x2a0 kernel/exit.c:1020 __do_sys_exit_group kernel/exit.c:1031 [inline] __se_sys_exit_group kernel/exit.c:1029 [inline] __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:1029 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xd3/0x250 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b The buggy address belongs to the object at ffff88802f4fc800 which belongs to the cache sock_inode_cache of size 1408 The buggy address is located 128 bytes inside of freed 1408-byte region [ffff88802f4fc800, ffff88802f4fcd80) The buggy address belongs to the physical page: page:ffffea0000bd3e00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2f4f8 head:ffffea0000bd3e00 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0 anon flags: 0xfff00000000840(slab|head|node=0|zone=1|lastcpupid=0x7ff) page_type: 0xffffffff() raw: 00fff00000000840 ffff888013b06b40 0000000000000000 0000000000000001 raw: 0000000000000000 0000000080150015 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 3, migratetype Reclaimable, gfp_mask 0xd20d0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE), pid 4956, tgid 4956 (sshd), ts 31423924727, free_ts 0 set_page_owner include/linux/page_owner.h:31 [inline] post_alloc_hook+0x2d0/0x350 mm/page_alloc.c:1533 prep_new_page mm/page_alloc.c:1540 [inline] get_page_from_freelist+0xa28/0x3780 mm/page_alloc.c:3311 __alloc_pages+0x22f/0x2440 mm/page_alloc.c:4567 __alloc_pages_node include/linux/gfp.h:238 [inline] alloc_pages_node include/linux/gfp.h:261 [inline] alloc_slab_page mm/slub.c:2190 [inline] allocate_slab mm/slub.c:2354 [inline] new_slab+0xcc/0x3a0 mm/slub.c:2407 ___slab_alloc+0x4af/0x19a0 mm/slub.c:3540 __slab_alloc.constprop.0+0x56/0xa0 mm/slub.c:3625 __slab_alloc_node mm/slub.c:3678 [inline] slab_alloc_node mm/slub.c:3850 [inline] kmem_cache_alloc_lru+0x379/0x6f0 mm/slub.c:3879 alloc_inode_sb include/linux/fs.h:3019 [inline] sock_alloc_inode+0x25/0x1c0 net/socket.c:308 alloc_inode+0x5d/0x220 fs/inode.c:260 new_inode_pseudo+0x16/0x80 fs/inode.c:1005 sock_alloc+0x40/0x270 net/socket.c:634 __sock_create+0xbc/0x800 net/socket.c:1535 sock_create net/socket.c:1622 [inline] __sys_socket_create net/socket.c:1659 [inline] __sys_socket+0x14c/0x260 net/socket.c:1706 __do_sys_socket net/socket.c:1720 [inline] __se_sys_socket net/socket.c:1718 [inline] __x64_sys_socket+0x72/0xb0 net/socket.c:1718 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xd3/0x250 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b page_owner free stack trace missing Memory state around the buggy address: ffff88802f4fc780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88802f4fc800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff88802f4fc880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88802f4fc900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88802f4fc980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: 43815482370c ("net: sock_def_readable() and friends RCU conversion") Reported-and-tested-by: syzbot+32b89eaa102b372ff76d@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240126165532.3396702-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-30Merge branch 'net-stmmac-dwmac-imx-time-based-scheduling-support'Paolo Abeni
Esben Haabendal says: ==================== net: stmmac: dwmac-imx: Time Based Scheduling support This small patch series allows using TBS support of the i.MX Ethernet QOS controller for etf qdisc offload. It achieves this in a similar manner that it is done in dwmac-intel.c, dwmac-mediatek.c and stmmac_pci.c. Changes since v1: - Simplified for loop by starting at index 1. - Fixed problem with indentation. ==================== Link: https://lore.kernel.org/r/cover.1706256158.git.esben@geanix.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>