summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-04-08Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Four small fixes, all in drivers. They're all one or two lines except for the ufs one, but that's a simple revert of a previous feature" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: iscsi_tcp: Check that sock is valid before iscsi_set_param() scsi: qla2xxx: Fix memory leak in qla2x00_probe_one() scsi: mpi3mr: Handle soft reset in progress fault code (0xF002) scsi: Revert "scsi: ufs: core: Initialize devfreq synchronously"
2023-04-08Merge tag 'block-6.3-2023-04-06' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - Ensure that ublk always reads the whole sqe upfront (me) - Fix for a block size probing issue with ublk (Ming) - Fix for the bio based polling (Keith) - NVMe pull request via Christoph: - fix discard support without oncs (Keith Busch) - Partition scan error handling regression fix (Yu) * tag 'block-6.3-2023-04-06' of git://git.kernel.dk/linux: block: don't set GD_NEED_PART_SCAN if scan partition failed block: ublk: make sure that block size is set correctly ublk: read any SQE values upfront nvme: fix discard support without oncs blk-mq: directly poll requests
2023-04-08Merge tag 'io_uring-6.3-2023-04-06' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring fixes from Jens Axboe: "Just two minor fixes for provided buffers - one where we could potentially leak a buffer, and one where the returned values was off-by-one in some cases" * tag 'io_uring-6.3-2023-04-06' of git://git.kernel.dk/linux: io_uring: fix memory leak when removing provided buffers io_uring: fix return value when removing provided buffers
2023-04-08Merge tag 'dma-mapping-6.3-2023-04-08' of ↵Linus Torvalds
git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping fix from Christoph Hellwig: - fix a braino in the swiotlb alignment check fix (Petr Tesarik) * tag 'dma-mapping-6.3-2023-04-08' of git://git.infradead.org/users/hch/dma-mapping: swiotlb: fix a braino in the alignment check fix
2023-04-08Merge tag 'trace-v6.3-rc5-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: "A couple more minor fixes: - Reset direct->addr back to its original value on error in updating the direct trampoline code - Make lastcmd_mutex static" * tag 'trace-v6.3-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/synthetic: Make lastcmd_mutex static ftrace: Fix issue that 'direct->addr' not restored in modify_ftrace_direct()
2023-04-08Merge tag 'mm-hotfixes-stable-2023-04-07-16-23' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM fixes from Andrew Morton: "28 hotfixes. 23 are cc:stable and the other five address issues which were introduced during this merge cycle. 20 are for MM and the remainder are for other subsystems" * tag 'mm-hotfixes-stable-2023-04-07-16-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (28 commits) maple_tree: fix a potential concurrency bug in RCU mode maple_tree: fix get wrong data_end in mtree_lookup_walk() mm/swap: fix swap_info_struct race between swapoff and get_swap_pages() nilfs2: fix sysfs interface lifetime mm: take a page reference when removing device exclusive entries mm: vmalloc: avoid warn_alloc noise caused by fatal signal nilfs2: initialize "struct nilfs_binfo_dat"->bi_pad field nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread() zsmalloc: document freeable stats zsmalloc: document new fullness grouping fsdax: force clear dirty mark if CoW mm/hugetlb: fix uffd wr-protection for CoW optimization path mm: enable maple tree RCU mode by default maple_tree: add RCU lock checking to rcu callback functions maple_tree: add smp_rmb() to dead node detection maple_tree: fix write memory barrier of nodes once dead for RCU mode maple_tree: remove extra smp_wmb() from mas_dead_leaves() maple_tree: fix freeing of nodes in rcu mode maple_tree: detect dead nodes in mas_start() maple_tree: be more cautious about dead nodes ...
2023-04-07r8152: Add __GFP_NOWARN to big allocationsDouglas Anderson
When memory is a little tight on my system, it's pretty easy to see warnings that look like this. ksoftirqd/0: page allocation failure: order:3, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0 ... Call trace: dump_backtrace+0x0/0x1e8 show_stack+0x20/0x2c dump_stack_lvl+0x60/0x78 dump_stack+0x18/0x38 warn_alloc+0x104/0x174 __alloc_pages+0x588/0x67c alloc_rx_agg+0xa0/0x190 [r8152 ...] r8152_poll+0x270/0x760 [r8152 ...] __napi_poll+0x44/0x1ec net_rx_action+0x100/0x300 __do_softirq+0xec/0x38c run_ksoftirqd+0x38/0xec smpboot_thread_fn+0xb8/0x248 kthread+0x134/0x154 ret_from_fork+0x10/0x20 On a fragmented system it's normal that order 3 allocations will sometimes fail, especially atomic ones. The driver handles these failures fine and the WARN just creates spam in the logs for this case. The __GFP_NOWARN flag is exactly for this situation, so add it to the allocation. NOTE: my testing is on a 5.15 system, but there should be no reason that this would be fundamentally different on a mainline kernel. Signed-off-by: Douglas Anderson <dianders@chromium.org> Acked-by: Hayes Wang <hayeswang@realtek.com> Link: https://lore.kernel.org/r/20230406171411.1.I84dbef45786af440fd269b71e9436a96a8e7a152@changeid Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-07net: make SO_BUSY_POLL available to all usersEric Dumazet
After commit 217f69743681 ("net: busy-poll: allow preemption in sk_busy_loop()"), a thread willing to use busy polling is not hurting other threads anymore in a non preempt kernel. I think it is safe to remove CAP_NET_ADMIN check. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230406194634.1804691-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-07Merge branch 'net-stmmac-dwmac-anarion-address-issues-flagged-by-sparse'Jakub Kicinski
Simon Horman says: ==================== net: stmmac: dwmac-anarion: address issues flagged by sparse Two minor enhancements to dwmac-anarion to address issues flagged by sparse. 1. Always return struct anarion_gmac * from anarion_config_dt() 2. Add __iomem annotation to register base No functional change intended. Compile tested only. ==================== Link: https://lore.kernel.org/r/20230406-dwmac-anarion-sparse-v1-0-b0c866c8be9d@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-07net: stmmac: dwmac-anarion: Always return struct anarion_gmac * from ↵Simon Horman
anarion_config_dt() Always return struct anarion_gmac * from anarion_config_dt(). In the case where ctl_block was an error pointer it was being returned directly. Which sparse flags as follows: .../dwmac-anarion.c:73:24: warning: incorrect type in return expression (different address spaces) .../dwmac-anarion.c:73:24: expected struct anarion_gmac * .../dwmac-anarion.c:73:24: got void [noderef] __iomem *[assigned] ctl_block Avoid this by converting the error pointer to an error. And then reversing the conversion. As a side effect, the error can be used for logging purposes, subjectively, leading to a minor cleanup. No functional change intended. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-07net: stmmac: dwmac-anarion: Use annotation __iomem for register baseSimon Horman
Use __iomem annotation the register base: the ctl_block field of struct anarion_gmac. I believe this is the normal practice for such variables. By doing so some casting is avoided. And sparse no longer reports: .../dwmac-anarion.c:29:23: warning: incorrect type in argument 1 (different address spaces) .../dwmac-anarion.c:29:23: expected void const volatile [noderef] __iomem *addr .../dwmac-anarion.c:29:23: got void * .../dwmac-anarion.c:34:22: warning: incorrect type in argument 2 (different address spaces) .../dwmac-anarion.c:34:22: expected void volatile [noderef] __iomem *addr .../dwmac-anarion.c:34:22: got void * No functional change intended. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-07net: stmmac: remove set but unused mask in stmmac_ethtool_set_link_ksettings()Vladimir Oltean
This was never used since the commit that added it - e58bb43f5e43 ("stmmac: initial support to manage pcs modes"). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230406125412.48790-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-07Merge tag 'ipsec-esn-replay' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Leon Romanovsky says: ==================== Improve IPsec limits, ESN and replay window This series overcomes existing hardware limitations in Mellanox ConnectX devices around handling IPsec soft and hard limits. In addition, the ESN logic is tied and added an interface to configure replay window sequence numbers through existing iproute2 interface. ip xfrm state ... [ replay-seq SEQ ] [ replay-oseq SEQ ] [ replay-seq-hi SEQ ] [ replay-oseq-hi SEQ ] Link: https://lore.kernel.org/all/cover.1680162300.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org> * tag 'ipsec-esn-replay' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5e: Simulate missing IPsec TX limits hardware functionality net/mlx5e: Generalize IPsec work structs net/mlx5e: Reduce contention in IPsec workqueue net/mlx5e: Set IPsec replay sequence numbers net/mlx5e: Remove ESN callbacks if it is not supported xfrm: don't require advance ESN callback for packet offload net/mlx5e: Overcome slow response for first IPsec ASO WQE net/mlx5e: Add SW implementation to support IPsec 64 bit soft and hard limits net/mlx5e: Prevent zero IPsec soft/hard limits net/mlx5e: Factor out IPsec ASO update function ==================== Link: https://lore.kernel.org/r/20230406071902.712388-1-leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-07net: phy: nxp-c45-tja11xx: fix unsigned long multiplication overflowRadu Pirea (OSS)
Any multiplication between GENMASK(31, 0) and a number bigger than 1 will be truncated because of the overflow, if the size of unsigned long is 32 bits. Replaced GENMASK with GENMASK_ULL to make sure that multiplication will be between 64 bits values. Cc: <stable@vger.kernel.org> # 5.15+ Fixes: 514def5dd339 ("phy: nxp-c45-tja11xx: add timestamping support") Signed-off-by: Radu Pirea (OSS) <radu-nicolae.pirea@oss.nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230406095953.75622-1-radu-nicolae.pirea@oss.nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-07Merge branch 'add-support-for-j784s4-cpsw9g'Jakub Kicinski
Siddharth Vadapalli says: ==================== Add support for J784S4 CPSW9G This series adds a new compatible to am65-cpsw driver for the CPSW9G instance of the CPSW Ethernet Switch on TI's J784S4 SoC which has 8 external ports and 1 internal host port. The CPSW9G instance supports QSGMII and USXGMII modes for which driver support is added. Additionally, the interface mode specific configurations are moved to the am65_cpsw_nuss_mac_config() callback. Also, a TODO comment is added for verifying whether in-band mode is necessary for 10 Mbps RGMII mode. NOTE: I have verified that the mac_config() operations are preserved across link up and link down events for SGMII and USXGMII mode with the new implementation in this series, as suggested by: Russell King <linux@armlinux.org.uk> For patches 1 and 3 of this series, I believe that the following tag: Suggested-by: Russell King <linux@armlinux.org.uk> should be added. However, I did not add it since I did not yet get the permission to do so. I will be happy if the tags are added, since the new implementation is almost entirely based on Russell's suggestion, with minor changes made by me. v2: https://lore.kernel.org/r/20230403110106.983994-1-s-vadapalli@ti.com/ v1: https://lore.kernel.org/r/20230331065110.604516-1-s-vadapalli@ti.com/ ==================== Link: https://lore.kernel.org/r/20230404061459.1100519-1-s-vadapalli@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-07net: ethernet: ti: am65-cpsw: Enable USXGMII mode for J784S4 CPSW9GSiddharth Vadapalli
TI's J784S4 SoC supports USXGMII mode. Add USXGMII mode to the extra_modes member of the J784S4 SoC data. Configure MAC control register for supporting USXGMII mode and add MAC_5000FD in the "mac_capabilities" member of struct "phylink_config". Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-07net: ethernet: ti: am65-cpsw: Enable QSGMII for J784S4 CPSW9GSiddharth Vadapalli
TI's J784S4 SoC supports QSGMII mode with the CPSW9G instance of the CPSW Ethernet Switch. Add a new compatible for J784S4 SoC and enable QSGMII support for it by adding QSGMII mode to the extra_modes member of the "j784s4_cpswxg_pdata" SoC data. Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-07net: ethernet: ti: am65-cpsw: Move mode specific config to mac_config()Siddharth Vadapalli
Move the interface mode specific configuration to the mac_config() callback am65_cpsw_nuss_mac_config(). Also, do not reset the MAC Control register on mac_link_down(). Only clear those bits that can possibly be set in mac_link_up(). Let the MAC remain in IDLE state after mac_link_down(). Bring it out of the IDLE state on mac_link_up(). Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-07net: openvswitch: fix race on port outputFelix Huettner
assume the following setup on a single machine: 1. An openvswitch instance with one bridge and default flows 2. two network namespaces "server" and "client" 3. two ovs interfaces "server" and "client" on the bridge 4. for each ovs interface a veth pair with a matching name and 32 rx and tx queues 5. move the ends of the veth pairs to the respective network namespaces 6. assign ip addresses to each of the veth ends in the namespaces (needs to be the same subnet) 7. start some http server on the server network namespace 8. test if a client in the client namespace can reach the http server when following the actions below the host has a chance of getting a cpu stuck in a infinite loop: 1. send a large amount of parallel requests to the http server (around 3000 curls should work) 2. in parallel delete the network namespace (do not delete interfaces or stop the server, just kill the namespace) there is a low chance that this will cause the below kernel cpu stuck message. If this does not happen just retry. Below there is also the output of bpftrace for the functions mentioned in the output. The series of events happening here is: 1. the network namespace is deleted calling `unregister_netdevice_many_notify` somewhere in the process 2. this sets first `NETREG_UNREGISTERING` on both ends of the veth and then runs `synchronize_net` 3. it then calls `call_netdevice_notifiers` with `NETDEV_UNREGISTER` 4. this is then handled by `dp_device_event` which calls `ovs_netdev_detach_dev` (if a vport is found, which is the case for the veth interface attached to ovs) 5. this removes the rx_handlers of the device but does not prevent packages to be sent to the device 6. `dp_device_event` then queues the vport deletion to work in background as a ovs_lock is needed that we do not hold in the unregistration path 7. `unregister_netdevice_many_notify` continues to call `netdev_unregister_kobject` which sets `real_num_tx_queues` to 0 8. port deletion continues (but details are not relevant for this issue) 9. at some future point the background task deletes the vport If after 7. but before 9. a packet is send to the ovs vport (which is not deleted at this point in time) which forwards it to the `dev_queue_xmit` flow even though the device is unregistering. In `skb_tx_hash` (which is called in the `dev_queue_xmit`) path there is a while loop (if the packet has a rx_queue recorded) that is infinite if `dev->real_num_tx_queues` is zero. To prevent this from happening we update `do_output` to handle devices without carrier the same as if the device is not found (which would be the code path after 9. is done). Additionally we now produce a warning in `skb_tx_hash` if we will hit the infinite loop. bpftrace (first word is function name): __dev_queue_xmit server: real_num_tx_queues: 1, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 1 netdev_core_pick_tx server: addr: 0xffff9f0a46d4a000 real_num_tx_queues: 1, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 1 dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 2, reg_state: 1 synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024 synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024 synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024 synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024 dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 6, reg_state: 2 ovs_netdev_detach_dev server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, reg_state: 2 netdev_rx_handler_unregister server: real_num_tx_queues: 1, cpu: 9, pid: 21024, tid: 21024, reg_state: 2 synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024 netdev_rx_handler_unregister ret server: real_num_tx_queues: 1, cpu: 9, pid: 21024, tid: 21024, reg_state: 2 dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 27, reg_state: 2 dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 22, reg_state: 2 dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 18, reg_state: 2 netdev_unregister_kobject: real_num_tx_queues: 1, cpu: 9, pid: 21024, tid: 21024 synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024 ovs_vport_send server: real_num_tx_queues: 0, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 2 __dev_queue_xmit server: real_num_tx_queues: 0, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 2 netdev_core_pick_tx server: addr: 0xffff9f0a46d4a000 real_num_tx_queues: 0, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 2 broken device server: real_num_tx_queues: 0, cpu: 2, pid: 28024, tid: 28024 ovs_dp_detach_port server: real_num_tx_queues: 0 cpu 9, pid: 9124, tid: 9124, reg_state: 2 synchronize_rcu_expedited: cpu 9, pid: 33604, tid: 33604 stuck message: watchdog: BUG: soft lockup - CPU#5 stuck for 26s! [curl:1929279] Modules linked in: veth pktgen bridge stp llc ip_set_hash_net nft_counter xt_set nft_compat nf_tables ip_set_hash_ip ip_set nfnetlink_cttimeout nfnetlink openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 tls binfmt_misc nls_iso8859_1 input_leds joydev serio_raw dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua sch_fq_codel drm efi_pstore virtio_rng ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel virtio_net ahci net_failover crypto_simd cryptd psmouse libahci virtio_blk failover CPU: 5 PID: 1929279 Comm: curl Not tainted 5.15.0-67-generic #74-Ubuntu Hardware name: OpenStack Foundation OpenStack Nova, BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:netdev_pick_tx+0xf1/0x320 Code: 00 00 8d 48 ff 0f b7 c1 66 39 ca 0f 86 e9 01 00 00 45 0f b7 ff 41 39 c7 0f 87 5b 01 00 00 44 29 f8 41 39 c7 0f 87 4f 01 00 00 <eb> f2 0f 1f 44 00 00 49 8b 94 24 28 04 00 00 48 85 d2 0f 84 53 01 RSP: 0018:ffffb78b40298820 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff9c8773adc2e0 RCX: 000000000000083f RDX: 0000000000000000 RSI: ffff9c8773adc2e0 RDI: ffff9c870a25e000 RBP: ffffb78b40298858 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9c870a25e000 R13: ffff9c870a25e000 R14: ffff9c87fe043480 R15: 0000000000000000 FS: 00007f7b80008f00(0000) GS:ffff9c8e5f740000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7b80f6a0b0 CR3: 0000000329d66000 CR4: 0000000000350ee0 Call Trace: <IRQ> netdev_core_pick_tx+0xa4/0xb0 __dev_queue_xmit+0xf8/0x510 ? __bpf_prog_exit+0x1e/0x30 dev_queue_xmit+0x10/0x20 ovs_vport_send+0xad/0x170 [openvswitch] do_output+0x59/0x180 [openvswitch] do_execute_actions+0xa80/0xaa0 [openvswitch] ? kfree+0x1/0x250 ? kfree+0x1/0x250 ? kprobe_perf_func+0x4f/0x2b0 ? flow_lookup.constprop.0+0x5c/0x110 [openvswitch] ovs_execute_actions+0x4c/0x120 [openvswitch] ovs_dp_process_packet+0xa1/0x200 [openvswitch] ? ovs_ct_update_key.isra.0+0xa8/0x120 [openvswitch] ? ovs_ct_fill_key+0x1d/0x30 [openvswitch] ? ovs_flow_key_extract+0x2db/0x350 [openvswitch] ovs_vport_receive+0x77/0xd0 [openvswitch] ? __htab_map_lookup_elem+0x4e/0x60 ? bpf_prog_680e8aff8547aec1_kfree+0x3b/0x714 ? trace_call_bpf+0xc8/0x150 ? kfree+0x1/0x250 ? kfree+0x1/0x250 ? kprobe_perf_func+0x4f/0x2b0 ? kprobe_perf_func+0x4f/0x2b0 ? __mod_memcg_lruvec_state+0x63/0xe0 netdev_port_receive+0xc4/0x180 [openvswitch] ? netdev_port_receive+0x180/0x180 [openvswitch] netdev_frame_hook+0x1f/0x40 [openvswitch] __netif_receive_skb_core.constprop.0+0x23d/0xf00 __netif_receive_skb_one_core+0x3f/0xa0 __netif_receive_skb+0x15/0x60 process_backlog+0x9e/0x170 __napi_poll+0x33/0x180 net_rx_action+0x126/0x280 ? ttwu_do_activate+0x72/0xf0 __do_softirq+0xd9/0x2e7 ? rcu_report_exp_cpu_mult+0x1b0/0x1b0 do_softirq+0x7d/0xb0 </IRQ> <TASK> __local_bh_enable_ip+0x54/0x60 ip_finish_output2+0x191/0x460 __ip_finish_output+0xb7/0x180 ip_finish_output+0x2e/0xc0 ip_output+0x78/0x100 ? __ip_finish_output+0x180/0x180 ip_local_out+0x5e/0x70 __ip_queue_xmit+0x184/0x440 ? tcp_syn_options+0x1f9/0x300 ip_queue_xmit+0x15/0x20 __tcp_transmit_skb+0x910/0x9c0 ? __mod_memcg_state+0x44/0xa0 tcp_connect+0x437/0x4e0 ? ktime_get_with_offset+0x60/0xf0 tcp_v4_connect+0x436/0x530 __inet_stream_connect+0xd4/0x3a0 ? kprobe_perf_func+0x4f/0x2b0 ? aa_sk_perm+0x43/0x1c0 inet_stream_connect+0x3b/0x60 __sys_connect_file+0x63/0x70 __sys_connect+0xa6/0xd0 ? setfl+0x108/0x170 ? do_fcntl+0xe8/0x5a0 __x64_sys_connect+0x18/0x20 do_syscall_64+0x5c/0xc0 ? __x64_sys_fcntl+0xa9/0xd0 ? exit_to_user_mode_prepare+0x37/0xb0 ? syscall_exit_to_user_mode+0x27/0x50 ? do_syscall_64+0x69/0xc0 ? __sys_setsockopt+0xea/0x1e0 ? exit_to_user_mode_prepare+0x37/0xb0 ? syscall_exit_to_user_mode+0x27/0x50 ? __x64_sys_setsockopt+0x1f/0x30 ? do_syscall_64+0x69/0xc0 ? irqentry_exit+0x1d/0x30 ? exc_page_fault+0x89/0x170 entry_SYSCALL_64_after_hwframe+0x61/0xcb RIP: 0033:0x7f7b8101c6a7 Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 18 89 54 24 0c 48 89 34 24 89 RSP: 002b:00007ffffd6b2198 EFLAGS: 00000246 ORIG_RAX: 000000000000002a RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7b8101c6a7 RDX: 0000000000000010 RSI: 00007ffffd6b2360 RDI: 0000000000000005 RBP: 0000561f1370d560 R08: 00002795ad21d1ac R09: 0030312e302e302e R10: 00007ffffd73f080 R11: 0000000000000246 R12: 0000561f1370c410 R13: 0000000000000000 R14: 0000000000000005 R15: 0000000000000000 </TASK> Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Co-developed-by: Luca Czesla <luca.czesla@mail.schwarz> Signed-off-by: Luca Czesla <luca.czesla@mail.schwarz> Signed-off-by: Felix Huettner <felix.huettner@mail.schwarz> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/ZC0pBXBAgh7c76CA@kernel-bug-kernel-bug Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-07Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2023-04-08 We've added 4 non-merge commits during the last 11 day(s) which contain a total of 5 files changed, 39 insertions(+), 6 deletions(-). The main changes are: 1) Fix BPF TCP socket iterator to use correct helper for dropping socket's refcount, that is, sock_gen_put instead of sock_put, from Martin KaFai Lau. 2) Fix a BTI exception splat in BPF trampoline-generated code on arm64, from Xu Kuohai. 3) Fix a LongArch JIT error from missing BPF_NOSPEC no-op, from George Guo. 4) Fix dynamic XDP feature detection of veth in xdp_redirect selftest, from Lorenzo Bianconi. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: fix xdp_redirect xdp-features selftest for veth driver bpf, arm64: Fixed a BTI error on returning to patched function LoongArch, bpf: Fix jit to skip speculation barrier opcode bpf: tcp: Use sock_gen_put instead of sock_put in bpf_iter_tcp ==================== Link: https://lore.kernel.org/r/20230407224642.30906-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-07selftests/bpf: Prevent infinite loop in veristat when base file is too shortEduard Zingerman
The following example forces veristat to loop indefinitely: $ cat two-ok file_name,prog_name,verdict,total_states file-a,a,success,12 file-b,b,success,67 $ cat add-failure file_name,prog_name,verdict,total_states file-a,a,success,12 file-b,b,success,67 file-b,c,failure,32 $ veristat -C two-ok add-failure <does not return> The loop is caused by handle_comparison_mode() not checking if `base` variable points to `fallback_stats` prior advancing joined results using `base`. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230407154125.896927-1-eddyz87@gmail.com
2023-04-07bpftool: Set program type only if it differs from the desired oneWei Yongjun
After commit d6e6286a12e7 ("libbpf: disassociate section handler on explicit bpf_program__set_type() call"), bpf_program__set_type() will force cleanup the program's SEC() definition, this commit fixed the test helper but missed the bpftool, which leads to bpftool prog autoattach broken as follows: $ bpftool prog load spi-xfer-r1v1.o /sys/fs/bpf/test autoattach Program spi_xfer_r1v1 does not support autoattach, falling back to pinning This patch fix bpftool to set program type only if it differs. Fixes: d6e6286a12e7 ("libbpf: disassociate section handler on explicit bpf_program__set_type() call") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230407081427.2621590-1-weiyongjun@huaweicloud.com
2023-04-07selftests/bpf: Use PERF_COUNT_HW_CPU_CYCLES event for get_branch_snapshotSong Liu
perf_event with type=PERF_TYPE_RAW and config=0x1b00 turned out to be not reliable in ensuring LBR is active. Thus, test_progs:get_branch_snapshot is not reliable in some systems. Replace it with PERF_COUNT_HW_CPU_CYCLES event, which gives more consistent results. Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20230407190130.2093736-1-song@kernel.org
2023-04-07Merge tag 'gpio-fixes-for-v6.3-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - fix irq handling in gpio-davinci - fix Kconfig dependencies for gpio-regmap * tag 'gpio-fixes-for-v6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpio: davinci: Add irq chip flag to skip set wake gpio: davinci: Do not clear the bank intr enable bit in save_context gpio: GPIO_REGMAP: select REGMAP instead of depending on it
2023-04-07Merge tag 'acpi-6.3-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "Fix the ACPI backlight override mechanism for the cases when acpi_backlight=video is set through the kernel command line or a DMI quirk and add backlight quirks for Apple iMac14,1 and iMac14,2 and Lenovo ThinkPad W530 (Hans de Goede)" * tag 'acpi-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: video: Add acpi_backlight=video quirk for Lenovo ThinkPad W530 ACPI: video: Add acpi_backlight=video quirk for Apple iMac14,1 and iMac14,2 ACPI: video: Make acpi_backlight=video work independent from GPU driver ACPI: video: Add auto_detect arg to __acpi_video_get_backlight_type()
2023-04-07Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Catalin Marinas: "Fix uninitialised variable warning (from smatch) in the arm64 compat alignment fixup code" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: compat: Work around uninitialized variable warning
2023-04-07Merge tag '6.3-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull ksmbd server fixes from Steve French: "Four fixes, three for stable: - slab out of bounds fix - lock cancellation fix - minor cleanup to address clang warning - fix for xfstest 551 (wrong parms passed to kvmalloc)" * tag '6.3-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: fix slab-out-of-bounds in init_smb2_rsp_hdr ksmbd: delete asynchronous work from list ksmbd: remove unused is_char_allowed function ksmbd: do not call kvmalloc() with __GFP_NORETRY | __GFP_NO_WARN
2023-04-07iavf: remove active_cvlans and active_svlans bitmapsAhmed Zaki
The VLAN filters info is currently being held in a list and 2 bitmaps (active_cvlans and active_svlans). We are experiencing some racing where data is not in sync in the list and bitmaps. For example, the VLAN is initially added to the list but only when the PF replies, it is added to the bitmap. If a user adds many V2 VLANS before the PF responds: while [ $((i++)) ] ip l add l eth0 name eth0.$i type vlan id $i we might end up with more VLAN list entries than the designated limit. Also, The "ip link show" will show more links added than the PF limit. On the other and, the bitmaps are only used to check the number of VLAN filters and to re-enable the filters when the interface goes from DOWN to UP. This patch gets rid of the bitmaps and uses the list only. To do that, the states of the VLAN filter are modified: 1 - IAVF_VLAN_REMOVE: the entry needs to be totally removed after informing the PF. This is the "ip link del eth0.$i" path. 2 - IAVF_VLAN_DISABLE: (new) the netdev went down. The filter needs to be removed from the PF and then marked INACTIVE. 3 - IAVF_VLAN_INACTIVE: (new) no PF filter exists, but the user did not delete the VLAN. Fixes: 48ccc43ecf10 ("iavf: Add support VIRTCHNL_VF_OFFLOAD_VLAN_V2 during netdev config") Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-04-07iavf: refactor VLAN filter statesAhmed Zaki
The VLAN filter states are currently being saved as individual bits. This is error prone as multiple bits might be mistakenly set. Fix by replacing the bits with a single state enum. Also, add an "ACTIVE" state for filters that are accepted by the PF. Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-04-07hwmon: constify pointers to hwmon_channel_infoKrzysztof Kozlowski
HWmon core receives an array of pointers to hwmon_channel_info and it does not modify it, thus it can be array of const pointers for safety. This allows drivers to make them also const. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2023-04-07Merge branch 'bonding-ns-validation-fixes'David S. Miller
Hangbin Liu says: ==================== bonding: fix ns validation on backup slaves The first patch fixed a ns validation issue on backup slaves. The second patch re-format the bond option test and add a test lib file. The third patch add the arp validate regression test for the kernel patch. Here is the new bonding option test without the kernel fix: ]# ./bond_options.sh TEST: prio (active-backup miimon primary_reselect 0) [ OK ] TEST: prio (active-backup miimon primary_reselect 1) [ OK ] TEST: prio (active-backup miimon primary_reselect 2) [ OK ] TEST: prio (active-backup arp_ip_target primary_reselect 0) [ OK ] TEST: prio (active-backup arp_ip_target primary_reselect 1) [ OK ] TEST: prio (active-backup arp_ip_target primary_reselect 2) [ OK ] TEST: prio (active-backup ns_ip6_target primary_reselect 0) [ OK ] TEST: prio (active-backup ns_ip6_target primary_reselect 1) [ OK ] TEST: prio (active-backup ns_ip6_target primary_reselect 2) [ OK ] TEST: prio (balance-tlb miimon primary_reselect 0) [ OK ] TEST: prio (balance-tlb miimon primary_reselect 1) [ OK ] TEST: prio (balance-tlb miimon primary_reselect 2) [ OK ] TEST: prio (balance-tlb arp_ip_target primary_reselect 0) [ OK ] TEST: prio (balance-tlb arp_ip_target primary_reselect 1) [ OK ] TEST: prio (balance-tlb arp_ip_target primary_reselect 2) [ OK ] TEST: prio (balance-tlb ns_ip6_target primary_reselect 0) [ OK ] TEST: prio (balance-tlb ns_ip6_target primary_reselect 1) [ OK ] TEST: prio (balance-tlb ns_ip6_target primary_reselect 2) [ OK ] TEST: prio (balance-alb miimon primary_reselect 0) [ OK ] TEST: prio (balance-alb miimon primary_reselect 1) [ OK ] TEST: prio (balance-alb miimon primary_reselect 2) [ OK ] TEST: prio (balance-alb arp_ip_target primary_reselect 0) [ OK ] TEST: prio (balance-alb arp_ip_target primary_reselect 1) [ OK ] TEST: prio (balance-alb arp_ip_target primary_reselect 2) [ OK ] TEST: prio (balance-alb ns_ip6_target primary_reselect 0) [ OK ] TEST: prio (balance-alb ns_ip6_target primary_reselect 1) [ OK ] TEST: prio (balance-alb ns_ip6_target primary_reselect 2) [ OK ] TEST: arp_validate (active-backup arp_ip_target arp_validate 0) [ OK ] TEST: arp_validate (active-backup arp_ip_target arp_validate 1) [ OK ] TEST: arp_validate (active-backup arp_ip_target arp_validate 2) [ OK ] TEST: arp_validate (active-backup arp_ip_target arp_validate 3) [ OK ] TEST: arp_validate (active-backup arp_ip_target arp_validate 4) [ OK ] TEST: arp_validate (active-backup arp_ip_target arp_validate 5) [ OK ] TEST: arp_validate (active-backup arp_ip_target arp_validate 6) [ OK ] TEST: arp_validate (active-backup ns_ip6_target arp_validate 0) [ OK ] TEST: arp_validate (active-backup ns_ip6_target arp_validate 1) [ OK ] TEST: arp_validate (interface eth1 mii_status DOWN) [FAIL] TEST: arp_validate (interface eth2 mii_status DOWN) [FAIL] TEST: arp_validate (active-backup ns_ip6_target arp_validate 2) [FAIL] TEST: arp_validate (interface eth1 mii_status DOWN) [FAIL] TEST: arp_validate (interface eth2 mii_status DOWN) [FAIL] TEST: arp_validate (active-backup ns_ip6_target arp_validate 3) [FAIL] TEST: arp_validate (active-backup ns_ip6_target arp_validate 4) [ OK ] TEST: arp_validate (active-backup ns_ip6_target arp_validate 5) [ OK ] TEST: arp_validate (interface eth1 mii_status DOWN) [FAIL] TEST: arp_validate (interface eth2 mii_status DOWN) [FAIL] TEST: arp_validate (active-backup ns_ip6_target arp_validate 6) [FAIL] Here is the test result after the kernel fix: TEST: arp_validate (active-backup arp_ip_target arp_validate 0) [ OK ] TEST: arp_validate (active-backup arp_ip_target arp_validate 1) [ OK ] TEST: arp_validate (active-backup arp_ip_target arp_validate 2) [ OK ] TEST: arp_validate (active-backup arp_ip_target arp_validate 3) [ OK ] TEST: arp_validate (active-backup arp_ip_target arp_validate 4) [ OK ] TEST: arp_validate (active-backup arp_ip_target arp_validate 5) [ OK ] TEST: arp_validate (active-backup arp_ip_target arp_validate 6) [ OK ] TEST: arp_validate (active-backup ns_ip6_target arp_validate 0) [ OK ] TEST: arp_validate (active-backup ns_ip6_target arp_validate 1) [ OK ] TEST: arp_validate (active-backup ns_ip6_target arp_validate 2) [ OK ] TEST: arp_validate (active-backup ns_ip6_target arp_validate 3) [ OK ] TEST: arp_validate (active-backup ns_ip6_target arp_validate 4) [ OK ] TEST: arp_validate (active-backup ns_ip6_target arp_validate 5) [ OK ] TEST: arp_validate (active-backup ns_ip6_target arp_validate 6) [ OK ] ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-07selftests: bonding: add arp validate testHangbin Liu
This patch add bonding arp validate tests with mode active backup, monitor arp_ip_target and ns_ip6_target. It also checks mii_status to make sure all slaves are UP. Acked-by: Jonathan Toppins <jtoppins@redhat.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-07selftests: bonding: re-format bond option testsHangbin Liu
To improve the testing process for bond options, A new bond topology lib is added to our testing setup. The current option_prio.sh file will be renamed to bond_options.sh so that all bonding options can be tested here. Specifically, for priority testing, we will run all tests using modes 1, 5, and 6. These changes will help us streamline the testing process and ensure that our bond options are rigorously evaluated. Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jonathan Toppins <jtoppins@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-07bonding: fix ns validation on backup slavesHangbin Liu
When arp_validate is set to 2, 3, or 6, validation is performed for backup slaves as well. As stated in the bond documentation, validation involves checking the broadcast ARP request sent out via the active slave. This helps determine which slaves are more likely to function in the event of an active slave failure. However, when the target is an IPv6 address, the NS message sent from the active interface is not checked on backup slaves. Additionally, based on the bond_arp_rcv() rule b, we must reverse the saddr and daddr when checking the NS message. Note that when checking the NS message, the destination address is a multicast address. Therefore, we must convert the target address to solicited multicast in the bond_get_targets_ip6() function. Prior to the fix, the backup slaves had a mii status of "down", but after the fix, all of the slaves' mii status was updated to "UP". Fixes: 4e24be018eb9 ("bonding: add new parameter ns_targets") Reviewed-by: Jonathan Toppins <jtoppins@redhat.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-07net: ethernet: mtk_eth_soc: mtk_ppe: prefer newly added l2 flowsFelix Fietkau
When a device is roaming between interfaces and a new flow entry is created, we should assume that its output device is more up to date than whatever entry existed already. Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-07net: ethernet: mtk_eth_soc: add code for offloading flows from wlan devicesFelix Fietkau
WED version 2 (on MT7986 and later) can offload flows originating from wireless devices. In order to make that work, ndo_setup_tc needs to be implemented on the netdevs. This adds the required code to offload flows coming in from WED, while keeping track of the incoming wed index used for selecting the correct PPE device. Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-07tcp: restrict net.ipv4.tcp_app_winYueHaibing
UBSAN: shift-out-of-bounds in net/ipv4/tcp_input.c:555:23 shift exponent 255 is too large for 32-bit type 'int' CPU: 1 PID: 7907 Comm: ssh Not tainted 6.3.0-rc4-00161-g62bad54b26db-dirty #206 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x136/0x150 __ubsan_handle_shift_out_of_bounds+0x21f/0x5a0 tcp_init_transfer.cold+0x3a/0xb9 tcp_finish_connect+0x1d0/0x620 tcp_rcv_state_process+0xd78/0x4d60 tcp_v4_do_rcv+0x33d/0x9d0 __release_sock+0x133/0x3b0 release_sock+0x58/0x1b0 'maxwin' is int, shifting int for 32 or more bits is undefined behaviour. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-07niu: Fix missing unwind goto in niu_alloc_channels()Harshit Mogalapalli
Smatch reports: drivers/net/ethernet/sun/niu.c:4525 niu_alloc_channels() warn: missing unwind goto? If niu_rbr_fill() fails, then we are directly returning 'err' without freeing the channels. Fix this by changing direct return to a goto 'out_err'. Fixes: a3138df9f20e ("[NIU]: Add Sun Neptune ethernet driver.") Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-06cifs: double lock in cifs_reconnect_tcon()Dan Carpenter
This lock was supposed to be an unlock. Fixes: 6cc041e90c17 ("cifs: avoid races in parallel reconnects in smb1") Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-04-06block: don't set GD_NEED_PART_SCAN if scan partition failedYu Kuai
Currently if disk_scan_partitions() failed, GD_NEED_PART_SCAN will still set, and partition scan will be proceed again when blkdev_get_by_dev() is called. However, this will cause a problem that re-assemble partitioned raid device will creat partition for underlying disk. Test procedure: mdadm -CR /dev/md0 -l 1 -n 2 /dev/sda /dev/sdb -e 1.0 sgdisk -n 0:0:+100MiB /dev/md0 blockdev --rereadpt /dev/sda blockdev --rereadpt /dev/sdb mdadm -S /dev/md0 mdadm -A /dev/md0 /dev/sda /dev/sdb Test result: underlying disk partition and raid partition can be observed at the same time Note that this can still happen in come corner cases that GD_NEED_PART_SCAN can be set for underlying disk while re-assemble raid device. Fixes: e5cfefa97bcc ("block: fix scan partition for exclusively open device again") Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-06Merge tag 'mlx5-updates-2023-04-05' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-04-05 From Paul: - TC action parsing cleanups - Correctly report stats for missed packets - Many CT actions limitations removed due to hw misses will now continue from the relevant tc ct action in software. From Adham: - RQ/SQ devlink health diagnostics layout fixes From Gal And Rahul: - PTP code cleanup and cyclecounter shift value improvement * tag 'mlx5-updates-2023-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: Fix SQ SW state layout in SQ devlink health diagnostics net/mlx5e: Fix RQ SW state layout in RQ devlink health diagnostics net/mlx5e: Rename misleading skb_pc/cc references in ptp code net/mlx5: Update cyclecounter shift value to improve ptp free running mode precision net/mlx5e: Remove redundant macsec code net/mlx5e: TC, Remove sample and ct limitation net/mlx5e: TC, Remove mirror and ct limitation net/mlx5e: TC, Remove tuple rewrite and ct limitation net/mlx5e: TC, Remove multiple ct actions limitation net/mlx5e: TC, Remove special handling of CT action net/mlx5e: TC, Remove CT action reordering net/mlx5e: CT: Use per action stats net/mlx5e: TC, Move main flow attribute cleanup to helper func net/mlx5e: TC, Remove unused vf_tun variable net/mlx5e: Set default can_offload action ==================== Link: https://lore.kernel.org/r/20230406020232.83844-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-06selftests: forwarding: hw_stats_l3: Detect failure to install countersPetr Machata
Running this test makes little sense if the enabled l3_stats are not actually reported as "used". This can signify a failure of a driver to install the necessary counters, or simply lack of support for enabling in-HW counters on a given netdevice. It is generally impossible to tell from the outside which it is. But more likely than not, if somebody is running this on veth pairs, they do not intend to actually test that a certain piece of HW can install in-HW counters for the veth. It is more likely they are e.g. running the test by mistake. Therefore detect that the counter has not been actually installed. In that case, if the netdevice is one end of a veth pair, SKIP. Otherwise FAIL. Suggested-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Tested-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/a86817961903cca5cb0aebf2b2a06294b8aa7dea.1680704172.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-06net: sunhme: move asm includes to below linux includesSimon Horman
A recent rearrangement of includes has lead to a problem on m68k as flagged by the kernel test robot. Resolve this by moving the block asm includes to below linux includes. A side effect i that non-Sparc asm includes are now immediately before Sparc asm includes, which seems nice. Using sparse v0.6.4 I was able to reproduce this problem as follows using the config provided by the kernel test robot: $ wget https://download.01.org/0day-ci/archive/20230404/202304041748.0sQc4K4l-lkp@intel.com/config $ cp config .config $ make ARCH=m68k oldconfig $ make ARCH=m68k C=2 M=drivers/net/ethernet/sun CC [M] drivers/net/ethernet/sun/sunhme.o In file included from drivers/net/ethernet/sun/sunhme.c:19: ./arch/m68k/include/asm/irq.h:78:11: error: expected ‘;’ before ‘void’ 78 | asmlinkage void do_IRQ(int irq, struct pt_regs *regs); | ^~~~~ | ; ./arch/m68k/include/asm/irq.h:78:40: warning: ‘struct pt_regs’ declared inside parameter list will not be visible outside of this definition or declaration 78 | asmlinkage void do_IRQ(int irq, struct pt_regs *regs); | ^~~~~~~ Compile tested only. Fixes: 1ff4f42aef60 ("net: sunhme: Alphabetize includes") Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202304041748.0sQc4K4l-lkp@intel.com/ Signed-off-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230405-sunhme-includes-fix-v1-1-bf17cc5de20d@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-06bpf: ensure all memory is initialized in bpf_get_current_commBarret Rhoden
BPF helpers that take an ARG_PTR_TO_UNINIT_MEM must ensure that all of the memory is set, including beyond the end of the string. Signed-off-by: Barret Rhoden <brho@google.com> Link: https://lore.kernel.org/r/20230407001808.1622968-1-brho@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-06PCI: Fix use-after-free in pci_bus_release_domain_nr()Rob Herring
Commit c14f7ccc9f5d ("PCI: Assign PCI domain IDs by ida_alloc()") introduced a use-after-free bug in the bus removal cleanup. The issue was found with kfence: [ 19.293351] BUG: KFENCE: use-after-free read in pci_bus_release_domain_nr+0x10/0x70 [ 19.302817] Use-after-free read at 0x000000007f3b80eb (in kfence-#115): [ 19.309677] pci_bus_release_domain_nr+0x10/0x70 [ 19.309691] dw_pcie_host_deinit+0x28/0x78 [ 19.309702] tegra_pcie_deinit_controller+0x1c/0x38 [pcie_tegra194] [ 19.309734] tegra_pcie_dw_probe+0x648/0xb28 [pcie_tegra194] [ 19.309752] platform_probe+0x90/0xd8 ... [ 19.311457] kfence-#115: 0x00000000063a155a-0x00000000ba698da8, size=1072, cache=kmalloc-2k [ 19.311469] allocated by task 96 on cpu 10 at 19.279323s: [ 19.311562] __kmem_cache_alloc_node+0x260/0x278 [ 19.311571] kmalloc_trace+0x24/0x30 [ 19.311580] pci_alloc_bus+0x24/0xa0 [ 19.311590] pci_register_host_bridge+0x48/0x4b8 [ 19.311601] pci_scan_root_bus_bridge+0xc0/0xe8 [ 19.311613] pci_host_probe+0x18/0xc0 [ 19.311623] dw_pcie_host_init+0x2c0/0x568 [ 19.311630] tegra_pcie_dw_probe+0x610/0xb28 [pcie_tegra194] [ 19.311647] platform_probe+0x90/0xd8 ... [ 19.311782] freed by task 96 on cpu 10 at 19.285833s: [ 19.311799] release_pcibus_dev+0x30/0x40 [ 19.311808] device_release+0x30/0x90 [ 19.311814] kobject_put+0xa8/0x120 [ 19.311832] device_unregister+0x20/0x30 [ 19.311839] pci_remove_bus+0x78/0x88 [ 19.311850] pci_remove_root_bus+0x5c/0x98 [ 19.311860] dw_pcie_host_deinit+0x28/0x78 [ 19.311866] tegra_pcie_deinit_controller+0x1c/0x38 [pcie_tegra194] [ 19.311883] tegra_pcie_dw_probe+0x648/0xb28 [pcie_tegra194] [ 19.311900] platform_probe+0x90/0xd8 ... [ 19.313579] CPU: 10 PID: 96 Comm: kworker/u24:2 Not tainted 6.2.0 #4 [ 19.320171] Hardware name: /, BIOS 1.0-d7fb19b 08/10/2022 [ 19.325852] Workqueue: events_unbound deferred_probe_work_func The stack trace is a bit misleading as dw_pcie_host_deinit() doesn't directly call pci_bus_release_domain_nr(). The issue turns out to be in pci_remove_root_bus() which first calls pci_remove_bus() which frees the struct pci_bus when its struct device is released. Then pci_bus_release_domain_nr() is called and accesses the freed struct pci_bus. Reordering these fixes the issue. Fixes: c14f7ccc9f5d ("PCI: Assign PCI domain IDs by ida_alloc()") Link: https://lore.kernel.org/r/20230329123835.2724518-1-robh@kernel.org Link: https://lore.kernel.org/r/b529cb69-0602-9eed-fc02-2f068707a006@nvidia.com Reported-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Cc: stable@vger.kernel.org # v6.2+ Cc: Pali Rohár <pali@kernel.org>
2023-04-06Merge branch 'bpf: Improve verifier for cond_op and spilled loop index ↵Alexei Starovoitov
variables' Yonghong Song says: ==================== LLVM commit [1] introduced hoistMinMax optimization like (i < VIRTIO_MAX_SGS) && (i < out_sgs) to upper = MIN(VIRTIO_MAX_SGS, out_sgs) ... i < upper ... and caused the verification failure. Commit [2] workarounded the issue by adding some bpf assembly code to prohibit the above optimization. This patch improved verifier such that verification can succeed without the above workaround. Without [2], the current verifier will hit the following failures: ... 119: (15) if r1 == 0x0 goto pc+1 The sequence of 8193 jumps is too complex. verification time 525829 usec stack depth 64 processed 156616 insns (limit 1000000) max_states_per_insn 8 total_states 1754 peak_states 1712 mark_read 12 -- END PROG LOAD LOG -- libbpf: prog 'trace_virtqueue_add_sgs': failed to load: -14 libbpf: failed to load object 'loop6.bpf.o' ... The failure is due to verifier inadequately handling '<const> <cond_op> <non_const>' which will go through both pathes and generate the following verificaiton states: ... 89: (07) r2 += 1 ; R2_w=5 90: (79) r8 = *(u64 *)(r10 -48) ; R8_w=scalar() R10=fp0 91: (79) r1 = *(u64 *)(r10 -56) ; R1_w=scalar(umax=5,var_off=(0x0; 0x7)) R10=fp0 92: (ad) if r2 < r1 goto pc+41 ; R0_w=scalar() R1_w=scalar(umin=6,umax=5,var_off=(0x4; 0x3)) R2_w=5 R6_w=scalar(id=385) R7_w=0 R8_w=scalar() R9_w=scalar(umax=21474836475,var_off=(0x0; 0x7ffffffff)) R10=fp0 fp-8=mmmmmmmm fp-16=mmmmmmmm fp-24=mmmm???? fp-32= fp-40_w=4 fp-48=mmmmmmmm fp-56= fp-64=mmmmmmmm ... 89: (07) r2 += 1 ; R2_w=6 90: (79) r8 = *(u64 *)(r10 -48) ; R8_w=scalar() R10=fp0 91: (79) r1 = *(u64 *)(r10 -56) ; R1_w=scalar(umax=5,var_off=(0x0; 0x7)) R10=fp0 92: (ad) if r2 < r1 goto pc+41 ; R0_w=scalar() R1_w=scalar(umin=7,umax=5,var_off=(0x4; 0x3)) R2_w=6 R6=scalar(id=388) R7=0 R8_w=scalar() R9_w=scalar(umax=25769803770,var_off=(0x0; 0x7ffffffff)) R10=fp0 fp-8=mmmmmmmm fp-16=mmmmmmmm fp-24=mmmm???? fp-32= fp-40=5 fp-48=mmmmmmmm fp-56= fp-64=mmmmmmmm ... 89: (07) r2 += 1 ; R2_w=4088 90: (79) r8 = *(u64 *)(r10 -48) ; R8_w=scalar() R10=fp0 91: (79) r1 = *(u64 *)(r10 -56) ; R1_w=scalar(umax=5,var_off=(0x0; 0x7)) R10=fp0 92: (ad) if r2 < r1 goto pc+41 ; R0=scalar() R1=scalar(umin=4089,umax=5,var_off=(0x0; 0x7)) R2=4088 R6=scalar(id=12634) R7=0 R8=scalar() R9=scalar(umax=17557826301960,var_off=(0x0; 0xfffffffffff)) R10=fp0 fp-8=mmmmmmmm fp-16=mmmmmmmm fp-24=mmmm???? fp-32= fp-40=4087 fp-48=mmmmmmmm fp-56= fp-64=mmmmmmmm Patch 3 fixed the above issue by handling '<const> <cond_op> <non_const>' properly. During developing selftests for Patch 3, I found some issues with bound deduction with BPF_EQ/BPF_NE and fixed the issue in Patch 1. After the above issue is fixed, the second issue shows up. ... 67: (07) r1 += -16 ; R1_w=fp-16 ; bpf_probe_read_kernel(&sgp, sizeof(sgp), sgs + i); 68: (b4) w2 = 8 ; R2_w=8 69: (85) call bpf_probe_read_kernel#113 ; R0_w=scalar() fp-16=mmmmmmmm ; return sgp; 70: (79) r6 = *(u64 *)(r10 -16) ; R6=scalar() R10=fp0 ; for (n = 0, sgp = get_sgp(sgs, i); sgp && (n < SG_MAX); 71: (15) if r6 == 0x0 goto pc-49 ; R6=scalar() 72: (b4) w1 = 0 ; R1_w=0 73: (05) goto pc-46 ; for (i = 0; (i < VIRTIO_MAX_SGS) && (i < out_sgs); i++) { 28: (bc) w7 = w1 ; R1_w=0 R7_w=0 ; bpf_probe_read_kernel(&len, sizeof(len), &sgp->length); ... 23: (79) r3 = *(u64 *)(r10 -40) ; R3_w=2 R10=fp0 ; for (i = 0; (i < VIRTIO_MAX_SGS) && (i < out_sgs); i++) { 24: (07) r3 += 1 ; R3_w=3 ; for (i = 0; (i < VIRTIO_MAX_SGS) && (i < out_sgs); i++) { 25: (79) r1 = *(u64 *)(r10 -56) ; R1_w=scalar(umax=5,var_off=(0x0; 0x7)) R10=fp0 26: (ad) if r3 < r1 goto pc+34 61: R0=scalar() R1_w=scalar(umin=4,umax=5,var_off=(0x4; 0x1)) R3_w=3 R6=scalar(id=1658) R7=0 R8=scalar(id=1653) R9=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0 fp-8=mmmmmmmm fp-16=mmmmmmmm fp-24=mmmm???? fp-32= fp-40=2 fp-56= fp-64=mmmmmmmm ; if (sg_is_chain(&sg)) 61: (7b) *(u64 *)(r10 -40) = r3 ; R3_w=3 R10=fp0 fp-40_w=3 ... 67: (07) r1 += -16 ; R1_w=fp-16 ; bpf_probe_read_kernel(&sgp, sizeof(sgp), sgs + i); 68: (b4) w2 = 8 ; R2_w=8 69: (85) call bpf_probe_read_kernel#113 ; R0_w=scalar() fp-16=mmmmmmmm ; return sgp; 70: (79) r6 = *(u64 *)(r10 -16) ; for (n = 0, sgp = get_sgp(sgs, i); sgp && (n < SG_MAX); infinite loop detected at insn 71 verification time 90800 usec stack depth 64 processed 25017 insns (limit 1000000) max_states_per_insn 20 total_states 491 peak_states 169 mark_read 12 -- END PROG LOAD LOG -- libbpf: prog 'trace_virtqueue_add_sgs': failed to load: -22 Further analysis found the index variable 'i' is spilled but since it is not marked as precise. This is more tricky as identifying induction variable is not easy in verifier. Although a heuristic is possible, let us leave it for now. [1] https://reviews.llvm.org/D143726 [2] Commit 3c2611bac08a ("selftests/bpf: Fix trace_virtqueue_add_sgs test issue with LLVM 17.") ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-06selftests/bpf: Add verifier tests for code pattern '<const> <cond_op> ↵Yonghong Song
<non_const>' Add various tests for code pattern '<const> <cond_op> <non_const>' to exercise the previous verifier patch. The following are veristat changed number of processed insns stat comparing the previous patch vs. this patch: File Program Insns (A) Insns (B) Insns (DIFF) ----------------------------------------------------- ---------------------------------------------------- --------- --------- ------------- test_seg6_loop.bpf.linked3.o __add_egr_x 12423 12314 -109 (-0.88%) Only one program is affected with minor change. Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230406164510.1047757-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-06bpf: Improve handling of pattern '<const> <cond_op> <non_const>' in verifierYonghong Song
Currently, the verifier does not handle '<const> <cond_op> <non_const>' well. For example, ... 10: (79) r1 = *(u64 *)(r10 -16) ; R1_w=scalar() R10=fp0 11: (b7) r2 = 0 ; R2_w=0 12: (2d) if r2 > r1 goto pc+2 13: (b7) r0 = 0 14: (95) exit 15: (65) if r1 s> 0x1 goto pc+3 16: (0f) r0 += r1 ... At insn 12, verifier decides both true and false branch are possible, but actually only false branch is possible. Currently, the verifier already supports patterns '<non_const> <cond_op> <const>. Add support for patterns '<const> <cond_op> <non_const>' in a similar way. Also fix selftest 'verifier_bounds_mix_sign_unsign/bounds checks mixing signed and unsigned, variant 10' due to this change. Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Dave Marchevsky <davemarchevsky@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230406164505.1046801-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-06selftests/bpf: Add tests for non-constant cond_op NE/EQ bound deductionYonghong Song
Add various tests for code pattern '<non-const> NE/EQ <const>' implemented in the previous verifier patch. Without the verifier patch, these new tests will fail. Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230406164500.1045715-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-06bpf: Improve verifier JEQ/JNE insn branch taken checkingYonghong Song
Currently, for BPF_JEQ/BPF_JNE insn, verifier determines whether the branch is taken or not only if both operands are constants. Therefore, for the following code snippet, 0: (85) call bpf_ktime_get_ns#5 ; R0_w=scalar() 1: (a5) if r0 < 0x3 goto pc+2 ; R0_w=scalar(umin=3) 2: (b7) r2 = 2 ; R2_w=2 3: (1d) if r0 == r2 goto pc+2 6 At insn 3, since r0 is not a constant, verifier assumes both branch can be taken which may lead inproper verification failure. Add comparing umin/umax value and the constant. If the umin value is greater than the constant, or umax value is smaller than the constant, for JEQ the branch must be not-taken, and for JNE the branch must be taken. The jmp32 mode JEQ/JNE branch taken checking is also handled similarly. The following lists the veristat result w.r.t. changed number of processes insns during verification: File Program Insns (A) Insns (B) Insns (DIFF) ----------------------------------------------------- ---------------------------------------------------- --------- --------- --------------- test_cls_redirect.bpf.linked3.o cls_redirect 64980 73472 +8492 (+13.07%) test_seg6_loop.bpf.linked3.o __add_egr_x 12425 12423 -2 (-0.02%) test_tcp_hdr_options.bpf.linked3.o estab 2634 2558 -76 (-2.89%) test_parse_tcp_hdr_opt.bpf.linked3.o xdp_ingress_v6 1421 1420 -1 (-0.07%) test_parse_tcp_hdr_opt_dynptr.bpf.linked3.o xdp_ingress_v6 1238 1237 -1 (-0.08%) test_tc_dtime.bpf.linked3.o egress_fwdns_prio100 414 411 -3 (-0.72%) Mostly a small improvement but test_cls_redirect.bpf.linked3.o has a 13% regression. I checked with verifier log and found it this is due to pruning. For some JEQ/JNE branches impacted by this patch, one branch is explored and the other has state equivalence and pruned. Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Dave Marchevsky <davemarchevsky@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230406164455.1045294-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>