linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2023-04-18	Merge branch 'bnxt_en-bug-fixes'	Paolo Abeni
	Michael Chan says: ==================== bnxt_en: Bug fixes This small series contains 2 fixes. The first one fixes the PTP initialization logic on older chips to avoid logging a warning. The second one fixes a potenial NULL pointer dereference in the driver's aux bus unload path. ==================== Link: https://lore.kernel.org/r/20230417065819.122055-1-michael.chan@broadcom.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-04-18	bnxt_en: Fix a possible NULL pointer dereference in unload path	Kalesh AP
	In the driver unload path, the driver currently checks the valid BNXT_FLAG_ROCE_CAP flag in bnxt_rdma_aux_device_uninit() before proceeding. This is flawed because the flag may not be set initially during driver load. It may be set later after the NVRAM setting is changed followed by a firmware reset. Relying on the BNXT_FLAG_ROCE_CAP flag may crash in bnxt_rdma_aux_device_uninit() if the aux device was never initialized: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 PGD 8ae6aa067 P4D 0 Oops: 0000 [#1] SMP NOPTI CPU: 39 PID: 42558 Comm: rmmod Kdump: loaded Tainted: G OE --------- - - 4.18.0-348.el8.x86_64 #1 Hardware name: Dell Inc. PowerEdge R750/0WT8Y6, BIOS 1.5.4 12/17/2021 RIP: 0010:device_del+0x1b/0x410 Code: 89 a5 50 03 00 00 4c 89 a5 58 03 00 00 eb 89 0f 1f 44 00 00 41 56 41 55 41 54 4c 8d a7 80 00 00 00 55 53 48 89 fb 48 83 ec 18 <48> 8b 2f 4c 89 e7 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 31 c0 RSP: 0018:ff7f82bf469a7dc8 EFLAGS: 00010292 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000206 RDI: 0000000000000000 RBP: ff31b7cd114b0ac0 R08: 0000000000000000 R09: ffffffff935c3400 R10: ff31b7cd45bc3440 R11: 0000000000000001 R12: 0000000000000080 R13: ffffffffc1069f40 R14: 0000000000000000 R15: 0000000000000000 FS: 00007fc9903ce740(0000) GS:ff31b7d4ffac0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000992fee004 CR4: 0000000000773ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: bnxt_rdma_aux_device_uninit+0x1f/0x30 [bnxt_en] bnxt_remove_one+0x2f/0x1f0 [bnxt_en] pci_device_remove+0x3b/0xc0 device_release_driver_internal+0x103/0x1f0 driver_detach+0x54/0x88 bus_remove_driver+0x77/0xc9 pci_unregister_driver+0x2d/0xb0 bnxt_exit+0x16/0x2c [bnxt_en] __x64_sys_delete_module+0x139/0x280 do_syscall_64+0x5b/0x1a0 entry_SYSCALL_64_after_hwframe+0x65/0xca RIP: 0033:0x7fc98f3af71b Fix this by modifying the check inside bnxt_rdma_aux_device_uninit() to check for bp->aux_priv instead. We also need to make some changes in bnxt_rdma_aux_device_init() to make sure that bp->aux_priv is set only when the aux device is fully initialized. Fixes: d80d88b0dfff ("bnxt_en: Add auxiliary driver support") Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-04-18	bnxt_en: Do not initialize PTP on older P3/P4 chips	Michael Chan
	The driver does not support PTP on these older chips and it is assuming that firmware on these older chips will not return the PORT_MAC_PTP_QCFG_RESP_FLAGS_HWRM_ACCESS flag in __bnxt_hwrm_ptp_qcfg(), causing the function to abort quietly. But newer firmware now sets this flag and so __bnxt_hwrm_ptp_qcfg() will proceed further. Eventually it will fail in bnxt_ptp_init() -> bnxt_map_ptp_regs() because there is no code to support the older chips. The driver will then complain: "PTP initialization failed.\n" Fix it so that we abort quietly earlier without going through the unnecessary steps and alarming the user with the warning log. Fixes: ae5c42f0b92c ("bnxt_en: Get PTP hardware capability from firmware") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-04-18	netfilter: nf_tables: tighten netlink attribute requirements for catch-all ↵	Pablo Neira Ayuso
	elements If NFT_SET_ELEM_CATCHALL is set on, then userspace provides no set element key. Otherwise, bail out with -EINVAL. Fixes: aaa31047a6d2 ("netfilter: nftables: add catch-all set element support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-18	cxgb4: fix use after free bugs caused by circular dependency problem	Duoming Zhou
	The flower_stats_timer can schedule flower_stats_work and flower_stats_work can also arm the flower_stats_timer. The process is shown below: ----------- timer schedules work ------------ ch_flower_stats_cb() //timer handler schedule_work(&adap->flower_stats_work); ----------- work arms timer ------------ ch_flower_stats_handler() //workqueue callback function mod_timer(&adap->flower_stats_timer, ...); When the cxgb4 device is detaching, the timer and workqueue could still be rearmed. The process is shown below: (cleanup routine) \| (timer and workqueue routine) remove_one() \| free_some_resources() \| ch_flower_stats_cb() //timer cxgb4_cleanup_tc_flower() \| schedule_work() del_timer_sync() \| \| ch_flower_stats_handler() //workqueue \| mod_timer() cancel_work_sync() \| kfree(adapter) //FREE \| ch_flower_stats_cb() //timer \| adap->flower_stats_work //USE This patch changes del_timer_sync() to timer_shutdown_sync(), which could prevent rearming of the timer from the workqueue. Fixes: e0f911c81e93 ("cxgb4: fetch stats for offloaded tc flower flows") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Link: https://lore.kernel.org/r/20230415081227.7463-1-duoming@zju.edu.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-04-18	netfilter: nf_tables: validate catch-all set elements	Pablo Neira Ayuso
	catch-all set element might jump/goto to chain that uses expressions that require validation. Fixes: aaa31047a6d2 ("netfilter: nftables: add catch-all set element support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-17	Merge branch 'ocelot-felix-driver-support-for-preemptible-traffic-classes'	Jakub Kicinski
	Vladimir Oltean says: ==================== Ocelot/Felix driver support for preemptible traffic classes The series "Add tc-mqprio and tc-taprio support for preemptible traffic classes" from: https://lore.kernel.org/netdev/20230220122343.1156614-1-vladimir.oltean@nxp.com/ was eventually submitted in a form without the support for the Ocelot/Felix switch driver. This patch set picks up that work again, and presents a fairly modified form compared to the original. ==================== Link: https://lore.kernel.org/r/20230415170551.3939607-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-17	net: mscc: ocelot: add support for preemptible traffic classes	Vladimir Oltean
	In order to not transmit (preemptible) frames which will be received by the link partner as corrupted (because it doesn't support FP), the hardware requires the driver to program the QSYS_PREEMPTION_CFG_P_QUEUES register only after the MAC Merge layer becomes active (verification succeeds, or was disabled). There are some cases when FP is known (through experimentation) to be broken. Give priority to FP over cut-through switching, and disable FP for known broken link modes. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-17	net: dsa: felix: act upon the mqprio qopt in taprio offload	Vladimir Oltean
	The mqprio queue configuration can appear either through TC_SETUP_QDISC_MQPRIO or through TC_SETUP_QDISC_TAPRIO. Make sure both are treated in the same way. Code does nothing new for now (except for rejecting multiple TXQs per TC, which is a useless concept with DSA switches). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-17	net: mscc: ocelot: add support for mqprio offload	Vladimir Oltean
	This doesn't apply anything to hardware and in general doesn't do anything that the software variant doesn't do, except for checking that there isn't more than 1 TXQ per TC (TXQs for a DSA switch are a dubious concept anyway). The reason we add this is to be able to parse one more field added to struct tc_mqprio_qopt_offload, namely preemptible_tcs. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-17	net: mscc: ocelot: don't rely on cached verify_status in ocelot_port_get_mm()	Vladimir Oltean
	ocelot_mm_update_port_status() updates mm->verify_status, but when the verification state of a port changes, an IRQ isn't emitted, but rather, only when the verification state reaches one of the final states (like DISABLED, FAILED, SUCCEEDED) - things that would affect mm->tx_active, which is what the IRQ is actually emitted for. That is to say, user space may miss reports of an intermediary MAC Merge verification state (like from INITIAL to VERIFYING), unless there was an IRQ notifying the driver of the change in mm->tx_active as well. This is not a huge deal, but for reliable reporting to user space, let's call ocelot_mm_update_port_status() synchronously from ocelot_port_get_mm(), which makes user space see the current MM status. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-17	net: mscc: ocelot: optimize ocelot_mm_irq()	Vladimir Oltean
	The MAC Merge IRQ of all ports is shared with the PTP TX timestamp IRQ of all ports, which means that currently, when a PTP TX timestamp is generated, felix_irq_handler() also polls for the MAC Merge layer status of all ports, looking for changes. This makes the kernel do more work, and under certain circumstances may make ptp4l require a tx_timestamp_timeout argument higher than before. Changes to the MAC Merge layer status are only to be expected under certain conditions - its TX direction needs to be enabled - so we can check early if that is the case, and omit register access otherwise. Make ocelot_mm_update_port_status() skip register access if mm->tx_enabled is unset, and also call it once more, outside IRQ context, from ocelot_port_set_mm(), when mm->tx_enabled transitions from true to false, because an IRQ is also expected in that case. Also, a port may have its MAC Merge layer enabled but it may not have generated the interrupt. In that case, there's no point in writing to DEV_MM_STATUS to acknowledge that IRQ. We can reduce the number of register writes per port with MM enabled by keeping an "ack" variable which writes the "write-one-to-clear" bits. Those are 3 in number: PRMPT_ACTIVE_STICKY, UNEXP_RX_PFRM_STICKY and UNEXP_TX_PFRM_STICKY. The other fields in DEV_MM_STATUS are read-only and it doesn't matter what is written to them, so writing zero is just fine. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-17	net: mscc: ocelot: remove struct ocelot_mm_state :: lock	Vladimir Oltean
	Unfortunately, the workarounds for the hardware bugs make it pointless to keep fine-grained locking for the MAC Merge state of each port. Our vsc9959_cut_through_fwd() implementation requires ocelot->fwd_domain_lock to be held, in order to serialize with changes to the bridging domains and to port speed changes (which affect which ports can be cut-through). Simultaneously, the traffic classes which can be cut-through cannot be preemptible at the same time, and this will depend on the MAC Merge layer state (which changes from threaded interrupt context). Since vsc9959_cut_through_fwd() would have to hold the mm->lock of all ports for a correct and race-free implementation with respect to ocelot_mm_irq(), in practice it means that any time a port's mm->lock is held, it would potentially block holders of ocelot->fwd_domain_lock. In the interest of simple locking rules, make all MAC Merge layer state changes (and preemptible traffic class changes) be serialized by the ocelot->fwd_domain_lock. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-17	net: mscc: ocelot: export a single ocelot_mm_irq()	Vladimir Oltean
	When the switch emits an IRQ, we don't know what caused it, and we iterate through all ports to check the MAC Merge status. Move that iteration inside the ocelot lib; we will change the locking in a future change and it would be good to encapsulate that lock completely within the ocelot lib. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-17	Merge branch 'xdp-rx-hwts-metadata-for-stmmac-driver'	Jakub Kicinski
	Song Yoong Siang says: ==================== XDP Rx HWTS metadata for stmmac driver Implemented XDP receive hardware timestamp metadata for stmmac driver. This patchset is tested with tools/testing/selftests/bpf/xdp_hw_metadata. Below are the test steps and results. Command on DUT: sudo ./xdp_hw_metadata <interface name> Command on Link Partner: echo -n xdp \| nc -u -q1 <destination IPv4 addr> 9091 echo -n skb \| nc -u -q1 <destination IPv4 addr> 9092 Result for port 9091: poll: 1 (0) skip=1 fail=0 redir=1 xsk_ring_cons__peek: 1 0x55f69f65f6d0: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000 rx_timestamp: 1677762069053692631 No rx_hash err=-95 0x55f69f65f6d0: complete idx=8 addr=8000 Result for port 9092: poll: 1 (0) skip=2 fail=0 redir=1 found skb hwtstamp = 1677762071.937207680 ==================== Link: https://lore.kernel.org/r/20230415064503.3225835-1-yoong.siang.song@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-17	net: stmmac: add Rx HWTS metadata to XDP ZC receive pkt	Song Yoong Siang
	Add receive hardware timestamp metadata support via kfunc to XDP Zero Copy receive packets. Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-17	net: stmmac: add Rx HWTS metadata to XDP receive pkt	Song Yoong Siang
	Add receive hardware timestamp metadata support via kfunc to XDP receive packets. Suggested-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Acked-by: Stanislav Fomichev <sdf@google.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-17	net: stmmac: introduce wrapper for struct xdp_buff	Song Yoong Siang
	Introduce struct stmmac_xdp_buff as a preparation to support XDP Rx metadata via kfuncs. Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-17	Merge branch 'support-tunnel-mode-in-mlx5-ipsec-packet-offload'	Jakub Kicinski
	Leon Romanovsky says: ==================== Support tunnel mode in mlx5 IPsec packet offload This series extends mlx5 to support tunnel mode in its IPsec packet offload implementation. v0: https://lore.kernel.org/all/cover.1681106636.git.leonro@nvidia.com ==================== Link: https://lore.kernel.org/r/cover.1681388425.git.leonro@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-17	net/mlx5e: Accept tunnel mode for IPsec packet offload	Leon Romanovsky
	Open mlx5 driver to accept IPsec tunnel mode. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-17	net/mlx5e: Create IPsec table with tunnel support only when encap is disabled	Leon Romanovsky
	Current hardware doesn't support double encapsulation which is happening when IPsec packet offload tunnel mode is configured together with eswitch encap option. Any user attempt to add new SA/policy after he/she sets encap mode, will generate the following FW syndrome: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 1904): CREATE_FLOW_TABLE(0x930) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0xa43321), err(-22) Make sure that we block encap changes before creating flow steering tables. This is applicable only for packet offload in tunnel mode, while packet offload in transport mode and crypto offload, don't have such limitation as they don't perform encapsulation. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-17	net/mlx5: Allow blocking encap changes in eswitch	Leon Romanovsky
	Existing eswitch encap option enables header encapsulation. Unfortunately currently available hardware isn't able to perform double encapsulation, which can happen once IPsec packet offload tunnel mode is used together with encap mode set to BASIC. So as a solution for misconfiguration, provide an option to block encap changes, which will be used for IPsec packet offload. Reviewed-by: Emeel Hakim <ehakim@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-17	net/mlx5e: Listen to ARP events to update IPsec L2 headers in tunnel mode	Leon Romanovsky
	In IPsec packet offload mode all header manipulations are performed by hardware, which is responsible to add/remove L2 header with source and destinations MACs. CX-7 devices don't support offload of in-kernel routing functionality, as such HW needs external help to fill other side MAC as it isn't available for HW. As a solution, let's listen to neigh ARP updates and reconfigure IPsec rules on the fly once new MAC data information arrives. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-17	net/mlx5e: Support IPsec TX packet offload in tunnel mode	Leon Romanovsky
	Extend mlx5 driver with logic to support IPsec TX packet offload in tunnel mode. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-17	net/mlx5e: Support IPsec RX packet offload in tunnel mode	Leon Romanovsky
	Extend mlx5 driver with logic to support IPsec RX packet offload in tunnel mode. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-17	net/mlx5e: Prepare IPsec packet reformat code for tunnel mode	Leon Romanovsky
	Refactor setup_pkt_reformat() function to accommodate future extension to support tunnel mode. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-17	net/mlx5e: Configure IPsec SA tables to support tunnel mode	Leon Romanovsky
	Create SA flow steering tables both for RX and TX with tunnel reformat property. This allows to add and delete extra headers needed for tunnel mode. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-17	net/mlx5e: Check IPsec packet offload tunnel capabilities	Leon Romanovsky
	Validate tunnel mode support for IPsec packet offload. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-17	net/mlx5e: Add IPsec packet offload tunnel bits	Leon Romanovsky
	Extend packet reformat types and flow table capabilities with IPsec packet offload tunnel bits. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-17	ice: document RDMA devlink parameters	Jacob Keller
	Commit e523af4ee560 ("net/ice: Add support for enable_iwarp and enable_roce devlink param") added support for the enable_roce and enable_iwarp parameters in the ice driver. It didn't document these parameters in the ice devlink documentation file. Add this documentation, including a note about the mutual exclusion between the two modes. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20230414162614.571861-1-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-17	i40e: fix i40e_setup_misc_vector() error handling	Aleksandr Loktionov
	Add error handling of i40e_setup_misc_vector() in i40e_rebuild(). In case interrupt vectors setup fails do not re-open vsi-s and do not bring up vf-s, we have no interrupts to serve a traffic anyway. Fixes: 41c445ff0f48 ("i40e: main driver core") Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-04-17	i40e: fix accessing vsi->active_filters without holding lock	Aleksandr Loktionov
	Fix accessing vsi->active_filters without holding the mac_filter_hash_lock. Move vsi->active_filters = 0 inside critical section and move clear_bit(__I40E_VSI_OVERFLOW_PROMISC, vsi->state) after the critical section to ensure the new filters from other threads can be added only after filters cleaning in the critical section is finished. Fixes: 278e7d0b9d68 ("i40e: store MAC/VLAN filters in a hash with the MAC Address as key") Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-04-17	SUNRPC: Fix failures of checksum Kunit tests	Chuck Lever
	Scott reports that when the new GSS krb5 Kunit tests are built as a separate module and loaded, the RFC 6803 and RFC 8009 checksum tests all fail, even though they pass when run under kunit.py. It appears that passing a buffer backed by static const memory to gss_krb5_checksum() is a problem. A printk in checksum_case() shows the correct plaintext, but by the time the buffer has been converted to a scatterlist and arrives at checksummer(), it contains all zeroes. Replacing this buffer with one that is dynamically allocated fixes the issue. Reported-by: Scott Mayhew <smayhew@redhat.com> Fixes: 02142b2ca8fc ("SUNRPC: Add checksum KUnit tests for the RFC 6803 encryption types") Tested-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-04-17	netfilter: nf_tables: fix ifdef to also consider nf_tables=m	Florian Westphal
	nftables can be built as a module, so fix the preprocessor conditional accordingly. Fixes: 478b360a47b7 ("netfilter: nf_tables: fix nf_trace always-on with XT_TRACE=n") Reported-by: Florian Fainelli <f.fainelli@gmail.com> Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-17	net/sched: clear actions pointer in miss cookie init fail	Pedro Tammela
	Palash reports a UAF when using a modified version of syzkaller[1]. When 'tcf_exts_miss_cookie_base_alloc()' fails in 'tcf_exts_init_ex()' a call to 'tcf_exts_destroy()' is made to free up the tcf_exts resources. In flower, a call to '__fl_put()' when 'tcf_exts_init_ex()' fails is made; Then calling 'tcf_exts_destroy()', which triggers an UAF since the already freed tcf_exts action pointer is lingering in the struct. Before the offending patch, this was not an issue since there was no case where the tcf_exts action pointer could linger. Therefore, restore the old semantic by clearing the action pointer in case of a failure to initialize the miss_cookie. [1] https://github.com/cmu-pasta/linux-kernel-enriched-corpus v1->v2: Fix compilation on configs without tc actions (kernel test robot) Fixes: 80cd22c35c90 ("net/sched: cls_api: Support hardware miss to tc action") Reported-by: Palash Oswal <oswalpalash@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-17	net: lan966x: Fix lan966x_ifh_get	Horatiu Vultur
	From time to time, it was observed that the nanosecond part of the received timestamp, which is extracted from the IFH, it was actually bigger than 1 second. So then when actually calculating the full received timestamp, based on the nanosecond part from IFH and the second part which is read from HW, it was actually wrong. The issue seems to be inside the function lan966x_ifh_get, which extracts information from an IFH(which is an byte array) and returns the value in a u64. When extracting the timestamp value from the IFH, which starts at bit 192 and have the size of 32 bits, then if the most significant bit was set in the timestamp, then this bit was extended then the return value became 0xffffffff... . And the reason of this is because constants without any postfix are treated as signed longs and that is the reason why '1 << 31' becomes 0xffffffff80000000. This is fixed by adding the postfix 'ULL' to 1. Fixes: fd7627833ddf ("net: lan966x: Stop using packing library") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-17	Merge branch 'sctp-info-dump'	David S. Miller
	Xin Long says: ==================== sctp: add some missing peer_capables in sctp info dump The 1st patch removes the unused and obsolete hostname_address from sctp_association peer and also the bit from sctp_info peer_capables, and then reuses its bit for reconf_capable and use the higher available bit for intl_capable in the 2nd patch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-17	sctp: add intl_capable and reconf_capable in ss peer_capable	Xin Long
	There are two new peer capables have been added since sctp_diag was introduced into SCTP. When dumping the peer capables, these two new peer capables should also be included. To not break the old capables, reconf_capable takes the old hostname_address bit, and intl_capable uses the higher available bit in sctpi_peer_capable. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-17	sctp: delete the obsolete code for the host name address param	Xin Long
	In the latest RFC9260, the Host Name Address param has been deprecated. For INIT chunk: Note 3: An INIT chunk MUST NOT contain the Host Name Address parameter. The receiver of an INIT chunk containing a Host Name Address parameter MUST send an ABORT chunk and MAY include an "Unresolvable Address" error cause. For Supported Address Types: The value indicating the Host Name Address parameter MUST NOT be used when sending this parameter and MUST be ignored when receiving this parameter. Currently Linux SCTP doesn't really support Host Name Address param, but only saves some flag and print debug info, which actually won't even be triggered due to the verification in sctp_verify_param(). This patch is to delete those dead code. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-17	Merge branch 'mptcp-cleanups'	David S. Miller
	Matthieu Baerts says: ==================== mptcp: various small cleanups Patch 1 makes a function static because it is only used in one file. Patch 2 adds info about the git trees we use to help occasional devs. Patch 3 removes an unused variable. Patch 4 removes duplicated entries from the help menu of a tool used in MPTCP selftests. Patch 5 removes some ShellCheck warnings in mptcp_join.sh selftest. Only very minor improvements then. ==================== Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-17	selftests: mptcp: join: fix ShellCheck warnings	Matthieu Baerts
	Most of the code had an issue according to ShellCheck. That's mainly due to the fact it incorrectly believes most of the code was unreachable because it's invoked by variable name, see how the "tests" array is used. Once SC2317 has been ignored, three small warnings were still visible: - SC2155: Declare and assign separately to avoid masking return values. - SC2046: Quote this to prevent word splitting: can be ignored because "ip netns pids" can display more than one pid. - SC2166: Prefer [ p ] \|\| [ q ] as [ p -o q ] is not well defined. This probably didn't fix any actual issues but it might help spotting new interesting warnings reported by ShellCheck as just before, ShellCheck was reporting issues for most lines making it a bit useless. Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-17	selftests: mptcp: remove duplicated entries in usage	Matthieu Baerts
	mptcp_connect tool was printing some duplicated entries when showing how to use it: -j -l -r While at it, I also: - moved the very few entries that were not sorted, - added -R that was missing since commit 8a4b910d005d ("mptcp: selftests: add rcvbuf set option"), - removed the -u parameter that has been removed in commit f730b65c9d85 ("selftests: mptcp: try to set mptcp ulp mode in different sk states"). No need to backport this, it is just an internal tool used by our selftests. The help menu is mainly useful for MPTCP kernel devs. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-17	mptcp: remove unused 'remaining' variable	Matthieu Baerts
	In some functions, 'remaining' variable was given in argument and/or set but never read. net/mptcp/options.c:779:3: warning: Value stored to 'remaining' is never read [clang-analyzer-deadcode.DeadStores]. net/mptcp/options.c:547:3: warning: Value stored to 'remaining' is never read [clang-analyzer-deadcode.DeadStores]. The issue has been reported internally by Alibaba CI. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Suggested-by: Mat Martineau <martineau@kernel.org> Co-developed-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-17	MAINTAINERS: add git trees for MPTCP	Matthieu Baerts
	This will help occasional developers to find our git repo without having to look at our wiki. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-17	mptcp: make userspace_pm_append_new_local_addr static	Geliang Tang
	mptcp_userspace_pm_append_new_local_addr() has always exclusively been used in pm_userspace.c since its introduction in commit 4638de5aefe5 ("mptcp: handle local addrs announced by userspace PMs"). So make it static. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-17	sfc: Fix use-after-free due to selftest_work	Ding Hui
	There is a use-after-free scenario that is: When the NIC is down, user set mac address or vlan tag to VF, the xxx_set_vf_mac() or xxx_set_vf_vlan() will invoke efx_net_stop() and efx_net_open(), since netif_running() is false, the port will not start and keep port_enabled false, but selftest_work is scheduled in efx_net_open(). If we remove the device before selftest_work run, the efx_stop_port() will not be called since the NIC is down, and then efx is freed, we will soon get a UAF in run_timer_softirq() like this: [ 1178.907941] ================================================================== [ 1178.907948] BUG: KASAN: use-after-free in run_timer_softirq+0xdea/0xe90 [ 1178.907950] Write of size 8 at addr ff11001f449cdc80 by task swapper/47/0 [ 1178.907950] [ 1178.907953] CPU: 47 PID: 0 Comm: swapper/47 Kdump: loaded Tainted: G O --------- -t - 4.18.0 #1 [ 1178.907954] Hardware name: SANGFOR X620G40/WI2HG-208T1061A, BIOS SPYH051032-U01 04/01/2022 [ 1178.907955] Call Trace: [ 1178.907956] <IRQ> [ 1178.907960] dump_stack+0x71/0xab [ 1178.907963] print_address_description+0x6b/0x290 [ 1178.907965] ? run_timer_softirq+0xdea/0xe90 [ 1178.907967] kasan_report+0x14a/0x2b0 [ 1178.907968] run_timer_softirq+0xdea/0xe90 [ 1178.907971] ? init_timer_key+0x170/0x170 [ 1178.907973] ? hrtimer_cancel+0x20/0x20 [ 1178.907976] ? sched_clock+0x5/0x10 [ 1178.907978] ? sched_clock_cpu+0x18/0x170 [ 1178.907981] __do_softirq+0x1c8/0x5fa [ 1178.907985] irq_exit+0x213/0x240 [ 1178.907987] smp_apic_timer_interrupt+0xd0/0x330 [ 1178.907989] apic_timer_interrupt+0xf/0x20 [ 1178.907990] </IRQ> [ 1178.907991] RIP: 0010:mwait_idle+0xae/0x370 If the NIC is not actually brought up, there is no need to schedule selftest_work, so let's move invoking efx_selftest_async_start() into efx_start_all(), and it will be canceled by broughting down. Fixes: dd40781e3a4e ("sfc: Run event/IRQ self-test asynchronously when interface is brought up") Fixes: e340be923012 ("sfc: add ndo_set_vf_mac() function for EF10") Debugged-by: Huang Cun <huangcun@sangfor.com.cn> Cc: Donglin Peng <pengdonglin@sangfor.com.cn> Suggested-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-17	Merge branch 'mptcp-subflow-init'	David S. Miller
	Matthieu Baerts says: ==================== mptcp: refactor first subflow init This series refactors the initialisation of the first subflow of a listen socket. The first subflow allocation is no longer done at the initialisation of the socket but later, when the connection request is received or when requested by the userspace. This is needed not just because Paolo likes to refactor things but because this simplifies the code and makes the behaviour more consistent with the rest. Also, this is a prerequisite for future patches adding proper support of SELinux/LSM labels with MPTCP and accept(2). In [1], Ondrej Mosnacek explained they discovered the (userspace-facing) sockets returned by accept(2) when using MPTCP always end up with the label representing the kernel (typically system_u:system_r:kernel_t:s0), while it would make more sense to inherit the context from the parent socket (the one that is passed to accept(2)). Before being able to properly support that on SELinux/LSM side, patches 2-3/5 prepare the code to simplify the patch 4/5 moving the allocation. Patch 1/5 is a small clean-up seen while working on the series and patch 5/5 is a small improvement when closing unaccepted sockets. [1] https://lore.kernel.org/netdev/CAFqZXNs2LF-OoQBUiiSEyranJUXkPLcCfBkMkwFeM6qEwMKCTw@mail.gmail.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-17	mptcp: fastclose msk when cleaning unaccepted sockets	Paolo Abeni
	When cleaning up unaccepted mptcp socket still laying inside the listener queue at listener close time, such sockets will go through a regular close, waiting for a timeout before shutting down the subflows. There is no need to keep the kernel resources in use for such a possibly long time: short-circuit to fast-close. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-17	mptcp: move first subflow allocation at mpc access time	Paolo Abeni
	In the long run this will simplify the mptcp code and will allow for more consistent behavior. Move the first subflow allocation out of the sock->init ops into the __mptcp_nmpc_socket() helper. Since the first subflow creation can now happen after the first setsockopt() we additionally need to invoke mptcp_sockopt_sync() on it. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-17	mptcp: move fastopen subflow check inside mptcp_sendmsg_fastopen()	Paolo Abeni
	So that we can avoid a bunch of check in fastpath. Additionally we can specialize such check according to the specific fastopen method - defer_connect vs MSG_FASTOPEN. The latter bits will simplify the next patches. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>