summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-08-07net/mlx5: Return correct EC_VF function IDDaniel Jurgens
The ECVF function ID range is 1..max_ec_vfs. Currently mlx5_vport_to_func_id returns 0..max_ec_vfs - 1. Which results in a syndrome when querying the caps with more recent firmware, or reading incorrect caps with older firmware that supports EC VFs. Fixes: 9ac0b128248e ("net/mlx5: Update vport caps query/set for EC VFs") Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5: DR, Fix wrong allocation of modify hdr patternYevgeny Kliteynik
Fixing wrong calculation of the modify hdr pattern size, where the previously calculated number would not be enough to accommodate the required number of actions. Fixes: da5d0027d666 ("net/mlx5: DR, Add cache for modify header pattern") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Erez Shitrit <erezsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5e: TC, Fix internal port memory leakJianbo Liu
The flow rule can be splited, and the extra post_act rules are added to post_act table. It's possible to trigger memleak when the rule forwards packets from internal port and over tunnel, in the case that, for example, CT 'new' state offload is allowed. As int_port object is assigned to the flow attribute of post_act rule, and its refcnt is incremented by mlx5e_tc_int_port_get(), but mlx5e_tc_int_port_put() is not called, the refcnt is never decremented, then int_port is never freed. The kmemleak reports the following error: unreferenced object 0xffff888128204b80 (size 64): comm "handler20", pid 50121, jiffies 4296973009 (age 642.932s) hex dump (first 32 bytes): 01 00 00 00 19 00 00 00 03 f0 00 00 04 00 00 00 ................ 98 77 67 41 81 88 ff ff 98 77 67 41 81 88 ff ff .wgA.....wgA.... backtrace: [<00000000e992680d>] kmalloc_trace+0x27/0x120 [<000000009e945a98>] mlx5e_tc_int_port_get+0x3f3/0xe20 [mlx5_core] [<0000000035a537f0>] mlx5e_tc_add_fdb_flow+0x473/0xcf0 [mlx5_core] [<0000000070c2cec6>] __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core] [<000000005cc84048>] mlx5e_configure_flower+0xd40/0x4c40 [mlx5_core] [<000000004f8a2031>] mlx5e_rep_indr_offload.isra.0+0x10e/0x1c0 [mlx5_core] [<000000007df797dc>] mlx5e_rep_indr_setup_tc_cb+0x90/0x130 [mlx5_core] [<0000000016c15cc3>] tc_setup_cb_add+0x1cf/0x410 [<00000000a63305b4>] fl_hw_replace_filter+0x38f/0x670 [cls_flower] [<000000008bc9e77c>] fl_change+0x1fd5/0x4430 [cls_flower] [<00000000e7f766e4>] tc_new_tfilter+0x867/0x2010 [<00000000e101c0ef>] rtnetlink_rcv_msg+0x6fc/0x9f0 [<00000000e1111d44>] netlink_rcv_skb+0x12c/0x360 [<0000000082dd6c8b>] netlink_unicast+0x438/0x710 [<00000000fc568f70>] netlink_sendmsg+0x794/0xc50 [<0000000016e92590>] sock_sendmsg+0xc5/0x190 So fix this by moving int_port cleanup code to the flow attribute free helper, which is used by all the attribute free cases. Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5e: Take RTNL lock when needed before calling xdp_set_features()Gal Pressman
Hold RTNL lock when calling xdp_set_features() with a registered netdev, as the call triggers the netdev notifiers. This could happen when switching from uplink rep to nic profile for example. This resolves the following call trace: RTNL: assertion failed at net/core/dev.c (1953) WARNING: CPU: 6 PID: 112670 at net/core/dev.c:1953 call_netdevice_notifiers_info+0x7c/0x80 Modules linked in: sch_mqprio sch_mqprio_lib act_tunnel_key act_mirred act_skbedit cls_matchall nfnetlink_cttimeout act_gact cls_flower sch_ingress bonding ib_umad ip_gre rdma_ucm mlx5_vfio_pci ipip tunnel4 ip6_gre gre mlx5_ib vfio_pci vfio_pci_core vfio_iommu_type1 ib_uverbs vfio mlx5_core ib_ipoib geneve nf_tables ip6_tunnel tunnel6 iptable_raw openvswitch nsh rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay zram zsmalloc fuse [last unloaded: ib_uverbs] CPU: 6 PID: 112670 Comm: devlink Not tainted 6.4.0-rc7_for_upstream_min_debug_2023_06_28_17_02 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:call_netdevice_notifiers_info+0x7c/0x80 Code: 90 ff 80 3d 2d 6b f7 00 00 75 c5 ba a1 07 00 00 48 c7 c6 e4 ce 0b 82 48 c7 c7 c8 f4 04 82 c6 05 11 6b f7 00 01 e8 a4 7c 8e ff <0f> 0b eb a2 0f 1f 44 00 00 55 48 89 e5 41 54 48 83 e4 f0 48 83 ec RSP: 0018:ffff8882a21c3948 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffff82e6f880 RCX: 0000000000000027 RDX: ffff88885f99b5c8 RSI: 0000000000000001 RDI: ffff88885f99b5c0 RBP: 0000000000000028 R08: ffff88887ffabaa8 R09: 0000000000000003 R10: ffff88887fecbac0 R11: ffff88887ff7bac0 R12: ffff8882a21c3968 R13: ffff88811c018940 R14: 0000000000000000 R15: ffff8881274401a0 FS: 00007fe141c81800(0000) GS:ffff88885f980000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f787c28b948 CR3: 000000014bcf3005 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __warn+0x79/0x120 ? call_netdevice_notifiers_info+0x7c/0x80 ? report_bug+0x17c/0x190 ? handle_bug+0x3c/0x60 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? call_netdevice_notifiers_info+0x7c/0x80 ? call_netdevice_notifiers_info+0x7c/0x80 call_netdevice_notifiers+0x2e/0x50 mlx5e_set_xdp_feature+0x21/0x50 [mlx5_core] mlx5e_nic_init+0xf1/0x1a0 [mlx5_core] mlx5e_netdev_init_profile+0x76/0x110 [mlx5_core] mlx5e_netdev_attach_profile+0x1f/0x90 [mlx5_core] mlx5e_netdev_change_profile+0x92/0x160 [mlx5_core] mlx5e_netdev_attach_nic_profile+0x1b/0x30 [mlx5_core] mlx5e_vport_rep_unload+0xaa/0xc0 [mlx5_core] __esw_offloads_unload_rep+0x52/0x60 [mlx5_core] mlx5_esw_offloads_rep_unload+0x52/0x70 [mlx5_core] esw_offloads_unload_rep+0x34/0x70 [mlx5_core] esw_offloads_disable+0x2b/0x90 [mlx5_core] mlx5_eswitch_disable_locked+0x1b9/0x210 [mlx5_core] mlx5_devlink_eswitch_mode_set+0xf5/0x630 [mlx5_core] ? devlink_get_from_attrs_lock+0x9e/0x110 devlink_nl_cmd_eswitch_set_doit+0x60/0xe0 genl_family_rcv_msg_doit.isra.0+0xc2/0x110 genl_rcv_msg+0x17d/0x2b0 ? devlink_get_from_attrs_lock+0x110/0x110 ? devlink_nl_cmd_eswitch_get_doit+0x290/0x290 ? devlink_pernet_pre_exit+0xf0/0xf0 ? genl_family_rcv_msg_doit.isra.0+0x110/0x110 netlink_rcv_skb+0x54/0x100 genl_rcv+0x24/0x40 netlink_unicast+0x1f6/0x2c0 netlink_sendmsg+0x232/0x4a0 sock_sendmsg+0x38/0x60 ? _copy_from_user+0x2a/0x60 __sys_sendto+0x110/0x160 ? __count_memcg_events+0x48/0x90 ? handle_mm_fault+0x161/0x260 ? do_user_addr_fault+0x278/0x6e0 __x64_sys_sendto+0x20/0x30 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7fe141b1340a Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 RSP: 002b:00007fff61d03de8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000afab00 RCX: 00007fe141b1340a RDX: 0000000000000038 RSI: 0000000000afab00 RDI: 0000000000000003 RBP: 0000000000afa910 R08: 00007fe141d80200 R09: 000000000000000c R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001 </TASK> Fixes: 4d5ab0ad964d ("net/mlx5e: take into account device reconfiguration for xdp_features flag") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07tpm/tpm_tis: Disable interrupts for Lenovo P620 devicesJonathan McDowell
The Lenovo ThinkStation P620 suffers from an irq storm issue like various other Lenovo machines, so add an entry for it to tpm_tis_dmi_table and force polling. It is worth noting that 481c2d14627d (tpm,tpm_tis: Disable interrupts after 1000 unhandled IRQs) does not seem to fix the problem on this machine, but setting 'tpm_tis.interrupts=0' on the kernel command line does. [jarkko@kernel.org: truncated the commit ID in the description to 12 characters] Cc: stable@vger.kernel.org # v6.4+ Fixes: e644b2f498d2 ("tpm, tpm_tis: Enable interrupt test") Signed-off-by: Jonathan McDowell <noodles@meta.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-08-07tpm: Disable RNG for all AMD fTPMsMario Limonciello
The TPM RNG functionality is not necessary for entropy when the CPU already supports the RDRAND instruction. The TPM RNG functionality was previously disabled on a subset of AMD fTPM series, but reports continue to show problems on some systems causing stutter root caused to TPM RNG functionality. Expand disabling TPM RNG use for all AMD fTPMs whether they have versions that claim to have fixed or not. To accomplish this, move the detection into part of the TPM CRB registration and add a flag indicating that the TPM should opt-out of registration to hwrng. Cc: stable@vger.kernel.org # 6.1.y+ Fixes: b006c439d58d ("hwrng: core - start hwrng kthread also for untrusted sources") Fixes: f1324bbc4011 ("tpm: disable hwrng for fTPM on some AMD designs") Reported-by: daniil.stas@posteo.net Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217719 Reported-by: bitlord0xff@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217212 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-08-07sysctl: set variable key_sysctls storage-class-specifier to staticTom Rix
smatch reports security/keys/sysctl.c:12:18: warning: symbol 'key_sysctls' was not declared. Should it be static? This variable is only used in its defining file, so it should be static. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-08-07tpm/tpm_tis: Disable interrupts for TUXEDO InfinityBook S 15/17 Gen7Takashi Iwai
TUXEDO InfinityBook S 15/17 Gen7 suffers from an IRQ problem on tpm_tis like a few other laptops. Add an entry for the workaround. Cc: stable@vger.kernel.org Fixes: e644b2f498d2 ("tpm, tpm_tis: Enable interrupt test") Link: https://bugzilla.suse.com/show_bug.cgi?id=1213645 Signed-off-by: Takashi Iwai <tiwai@suse.de> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-08-07net/mlx5: Bridge, Only handle registered netdev bridge eventsRoi Dayan
Don't handle bridge events for a netdev that doesn't belong to an eswitch that registered for bridge events. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5: E-Switch, Remove redundant arg ignore_flow_lvlRoi Dayan
The arg is always passed as true and thus redundant. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5: Fix typo reminder -> remainderGal Pressman
Fix a typo in esw_qos_devlink_rate_to_mbps(): reminder -> remainder. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5: remove many unnecessary NULL valuesRuan Jinjie
There are many pointers assigned first, which need not to be initialized, so remove the NULL assignments. Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5: Allocate completion EQs dynamicallyMaher Sanalla
This commit enables the dynamic allocation of EQs at runtime, allowing for more flexibility in managing completion EQs and reducing the memory overhead of driver load. Whenever a CQ is created for a given vector index, the driver will lookup to see if there is an already mapped completion EQ for that vector, if so, utilize it. Otherwise, allocate a new EQ on demand and then utilize it for the CQ completion events. Add a protection lock to the EQ table to protect from concurrent EQ creation attempts. While at it, replace mlx5_vector2irqn()/mlx5_vector2eqn() with mlx5_comp_eqn_get() and mlx5_comp_irqn_get() which will allocate an EQ on demand if no EQ is found for the given vector. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5: Handle SF IRQ request in the absence of SF IRQ poolMaher Sanalla
In case the SF IRQ pool is not available due to setup limitations, SF currently relies on the already allocated PF IRQs to fulfill its IRQ vector requests. However, with the dynamic EQ allocation introduced in the next patch, it is possible that not all IRQs of PF will be allocated after the driver is loaded. In such case, if a SF requests a completion IRQ without having its own independent IRQ pool, SF will lack a PF IRQ to utilize. To address this scenario, allocate an IRQ for the SF from the PF's IRQ pool on demand. The new IRQ will be shared between the SF and it's PF. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5: Rename mlx5_comp_vectors_count() to mlx5_comp_vectors_max()Maher Sanalla
To accurately represent its purpose, rename the function that retrieves the value of maximum vectors from mlx5_comp_vectors_count() to mlx5_comp_vectors_max(). Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5: Add IRQ vector to CPU lookup functionMaher Sanalla
Currently, once driver load completes, IRQ requests were performed for all vectors. However, as we move to support dynamic creation of EQs, this will not be the case as some IRQs will not exist at this stage. Thus, in such case, use the default CPU to IRQ mapping which is the serial mapping based on IRQ vector index. Meaning, the n'th vector gets mapped to the n'th CPU. Introduce an API function mlx5_comp_vector_cpu() that takes an IRQ index and provides the corresponding CPU mapping. It utilizes the existing IRQ affinity if defined, or resorts to the default serialized CPU mapping otherwise. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5: Introduce mlx5_cpumask_default_spreadMaher Sanalla
For better code readability in the completion IRQ request code, define the cpu lookup per completion vector logic in a separate function. The new method mlx5_cpumask_default_spread() given a vector index 'n' will return the 'nth' cpu. This new method will be used also in the next patch. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5: Implement single completion EQ create/destroy methodsMaher Sanalla
Currently, create_comp_eqs() function handles the creation of all completion EQs for all the vectors on driver load. While on driver unload, destroy_comp_eqs() performs the equivalent job. In preparation for dynamic EQ creation, replace create_comp_eqs() / destroy_comp_eqs() with create_comp_eq() / destroy_comp_eq() functions which will receive a vector index and allocate/destroy an EQ for that specific vector. Thus, allowing more flexibility in the management of completion EQs. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5: Use xarray to store and manage completion EQsMaher Sanalla
Use xarray to store the completion EQs instead of a linked list. The xarray offers more scalability, reduced memory overhead, and facilitates the lookup of a certain EQ given a vector index. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5: Refactor completion IRQ request/release handlers in EQ layerMaher Sanalla
Break the completion IRQ request/release functions into per-vector handlers for both PCI devices and SFs in the EQ layer. On EQ table creation, loop over all vectors and request an IRQ for each one using the new per-vector functions. Perform the symmetrical change when releasing IRQs on EQ table cleanup. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5: Use xarray to store and manage completion IRQsMaher Sanalla
Use xarray to store the completion IRQs instead of a fixed-size allocated array as not all completion IRQs will be requested on driver load, but rather on demand when an EQ is created. The xarray offers more scalability, reduced memory overhead, and provides the ability to dynamically resize the array when needed. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5: Refactor completion IRQ request/release APIMaher Sanalla
Introduce a per-vector completion IRQ request API that requests a single IRQ for a given vector index instead of multiple IRQs request API. On driver load, loop over all completion vectors and request an IRQ for each one via the newly introduced API. Symmetrically, introduce an IRQ release API per vector. On driver unload, loop over all vectors and release each completion IRQ via the new per-vector API. As IRQ vectors will be requested dynamically later in the patchset, add a cpumask of the bounded CPUs to avoid the possible mapping of two IRQs of the same device to the same cpu. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5: Track the current number of completion EQsMaher Sanalla
In preparation to allocate completion EQs, add a counter to track the number of completion EQs currently allocated. Store the maximum number of EQs in max_comp_eqs variable. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "x86: - Fix SEV race condition ARM: - Fixes for the configuration of SVE/SME traps when hVHE mode is in use - Allow use of pKVM on systems with FF-A implementations that are v1.0 compatible - Request/release percpu IRQs (arch timer, vGIC maintenance) correctly when pKVM is in use - Fix function prototype after __kvm_host_psci_cpu_entry() rename - Skip to the next instruction when emulating writes to TCR_EL1 on AmpereOne systems Selftests: - Fix missing include" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: selftests/rseq: Fix build with undefined __weak KVM: SEV: remove ghcb variable declarations KVM: SEV: only access GHCB fields once KVM: SEV: snapshot the GHCB before accessing it KVM: arm64: Skip instruction after emulating write to TCR_EL1 KVM: arm64: fix __kvm_host_psci_cpu_entry() prototype KVM: arm64: Fix resetting SME trap values on reset for (h)VHE KVM: arm64: Fix resetting SVE trap values on reset for hVHE KVM: arm64: Use the appropriate feature trap register when activating traps KVM: arm64: Helper to write to appropriate feature trap register based on mode KVM: arm64: Disable SME traps for (h)VHE at setup KVM: arm64: Use the appropriate feature trap register for SVE at EL2 setup KVM: arm64: Factor out code for checking (h)VHE mode into a macro KVM: arm64: Rephrase percpu enable/disable tracking in terms of hyp KVM: arm64: Fix hardware enable/disable flows for pKVM KVM: arm64: Allow pKVM on v1.0 compatible FF-A implementations
2023-08-07Merge tag 'mmc-v6.5-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: - moxart: Fix big-endian conversion for SCR structure - sdhci-f-sdh30: Replace with sdhci_pltfm to fix PM support * tag 'mmc-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: sdhci-f-sdh30: Replace with sdhci_pltfm mmc: moxart: read scr register without changing byte order
2023-08-07gfs2: Don't use filemap_splice_readBob Peterson
Starting with patch 2cb1e08985, gfs2 started using the new function filemap_splice_read rather than the old (and subsequently deleted) function generic_file_splice_read. filemap_splice_read works by taking references to a number of folios in the page cache and splicing those folios into a pipe. The folios are then read from the pipe and the folio references are dropped. This can take an arbitrary amount of time. We cannot allow that in gfs2 because those folio references will pin the inode glock to the node and prevent it from being demoted, which can lead to cluster-wide deadlocks. Instead, use copy_splice_read. (In addition, the old generic_file_splice_read called into ->read_iter, which called gfs2_file_read_iter, which took the inode glock during the operation. The new filemap_splice_read interface does not take the inode glock anymore. This is fixable, but it still wouldn't prevent cluster-wide deadlocks.) Fixes: 2cb1e08985e3 ("splice: Use filemap_splice_read() instead of generic_file_splice_read()") Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-08-07gfs2: Fix freeze consistency check in gfs2_trans_add_metaAndreas Gruenbacher
Function gfs2_trans_add_meta() checks for the SDF_FROZEN flag to make sure that no buffers are added to a transaction while the filesystem is frozen. With the recent freeze/thaw rework, the SDF_FROZEN flag is cleared after thaw_super() is called, which is sufficient for serializing freeze/thaw. However, other filesystem operations started after thaw_super() may now be calling gfs2_trans_add_meta() before the SDF_FROZEN flag is cleared, which will trigger the SDF_FROZEN check in gfs2_trans_add_meta(). Fix that by checking the s_writers.frozen state instead. In addition, make sure not to call gfs2_assert_withdraw() with the sd_log_lock spin lock held. Check for a withdrawn filesystem before checking for a frozen filesystem, and don't pin/add buffers to the current transaction in case of a failure in either case. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2023-08-07x86/srso: Tie SBPB bit setting to microcode patch detectionBorislav Petkov (AMD)
The SBPB bit in MSR_IA32_PRED_CMD is supported only after a microcode patch has been applied so set X86_FEATURE_SBPB only then. Otherwise, guests would attempt to set that bit and #GP on the MSR write. While at it, make SMT detection more robust as some guests - depending on how and what CPUID leafs their report - lead to cpu_smt_control getting set to CPU_SMT_NOT_SUPPORTED but SRSO_NO should be set for any guest incarnation where one simply cannot do SMT, for whatever reason. Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation") Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reported-by: Salvatore Bonaccorso <carnil@debian.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-08-07udp/udplite: Remove unused function declarations udp{,lite}_get_port()Yue Haibing
Commit 6ba5a3c52da0 ("[UDP]: Make full use of proto.h.udp_hash innovation.") removed these implementations but leave declarations. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-07net: sfp: Remove unused function declaration sfp_link_configure()Yue Haibing
Commit ce0aa27ff3f6 ("sfp: add sfp-bus to bridge between network devices and sfp cages") declared but never implemented it. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-07ndisc: Remove unused ndisc_ifinfo_sysctl_strategy() declarationYue Haibing
Commit f8572d8f2a2b ("sysctl net: Remove unused binary sysctl code") left behind this declaration. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-07net: pkt_cls: Remove unused inline helpersYue Haibing
Commit acb674428c3d ("net: sched: introduce per-block callbacks") implemented these but never used it. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-07neighbour: Remove unused function declaration pneigh_for_each()Yue Haibing
pneigh_for_each() is never implemented since the beginning of git history. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-07net/tls: Remove unused function declarationsYue Haibing
Commit 3c4d7559159b ("tls: kernel TLS support") declared but never implemented these functions. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-07Revert "riscv: dts: allwinner: d1: Add CAN controller nodes"Marc Kleine-Budde
It turned out the dtsi changes were not quite ready, revert them for now. This reverts commit 6ea1ad888f5900953a21853e709fa499fdfcb317. Link: https://lore.kernel.org/all/2690764.mvXUDI8C0e@jernej-laptop Suggested-by: Jernej Škrabec <jernej.skrabec@gmail.com> Link: https://lore.kernel.org/all/20230807-riscv-allwinner-d1-revert-can-controller-nodes-v1-1-eb3f70b435d9@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-08-06Linux 6.5-rc5v6.5-rc5Linus Torvalds
2023-08-07dmaengine: xilinx: xdma: Fix typoMiquel Raynal
Probably a copy/paste error with the previous block, here we are actually managing C2H IRQs. Fixes: 17ce252266c7 ("dmaengine: xilinx: xdma: Add xilinx xdma driver") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20230731101442.792514-3-miquel.raynal@bootlin.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-08-07dmaengine: xilinx: xdma: Fix interrupt vector settingMiquel Raynal
A couple of hardware registers need to be set to reflect which interrupts have been allocated to the device. Each register is 32-bit wide and can receive four 8-bit values. If we provide any other interrupt number than four, the irq_num variable will never be 0 within the while check and the while block will loop forever. There is an easy way to prevent this: just break the for loop when we reach "irq_num == 0", which anyway means all interrupts have been processed. Cc: stable@vger.kernel.org Fixes: 17ce252266c7 ("dmaengine: xilinx: xdma: Add xilinx xdma driver") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://lore.kernel.org/r/20230731101442.792514-2-miquel.raynal@bootlin.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-08-07dmaengine: owl-dma: Modify mismatched function nameZhang Jianhua
No functional modification involved. drivers/dma/owl-dma.c:208: warning: expecting prototype for struct owl_dma_pchan. Prototype was for struct owl_dma_vchan instead HDRTEST usr/include/sound/asequencer.h Fixes: 47e20577c24d ("dmaengine: Add Actions Semi Owl family S900 DMA driver") Signed-off-by: Zhang Jianhua <chris.zjh@huawei.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20230722153244.2086949-1-chris.zjh@huawei.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-08-07dmaengine: idxd: Clear PRS disable flag when disabling IDXD deviceFenghua Yu
Disabling IDXD device doesn't reset Page Request Service (PRS) disable flag to its initial value 0. This may cause user confusion because once PRS is disabled user will see PRS still remains the previous setting (i.e. disabled) via sysfs interface even after the device is disabled. To eliminate user confusion, reset PRS disable flag to ensure that the PRS flag bit reflects correct state after the device is disabled. Additionally, simplify the code by setting wq->flags to 0, which clears all flag bits, including any future additions. Fixes: f2dc327131b5 ("dmaengine: idxd: add per wq PRS disable") Tested-by: Tony Zhu <tony.zhu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20230712193505.3440752-1-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-08-07dmaengine: pl330: Return DMA_PAUSED when transaction is pausedIlpo Järvinen
pl330_pause() does not set anything to indicate paused condition which causes pl330_tx_status() to return DMA_IN_PROGRESS. This breaks 8250 DMA flush after the fix in commit 57e9af7831dc ("serial: 8250_dma: Fix DMA Rx rearm race"). The function comment for pl330_pause() claims pause is supported but resume is not which is enough for 8250 DMA flush to work as long as DMA status reports DMA_PAUSED when appropriate. Add PAUSED state for descriptor and mark BUSY descriptors with PAUSED in pl330_pause(). Return DMA_PAUSED from pl330_tx_status() when the descriptor is PAUSED. Reported-by: Richard Tresidder <rtresidd@electromag.com.au> Tested-by: Richard Tresidder <rtresidd@electromag.com.au> Fixes: 88987d2c7534 ("dmaengine: pl330: add DMA_PAUSE feature") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/linux-serial/f8a86ecd-64b1-573f-c2fa-59f541083f1a@electromag.com.au/ Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20230526105434.14959-1-ilpo.jarvinen@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-08-07dmaengine: qcom_hidma: Update codeaurora email domainJeffrey Hugo
The codeaurora.org email domain is defunct and will bounce. Update entries to Sinan's kernel.org address which is the address in MAINTAINERS for this component. Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Acked-By: Sinan Kaya <okaya@kernel.org> Link: https://lore.kernel.org/r/20230707195003.6619-1-quic_jhugo@quicinc.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-08-07dmaengine: mcf-edma: Fix a potential un-allocated memory accessChristophe JAILLET
When 'mcf_edma' is allocated, some space is allocated for a flexible array at the end of the struct. 'chans' item are allocated, that is to say 'pdata->dma_channels'. Then, this number of item is stored in 'mcf_edma->n_chans'. A few lines later, if 'mcf_edma->n_chans' is 0, then a default value of 64 is set. This ends to no space allocated by devm_kzalloc() because chans was 0, but 64 items are read and/or written in some not allocated memory. Change the logic to define a default value before allocating the memory. Fixes: e7a3ff92eaf1 ("dmaengine: fsl-edma: add ColdFire mcf5441x edma support") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/f55d914407c900828f6fad3ea5fa791a5f17b9a4.1685172449.git.christophe.jaillet@wanadoo.fr Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-08-06Merge tag 'v6.5-rc5.vfs.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix a wrong check for O_TMPFILE during RESOLVE_CACHED lookup - Clean up directory iterators and clarify file_needs_f_pos_lock() * tag 'v6.5-rc5.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: rely on ->iterate_shared to determine f_pos locking vfs: get rid of old '->iterate' directory operation proc: fix missing conversion to 'iterate_shared' open: make RESOLVE_CACHED correctly test for O_TMPFILE
2023-08-06ionic: Add missing err handling for queue reconfigNitya Sunkad
ionic_start_queues_reconfig returns an error code if txrx_init fails. Handle this error code in the relevant places. This fixes a corner case where the device could get left in a detached state if the CMB reconfig fails and the attempt to clean up the mess also fails. Note that calling netif_device_attach when the netdev is already attached does not lead to unexpected behavior. Change goto name "errout" to "err_out" to maintain consistency across goto statements. Fixes: 40bc471dc714 ("ionic: add tx/rx-push support with device Component Memory Buffers") Fixes: 6f7d6f0fd7a3 ("ionic: pull reset_queues into tx_timeout handler") Signed-off-by: Nitya Sunkad <nitya.sunkad@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-06drivers: vxlan: vnifilter: free percpu vni stats on error pathFedor Pchelkin
In case rhashtable_lookup_insert_fast() fails inside vxlan_vni_add(), the allocated percpu vni stats are not freed on the error path. Introduce vxlan_vni_free() which would work as a nice wrapper to free vxlan_vni_node resources properly. Found by Linux Verification Center (linuxtesting.org). Fixes: 4095e0e1328a ("drivers: vxlan: vnifilter: per vni stats") Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-06fs: rely on ->iterate_shared to determine f_pos lockingChristian Brauner
Now that we removed ->iterate we don't need to check for either ->iterate or ->iterate_shared in file_needs_f_pos_lock(). Simply check for ->iterate_shared instead. This will tell us whether we need to unconditionally take the lock. Not just does it allow us to avoid checking f_inode's mode it also actually clearly shows that we're locking because of readdir. Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-06vfs: get rid of old '->iterate' directory operationLinus Torvalds
All users now just use '->iterate_shared()', which only takes the directory inode lock for reading. Filesystems that never got convered to shared mode now instead use a wrapper that drops the lock, re-takes it in write mode, calls the old function, and then downgrades the lock back to read mode. This way the VFS layer and other callers no longer need to care about filesystems that never got converted to the modern era. The filesystems that use the new wrapper are ceph, coda, exfat, jfs, ntfs, ocfs2, overlayfs, and vboxsf. Honestly, several of them look like they really could just iterate their directories in shared mode and skip the wrapper entirely, but the point of this change is to not change semantics or fix filesystems that haven't been fixed in the last 7+ years, but to finally get rid of the dual iterators. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-06proc: fix missing conversion to 'iterate_shared'Linus Torvalds
I'm looking at the directory handling due to the discussion about f_pos locking (see commit 797964253d35: "file: reinstate f_pos locking optimization for regular files"), and wanting to clean that up. And one source of ugliness is how we were supposed to move filesystems over to the '->iterate_shared()' function that only takes the inode lock for reading many many years ago, but several filesystems still use the bad old '->iterate()' that takes the inode lock for exclusive access. See commit 6192269444eb ("introduce a parallel variant of ->iterate()") that also added some documentation stating Old method is only used if the new one is absent; eventually it will be removed. Switch while you still can; the old one won't stay. and that was back in April 2016. Here we are, many years later, and the old version is still clearly sadly alive and well. Now, some of those old style iterators are probably just because the filesystem may end up having per-inode mutable data that it uses for iterating a directory, but at least one case is just a mistake. Al switched over most filesystems to use '->iterate_shared()' back when it was introduced. In particular, the /proc filesystem was converted as one of the first ones in commit f50752eaa0b0 ("switch all procfs directories ->iterate_shared()"). But then later one new user of '->iterate()' was then re-introduced by commit 6d9c939dbe4d ("procfs: add smack subdir to attrs"). And that's clearly not what we wanted, since that new case just uses the same 'proc_pident_readdir()' and 'proc_pident_lookup()' helper functions that other /proc pident directories use, and they are most definitely safe to use with the inode lock held shared. So just fix it. This still leaves a fair number of oddball filesystems using the old-style directory iterator (ceph, coda, exfat, jfs, ntfs, ocfs2, overlayfs, and vboxsf), but at least we don't have any remaining in the core filesystems. I'm going to add a wrapper function that just drops the read-lock and takes it as a write lock, so that we can clean up the core vfs layer and make all the ugly 'this filesystem needs exclusive inode locking' be just filesystem-internal warts. I just didn't want to make that conversion when we still had a core user left. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-06open: make RESOLVE_CACHED correctly test for O_TMPFILEAleksa Sarai
O_TMPFILE is actually __O_TMPFILE|O_DIRECTORY. This means that the old fast-path check for RESOLVE_CACHED would reject all users passing O_DIRECTORY with -EAGAIN, when in fact the intended test was to check for __O_TMPFILE. Cc: stable@vger.kernel.org # v5.12+ Fixes: 99668f618062 ("fs: expose LOOKUP_CACHED through openat2() RESOLVE_CACHED") Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Message-Id: <20230806-resolve_cached-o_tmpfile-v1-1-7ba16308465e@cyphar.com> Signed-off-by: Christian Brauner <brauner@kernel.org>