summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-04-29KVM: x86/mmu: Do not create SPTEs for GFNs that exceed host.MAXPHYADDRSean Christopherson
Disallow memslots and MMIO SPTEs whose gpa range would exceed the host's MAXPHYADDR, i.e. don't create SPTEs for gfns that exceed host.MAXPHYADDR. The TDP MMU bounds its zapping based on host.MAXPHYADDR, and so if the guest, possibly with help from userspace, manages to coerce KVM into creating a SPTE for an "impossible" gfn, KVM will leak the associated shadow pages (page tables): WARNING: CPU: 10 PID: 1122 at arch/x86/kvm/mmu/tdp_mmu.c:57 kvm_mmu_uninit_tdp_mmu+0x4b/0x60 [kvm] Modules linked in: kvm_intel kvm irqbypass CPU: 10 PID: 1122 Comm: set_memory_regi Tainted: G W 5.18.0-rc1+ #293 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x4b/0x60 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2d0 [kvm] kvm_vm_release+0x1d/0x30 [kvm] __fput+0x82/0x240 task_work_run+0x5b/0x90 exit_to_user_mode_prepare+0xd2/0xe0 syscall_exit_to_user_mode+0x1d/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> On bare metal, encountering an impossible gpa in the page fault path is well and truly impossible, barring CPU bugs, as the CPU will signal #PF during the gva=>gpa translation (or a similar failure when stuffing a physical address into e.g. the VMCS/VMCB). But if KVM is running as a VM itself, the MAXPHYADDR enumerated to KVM may not be the actual MAXPHYADDR of the underlying hardware, in which case the hardware will not fault on the illegal-from-KVM's-perspective gpa. Alternatively, KVM could continue allowing the dodgy behavior and simply zap the max possible range. But, for hosts with MAXPHYADDR < 52, that's a (minor) waste of cycles, and more importantly, KVM can't reasonably support impossible memslots when running on bare metal (or with an accurate MAXPHYADDR as a VM). Note, limiting the overhead by checking if KVM is running as a guest is not a safe option as the host isn't required to announce itself to the guest in any way, e.g. doesn't need to set the HYPERVISOR CPUID bit. A second alternative to disallowing the memslot behavior would be to disallow creating a VM with guest.MAXPHYADDR > host.MAXPHYADDR. That restriction is undesirable as there are legitimate use cases for doing so, e.g. using the highest host.MAXPHYADDR out of a pool of heterogeneous systems so that VMs can be migrated between hosts with different MAXPHYADDRs without running afoul of the allow_smaller_maxphyaddr mess. Note that any guest.MAXPHYADDR is valid with shadow paging, and it is even useful in order to test KVM with MAXPHYADDR=52 (i.e. without any reserved physical address bits). The now common kvm_mmu_max_gfn() is inclusive instead of exclusive. The memslot and TDP MMU code want an exclusive value, but the name implies the returned value is inclusive, and the MMIO path needs an inclusive check. Fixes: faaf05b00aec ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU") Fixes: 524a1e4e381f ("KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs") Cc: stable@vger.kernel.org Cc: Maxim Levitsky <mlevitsk@redhat.com> Cc: Ben Gardon <bgardon@google.com> Cc: David Matlack <dmatlack@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220428233416.2446833-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29Merge tag 'kvmarm-fixes-5.18-2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.18, take #2 - Take care of faults occuring between the PARange and IPA range by injecting an exception - Fix S2 faults taken from a host EL0 in protected mode - Work around Oops caused by a PMU access from a 32bit guest when PMU has been created. This is a temporary bodge until we fix it for good.
2022-04-29Merge tag 'tegra-for-5.18-arm-defconfig-fixes' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into arm/fixes ARM: tegra: Default configuration fixes for v5.18 This contains two updates to the default configuration needed because of a Kconfig symbol name change. This fixes a failure that was detected in the NVIDIA automated test farm. * tag 'tegra-for-5.18-arm-defconfig-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux: ARM: config: multi v7: Enable NVIDIA Tegra video decoder driver ARM: tegra_defconfig: Update CONFIG_TEGRA_VDE option Link: https://lore.kernel.org/r/20220429080626.494150-1-thierry.reding@gmail.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-04-29io_uring: check that data field is 0 in ringfd unregisterEugene Syromiatnikov
Only allow data field to be 0 in struct io_uring_rsrc_update user arguments to allow for future possible usage. Fixes: e7a6c00dc77a ("io_uring: add support for registering ring file descriptors") Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com> Link: https://lore.kernel.org/r/20220429142218.GA28696@asgard.redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-29Merge tag 'imx-fixes-5.18-2' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes i.MX fixes for 5.18, 2nd round: - Fix one sparse warning on imx-weim driver. - Fix vqmmc regulator to get UHS-I mode work on imx6ull-colibri board. - Add missing 32.768 kHz PMIC clock for imx8mn-ddr4-evk board to fix bd718xx-clk probe error. * tag 'imx-fixes-5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: arm64: dts: imx8mn-ddr4-evk: Describe the 32.768 kHz PMIC clock ARM: dts: imx6ull-colibri: fix vqmmc regulator bus: imx-weim: make symbol 'weim_of_notifier' static Link: https://lore.kernel.org/r/20220426013427.GB14615@dragon Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-04-29Merge tag 'sunxi-fixes-for-5.18-1' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into arm/fixes Fix return value in RSB bus driver * tag 'sunxi-fixes-for-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux: bus: sunxi-rsb: Fix the return value of sunxi_rsb_device_create() Link: https://lore.kernel.org/r/Ymbkd+/dDmRJz66w@kista.localdomain Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-04-29bfq: Fix warning in bfqq_request_over_limit()Jan Kara
People are occasionally reporting a warning bfqq_request_over_limit() triggering reporting that BFQ's idea of cgroup hierarchy (and its depth) does not match what generic blkcg code thinks. This can actually happen when bfqq gets moved between BFQ groups while bfqq_request_over_limit() is running. Make sure the code is safe against BFQ queue being moved to a different BFQ group. Fixes: 76f1df88bbc2 ("bfq: Limit number of requests consumed by each cgroup") CC: stable@vger.kernel.org Link: https://lore.kernel.org/all/CAJCQCtTw_2C7ZSz7as5Gvq=OmnDiio=HRkQekqWpKot84sQhFA@mail.gmail.com/ Reported-by: Chris Murphy <lists@colorremedies.com> Reported-by: "yukuai (C)" <yukuai3@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220407140738.9723-1-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-29x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guestsThomas Gleixner
When a XEN_HVM guest uses the XEN PIRQ/Eventchannel mechanism, then PCI/MSI[-X] masking is solely controlled by the hypervisor, but contrary to XEN_PV guests this does not disable PCI/MSI[-X] masking in the PCI/MSI layer. This can lead to a situation where the PCI/MSI layer masks an MSI[-X] interrupt and the hypervisor grants the write despite the fact that it already requested the interrupt. As a consequence interrupt delivery on the affected device is not happening ever. Set pci_msi_ignore_mask to prevent that like it's done for XEN_PV guests already. Fixes: 809f9267bbab ("xen: map MSIs into pirqs") Reported-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com> Reported-by: Dusty Mabe <dustymabe@redhat.com> Reported-by: Salvatore Bonaccorso <carnil@debian.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Noah Meyerhans <noahm@debian.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87tuaduxj5.ffs@tglx
2022-04-28Merge tag 'drm-fixes-2022-04-29' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "Another relatively quiet week, amdgpu leads the way, some i915 display fixes, and a single sunxi fix. amdgpu: - Runtime pm fix - DCN memory leak fix in error path - SI DPM deadlock fix - S0ix fix amdkfd: - GWS fix - GWS support for CRIU i915: - Fix #5284: Backlight control regression on XMG Core 15 e21 - Fix black display plane on Acer One AO532h - Two smaller display fixes sunxi: - Single fix removing applying PHYS_OFFSET twice" * tag 'drm-fixes-2022-04-29' of git://anongit.freedesktop.org/drm/drm: drm/amdgpu: keep mmhub clock gating being enabled during s2idle suspend drm/amd/pm: fix the deadlock issue observed on SI drm/amd/display: Fix memory leak in dcn21_clock_source_create drm/amdgpu: don't runtime suspend if there are displays attached (v3) drm/amdkfd: CRIU add support for GWS queues drm/amdkfd: Fix GWS queue count drm/sun4i: Remove obsolete references to PHYS_OFFSET drm/i915/fbc: Consult hw.crtc instead of uapi.crtc drm/i915: Fix SEL_FETCH_PLANE_*(PIPE_B+) register addresses drm/i915: Check EDID for HDR static metadata when choosing blc drm/i915: Fix DISP_POS_Y and DISP_HEIGHT defines
2022-04-29Merge tag 'amd-drm-fixes-5.18-2022-04-27' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-5.18-2022-04-27: amdgpu: - Runtime pm fix - DCN memory leak fix in error path - SI DPM deadlock fix - S0ix fix amdkfd: - GWS fix - GWS support for CRIU Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220428023232.5794-1-alexander.deucher@amd.com
2022-04-29Merge tag 'drm-intel-fixes-2022-04-28' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes - Fix #5284: Backlight control regression on XMG Core 15 e21 - Fix black display plane on Acer One AO532h - Two smaller display fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Ymotel5VfZUrJahf@jlahtine-mobl.ger.corp.intel.com
2022-04-29Merge tag 'drm-misc-fixes-2022-04-27' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes drm-misc-fixes for v5.18-rc5: - Single fix removing applying PHYS_OFFSET twice in sunxi. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/f692bb62-5620-1868-91b7-dffb8d6f9175@linux.intel.com
2022-04-28Merge tag 'net-5.18-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, bpf and netfilter. Current release - new code bugs: - bridge: switchdev: check br_vlan_group() return value - use this_cpu_inc() to increment net->core_stats, fix preempt-rt Previous releases - regressions: - eth: stmmac: fix write to sgmii_adapter_base Previous releases - always broken: - netfilter: nf_conntrack_tcp: re-init for syn packets only, resolving issues with TCP fastopen - tcp: md5: fix incorrect tcp_header_len for incoming connections - tcp: fix F-RTO may not work correctly when receiving DSACK - tcp: ensure use of most recently sent skb when filling rate samples - tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWAT - virtio_net: fix wrong buf address calculation when using xdp - xsk: fix forwarding when combining copy mode with busy poll - xsk: fix possible crash when multiple sockets are created - bpf: lwt: fix crash when using bpf_skb_set_tunnel_key() from bpf_xmit lwt hook - sctp: null-check asoc strreset_chunk in sctp_generate_reconf_event - wireguard: device: check for metadata_dst with skb_valid_dst() - netfilter: update ip6_route_me_harder to consider L3 domain - gre: make o_seqno start from 0 in native mode - gre: switch o_seqno to atomic to prevent races in collect_md mode Misc: - add Eric Dumazet to networking maintainers - dt: dsa: realtek: remove realtek,rtl8367s string - netfilter: flowtable: Remove the empty file" * tag 'net-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits) tcp: fix F-RTO may not work correctly when receiving DSACK Revert "ibmvnic: Add ethtool private flag for driver-defined queue limits" net: enetc: allow tc-etf offload even with NETIF_F_CSUM_MASK ixgbe: ensure IPsec VF<->PF compatibility MAINTAINERS: Update BNXT entry with firmware files netfilter: nft_socket: only do sk lookups when indev is available net: fec: add missing of_node_put() in fec_enet_init_stop_mode() bnx2x: fix napi API usage sequence tls: Skip tls_append_frag on zero copy size Add Eric Dumazet to networking maintainers netfilter: conntrack: fix udp offload timeout sysctl netfilter: nf_conntrack_tcp: re-init for syn packets only net: dsa: lantiq_gswip: Don't set GSWIP_MII_CFG_RMII_CLK net: Use this_cpu_inc() to increment net->core_stats Bluetooth: hci_sync: Cleanup hci_conn if it cannot be aborted Bluetooth: hci_event: Fix creating hci_conn object on error status Bluetooth: hci_event: Fix checking for invalid handle on error status ice: fix use-after-free when deinitializing mailbox snapshot ice: wait 5 s for EMP reset after firmware flash ice: Protect vf_state check by cfg_lock in ice_vc_process_vf_msg() ...
2022-04-28io_uring: fix uninitialized field in rw io_kiocbJoseph Ravichandran
io_rw_init_file does not initialize kiocb->private, so when iocb_bio_iopoll reads kiocb->private it can contain uninitialized data. Fixes: 3e08773c3841 ("block: switch polling to be bio based") Signed-off-by: Joseph Ravichandran <jravi@mit.edu> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-28Merge tag 'thermal-5.18-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control fixes from Rafael Wysocki: "These take back recent chages that started to confuse users and fix up an attr.show callback prototype in a driver. Specifics: - Stop warning about deprecation of the userspace thermal governor and cooling device status interface, because there are cases in which user space has to drive thermal management with the help of them (Daniel Lezcano) - Fix attr.show callback prototype in the int340x thermal driver (Kees Cook)" * tag 'thermal-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal/governor: Remove deprecated information Revert "thermal/core: Deprecate changing cooling device state from userspace" thermal: int340x: Fix attr.show callback prototype
2022-04-28Merge tag 'pm-5.18-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix up recent intel_idle driver changes and fix some ARM cpufreq driver issues. Specifics: - Fix issues with the Qualcomm's cpufreq driver (Dmitry Baryshkov, Vladimir Zapolskiy). - Fix memory leak with the Sun501 driver (Xiaobing Luo). - Make intel_idle enable C1E promotion on all CPUs when C1E is preferred to C1 (Artem Bityutskiy). - Make C6 optimization on Sapphire Rapids added recently work as expected if both C1E and C1 are "preferred" (Artem Bityutskiy)" * tag 'pm-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: intel_idle: Fix SPR C6 optimization intel_idle: Fix the 'preferred_cstates' module parameter cpufreq: qcom-cpufreq-hw: Clear dcvs interrupts cpufreq: fix memory leak in sun50i_cpufreq_nvmem_probe cpufreq: qcom-cpufreq-hw: Fix throttle frequency value on EPSS platforms cpufreq: qcom-hw: provide online/offline operations cpufreq: qcom-hw: fix the opp entries refcounting cpufreq: qcom-hw: fix the race between LMH worker and cpuhp cpufreq: qcom-hw: drop affinity hint before freeing the IRQ
2022-04-28Merge tag 'acpi-5.18-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael WysockiL "These fix up the ACPI processor driver after a change made during the 5.16 cycle that inadvertently broke falling back to shallower C-states when C3 cannot be used. Specifics: - Make the ACPI processor driver avoid falling back to C3 type of C-states when C3 cannot be requested (Ville Syrjälä) - Revert a quirk that is not necessary any more after fixing the underlying issue properly (Ville Syrjälä)" * tag 'acpi-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: Revert "ACPI: processor: idle: fix lockup regression on 32-bit ThinkPad T40" ACPI: processor: idle: Avoid falling back to C3 type C-states
2022-04-28Merge tag 'platform-drivers-x86-v5.18-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Hans de Goede: "Highlights: - asus-wmi bug-fixes - intel-sdsu bug-fixes - build (warning) fixes - couple of hw-id additions" * tag 'platform-drivers-x86-v5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86/intel: pmc/core: change pmc_lpm_modes to static platform/x86/intel/sdsi: Fix bug in multi packet reads platform/x86/intel/sdsi: Poll on ready bit for writes platform/x86/intel/sdsi: Handle leaky bucket platform/x86: intel-uncore-freq: Prevent driver loading in guests platform/x86: gigabyte-wmi: added support for B660 GAMING X DDR4 motherboard platform/x86: dell-laptop: Add quirk entry for Latitude 7520 platform/x86: asus-wmi: Fix driver not binding when fan curve control probe fails platform/x86: asus-wmi: Potential buffer overflow in asus_wmi_evaluate_method_buf() tools/power/x86/intel-speed-select: fix build failure when using -Wl,--as-needed
2022-04-28Merge tag 'regulator-fix-v5.18-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fix from Mark Brown: "A minor fix for the DT binding documentation of the rt5190a driver" * tag 'regulator-fix-v5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: dt-bindings: Revise the rt5190a buck/ldo description
2022-04-28tcp: fix F-RTO may not work correctly when receiving DSACKPengcheng Yang
Currently DSACK is regarded as a dupack, which may cause F-RTO to incorrectly enter "loss was real" when receiving DSACK. Packetdrill to demonstrate: // Enable F-RTO and TLP 0 `sysctl -q net.ipv4.tcp_frto=2` 0 `sysctl -q net.ipv4.tcp_early_retrans=3` 0 `sysctl -q net.ipv4.tcp_congestion_control=cubic` // Establish a connection +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 // RTT 10ms, RTO 210ms +.1 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> +0 > S. 0:0(0) ack 1 <...> +.01 < . 1:1(0) ack 1 win 257 +0 accept(3, ..., ...) = 4 // Send 2 data segments +0 write(4, ..., 2000) = 2000 +0 > P. 1:2001(2000) ack 1 // TLP +.022 > P. 1001:2001(1000) ack 1 // Continue to send 8 data segments +0 write(4, ..., 10000) = 10000 +0 > P. 2001:10001(8000) ack 1 // RTO +.188 > . 1:1001(1000) ack 1 // The original data is acked and new data is sent(F-RTO step 2.b) +0 < . 1:1(0) ack 2001 win 257 +0 > P. 10001:12001(2000) ack 1 // D-SACK caused by TLP is regarded as a dupack, this results in // the incorrect judgment of "loss was real"(F-RTO step 3.a) +.022 < . 1:1(0) ack 2001 win 257 <sack 1001:2001,nop,nop> // Never-retransmitted data(3001:4001) are acked and // expect to switch to open state(F-RTO step 3.b) +0 < . 1:1(0) ack 4001 win 257 +0 %{ assert tcpi_ca_state == 0, tcpi_ca_state }% Fixes: e33099f96d99 ("tcp: implement RFC5682 F-RTO") Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/1650967419-2150-1-git-send-email-yangpc@wangsu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski
Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Fix incorrect TCP connection tracking window reset for non-syn packets, from Florian Westphal. 2) Incorrect dependency on CONFIG_NFT_FLOW_OFFLOAD, from Volodymyr Mytnyk. 3) Fix nft_socket from the output path, from Florian Westphal. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nft_socket: only do sk lookups when indev is available netfilter: conntrack: fix udp offload timeout sysctl netfilter: nf_conntrack_tcp: re-init for syn packets only ==================== Link: https://lore.kernel.org/r/20220428142109.38726-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-28Merge tag 'gfs2-v5.18-rc4-fix2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fix from Andreas Gruenbacher: - No short reads or writes upon glock contention * tag 'gfs2-v5.18-rc4-fix2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: No short reads or writes upon glock contention
2022-04-28Revert "ibmvnic: Add ethtool private flag for driver-defined queue limits"Dany Madden
This reverts commit 723ad916134784b317b72f3f6cf0f7ba774e5dae When client requests channel or ring size larger than what the server can support the server will cap the request to the supported max. So, the client would not be able to successfully request resources that exceed the server limit. Fixes: 723ad9161347 ("ibmvnic: Add ethtool private flag for driver-defined queue limits") Signed-off-by: Dany Madden <drt@linux.ibm.com> Link: https://lore.kernel.org/r/20220427235146.23189-1-drt@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-28net: enetc: allow tc-etf offload even with NETIF_F_CSUM_MASKVladimir Oltean
The Time-Specified Departure feature is indeed mutually exclusive with TX IP checksumming in ENETC, but TX checksumming in itself is broken and was removed from this driver in commit 82728b91f124 ("enetc: Remove Tx checksumming offload code"). The blamed commit declared NETIF_F_HW_CSUM in dev->features to comply with software TSO's expectations, and still did the checksumming in software by calling skb_checksum_help(). So there isn't any restriction for the Time-Specified Departure feature. However, enetc_setup_tc_txtime() doesn't understand that, and blindly looks for NETIF_F_CSUM_MASK. Instead of checking for things which can literally never happen in the current code base, just remove the check and let the driver offload tc-etf qdiscs. Fixes: acede3c5dad5 ("net: enetc: declare NETIF_F_HW_CSUM and do it in software") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220427203017.1291634-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-28ixgbe: ensure IPsec VF<->PF compatibilityLeon Romanovsky
The VF driver can forward any IPsec flags and such makes the function is not extendable and prone to backward/forward incompatibility. If new software runs on VF, it won't know that PF configured something completely different as it "knows" only XFRM_OFFLOAD_INBOUND flag. Fixes: eda0333ac293 ("ixgbe: add VF IPsec management") Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Shannon Nelson <snelson@pensando.io> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20220427173152.443102-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-28Merge tag 'xfs-5.18-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Dave Chinner: - define buffer bit flags as unsigned to fix gcc-5 + c11 warnings - remove redundant XFS fields from MAINTAINERS - fix inode buffer locking order regression * tag 'xfs-5.18-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: reorder iunlink remove operation in xfs_ifree MAINTAINERS: update IOMAP FILESYSTEM LIBRARY and XFS FILESYSTEM xfs: convert buffer flags to unsigned.
2022-04-28MAINTAINERS: Update BNXT entry with firmware filesFlorian Fainelli
There appears to be a maintainer gap for BNXT TEE firmware files which causes some patches to be missed. Update the entry for the BNXT Ethernet controller with its companion firmware files. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20220427163606.126154-1-f.fainelli@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-28Merge branch 'thermal-int340x'Rafael J. Wysocki
Merge a fix for the attr.show callback prototype in the int340x thermal driver (Kees Cook). * thermal-int340x: thermal: int340x: Fix attr.show callback prototype
2022-04-28Merge tag 'iio-fixes-for-5.18a' of ↵Greg Kroah-Hartman
https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio Pull set of IIO fixes for 5.18 from Jonathan Cameron: "1st set of IIO fixes for the 5.18 cycle ad3552r: - Fix a bug with error codes being stored in unsigned local variable. - Fix IS_ERR when value is either NULL or not rather than ERR_PTR ad5446 - Fix shifting of read_raw value. ad5592r - Fix missing return value being set for a fwnode property read. ad7280a - Wrong variable being used to set thresholds. admv8818 - Kconfig dependency fix. ak8975 - Missing regulator disable in error path. bmi160 - Disable regulators in an error path. dac5571 - Fix chip id detection for devices with OF bindings. inv_icm42600 - Handle a case of a missing I2C NACK during initially configuration. ltc2688 - Fix voltage scaling where integer part was written twice and decimal part not at all. scd4x - Handle error before using value. sx9310 - Device property parsing against indio_dev->dev.of_node which hasn't been set yet. sx9324 - Fix hardware gain related maths. - Wrong defaults for precharge internal resistance register." * tag 'iio-fixes-for-5.18a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio: iio: imu: inv_icm42600: Fix I2C init possible nack iio: dac: ltc2688: fix voltage scale read iio:dac:ad3552r: Fix an IS_ERR() vs NULL check iio: sx9324: Fix default precharge internal resistance register iio: dac: ad5446: Fix read_raw not returning set value iio: magnetometer: ak8975: Fix the error handling in ak8975_power_on() iio:proximity:sx9324: Fix hardware gain read/write iio:proximity:sx_common: Fix device property parsing on DT systems iio: adc: ad7280a: Fix wrong variable used when setting thresholds. iio:filter:admv8818: select REGMAP_SPI for ADMV8818 iio: dac: ad5592r: Fix the missing return value. iio: dac: dac5571: Fix chip id detection for OF devices iio:imu:bmi160: disable regulator in error path iio: scd4x: check return of scd4x_write_and_fetch iio: dac: ad3552r: fix signedness bug in ad3552r_reset()
2022-04-28netfilter: nft_socket: only do sk lookups when indev is availableFlorian Westphal
Check if the incoming interface is available and NFT_BREAK in case neither skb->sk nor input device are set. Because nf_sk_lookup_slow*() assume packet headers are in the 'in' direction, use in postrouting is not going to yield a meaningful result. Same is true for the forward chain, so restrict the use to prerouting, input and output. Use in output work if a socket is already attached to the skb. Fixes: 554ced0a6e29 ("netfilter: nf_tables: add support for native socket matching") Reported-and-tested-by: Topi Miettinen <toiwoton@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-04-28Merge branch 'pm-cpuidle'Rafael J. Wysocki
Merge cpuidle fixes for 5.18-rc5: - Make intel_idle enable C1E promotion on all CPUs when C1E is preferred to C1 (Artem Bityutskiy). - Make C6 optimization on Sapphire Rapids added recently work as expected if both C1E and C1 are "preferred" (Artem Bityutskiy). * pm-cpuidle: intel_idle: Fix SPR C6 optimization intel_idle: Fix the 'preferred_cstates' module parameter
2022-04-28perf symbol: Remove arch__symbols__fixup_end()Namhyung Kim
Now the generic code can handle kallsyms fixup properly so no need to keep the arch-functions anymore. Fixes: 3cf6a32f3f2a4594 ("perf symbols: Fix symbol size calculation condition") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Will Deacon <will@kernel.org> Cc: linux-s390@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20220416004048.1514900-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-28perf symbol: Update symbols__fixup_end()Namhyung Kim
Now arch-specific functions all do the same thing. When it fixes the symbol address it needs to check the boundary between the kernel image and modules. For the last symbol in the previous region, it cannot know the exact size as it's discarded already. Thus it just uses a small page size (4096) and rounds it up like the last symbol. Fixes: 3cf6a32f3f2a4594 ("perf symbols: Fix symbol size calculation condition") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Will Deacon <will@kernel.org> Cc: linux-s390@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20220416004048.1514900-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-28perf symbol: Pass is_kallsyms to symbols__fixup_end()Namhyung Kim
The symbol fixup is necessary for symbols in kallsyms since they don't have size info. So we use the next symbol's address to calculate the size. Now it's also used for user binaries because sometimes they miss size for hand-written asm functions. There's a arch-specific function to handle kallsyms differently but currently it cannot distinguish kallsyms from others. Pass this information explicitly to handle it properly. Note that those arch functions will be moved to the generic function so I didn't added it to the arch-functions. Fixes: 3cf6a32f3f2a4594 ("perf symbols: Fix symbol size calculation condition") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Will Deacon <will@kernel.org> Cc: linux-s390@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20220416004048.1514900-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-28perf test: Add perf_event_attr test for Arm SPETimothy Hayes
Adds a perf_event_attr test for Arm SPE in which the presence of physical addresses are checked when SPE unit is run with pa_enable=1. Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Timothy Hayes <timothy.hayes@arm.com> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: John Garry <john.garry@huawei.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Will Deacon <will@kernel.org> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: netdev@vger.kernel.org Link: https://lore.kernel.org/r/20220421165205.117662-4-timothy.hayes@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-28perf arm-spe: Fix SPE events with phys addressesTimothy Hayes
This patch corrects a bug whereby SPE collection is invoked with pa_enable=1 but synthesized events fail to show physical addresses. Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Timothy Hayes <timothy.hayes@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: John Garry <john.garry@huawei.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Will Deacon <will@kernel.org> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: netdev@vger.kernel.org Link: https://lore.kernel.org/r/20220421165205.117662-3-timothy.hayes@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-28perf arm-spe: Fix addresses of synthesized SPE eventsTimothy Hayes
This patch corrects a bug whereby synthesized events from SPE samples are missing virtual addresses. Fixes: 54f7815efef7fad9 ("perf arm-spe: Fill address info for samples") Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Timothy Hayes <timothy.hayes@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: bpf@vger.kernel.org Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: John Garry <john.garry@huawei.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: linux-arm-kernel@lists.infradead.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: netdev@vger.kernel.org Cc: Song Liu <songliubraving@fb.com> Cc: Will Deacon <will@kernel.org> Cc: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20220421165205.117662-2-timothy.hayes@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-28perf intel-pt: Fix timeless decoding with perf.data directoryAdrian Hunter
Intel PT does not capture data in separate directories, so do not use separate directory processing because it doesn't work for timeless decoding. It also looks like it doesn't support one_mmap handling. Example: Before: # perf record --kcore -a -e intel_pt/tsc=0/k sleep 0.1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.799 MB perf.data ] # perf script --itrace=bep | head # After: # perf script --itrace=bep | head perf 21073 [000] psb: psb offs: 0 ffffffffaa68faf4 native_write_msr+0x4 ([kernel.kallsyms]) perf 21073 [000] cbr: cbr: 45 freq: 4505 MHz (161%) ffffffffaa68faf4 native_write_msr+0x4 ([kernel.kallsyms]) perf 21073 [000] 1 branches:k: 0 [unknown] ([unknown]) => ffffffffaa68faf6 native_write_msr+0x6 ([kernel.kallsyms]) perf 21073 [000] 1 branches:k: ffffffffaa68faf8 native_write_msr+0x8 ([kernel.kallsyms]) => ffffffffaa61aab0 pt_config_start+0x60 ([kernel.kallsyms]) perf 21073 [000] 1 branches:k: ffffffffaa61aabd pt_config_start+0x6d ([kernel.kallsyms]) => ffffffffaa61b8ad pt_event_start+0x27d ([kernel.kallsyms]) perf 21073 [000] 1 branches:k: ffffffffaa61b8bb pt_event_start+0x28b ([kernel.kallsyms]) => ffffffffaa61ba60 pt_event_add+0x40 ([kernel.kallsyms]) perf 21073 [000] 1 branches:k: ffffffffaa61ba76 pt_event_add+0x56 ([kernel.kallsyms]) => ffffffffaa880e86 event_sched_in+0xc6 ([kernel.kallsyms]) perf 21073 [000] 1 branches:k: ffffffffaa880e9b event_sched_in+0xdb ([kernel.kallsyms]) => ffffffffaa880ea5 event_sched_in+0xe5 ([kernel.kallsyms]) perf 21073 [000] 1 branches:k: ffffffffaa880eba event_sched_in+0xfa ([kernel.kallsyms]) => ffffffffaa880f96 event_sched_in+0x1d6 ([kernel.kallsyms]) perf 21073 [000] 1 branches:k: ffffffffaa880fc8 event_sched_in+0x208 ([kernel.kallsyms]) => ffffffffaa880ec0 event_sched_in+0x100 ([kernel.kallsyms]) Fixes: bb6be405c4a2a5 ("perf session: Load data directory files for analysis") Cc: stable@vger.kernel.org Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20220428093109.274641-1-adrian.hunter@intel.com Cc: Ian Rogers <irogers@google.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: linux-kernel@vger.kernel.org
2022-04-28gfs2: No short reads or writes upon glock contentionAndreas Gruenbacher
Commit 00bfe02f4796 ("gfs2: Fix mmap + page fault deadlocks for buffered I/O") changed gfs2_file_read_iter() and gfs2_file_buffered_write() to allow dropping the inode glock while faulting in user buffers. When the lock was dropped, a short result was returned to indicate that the operation was interrupted. As pointed out by Linus (see the link below), this behavior is broken and the operations should always re-acquire the inode glock and resume the operation instead. Link: https://lore.kernel.org/lkml/CAHk-=whaz-g_nOOoo8RRiWNjnv2R+h6_xk2F1J4TuSRxk1MtLw@mail.gmail.com/ Fixes: 00bfe02f4796 ("gfs2: Fix mmap + page fault deadlocks for buffered I/O") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-04-28elf: Fix the arm64 MTE ELF segment name and valueCatalin Marinas
Unfortunately, the name/value choice for the MTE ELF segment type (PT_ARM_MEMTAG_MTE) was pretty poor: LOPROC+1 is already in use by PT_AARCH64_UNWIND, as defined in the AArch64 ELF ABI (https://github.com/ARM-software/abi-aa/blob/main/aaelf64/aaelf64.rst). Update the ELF segment type value to LOPROC+2 and also change the define to PT_AARCH64_MEMTAG_MTE to match the AArch64 ELF ABI namespace. The AArch64 ELF ABI document is updating accordingly (segment type not previously mentioned in the document). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Fixes: 761b9b366cec ("elf: Introduce the ARM MTE ELF segment type") Cc: Will Deacon <will@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Cc: Luis Machado <luis.machado@arm.com> Cc: Richard Earnshaw <Richard.Earnshaw@arm.com> Link: https://lore.kernel.org/r/20220425151833.2603830-1-catalin.marinas@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-04-28Merge tag 'for-net-2022-04-27' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - Fix regression causing some HCI events to be discarded when they shouldn't. * tag 'for-net-2022-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: hci_sync: Cleanup hci_conn if it cannot be aborted Bluetooth: hci_event: Fix creating hci_conn object on error status Bluetooth: hci_event: Fix checking for invalid handle on error status ==================== Link: https://lore.kernel.org/r/20220427234031.1257281-1-luiz.dentz@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-27scsi: target: pscsi: Set SCF_TREAT_READ_AS_NORMAL flag only if there is ↵David Jeffery
valid data With tape devices, the SCF_TREAT_READ_AS_NORMAL flag is used by the target subsystem to mark commands which have both data to return as well as sense data. But with pscsi, SCF_TREAT_READ_AS_NORMAL can be set even if there is no data to return. The SCF_TREAT_READ_AS_NORMAL flag causes the target core to call iscsit data-in callbacks even if there is no data, which iscsit does not support. This results in iscsit going into an error state requiring recovery and being unable to complete the command to the initiator. This issue can be resolved by fixing pscsi to only set SCF_TREAT_READ_AS_NORMAL if there is valid data to return alongside the sense data. Link: https://lore.kernel.org/r/20220427183250.291881-1-djeffery@redhat.com Fixes: bd81372065fa ("scsi: target: transport should handle st FM/EOM/ILI reads") Reported-by: Scott Hamilton <scott.hamilton@atos.net> Tested-by: Laurence Oberman <loberman@redhat.com> Reviewed-by: Laurence Oberman <loberman@redhat.com> Signed-off-by: David Jeffery <djeffery@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-27net: fec: add missing of_node_put() in fec_enet_init_stop_mode()Yang Yingliang
Put device node in error path in fec_enet_init_stop_mode(). Fixes: 8a448bf832af ("net: ethernet: fec: move GPR register offset and bit into DT") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20220426125231.375688-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-27bnx2x: fix napi API usage sequenceManish Chopra
While handling PCI errors (AER flow) driver tries to disable NAPI [napi_disable()] after NAPI is deleted [__netif_napi_del()] which causes unexpected system hang/crash. System message log shows the following: ======================================= [ 3222.537510] EEH: Detected PCI bus error on PHB#384-PE#800000 [ 3222.537511] EEH: This PCI device has failed 2 times in the last hour and will be permanently disabled after 5 failures. [ 3222.537512] EEH: Notify device drivers to shutdown [ 3222.537513] EEH: Beginning: 'error_detected(IO frozen)' [ 3222.537514] EEH: PE#800000 (PCI 0384:80:00.0): Invoking bnx2x->error_detected(IO frozen) [ 3222.537516] bnx2x: [bnx2x_io_error_detected:14236(eth14)]IO error detected [ 3222.537650] EEH: PE#800000 (PCI 0384:80:00.0): bnx2x driver reports: 'need reset' [ 3222.537651] EEH: PE#800000 (PCI 0384:80:00.1): Invoking bnx2x->error_detected(IO frozen) [ 3222.537651] bnx2x: [bnx2x_io_error_detected:14236(eth13)]IO error detected [ 3222.537729] EEH: PE#800000 (PCI 0384:80:00.1): bnx2x driver reports: 'need reset' [ 3222.537729] EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'need reset' [ 3222.537890] EEH: Collect temporary log [ 3222.583481] EEH: of node=0384:80:00.0 [ 3222.583519] EEH: PCI device/vendor: 168e14e4 [ 3222.583557] EEH: PCI cmd/status register: 00100140 [ 3222.583557] EEH: PCI-E capabilities and status follow: [ 3222.583744] EEH: PCI-E 00: 00020010 012c8da2 00095d5e 00455c82 [ 3222.583892] EEH: PCI-E 10: 10820000 00000000 00000000 00000000 [ 3222.583893] EEH: PCI-E 20: 00000000 [ 3222.583893] EEH: PCI-E AER capability register set follows: [ 3222.584079] EEH: PCI-E AER 00: 13c10001 00000000 00000000 00062030 [ 3222.584230] EEH: PCI-E AER 10: 00002000 000031c0 000001e0 00000000 [ 3222.584378] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000 [ 3222.584416] EEH: PCI-E AER 30: 00000000 00000000 [ 3222.584416] EEH: of node=0384:80:00.1 [ 3222.584454] EEH: PCI device/vendor: 168e14e4 [ 3222.584491] EEH: PCI cmd/status register: 00100140 [ 3222.584492] EEH: PCI-E capabilities and status follow: [ 3222.584677] EEH: PCI-E 00: 00020010 012c8da2 00095d5e 00455c82 [ 3222.584825] EEH: PCI-E 10: 10820000 00000000 00000000 00000000 [ 3222.584826] EEH: PCI-E 20: 00000000 [ 3222.584826] EEH: PCI-E AER capability register set follows: [ 3222.585011] EEH: PCI-E AER 00: 13c10001 00000000 00000000 00062030 [ 3222.585160] EEH: PCI-E AER 10: 00002000 000031c0 000001e0 00000000 [ 3222.585309] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000 [ 3222.585347] EEH: PCI-E AER 30: 00000000 00000000 [ 3222.586872] RTAS: event: 5, Type: Platform Error (224), Severity: 2 [ 3222.586873] EEH: Reset without hotplug activity [ 3224.762767] EEH: Beginning: 'slot_reset' [ 3224.762770] EEH: PE#800000 (PCI 0384:80:00.0): Invoking bnx2x->slot_reset() [ 3224.762771] bnx2x: [bnx2x_io_slot_reset:14271(eth14)]IO slot reset initializing... [ 3224.762887] bnx2x 0384:80:00.0: enabling device (0140 -> 0142) [ 3224.768157] bnx2x: [bnx2x_io_slot_reset:14287(eth14)]IO slot reset --> driver unload Uninterruptible tasks ===================== crash> ps | grep UN 213 2 11 c000000004c89e00 UN 0.0 0 0 [eehd] 215 2 0 c000000004c80000 UN 0.0 0 0 [kworker/0:2] 2196 1 28 c000000004504f00 UN 0.1 15936 11136 wickedd 4287 1 9 c00000020d076800 UN 0.0 4032 3008 agetty 4289 1 20 c00000020d056680 UN 0.0 7232 3840 agetty 32423 2 26 c00000020038c580 UN 0.0 0 0 [kworker/26:3] 32871 4241 27 c0000002609ddd00 UN 0.1 18624 11648 sshd 32920 10130 16 c00000027284a100 UN 0.1 48512 12608 sendmail 33092 32987 0 c000000205218b00 UN 0.1 48512 12608 sendmail 33154 4567 16 c000000260e51780 UN 0.1 48832 12864 pickup 33209 4241 36 c000000270cb6500 UN 0.1 18624 11712 sshd 33473 33283 0 c000000205211480 UN 0.1 48512 12672 sendmail 33531 4241 37 c00000023c902780 UN 0.1 18624 11648 sshd EEH handler hung while bnx2x sleeping and holding RTNL lock =========================================================== crash> bt 213 PID: 213 TASK: c000000004c89e00 CPU: 11 COMMAND: "eehd" #0 [c000000004d477e0] __schedule at c000000000c70808 #1 [c000000004d478b0] schedule at c000000000c70ee0 #2 [c000000004d478e0] schedule_timeout at c000000000c76dec #3 [c000000004d479c0] msleep at c0000000002120cc #4 [c000000004d479f0] napi_disable at c000000000a06448 ^^^^^^^^^^^^^^^^ #5 [c000000004d47a30] bnx2x_netif_stop at c0080000018dba94 [bnx2x] #6 [c000000004d47a60] bnx2x_io_slot_reset at c0080000018a551c [bnx2x] #7 [c000000004d47b20] eeh_report_reset at c00000000004c9bc #8 [c000000004d47b90] eeh_pe_report at c00000000004d1a8 #9 [c000000004d47c40] eeh_handle_normal_event at c00000000004da64 And the sleeping source code ============================ crash> dis -ls c000000000a06448 FILE: ../net/core/dev.c LINE: 6702 6697 { 6698 might_sleep(); 6699 set_bit(NAPI_STATE_DISABLE, &n->state); 6700 6701 while (test_and_set_bit(NAPI_STATE_SCHED, &n->state)) * 6702 msleep(1); 6703 while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state)) 6704 msleep(1); 6705 6706 hrtimer_cancel(&n->timer); 6707 6708 clear_bit(NAPI_STATE_DISABLE, &n->state); 6709 } EEH calls into bnx2x twice based on the system log above, first through bnx2x_io_error_detected() and then bnx2x_io_slot_reset(), and executes the following call chains: bnx2x_io_error_detected() +-> bnx2x_eeh_nic_unload() +-> bnx2x_del_all_napi() +-> __netif_napi_del() bnx2x_io_slot_reset() +-> bnx2x_netif_stop() +-> bnx2x_napi_disable() +->napi_disable() Fix this by correcting the sequence of NAPI APIs usage, that is delete the NAPI after disabling it. Fixes: 7fa6f34081f1 ("bnx2x: AER revised") Reported-by: David Christensen <drc@linux.vnet.ibm.com> Tested-by: David Christensen <drc@linux.vnet.ibm.com> Signed-off-by: Manish Chopra <manishc@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Link: https://lore.kernel.org/r/20220426153913.6966-1-manishc@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-27tls: Skip tls_append_frag on zero copy sizeMaxim Mikityanskiy
Calling tls_append_frag when max_open_record_len == record->len might add an empty fragment to the TLS record if the call happens to be on the page boundary. Normally tls_append_frag coalesces the zero-sized fragment to the previous one, but not if it's on page boundary. If a resync happens then, the mlx5 driver posts dump WQEs in tx_post_resync_dump, and the empty fragment may become a data segment with byte_count == 0, which will confuse the NIC and lead to a CQE error. This commit fixes the described issue by skipping tls_append_frag on zero size to avoid adding empty fragments. The fix is not in the driver, because an empty fragment is hardly the desired behavior. Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220426154949.159055-1-maximmi@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-27Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski
Daniel Borkmann says: ==================== pull-request: bpf 2022-04-27 We've added 5 non-merge commits during the last 20 day(s) which contain a total of 6 files changed, 34 insertions(+), 12 deletions(-). The main changes are: 1) Fix xsk sockets when rx and tx are separately bound to the same umem, also fix xsk copy mode combined with busy poll, from Maciej Fijalkowski. 2) Fix BPF tunnel/collect_md helpers with bpf_xmit lwt hook usage which triggered a crash due to invalid metadata_dst access, from Eyal Birger. 3) Fix release of page pool in XDP live packet mode, from Toke Høiland-Jørgensen. 4) Fix potential NULL pointer dereference in kretprobes, from Adam Zabrocki. (Masami & Steven preferred this small fix to be routed via bpf tree given it's follow-up fix to Masami's rethook work that went via bpf earlier, too.) * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: xsk: Fix possible crash when multiple sockets are created kprobes: Fix KRETPROBES when CONFIG_KRETPROBE_ON_RETHOOK is set bpf, lwt: Fix crash when using bpf_skb_set_tunnel_key() from bpf_xmit lwt hook bpf: Fix release of page_pool in BPF_PROG_RUN in test runner xsk: Fix l2fwd for copy mode + busy poll combo ==================== Link: https://lore.kernel.org/r/20220427212748.9576-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-27KVM: arm64: Inject exception on out-of-IPA-range translation faultMarc Zyngier
When taking a translation fault for an IPA that is outside of the range defined by the hypervisor (between the HW PARange and the IPA range), we stupidly treat it as an IO and forward the access to userspace. Of course, userspace can't do much with it, and things end badly. Arguably, the guest is braindead, but we should at least catch the case and inject an exception. Check the faulting IPA against: - the sanitised PARange: inject an address size fault - the IPA size: inject an abort Reported-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-04-27drm/amdgpu: keep mmhub clock gating being enabled during s2idle suspendPrike Liang
Without MMHUB clock gating being enabled then MMHUB will not disconnect from DF and will result in DF C-state entry can't be accessed during S2idle suspend, and eventually s0ix entry will be blocked. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-04-27drm/amd/pm: fix the deadlock issue observed on SIEvan Quan
The adev->pm.mutx is already held at the beginning of amdgpu_dpm_compute_clocks/amdgpu_dpm_enable_uvd/amdgpu_dpm_enable_vce. But on their calling path, amdgpu_display_bandwidth_update will be called and thus its sub functions amdgpu_dpm_get_sclk/mclk. They will then try to acquire the same adev->pm.mutex and deadlock will occur. By placing amdgpu_display_bandwidth_update outside of adev->pm.mutex protection(considering logically they do not need such protection) and restructuring the call flow accordingly, we can eliminate the deadlock issue. This comes with no real logics change. Fixes: 3712e7a49459 ("drm/amd/pm: unified lock protections in amdgpu_dpm.c") Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Reported-by: Arthur Marsh <arthur.marsh@internode.on.net> Link: https://lore.kernel.org/all/9e689fea-6c69-f4b0-8dee-32c4cf7d8f9c@molgen.mpg.de/ BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1957 Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-04-27drm/amd/display: Fix memory leak in dcn21_clock_source_createMiaoqian Lin
When dcn20_clk_src_construct() fails, we need to release clk_src. Fixes: 6f4e6361c3ff ("drm/amd/display: Add Renoir resource (v2)") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>