summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-09-08RDMA/bnxt_re: Prefer kcalloc over open coded arithmeticLen Baker
As noted in the "Deprecated Interfaces, Language Features, Attributes, and Conventions" documentation [1], size calculations (especially multiplication) should not be performed in memory allocator (or similar) function arguments due to the risk of them overflowing. This could lead to values wrapping around and a smaller allocation being made than the caller was expecting. Using those allocations could lead to linear overflows of heap memory and other misbehaviors. In this case this is not actually dynamic sizes: both sides of the multiplication are constant values. However it is best to refactor this anyway, just to keep the open-coded math idiom out of code. So, use the purpose specific kcalloc() function instead of the argument size * count in the kzalloc() function. Also, remove the unnecessary initialization of the sqp_tbl variable since it is set a few lines later. [1] https://www.kernel.org/doc/html/v5.14/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments Link: https://lore.kernel.org/r/20210905081812.17113-1-len.baker@gmx.com Signed-off-by: Len Baker <len.baker@gmx.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-08IB/qib: Fix null pointer subtraction compiler warningJason Gunthorpe
>> drivers/infiniband/hw/qib/qib_sysfs.c:411:1: warning: performing pointer subtraction with a null pointer has undefined behavior +[-Wnull-pointer-subtraction] QIB_DIAGC_ATTR(rc_resends); ^~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/infiniband/hw/qib/qib_sysfs.c:408:51: note: expanded from macro 'QIB_DIAGC_ATTR' .counter = &((struct qib_ibport *)0)->rvp.n_##N - (u64 *)0, \ Use offsetof and accomplish the type check using static_assert. Fixes: 4a7aaf88c89f ("RDMA/qib: Use attributes for the port sysfs") Link: https://lore.kernel.org/r/0-v1-43ae3c759177+65-qib_type_jgg@nvidia.com Reported-by: kernel test robot <lkp@intel.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-08RDMA/mlx5: Fix xlt_chunk_align calculationNiklas Schnelle
The XLT chunk alignment depends on ent_size not sizeof(ent_size) aka sizeof(size_t). The incoming ent_size is either 8 or 16, so the miscalculation when 16 is required is only an over-alignment and functional harmless. Fixes: 8010d74b9965 ("RDMA/mlx5: Split the WR setup out of mlx5_ib_update_xlt()") Link: https://lore.kernel.org/r/20210908081849.7948-2-schnelle@linux.ibm.com Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-08RDMA/mlx5: Fix number of allocated XLT entriesNiklas Schnelle
In commit 8010d74b9965b ("RDMA/mlx5: Split the WR setup out of mlx5_ib_update_xlt()") the allocation logic was split out of mlx5_ib_update_xlt() and the logic was changed to enable better OOM handling. Sadly this change introduced a miscalculation of the number of entries that were actually allocated when under memory pressure where it can actually become 0 which on s390 lets dma_map_single() fail. It can also lead to corruption of the free pages list when the wrong number of entries is used in the calculation of sg->length which is used as argument for free_pages(). Fix this by using the allocation size instead of misusing get_order(size). Cc: stable@vger.kernel.org Fixes: 8010d74b9965 ("RDMA/mlx5: Split the WR setup out of mlx5_ib_update_xlt()") Link: https://lore.kernel.org/r/20210908081849.7948-1-schnelle@linux.ibm.com Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-07Merge tag 'pci-v5.15-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Convert controller drivers to generic_handle_domain_irq() (Marc Zyngier) - Simplify VPD (Vital Product Data) access and search (Heiner Kallweit) - Update bnx2, bnx2x, bnxt, cxgb4, cxlflash, sfc, tg3 drivers to use simplified VPD interfaces (Heiner Kallweit) - Run Max Payload Size quirks before configuring MPS; work around ASMedia ASM1062 SATA MPS issue (Marek Behún) Resource management: - Refactor pci_ioremap_bar() and pci_ioremap_wc_bar() (Krzysztof Wilczyński) - Optimize pci_resource_len() to reduce kernel size (Zhen Lei) PCI device hotplug: - Fix a double unmap in ibmphp (Vishal Aslot) PCIe port driver: - Enable Bandwidth Notification only if port supports it (Stuart Hayes) Sysfs/proc/syscalls: - Add schedule point in proc_bus_pci_read() (Krzysztof Wilczyński) - Return ~0 data on pciconfig_read() CAP_SYS_ADMIN failure (Krzysztof Wilczyński) - Return "int" from pciconfig_read() syscall (Krzysztof Wilczyński) Virtualization: - Extend "pci=noats" to also turn on Translation Blocking to protect against some DMA attacks (Alex Williamson) - Add sysfs mechanism to control the type of reset used between device assignments to VMs (Amey Narkhede) - Add support for ACPI _RST reset method (Shanker Donthineni) - Add ACS quirks for Cavium multi-function devices (George Cherian) - Add ACS quirks for NXP LX2xx0 and LX2xx2 platforms (Wasim Khan) - Allow HiSilicon AMBA devices that appear as fake PCI devices to use PASID and SVA (Zhangfei Gao) Endpoint framework: - Add support for SR-IOV Endpoint devices (Kishon Vijay Abraham I) - Zero-initialize endpoint test tool parameters so we don't use random parameters (Shunyong Yang) APM X-Gene PCIe controller driver: - Remove redundant dev_err() call in xgene_msi_probe() (ErKun Yang) Broadcom iProc PCIe controller driver: - Don't fail devm_pci_alloc_host_bridge() on missing 'ranges' because it's optional on BCMA devices (Rob Herring) - Fix BCMA probe resource handling (Rob Herring) Cadence PCIe driver: - Work around J7200 Link training electrical issue by increasing delays in LTSSM (Nadeem Athani) Intel IXP4xx PCI controller driver: - Depend on ARCH_IXP4XX to avoid useless config questions (Geert Uytterhoeven) Intel Keembay PCIe controller driver: - Add Intel Keem Bay PCIe controller (Srikanth Thokala) Marvell Aardvark PCIe controller driver: - Work around config space completion handling issues (Evan Wang) - Increase timeout for config access completions (Pali Rohár) - Emulate CRS Software Visibility bit (Pali Rohár) - Configure resources from DT 'ranges' property to fix I/O space access (Pali Rohár) - Serialize INTx mask/unmask (Pali Rohár) MediaTek PCIe controller driver: - Add MT7629 support in DT (Chuanjia Liu) - Fix an MSI issue (Chuanjia Liu) - Get syscon regmap ("mediatek,generic-pciecfg"), IRQ number ("pci_irq"), PCI domain ("linux,pci-domain") from DT properties if present (Chuanjia Liu) Microsoft Hyper-V host bridge driver: - Add ARM64 support (Boqun Feng) - Support "Create Interrupt v3" message (Sunil Muthuswamy) NVIDIA Tegra PCIe controller driver: - Use seq_puts(), move err_msg from stack to static, fix OF node leak (Christophe JAILLET) NVIDIA Tegra194 PCIe driver: - Disable suspend when in Endpoint mode (Om Prakash Singh) - Fix MSI-X address programming error (Om Prakash Singh) - Disable interrupts during suspend to avoid spurious AER link down (Om Prakash Singh) Renesas R-Car PCIe controller driver: - Work around hardware issue that prevents Link L1->L0 transition (Marek Vasut) - Fix runtime PM refcount leak (Dinghao Liu) Rockchip DesignWare PCIe controller driver: - Add Rockchip RK356X host controller driver (Simon Xue) TI J721E PCIe driver: - Add support for J7200 and AM64 (Kishon Vijay Abraham I) Toshiba Visconti PCIe controller driver: - Add Toshiba Visconti PCIe host controller driver (Nobuhiro Iwamatsu) Xilinx NWL PCIe controller driver: - Enable PCIe reference clock via CCF (Hyun Kwon) Miscellaneous: - Convert sta2x11 from 'pci_' to 'dma_' API (Christophe JAILLET) - Fix pci_dev_str_match_path() alloc while atomic bug (used for kernel parameters that specify devices) (Dan Carpenter) - Remove pointless Precision Time Management warning when PTM is present but not enabled (Jakub Kicinski) - Remove surplus "break" statements (Krzysztof Wilczyński)" * tag 'pci-v5.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (132 commits) PCI: ibmphp: Fix double unmap of io_mem x86/PCI: sta2x11: switch from 'pci_' to 'dma_' API PCI/VPD: Use unaligned access helpers PCI/VPD: Clean up public VPD defines and inline functions cxgb4: Use pci_vpd_find_id_string() to find VPD ID string PCI/VPD: Add pci_vpd_find_id_string() PCI/VPD: Include post-processing in pci_vpd_find_tag() PCI/VPD: Stop exporting pci_vpd_find_info_keyword() PCI/VPD: Stop exporting pci_vpd_find_tag() PCI: Set dma-can-stall for HiSilicon chips PCI: rockchip-dwc: Add Rockchip RK356X host controller driver PCI: dwc: Remove surplus break statement after return PCI: artpec6: Remove local code block from switch statement PCI: artpec6: Remove surplus break statement after return MAINTAINERS: Add entries for Toshiba Visconti PCIe controller PCI: visconti: Add Toshiba Visconti PCIe host controller driver PCI/portdrv: Enable Bandwidth Notification only if port supports it PCI: Allow PASID on fake PCIe devices without TLP prefixes PCI: mediatek: Use PCI domain to handle ports detection PCI: mediatek: Add new method to get irq number ...
2021-09-07kbuild: Only default to -Werror if COMPILE_TESTMarco Elver
The cross-product of the kernel's supported toolchains, architectures, and configuration options is large. So large, that it's generally accepted to be infeasible to enumerate and build+test them all (many compile-testers rely on randomly generated configs). Without the possibility to enumerate all possible combinations of toolchains, architectures, and configuration options, it is inevitable that compiler warnings in this space exist. With -Werror, this means that an innumerable set of kernels are now broken, yet had been perfectly usable before (confused compilers, code with warnings unused, or luck). Distributors will necessarily pick a point in the toolchain X arch X config space, and if unlucky, will have a broken build. Granted, those will likely disable CONFIG_WERROR and move on. The kernel's default configuration is unlikely to be suitable for all users, but it's inappropriate to force many users to set CONFIG_WERROR=n. This also holds for CI systems which are focused on runtime testing, where the odd warning in some subsystem will disrupt testing of the rest of the kernel. Many of those runtime-focused CI systems run tests or fuzz the kernel using runtime debugging tools. Runtime testing of different subsystems can proceed in parallel, and potentially uncover serious bugs; halting runtime testing of the entire kernel because of the odd warning (now error) in a subsystem or driver is simply inappropriate. Therefore, runtime-focused CI systems will likely choose CONFIG_WERROR=n as well. The appropriate usecase for -Werror is therefore compile-test focused builds (often done by developers or CI systems). Reflect this in the Kconfig option by making the default value of WERROR match COMPILE_TEST. Signed-off-by: Marco Elver <elver@google.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Randy Dunlap <rdunlap@infradead.org> Reviwed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-07tracing: Fix some alloc_event_probe() error handling bugsDan Carpenter
There are two bugs in this code. First, if the kzalloc() fails it leads to a NULL dereference of "ep" on the next line. Second, if the alloc_event_probe() function returns an error then it leads to an error pointer dereference in the caller. Link: https://lkml.kernel.org/r/20210824115150.GI31143@kili Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-07blk-mq: allow 4x BLK_MAX_REQUEST_COUNT at blk_plug for multiple_queuesSong Liu
Limiting number of request to BLK_MAX_REQUEST_COUNT at blk_plug hurts performance for large md arrays. [1] shows resync speed of md array drops for md array with more than 16 HDDs. Fix this by allowing more request at plug queue. The multiple_queue flag is used to only apply higher limit to multiple queue cases. [1] https://lore.kernel.org/linux-raid/CAFDAVznS71BXW8Jxv6k9dXc2iR3ysX3iZRBww_rzA8WifBFxGg@mail.gmail.com/ Tested-by: Marcin Wanat <marcin.wanat@gmail.com> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-07Merge tag 'net-5.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes and stragglers from Jakub Kicinski: "Networking stragglers and fixes, including changes from netfilter, wireless and can. Current release - regressions: - qrtr: revert check in qrtr_endpoint_post(), fixes audio and wifi - ip_gre: validate csum_start only on pull - bnxt_en: fix 64-bit doorbell operation on 32-bit kernels - ionic: fix double use of queue-lock, fix a sleeping in atomic - can: c_can: fix null-ptr-deref on ioctl() - cs89x0: disable compile testing on powerpc Current release - new code bugs: - bridge: mcast: fix vlan port router deadlock, consistently disable BH Previous releases - regressions: - dsa: tag_rtl4_a: fix egress tags, only port 0 was working - mptcp: fix possible divide by zero - netfilter: nft_ct: protect nft_ct_pcpu_template_refcnt with mutex - netfilter: socket: icmp6: fix use-after-scope - stmmac: fix MAC not working when system resume back with WoL active Previous releases - always broken: - ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address - seg6: set fc_nlinfo in nh_create_ipv4, nh_create_ipv6 - mptcp: only send extra TCP acks in eligible socket states - dsa: lantiq_gswip: fix maximum frame length - stmmac: fix overall budget calculation for rxtx_napi - bnxt_en: fix firmware version reporting via devlink - renesas: sh_eth: add missing barrier to fix freeing wrong tx descriptor Stragglers: - netfilter: conntrack: switch to siphash - netfilter: refuse insertion if chain has grown too large - ncsi: add get MAC address command to get Intel i210 MAC address" * tag 'net-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (76 commits) ieee802154: Remove redundant initialization of variable ret net: stmmac: fix MAC not working when system resume back with WoL active net: phylink: add suspend/resume support net: renesas: sh_eth: Fix freeing wrong tx descriptor bonding: 3ad: pass parameter bond_params by reference cxgb3: fix oops on module removal can: c_can: fix null-ptr-deref on ioctl() can: rcar_canfd: add __maybe_unused annotation to silence warning net: wwan: iosm: Unify IO accessors used in the driver net: wwan: iosm: Replace io.*64_lo_hi() with regular accessors net: qcom/emac: Replace strlcpy with strscpy ip6_gre: Revert "ip6_gre: add validation for csum_start" net: hns3: make hclgevf_cmd_caps_bit_map0 and hclge_cmd_caps_bit_map0 static selftests/bpf: Test XDP bonding nest and unwind bonding: Fix negative jump label count on nested bonding MAINTAINERS: add VM SOCKETS (AF_VSOCK) entry stmmac: dwmac-loongson:Fix missing return value iwlwifi: fix printk format warnings in uefi.c net: create netdev->dev_addr assignment helpers bnxt_en: Fix possible unintended driver initiated error recovery ...
2021-09-07Merge tag 'linux-watchdog-5.15-rc1' of ↵Linus Torvalds
git://www.linux-watchdog.org/linux-watchdog Pull watchdog updates from Wim Van Sebroeck: - add Mediatek MT7986 & MT8195 wdt support - add Maxim MAX63xx - drop bd70528 support - rewrite ixp4xx to watchdog framework - constify static struct watchdog_ops for sl28cpld_wdt, mpc8xxx_wdt and tqmx86 - introduce watchdog_dev_suspend/resume - several fixes and improvements * tag 'linux-watchdog-5.15-rc1' of git://www.linux-watchdog.org/linux-watchdog: dt-bindings: watchdog: Add compatible for Mediatek MT7986 watchdog: ixp4xx: Rewrite driver to use core watchdog: Start watchdog in watchdog_set_last_hw_keepalive only if appropriate watchdog: max63xx_wdt: Add device tree probing dt-bindings: watchdog: Add Maxim MAX63xx bindings watchdog: mediatek: mt8195: add wdt support dt-bindings: reset: mt8195: add toprgu reset-controller header file watchdog: tqmx86: Constify static struct watchdog_ops watchdog: mpc8xxx_wdt: Constify static struct watchdog_ops watchdog: sl28cpld_wdt: Constify static struct watchdog_ops watchdog: iTCO_wdt: Fix detection of SMI-off case watchdog: bcm2835_wdt: consider system-power-controller property watchdog: imx2_wdg: notify wdog core to stop ping worker on suspend watchdog: introduce watchdog_dev_suspend/resume watchdog: Fix NULL pointer dereference when releasing cdev watchdog: only run driver set_pretimeout op if device supports it watchdog: bd70528 drop bd70528 support
2021-09-07Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "ARM: - Page ownership tracking between host EL1 and EL2 - Rely on userspace page tables to create large stage-2 mappings - Fix incompatibility between pKVM and kmemleak - Fix the PMU reset state, and improve the performance of the virtual PMU - Move over to the generic KVM entry code - Address PSCI reset issues w.r.t. save/restore - Preliminary rework for the upcoming pKVM fixed feature - A bunch of MM cleanups - a vGIC fix for timer spurious interrupts - Various cleanups s390: - enable interpretation of specification exceptions - fix a vcpu_idx vs vcpu_id mixup x86: - fast (lockless) page fault support for the new MMU - new MMU now the default - increased maximum allowed VCPU count - allow inhibit IRQs on KVM_RUN while debugging guests - let Hyper-V-enabled guests run with virtualized LAPIC as long as they do not enable the Hyper-V "AutoEOI" feature - fixes and optimizations for the toggling of AMD AVIC (virtualized LAPIC) - tuning for the case when two-dimensional paging (EPT/NPT) is disabled - bugfixes and cleanups, especially with respect to vCPU reset and choosing a paging mode based on CR0/CR4/EFER - support for 5-level page table on AMD processors Generic: - MMU notifier invalidation callbacks do not take mmu_lock unless necessary - improved caching of LRU kvm_memory_slot - support for histogram statistics - add statistics for halt polling and remote TLB flush requests" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (210 commits) KVM: Drop unused kvm_dirty_gfn_invalid() KVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted KVM: MMU: mark role_regs and role accessors as maybe unused KVM: MIPS: Remove a "set but not used" variable x86/kvm: Don't enable IRQ when IRQ enabled in kvm_wait KVM: stats: Add VM stat for remote tlb flush requests KVM: Remove unnecessary export of kvm_{inc,dec}_notifier_count() KVM: x86/mmu: Move lpage_disallowed_link further "down" in kvm_mmu_page KVM: x86/mmu: Relocate kvm_mmu_page.tdp_mmu_page for better cache locality Revert "KVM: x86: mmu: Add guest physical address check in translate_gpa()" KVM: x86/mmu: Remove unused field mmio_cached in struct kvm_mmu_page kvm: x86: Increase KVM_SOFT_MAX_VCPUS to 710 kvm: x86: Increase MAX_VCPUS to 1024 kvm: x86: Set KVM_MAX_VCPU_ID to 4*KVM_MAX_VCPUS KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation KVM: x86/mmu: Don't freak out if pml5_root is NULL on 4-level host KVM: s390: index kvm->arch.idle_mask by vcpu_idx KVM: s390: Enable specification exception interpretation KVM: arm64: Trim guest debug exception handling KVM: SVM: Add 5-level page table support for SVM ...
2021-09-07putname(): IS_ERR_OR_NULL() is wrong hereAl Viro
Mixing NULL and ERR_PTR() just in case is a Bad Idea(tm). For struct filename the former is wrong - failures are reported as ERR_PTR(...), not as NULL. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-09-07namei: Standardize callers of filename_create()Stephen Brennan
filename_create() has two variants, one which drops the caller's reference to filename (filename_create) and one which does not (__filename_create). This can be confusing as it's unusual to drop a caller's reference. Remove filename_create, rename __filename_create to filename_create, and convert all callers. Link: https://lore.kernel.org/linux-fsdevel/f6238254-35bd-7e97-5b27-21050c745874@oracle.com/ Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-09-07Merge branch 'dmi-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging Pull dmi fix from Jean Delvare. Unbreak some existing udev/hwdb modalias matches due to misplaced product_sku field. * 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging: firmware: dmi: Move product_sku info to the end of the modalias
2021-09-07namei: Standardize callers of filename_lookup()Stephen Brennan
filename_lookup() has two variants, one which drops the caller's reference to filename (filename_lookup), and one which does not (__filename_lookup). This can be confusing as it's unusual to drop a caller's reference. Remove filename_lookup, rename __filename_lookup to filename_lookup, and convert all callers. The cost is a few slightly longer functions, but the clarity is greater. [AV: consuming a reference is not at all unusual, actually; look at e.g. do_mkdirat(), for example. It's more that we want non-consuming variant for close relative of that function...] Link: https://lore.kernel.org/linux-fsdevel/YS+dstZ3xfcLxhoB@zeniv-ca.linux.org.uk/ Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-09-07Merge tag 'ntb-5.15' of git://github.com/jonmason/ntbLinus Torvalds
Pull NTB updates from Jon Mason: "Bug fixes and clean-ups for Linux v5.15" * tag 'ntb-5.15' of git://github.com/jonmason/ntb: NTB: switch from 'pci_' to 'dma_' API ntb: ntb_pingpong: remove redundant initialization of variables msg_data and spad_data NTB: perf: Fix an error code in perf_setup_inbuf() NTB: Fix an error code in ntb_msit_probe() ntb: intel: remove invalid email address in header comment
2021-09-07rename __filename_parentat() to filename_parentat()Al Viro
... in separate commit, to avoid noise in previous one Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-09-07Merge tag 'rproc-v5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc Pull remoteproc updates from Bjorn Andersson: - move the crash recovery worker to the freezable work queue to avoid interaction with other drivers during suspend & resume - fix a couple of typos in comments - add support for handling the audio DSP on SDM660 - fix a race between the Qualcomm wireless subsystem driver and the associated driver for the RF chip * tag 'rproc-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc: remoteproc: q6v5_pas: Add sdm660 ADSP PIL compatible dt-bindings: remoteproc: qcom: adsp: Add SDM660 ADSP remoteproc: use freezable workqueue for crash notifications remoteproc: fix kernel doc for struct rproc_ops remoteproc: fix an typo in fw_elf_get_class code comments remoteproc: qcom: wcnss: Fix race with iris probe
2021-09-07namei: Fix use after free in kern_path_lockedStephen Brennan
In 0ee50b47532a ("namei: change filename_parentat() calling conventions"), filename_parentat() was made to always call putname() on the filename before returning, and kern_path_locked() was migrated to this calling convention. However, kern_path_locked() uses the "last" parameter to lookup and potentially create a new dentry. The last parameter contains the last component of the path and points within the filename, which was recently freed at the end of filename_parentat(). Thus, when kern_path_locked() calls __lookup_hash(), it is using the filename after it has already been freed. In other words, these calling conventions had been wrong for the only remaining caller of filename_parentat(). Everything else is using __filename_parentat(), which does not drop the reference; so should kern_path_locked(). Switch kern_path_locked() to use of __filename_parentat() and move getting/dropping struct filename into wrapper. Remove filename_parentat(), now that we have no remaining callers. Fixes: 0ee50b47532a ("namei: change filename_parentat() calling conventions") Link: https://lore.kernel.org/linux-fsdevel/YS9D4AlEsaCxLFV0@infradead.org/ Link: https://lore.kernel.org/linux-fsdevel/YS+csMTV2tTXKg3s@zeniv-ca.linux.org.uk/ Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Reported-by: syzbot+fb0d60a179096e8c2731@syzkaller.appspotmail.com Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Co-authored-by: Dmitry Kadashev <dkadashev@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-09-07Merge tag 'backlight-next-5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight Pull backlight updates from Lee Jones: "Fix-ups: - Improve bootloader/kernel device handover Bug Fixes: - Stabilise backlight in ktd253 driver" * tag 'backlight-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight: backlight: pwm_bl: Improve bootloader/kernel device handover backlight: ktd253: Stabilize backlight
2021-09-07Merge tag 'mfd-next-5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Pull MFD updates from Lee Jones: "Core Frameworks: - Add support for registering devices via MFD cells to Simple MFD (I2C) New Drivers: - Add support for Renesas Synchronization Management Unit (SMU) New Device Support: - Add support for N5010 to Intel M10 BMC - Add support for Cannon Lake to Intel LPSS ACPI - Add support for Samsung SSG{1,2} to ST-Ericsson's U8500 family - Add support for TQMx110EB and TQMxE40x to TQ-Systems PLD TQMx86 New Functionality: - Add support for GPIO to Intel LPC ICH - Add support for Reset to Texas Instruments TPS65086 Fix-ups: - Trivial, sorting, whitespace, renaming, etc; mt6360-core, db8500-prcmu-regs, tqmx86 - Device Tree fiddling; syscon, axp20x, qcom,pm8008, ti,tps65086, brcm,cru - Use proper APIs for IRQ map resolution; ab8500-core, stmpe, tc3589x, wm8994-irq - Pass 'supplied-from' property through axp288_fuel_gauge via swnode - Remove unused file entry; MAINTAINERS - Make interrupt line optional; tps65086 - Rename db8500-cpuidle driver symbol; db8500-prcmu - Remove support for unused hardware; tqmx86 - Provide a standard LPC clock frequency for unknown boards; tqmx86 - Remove unused code; ti_am335x_tscadc - Use of_iomap() instead of ioremap(); syscon Bug Fixes: - Clear GPIO IRQ resource flags when no IRQ is set; tqmx86 - Fix incorrect/misleading frequencies; db8500-prcmu - Mitigate namespace clash with other GPIOBASE users" * tag 'mfd-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (31 commits) mfd: lpc_sch: Rename GPIOBASE to prevent build error mfd: syscon: Use of_iomap() instead of ioremap() dt-bindings: mfd: Add Broadcom CRU mfd: ti_am335x_tscadc: Delete superfluous error message mfd: tqmx86: Assume 24MHz LPC clock for unknown boards mfd: tqmx86: Add support for TQ-Systems DMI IDs mfd: tqmx86: Add support for TQMx110EB and TQMxE40x mfd: tqmx86: Fix typo in "platform" mfd: tqmx86: Remove incorrect TQMx90UC board ID mfd: tqmx86: Clear GPIO IRQ resource when no IRQ is set mfd: simple-mfd-i2c: Add support for registering devices via MFD cells mfd/cpuidle: ux500: Rename driver symbol mfd: tps65086: Add cell entry for reset driver mfd: tps65086: Make interrupt line optional dt-bindings: mfd: Convert tps65086.txt to YAML MAINTAINERS: Adjust ARM/NOMADIK/Ux500 ARCHITECTURES to file renaming mfd: db8500-prcmu: Handle missing FW variant mfd: db8500-prcmu: Rename register header mfd: axp20x: Add supplied-from property to axp288_fuel_gauge cell mfd: Don't use irq_create_mapping() to resolve a mapping ...
2021-09-07Merge tag 'gpio-updates-for-v5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio updates from Bartosz Golaszewski: "We mostly have various improvements and refactoring all over the place but also some interesting new features - like the virtio GPIO driver that allows guest VMs to use host's GPIOs. We also have a new/old GPIO driver for rockchip - this one has been split out of the pinctrl driver. Summary: - new driver: gpio-virtio allowing a guest VM running linux to access GPIO lines provided by the host - split the GPIO driver out of the rockchip pin control driver - add support for a new model to gpio-aspeed-sgpio, refactor the driver and use generic device property interfaces, improve property sanitization - add ACPI support to gpio-tegra186 - improve the code setting the line names to support multiple GPIO banks per device - constify a bunch of OF functions in the core GPIO code and make the declaration for one of the core OF functions we use consistent within its header - use software nodes in intel_quark_i2c_gpio - add support for the gpio-line-names property in gpio-mt7621 - use the standard GPIO function for setting the GPIO names in gpio-brcmstb - fix a bunch of leaks and other bugs in gpio-mpc8xxx - use generic pm callbacks in gpio-ml-ioh - improve resource management and PM handling in gpio-mlxbf2 - modernize and improve the gpio-dwapb driver - coding style improvements in gpio-rcar - documentation fixes and improvements - update the MAINTAINERS entry for gpio-zynq - minor tweaks in several drivers" * tag 'gpio-updates-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: (35 commits) gpio: mpc8xxx: Use 'devm_gpiochip_add_data()' to simplify the code and avoid a leak gpio: mpc8xxx: Fix a potential double iounmap call in 'mpc8xxx_probe()' gpio: mpc8xxx: Fix a resources leak in the error handling path of 'mpc8xxx_probe()' gpio: viperboard: remove platform_set_drvdata() call in probe gpio: virtio: Add missing mailings lists in MAINTAINERS entry gpio: virtio: Fix sparse warnings gpio: remove the obsolete MX35 3DS BOARD MC9S08DZ60 GPIO functions gpio: max730x: Use the right include gpio: Add virtio-gpio driver gpio: mlxbf2: Use DEFINE_RES_MEM_NAMED() helper macro gpio: mlxbf2: Use devm_platform_ioremap_resource() gpio: mlxbf2: Drop wrong use of ACPI_PTR() gpio: mlxbf2: Convert to device PM ops gpio: dwapb: Get rid of legacy platform data mfd: intel_quark_i2c_gpio: Convert GPIO to use software nodes gpio: dwapb: Read GPIO base from gpio-base property gpio: dwapb: Unify ACPI enumeration checks in get_irq() and configure_irqs() gpiolib: Deduplicate forward declaration in the consumer.h header MAINTAINERS: update gpio-zynq.yaml reference gpio: tegra186: Add ACPI support ...
2021-09-07PM: sleep: core: Avoid setting power.must_resume to falsePrasad Sodagudi
There are variables(power.may_skip_resume and dev->power.must_resume) and DPM_FLAG_MAY_SKIP_RESUME flags to control the resume of devices after a system wide suspend transition. Setting the DPM_FLAG_MAY_SKIP_RESUME flag means that the driver allows its "noirq" and "early" resume callbacks to be skipped if the device can be left in suspend after a system-wide transition into the working state. PM core determines that the driver's "noirq" and "early" resume callbacks should be skipped or not with dev_pm_skip_resume() function by checking power.may_skip_resume variable. power.must_resume variable is getting set to false in __device_suspend() function without checking device's DPM_FLAG_MAY_SKIP_RESUME settings. In problematic scenario, where all the devices in the suspend_late stage are successful and some device can fail to suspend in suspend_noirq phase. So some devices successfully suspended in suspend_late stage are not getting chance to execute __device_suspend_noirq() to set dev->power.must_resume variable to true and not getting resumed in early_resume phase. Add a check for device's DPM_FLAG_MAY_SKIP_RESUME flag before setting power.must_resume variable in __device_suspend function. Fixes: 6e176bf8d461 ("PM: sleep: core: Do not skip callbacks in the resume phase") Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-09-07Merge tag 'fuse-update-5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Allow mounting an active fuse device. Previously the fuse device would always be mounted during initialization, and sharing a fuse superblock was only possible through mount or namespace cloning - Fix data flushing in syncfs (virtiofs only) - Fix data flushing in copy_file_range() - Fix a possible deadlock in atomic O_TRUNC - Misc fixes and cleanups * tag 'fuse-update-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: remove unused arg in fuse_write_file_get() fuse: wait for writepages in syncfs fuse: flush extending writes fuse: truncate pagecache on atomic_o_trunc fuse: allow sharing existing sb fuse: move fget() to fuse_get_tree() fuse: move option checking into fuse_fill_super() fuse: name fs_context consistently fuse: fix use after free in fuse_read_interrupt()
2021-09-07Documentation: power: include kernel-doc in Energy Model docLukasz Luba
Improve the existing documentation of the Energy Model and add kernel-doc comments. This extends an API description and helps better understanding the usage. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-09-07PM: EM: fix kernel-doc commentsLukasz Luba
Fix the kernel-doc comments for the improved Energy Model documentation. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-09-07cpufreq: intel_pstate: hybrid: Rework HWP calibrationRafael J. Wysocki
The current HWP calibration for hybrid processors in intel_pstate is fragile, because it depends too much on the information provided by the platform firmware via CPPC which may not be reliable enough. It also need not be so complicated. In order to improve that mechanism and make it more resistant to platform firmware issues, make it only use the CPPC nominal_perf values to compute the HWP-to-frequency scaling factors for all CPUs and possibly use the HWP_CAP highest_perf values to recompute them if the ones derived from the CPPC nominal_perf values alone appear to be too high. Namely, fetch CPC.nominal_perf for all CPUs present in the system, find the minimum one and use it as a reference for computing all of the CPUs' scaling factors (using the observation that for the CPUs having the minimum CPC.nominal_perf the HWP range of available performance levels should be the same as the range of available "legacy" P-states and so the HWP-to-frequency scaling factor for them should be the same as the corresponding scaling factor used for representing the P-state values in kHz). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Zhang Rui <rui.zhang@intel.com>
2021-09-07ACPI: CPPC: Introduce cppc_get_nominal_perf()Rafael J. Wysocki
On some systems the nominal_perf value retrieved via CPPC is just a constant and fetching it doesn't require accessing any registers, so if it is the only CPPC capability that's needed, it is wasteful to run cppc_get_perf_caps() in order to get just that value alone, especially when this is done for CPUs other than the one running the code. For this reason, introduce cppc_get_nominal_perf() allowing nominal_perf to be obtained individually, by generalizing the existing cppc_get_desired_perf() (and renaming it) so it can be used to retrieve any specific CPPC capability value. While at it, clean up the cppc_get_desired_perf() kerneldoc comment a bit. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-09-07ACPI: scan: Remove unneeded header linux/nls.hKari Argillander
Code that use linux/nls.h was moved to device_sysfs.c by commit c2efefb33abf ("ACPI / scan: Move sysfs-related device code to a separate file") Remove this include so that complier has easier times and it would be easier to grep where nls code is used. Signed-off-by: Kari Argillander <kari.argillander@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-09-07PM: sleep: wakeirq: drop useless parameter from dev_pm_attach_wake_irq()Sergey Shtylyov
This function has the 'irq' parameter which isn't ever used, so drop it. Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-09-07Merge tag 'kgdb-5.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux Pull kgdb updates from Daniel Thompson: "Changes for kgdb/kdb this cycle are dominated by a change from Sumit that removes as small (256K) private heap from kdb. This is change I've hoped for ever since I discovered how few users of this heap remained in the kernel, so many thanks to Sumit for hunting these down. The other change is an incremental step towards SPDX headers" * tag 'kgdb-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux: kernel: debug: Convert to SPDX identifier kdb: Rename members of struct kdbtab_t kdb: Simplify kdb_defcmd macro logic kdb: Get rid of redundant kdb_register_flags() kdb: Rename struct defcmd_set to struct kdb_macro kdb: Get rid of custom debug heap allocator
2021-09-07cxl/registers: Fix Documentation warningDan Williams
Commit 0f06157e0135 ("cxl/core: Move register mapping infrastructure") neglected to add a DOC header for the new drivers/core/regs.c file. Reported-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/163072206675.2250120.3527179192933919995.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-09-07cxl/pmem: Fix Documentation warningDan Williams
Commit 06737cd0d216 ("cxl/core: Move pmem functionality") neglected to add a DOC header for the new drivers/cxl/core/pmem.c file. Reported-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huwei.com> Link: https://lore.kernel.org/r/163072206163.2250120.11486436976516079516.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-09-07cxl/uapi: Fix defined but not used warningsBen Widawsky
Fix unused-const-variable warnings emitted by gcc when cxlmem.h is used by pretty much all files except pci.c Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/163072205652.2250120.16833548560832424468.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-09-07cxl/pci: Fix debug message in cxl_probe_regs()Li Qiang (Johnny Li)
Indicator string for mbox and memdev register set to status incorrectly in error message. Cc: <stable@vger.kernel.org> Fixes: 30af97296f48 ("cxl/pci: Map registers based on capabilities") Signed-off-by: Li Qiang (Johnny Li) <johnny.li@montage-tech.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/163072205089.2250120.8103605864156687395.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-09-07cxl/pci: Fix lockdown levelDan Williams
A proposed rework of security_locked_down() users identified that the cxl_pci driver was passing the wrong lockdown_reason. Update cxl_mem_raw_command_allowed() to fail raw command access when raw pci access is also disabled. Fixes: 13237183c735 ("cxl/mem: Add a "RAW" send command") Cc: Ben Widawsky <ben.widawsky@intel.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: <stable@vger.kernel.org> Cc: Ondrej Mosnacek <omosnace@redhat.com> Cc: Paul Moore <paul@paul-moore.com> Link: https://lore.kernel.org/r/163072204525.2250120.16615792476976546735.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-09-07cxl/acpi: Do not add DSDT disabled ACPI0016 host bridge portsAlison Schofield
During CXL ACPI probe, host bridge ports are discovered by scanning the ACPI0017 root port for ACPI0016 host bridge devices. The scan matches on the hardware id of "ACPI0016". An issue occurs when an ACPI0016 device is defined in the DSDT yet disabled on the platform. Attempts by the cxl_acpi driver to add host bridge ports using a disabled device fails, and the entire cxl_acpi probe fails. The DSDT table includes an _STA method that sets the status and the ACPI subsystem has checks available to examine it. One such check is in the acpi_pci_find_root() path. Move the call to acpi_pci_find_root() to the matching function to prevent this issue when adding either upstream or downstream ports. Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Alison Schofield <alison.schofield@intel.com> Fixes: 7d4b5ca2e2cb ("cxl/acpi: Add downstream port data to cxl_port instances") Cc: <stable@vger.kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/163072203957.2250120.2178685721061002124.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-09-07Revert "memcg: enable accounting for pollfd and select bits arrays"Linus Torvalds
This reverts commit b655843444152c0a14b749308e4cb35d91cbcf0b. Just like with the memcg lock accounting, the kernel test robot reports a sizeable performance regression for this commit, and while it clearly does the rigth thing in theory, we'll need to look at just how to avoid or minimize the performance overhead of the memcg accounting. People already have suggestions on how to do that, but it's "future work". So revert it for now. [ Note: the first link below is for this same commit but a different commit ID, because it's the kernel test robot ended up noticing it in Andrew Morton's patch queue ] Link: https://lore.kernel.org/lkml/20210905132732.GC15026@xsang-OptiPlex-9020/ Link: https://lore.kernel.org/lkml/20210907150757.GE17617@xsang-OptiPlex-9020/ Acked-by: Jens Axboe <axboe@kernel.dk> Acked-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <guro@fb.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-07Revert "memcg: enable accounting for file lock caches"Linus Torvalds
This reverts commit 0f12156dff2862ac54235fc72703f18770769042. The kernel test robot reports a sizeable performance regression for this commit, and while it clearly does the rigth thing in theory, we'll need to look at just how to avoid or minimize the performance overhead of the memcg accounting. People already have suggestions on how to do that, but it's "future work". So revert it for now. Link: https://lore.kernel.org/lkml/20210907150757.GE17617@xsang-OptiPlex-9020/ Acked-by: Jens Axboe <axboe@kernel.dk> Acked-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <guro@fb.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-07Revert "mm/gup: remove try_get_page(), call try_get_compound_head() directly"Linus Torvalds
This reverts commit 9857a17f206ff374aea78bccfb687f145368be2e. That commit was completely broken, and I should have caught on to it earlier. But happily, the kernel test robot noticed the breakage fairly quickly. The breakage is because "try_get_page()" is about avoiding the page reference count overflow case, but is otherwise the exact same as a plain "get_page()". In contrast, "try_get_compound_head()" is an entirely different beast, and uses __page_cache_add_speculative() because it's not just about the page reference count, but also about possibly racing with the underlying page going away. So all the commentary about how "try_get_page() has fallen a little behind in terms of maintenance, try_get_compound_head() handles speculative page references more thoroughly" was just completely wrong: yes, try_get_compound_head() handles speculative page references, but the point is that try_get_page() does not, and must not. So there's no lack of maintainance - there are fundamentally different semantics. A speculative page reference would be entirely wrong in "get_page()", and it's entirely wrong in "try_get_page()". It's not about speculation, it's purely about "uhhuh, you can't get this page because you've tried to increment the reference count too much already". The reason the kernel test robot noticed this bug was that it hit the VM_BUG_ON() in __page_cache_add_speculative(), which is all about verifying that the context of any speculative page access is correct. But since that isn't what try_get_page() is all about, the VM_BUG_ON() tests things that are not correct to test for try_get_page(). Reported-by: kernel test robot <oliver.sang@intel.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-07block: move fs/block_dev.c to block/bdev.cChristoph Hellwig
Move it together with the rest of the block layer. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210907141303.1371844-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-07block: split out operations on block special filesChristoph Hellwig
Add a new block/fops.c for all the file and address_space operations that provide the block special file support. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210907141303.1371844-2-hch@lst.de [axboe: correct trailing whitespace while at it] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-07Merge tag 'nvme-5.15-2021-09-07' of git://git.infradead.org/nvme into block-5.15Jens Axboe
Pull NVMe fixes from Christoph: "nvme fixes for Linux 5.15 - fix nvmet command set reporting for passthrough controllers (Adam Manzanares) - updated - update a MAINTAINERS email address (Chaitanya Kulkarni) - set QUEUE_FLAG_NOWAIT for nvme-multipth (me) - handle errors from add_disk() (Luis Chamberlain) - update the keep alive interval when kato is modified (Tatsuya Sasaki) - fix a buffer overrun in nvmet_subsys_attr_serial (Hannes Reinecke) - do not reset transport on data digest errors in nvme-tcp (Daniel Wagner) - only call synchronize_srcu when clearing current path (Daniel Wagner) - revalidate paths during rescan (Hannes Reinecke)" * tag 'nvme-5.15-2021-09-07' of git://git.infradead.org/nvme: nvme: update MAINTAINERS email address nvme: add error handling support for add_disk() nvme: only call synchronize_srcu when clearing current path nvme: update keep alive interval when kato is modified nvme-tcp: Do not reset transport on data digest errors nvmet: fixup buffer overrun in nvmet_subsys_attr_serial() nvmet: return bool from nvmet_passthru_ctrl and nvmet_is_passthru_req nvmet: looks at the passthrough controller when initializing CAP nvme: move nvme_multi_css into nvme.h nvme-multipath: revalidate paths during rescan nvme-multipath: set QUEUE_FLAG_NOWAIT
2021-09-07blk-throttle: fix UAF by deleteing timer in blk_throtl_exit()Li Jinlin
The pending timer has been set up in blk_throtl_init(). However, the timer is not deleted in blk_throtl_exit(). This means that the timer handler may still be running after freeing the timer, which would result in a use-after-free. Fix by calling del_timer_sync() to delete the timer in blk_throtl_exit(). Signed-off-by: Li Jinlin <lijinlin3@huawei.com> Link: https://lore.kernel.org/r/20210907121242.2885564-1-lijinlin3@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-07block: genhd: don't call blkdev_show() with major_names_lock heldTetsuo Handa
If CONFIG_BLK_DEV_LOOP && CONFIG_MTD (at least; there might be other combinations), lockdep complains circular locking dependency at __loop_clr_fd(), for major_names_lock serves as a locking dependency aggregating hub across multiple block modules. ====================================================== WARNING: possible circular locking dependency detected 5.14.0+ #757 Tainted: G E ------------------------------------------------------ systemd-udevd/7568 is trying to acquire lock: ffff88800f334d48 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x70/0x560 but task is already holding lock: ffff888014a7d4a0 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x4d/0x400 [loop] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #6 (&lo->lo_mutex){+.+.}-{3:3}: lock_acquire+0xbe/0x1f0 __mutex_lock_common+0xb6/0xe10 mutex_lock_killable_nested+0x17/0x20 lo_open+0x23/0x50 [loop] blkdev_get_by_dev+0x199/0x540 blkdev_open+0x58/0x90 do_dentry_open+0x144/0x3a0 path_openat+0xa57/0xda0 do_filp_open+0x9f/0x140 do_sys_openat2+0x71/0x150 __x64_sys_openat+0x78/0xa0 do_syscall_64+0x3d/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #5 (&disk->open_mutex){+.+.}-{3:3}: lock_acquire+0xbe/0x1f0 __mutex_lock_common+0xb6/0xe10 mutex_lock_nested+0x17/0x20 bd_register_pending_holders+0x20/0x100 device_add_disk+0x1ae/0x390 loop_add+0x29c/0x2d0 [loop] blk_request_module+0x5a/0xb0 blkdev_get_no_open+0x27/0xa0 blkdev_get_by_dev+0x5f/0x540 blkdev_open+0x58/0x90 do_dentry_open+0x144/0x3a0 path_openat+0xa57/0xda0 do_filp_open+0x9f/0x140 do_sys_openat2+0x71/0x150 __x64_sys_openat+0x78/0xa0 do_syscall_64+0x3d/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #4 (major_names_lock){+.+.}-{3:3}: lock_acquire+0xbe/0x1f0 __mutex_lock_common+0xb6/0xe10 mutex_lock_nested+0x17/0x20 blkdev_show+0x19/0x80 devinfo_show+0x52/0x60 seq_read_iter+0x2d5/0x3e0 proc_reg_read_iter+0x41/0x80 vfs_read+0x2ac/0x330 ksys_read+0x6b/0xd0 do_syscall_64+0x3d/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #3 (&p->lock){+.+.}-{3:3}: lock_acquire+0xbe/0x1f0 __mutex_lock_common+0xb6/0xe10 mutex_lock_nested+0x17/0x20 seq_read_iter+0x37/0x3e0 generic_file_splice_read+0xf3/0x170 splice_direct_to_actor+0x14e/0x350 do_splice_direct+0x84/0xd0 do_sendfile+0x263/0x430 __se_sys_sendfile64+0x96/0xc0 do_syscall_64+0x3d/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #2 (sb_writers#3){.+.+}-{0:0}: lock_acquire+0xbe/0x1f0 lo_write_bvec+0x96/0x280 [loop] loop_process_work+0xa68/0xc10 [loop] process_one_work+0x293/0x480 worker_thread+0x23d/0x4b0 kthread+0x163/0x180 ret_from_fork+0x1f/0x30 -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}: lock_acquire+0xbe/0x1f0 process_one_work+0x280/0x480 worker_thread+0x23d/0x4b0 kthread+0x163/0x180 ret_from_fork+0x1f/0x30 -> #0 ((wq_completion)loop0){+.+.}-{0:0}: validate_chain+0x1f0d/0x33e0 __lock_acquire+0x92d/0x1030 lock_acquire+0xbe/0x1f0 flush_workqueue+0x8c/0x560 drain_workqueue+0x80/0x140 destroy_workqueue+0x47/0x4f0 __loop_clr_fd+0xb4/0x400 [loop] blkdev_put+0x14a/0x1d0 blkdev_close+0x1c/0x20 __fput+0xfd/0x220 task_work_run+0x69/0xc0 exit_to_user_mode_prepare+0x1ce/0x1f0 syscall_exit_to_user_mode+0x26/0x60 do_syscall_64+0x4c/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae other info that might help us debug this: Chain exists of: (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&lo->lo_mutex); lock(&disk->open_mutex); lock(&lo->lo_mutex); lock((wq_completion)loop0); *** DEADLOCK *** 2 locks held by systemd-udevd/7568: #0: ffff888012554128 (&disk->open_mutex){+.+.}-{3:3}, at: blkdev_put+0x4c/0x1d0 #1: ffff888014a7d4a0 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x4d/0x400 [loop] stack backtrace: CPU: 0 PID: 7568 Comm: systemd-udevd Tainted: G E 5.14.0+ #757 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020 Call Trace: dump_stack_lvl+0x79/0xbf print_circular_bug+0x5d6/0x5e0 ? stack_trace_save+0x42/0x60 ? save_trace+0x3d/0x2d0 check_noncircular+0x10b/0x120 validate_chain+0x1f0d/0x33e0 ? __lock_acquire+0x953/0x1030 ? __lock_acquire+0x953/0x1030 __lock_acquire+0x92d/0x1030 ? flush_workqueue+0x70/0x560 lock_acquire+0xbe/0x1f0 ? flush_workqueue+0x70/0x560 flush_workqueue+0x8c/0x560 ? flush_workqueue+0x70/0x560 ? sched_clock_cpu+0xe/0x1a0 ? drain_workqueue+0x41/0x140 drain_workqueue+0x80/0x140 destroy_workqueue+0x47/0x4f0 ? blk_mq_freeze_queue_wait+0xac/0xd0 __loop_clr_fd+0xb4/0x400 [loop] ? __mutex_unlock_slowpath+0x35/0x230 blkdev_put+0x14a/0x1d0 blkdev_close+0x1c/0x20 __fput+0xfd/0x220 task_work_run+0x69/0xc0 exit_to_user_mode_prepare+0x1ce/0x1f0 syscall_exit_to_user_mode+0x26/0x60 do_syscall_64+0x4c/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f0fd4c661f7 Code: 00 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 13 fc ff ff RSP: 002b:00007ffd1c9e9fd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 00007f0fd46be6c8 RCX: 00007f0fd4c661f7 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000006 RBP: 0000000000000006 R08: 000055fff1eaf400 R09: 0000000000000000 R10: 00007f0fd46be6c8 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000002f08 R15: 00007ffd1c9ea050 Commit 1c500ad706383f1a ("loop: reduce the loop_ctl_mutex scope") is for breaking "loop_ctl_mutex => &lo->lo_mutex" dependency chain. But enabling a different block module results in forming circular locking dependency due to shared major_names_lock mutex. The simplest fix is to call probe function without holding major_names_lock [1], but Christoph Hellwig does not like such idea. Therefore, instead of holding major_names_lock in blkdev_show(), introduce a different lock for blkdev_show() in order to break "sb_writers#$N => &p->lock => major_names_lock" dependency chain. Link: https://lkml.kernel.org/r/b2af8a5b-3c1b-204e-7f56-bea0b15848d6@i-love.sakura.ne.jp [1] Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: https://lore.kernel.org/r/18a02da2-0bf3-550e-b071-2b4ab13c49f0@i-love.sakura.ne.jp Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-07Merge branch 'cpufreq/arm/linux-next' of ↵Rafael J. Wysocki
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Pull more ARM cpufreq changes for v5.15-rc1 from Viresh Kumar: "This adds a new cpufreq driver for Mediatek, which had been going through reviews since last one year." * 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: cpufreq: mediatek-hw: Add support for CPUFREQ HW cpufreq: Add of_perf_domain_get_sharing_cpumask dt-bindings: cpufreq: add bindings for MediaTek cpufreq HW
2021-09-07Revert "cpufreq: intel_pstate: Process HWP Guaranteed change notification"Rafael J. Wysocki
Revert commit d0e936adbd22 ("cpufreq: intel_pstate: Process HWP Guaranteed change notification"), because it causes a NULL pointer dereference to occur on Lenovo X1 gen9 laptops due to an HWP guaranteed performance change interrupt arriving prematurely. This feature will be revisited in the next cycle. Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-09-07ieee802154: Remove redundant initialization of variable retColin Ian King
The variable ret is being initialized with a value that is never read, it is being updated later on. The assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-07Merge branch 'stmmac-wol-fix'David S. Miller
Joakim Zhang says: ==================== net: stmmac: fix WoL issue This patch set fixes stmmac not working after system resume back with WoL active. Thanks a lot for Russell King keeps looking into this issue. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-07net: stmmac: fix MAC not working when system resume back with WoL activeJoakim Zhang
We can reproduce this issue with below steps: 1) enable WoL on the host 2) host system suspended 3) remote client send out wakeup packets We can see that host system resume back, but can't work, such as ping failed. After a bit digging, this issue is introduced by the commit 46f69ded988d ("net: stmmac: Use resolved link config in mac_link_up()"), which use the finalised link parameters in mac_link_up() rather than the parameters in mac_config(). There are two scenarios for MAC suspend/resume in STMMAC driver: 1) MAC suspend with WoL inactive, stmmac_suspend() call phylink_mac_change() to notify phylink machine that a change in MAC state, then .mac_link_down callback would be invoked. Further, it will call phylink_stop() to stop the phylink instance. When MAC resume back, firstly phylink_start() is called to start the phylink instance, then call phylink_mac_change() which will finally trigger phylink machine to invoke .mac_config and .mac_link_up callback. All is fine since configuration in these two callbacks will be initialized, that means MAC can restore the state. 2) MAC suspend with WoL active, phylink_mac_change() will put link down, but there is no phylink_stop() to stop the phylink instance, so it will link up again, that means .mac_config and .mac_link_up would be invoked before system suspended. After system resume back, it will do DMA initialization and SW reset which let MAC lost the hardware setting (i.e MAC_Configuration register(offset 0x0) is reset). Since link is up before system suspended, so .mac_link_up would not be invoked after system resume back, lead to there is no chance to initialize the configuration in .mac_link_up callback, as a result, MAC can't work any longer. After discussed with Russell King [1], we confirm that phylink framework have not take WoL into consideration yet. This patch calls phylink_suspend()/phylink_resume() functions which is newly introduced by Russell King to fix this issue. [1] https://lore.kernel.org/netdev/20210901090228.11308-1-qiangqing.zhang@nxp.com/ Fixes: 46f69ded988d ("net: stmmac: Use resolved link config in mac_link_up()") Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>