summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-06-09filemap: Don't release a locked folioMatthew Wilcox (Oracle)
We must hold a reference over the call to filemap_release_folio(), otherwise the page cache will put the last reference to the folio before we unlock it, leading to splats like this: BUG: Bad page state in process u8:5 pfn:1ab1f4 page:ffffea0006ac7d00 refcount:0 mapcount:0 mapping:0000000000000000 index:0x28b1de pfn:0x1ab1f4 flags: 0x17ff80000040001(locked|reclaim|node=0|zone=2|lastcpupid=0xfff) raw: 017ff80000040001 dead000000000100 dead000000000122 0000000000000000 raw: 000000000028b1de 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set It's an error path, so it doesn't see much testing. Reported-by: Darrick J. Wong <djwong@kernel.org> Fixes: a42634a6c07d ("readahead: Use a folio in read_pages()") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-06-09MIPS: Loongson-3: fix compile mips cpu_hwmon as module build error.Yupeng Li
set cpu_hwmon as a module build with loongson_sysconf, loongson_chiptemp undefined error,fix cpu_hwmon compile options to be bool.Some kernel compilation error information is as follows: Checking missing-syscalls for N32 CALL scripts/checksyscalls.sh Checking missing-syscalls for O32 CALL scripts/checksyscalls.sh CALL scripts/checksyscalls.sh CHK include/generated/compile.h CC [M] drivers/platform/mips/cpu_hwmon.o Building modules, stage 2. MODPOST 200 modules ERROR: "loongson_sysconf" [drivers/platform/mips/cpu_hwmon.ko] undefined! ERROR: "loongson_chiptemp" [drivers/platform/mips/cpu_hwmon.ko] undefined! make[1]: *** [scripts/Makefile.modpost:92:__modpost] 错误 1 make: *** [Makefile:1261:modules] 错误 2 Signed-off-by: Yupeng Li <liyupeng@zbhlos.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Huacai Chen <chenhuacai@kernel.org> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2022-06-09Merge tag 'fs_for_v5.19-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext2, writeback, and quota fixes and cleanups from Jan Kara: "A fix for race in writeback code and two cleanups in quota and ext2" * tag 'fs_for_v5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: Prevent memory allocation recursion while holding dq_lock writeback: Fix inode->i_io_list not be protected by inode->i_lock error fs: Fix syntax errors in comments
2022-06-09Merge tag 'powerpc-5.19-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - On 32-bit fix overread/overwrite of thread_struct via ptrace PEEK/POKE. - Fix softirqs not switching to the softirq stack since we moved irq_exit(). - Force thread size increase when KASAN is enabled to avoid stack overflows. - On Book3s 64 mark more code as not to be instrumented by KASAN to avoid crashes. - Exempt __get_wchan() from KASAN checking, as it's inherently racy. - Fix a recently introduced crash in the papr_scm driver in some configurations. - Remove include of <generated/compile.h> which is forbidden. Thanks to Ariel Miculas, Chen Jingwen, Christophe Leroy, Erhard Furtner, He Ying, Kees Cook, Masahiro Yamada, Nageswara R Sastry, Paul Mackerras, Sachin Sant, Vaibhav Jain, and Wanming Hu. * tag 'powerpc-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/32: Fix overread/overwrite of thread_struct via ptrace powerpc/book3e: get rid of #include <generated/compile.h> powerpc/kasan: Force thread size increase with KASAN powerpc/papr_scm: don't requests stats with '0' sized stats buffer powerpc: Don't select HAVE_IRQ_EXIT_ON_IRQ_STACK powerpc/kasan: Silence KASAN warnings in __get_wchan() powerpc/kasan: Mark more real-mode code as not to be instrumented
2022-06-09Merge tag 'net-5.19-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf and netfilter. Current release - regressions: - eth: amt: fix possible null-ptr-deref in amt_rcv() Previous releases - regressions: - tcp: use alloc_large_system_hash() to allocate table_perturb - af_unix: fix a data-race in unix_dgram_peer_wake_me() - nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling - eth: ixgbe: fix unexpected VLAN rx in promisc mode on VF Previous releases - always broken: - ipv6: fix signed integer overflow in __ip6_append_data - netfilter: - nat: really support inet nat without l3 address - nf_tables: memleak flow rule from commit path - bpf: fix calling global functions from BPF_PROG_TYPE_EXT programs - openvswitch: fix misuse of the cached connection on tuple changes - nfc: nfcmrvl: fix memory leak in nfcmrvl_play_deferred - eth: altera: fix refcount leak in altera_tse_mdio_create Misc: - add Quentin Monnet to bpftool maintainers" * tag 'net-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits) net: amd-xgbe: fix clang -Wformat warning tcp: use alloc_large_system_hash() to allocate table_perturb net: dsa: realtek: rtl8365mb: fix GMII caps for ports with internal PHY net: dsa: mv88e6xxx: correctly report serdes link failure net: dsa: mv88e6xxx: fix BMSR error to be consistent with others net: dsa: mv88e6xxx: use BMSR_ANEGCOMPLETE bit for filling an_complete net: altera: Fix refcount leak in altera_tse_mdio_create net: openvswitch: fix misuse of the cached connection on tuple changes net: ethernet: mtk_eth_soc: fix misuse of mem alloc interface netdev[napi]_alloc_frag ip_gre: test csum_start instead of transport header au1000_eth: stop using virt_to_bus() ipv6: Fix signed integer overflow in l2tp_ip6_sendmsg ipv6: Fix signed integer overflow in __ip6_append_data nfc: nfcmrvl: Fix memory leak in nfcmrvl_play_deferred nfc: st21nfca: fix incorrect sizing calculations in EVT_TRANSACTION nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling nfc: st21nfca: fix incorrect validating logic in EVT_TRANSACTION net: ipv6: unexport __init-annotated seg6_hmac_init() net: xfrm: unexport __init-annotated xfrm4_protocol_init() net: mdio: unexport __init-annotated mdio_bus_init() ...
2022-06-09docs: arm: tcm: Fix typo in description of TCM and MMU usageSimon Horman
Correct a typo in the description of interaction between the TCM and MMU. Found by inspection. Signed-off-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20220609184230.627958-1-simon.horman@corigine.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-06-10scripts/check-local-export: avoid 'wait $!' for process substitutionMasahiro Yamada
Bash 4.4, released in 2016, supports 'wait $!' to check the exit status of a process substitution, but it seems too new. Some people using older bash versions (on CentOS 7, Ubuntu 16.04, etc.) reported an error like this: ./scripts/check-local-export: line 54: wait: pid 17328 is not a child of this shell I used the process substitution to avoid a pipeline, which executes each command in a subshell. If the while-loop is executed in the subshell context, variable changes within are lost after the subshell terminates. Fortunately, Bash 4.2, released in 2011, supports the 'lastpipe' option, which makes the last element of a pipeline run in the current shell process. Switch to the pipeline with 'lastpipe' solution, and also set 'pipefail' to catch errors from ${NM}. Add the bash requirement to Documentation/process/changes.rst. Fixes: 31cb50b5590f ("kbuild: check static EXPORT_SYMBOL* by script instead of modpost") Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: Michael Ellerman <mpe@ellerman.id.au> Reported-by: Wang Yugui <wangyugui@e16-tech.com> Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM-14 (x86-64) Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2022-06-09netfs: gcc-12: temporarily disable '-Wattribute-warning' for nowLinus Torvalds
This is a pure band-aid so that I can continue merging stuff from people while some of the gcc-12 fallout gets sorted out. In particular, gcc-12 is very unhappy about the kinds of pointer arithmetic tricks that netfs does, and that makes the fortify checks trigger in afs and ceph: In function ‘fortify_memset_chk’, inlined from ‘netfs_i_context_init’ at include/linux/netfs.h:327:2, inlined from ‘afs_set_netfs_context’ at fs/afs/inode.c:61:2, inlined from ‘afs_root_iget’ at fs/afs/inode.c:543:2: include/linux/fortify-string.h:258:25: warning: call to ‘__write_overflow_field’ declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning] 258 | __write_overflow_field(p_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ and the reason is that netfs_i_context_init() is passed a 'struct inode' pointer, and then it does struct netfs_i_context *ctx = netfs_i_context(inode); memset(ctx, 0, sizeof(*ctx)); where that netfs_i_context() function just does pointer arithmetic on the inode pointer, knowing that the netfs_i_context is laid out immediately after it in memory. This is all truly disgusting, since the whole "netfs_i_context is laid out immediately after it in memory" is not actually remotely true in general, but is just made to be that way for afs and ceph. See for example fs/cifs/cifsglob.h: struct cifsInodeInfo { struct { /* These must be contiguous */ struct inode vfs_inode; /* the VFS's inode record */ struct netfs_i_context netfs_ctx; /* Netfslib context */ }; [...] and realize that this is all entirely wrong, and the pointer arithmetic that netfs_i_context() is doing is also very very wrong and wouldn't give the right answer if netfs_ctx had different alignment rules from a 'struct inode', for example). Anyway, that's just a long-winded way to say "the gcc-12 warning is actually quite reasonable, and our code happens to work but is pretty disgusting". This is getting fixed properly, but for now I made the mistake of thinking "the week right after the merge window tends to be calm for me as people take a breather" and I did a sustem upgrade. And I got gcc-12 as a result, so to continue merging fixes from people and not have the end result drown in warnings, I am fixing all these gcc-12 issues I hit. Including with these kinds of temporary fixes. Cc: Kees Cook <keescook@chromium.org> Cc: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/all/AEEBCF5D-8402-441D-940B-105AA718C71F@chromium.org/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-06-09gcc-12: disable '-Warray-bounds' universally for nowLinus Torvalds
In commit 8b202ee21839 ("s390: disable -Warray-bounds") the s390 people disabled the '-Warray-bounds' warning for gcc-12, because the new logic in gcc would cause warnings for their use of the S390_lowcore macro, which accesses absolute pointers. It turns out gcc-12 has many other issues in this area, so this takes that s390 warning disable logic, and turns it into a kernel build config entry instead. Part of the intent is that we can make this all much more targeted, and use this conflig flag to disable it in only particular configurations that cause problems, with the s390 case as an example: select GCC12_NO_ARRAY_BOUNDS and we could do that for other configuration cases that cause issues. Or we could possibly use the CONFIG_CC_NO_ARRAY_BOUNDS thing in a more targeted way, and disable the warning only for particular uses: again the s390 case as an example: KBUILD_CFLAGS_DECOMPRESSOR += $(if $(CONFIG_CC_NO_ARRAY_BOUNDS),-Wno-array-bounds) but this ends up just doing it globally in the top-level Makefile, since the current issues are spread fairly widely all over: KBUILD_CFLAGS-$(CONFIG_CC_NO_ARRAY_BOUNDS) += -Wno-array-bounds We'll try to limit this later, since the gcc-12 problems are rare enough that *much* of the kernel can be built with it without disabling this warning. Cc: Kees Cook <keescook@chromium.org> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-06-09mellanox: mlx5: avoid uninitialized variable warning with gcc-12Linus Torvalds
gcc-12 started warning about 'tracker' being used uninitialized: drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c: In function ‘mlx5_do_bond’: drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c:786:28: warning: ‘tracker’ is used uninitialized [-Wuninitialized] 786 | struct lag_tracker tracker; | ^~~~~~~ which seems to be because it doesn't track how the use (and initialization) is bound by the 'do_bond' flag. But admittedly that 'do_bond' usage is fairly complicated, and involves passing it around as an argument to helper functions, so it's somewhat understandable that gcc doesn't see how that all works. This function could be rewritten to make the use of that tracker variable more obviously safe, but for now I'm just adding the forced initialization of it. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-06-09irqchip/uniphier-aidet: Add compatible string for NX1 SoCKunihiko Hayashi
Add the compatible string to support UniPhier NX1 SoC, which has the same kinds of controls as the other UniPhier SoCs. Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/1653023822-19229-3-git-send-email-hayashi.kunihiko@socionext.com
2022-06-09dt-bindings: interrupt-controller/uniphier-aidet: Add bindings for NX1 SoCKunihiko Hayashi
Update uniphier-aidet binding document for UniPhier NX1 SoC. Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/1653023822-19229-2-git-send-email-hayashi.kunihiko@socionext.com
2022-06-09gcc-12: disable '-Wdangling-pointer' warning for nowLinus Torvalds
While the concept of checking for dangling pointers to local variables at function exit is really interesting, the gcc-12 implementation is not compatible with reality, and results in false positives. For example, gcc sees us putting things on a local list head allocated on the stack, which involves exactly those kinds of pointers to the local stack entry: In function ‘__list_add’, inlined from ‘list_add_tail’ at include/linux/list.h:102:2, inlined from ‘rebuild_snap_realms’ at fs/ceph/snap.c:434:2: include/linux/list.h:74:19: warning: storing the address of local variable ‘realm_queue’ in ‘*&realm_27(D)->rebuild_item.prev’ [-Wdangling-pointer=] 74 | new->prev = prev; | ~~~~~~~~~~^~~~~~ But then gcc - understandably - doesn't really understand the big picture how the doubly linked list works, so doesn't see how we then end up emptying said list head in a loop and the pointer we added has been removed. Gcc also complains about us (intentionally) using this as a way to store a kind of fake stack trace, eg drivers/acpi/acpica/utdebug.c:40:38: warning: storing the address of local variable ‘current_sp’ in ‘acpi_gbl_entry_stack_pointer’ [-Wdangling-pointer=] 40 | acpi_gbl_entry_stack_pointer = &current_sp; | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~ which is entirely reasonable from a compiler standpoint, and we may want to change those kinds of patterns, but not not. So this is one of those "it would be lovely if the compiler were to complain about us leaving dangling pointers to the stack", but not this way. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-06-09drm: imx: fix compiler warning with gcc-12Linus Torvalds
Gcc-12 correctly warned about this code using a non-NULL pointer as a truth value: drivers/gpu/drm/imx/ipuv3-crtc.c: In function ‘ipu_crtc_disable_planes’: drivers/gpu/drm/imx/ipuv3-crtc.c:72:21: error: the comparison will always evaluate as ‘true’ for the address of ‘plane’ will never be NULL [-Werror=address] 72 | if (&ipu_crtc->plane[1] && plane == &ipu_crtc->plane[1]->base) | ^ due to the extraneous '&' address-of operator. Philipp Zabel points out that The mistake had no adverse effect since the following condition doesn't actually dereference the NULL pointer, but the intent of the code was obviously to check for it, not to take the address of the member. Fixes: eb8c88808c83 ("drm/imx: add deferred plane disabling") Acked-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-06-09irqchip/realtek-rtl: Fix refcount leak in map_interruptsMiaoqian Lin
of_find_node_by_phandle() returns a node pointer with refcount incremented, we should use of_node_put() on it when not need anymore. This function doesn't call of_node_put() in error path. Call of_node_put() directly after of_property_read_u32() to cover both normal path and error path. Fixes: 9f3a0f34b84a ("irqchip: Add support for Realtek RTL838x/RTL839x interrupt controller") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220601080930.31005-7-linmq006@gmail.com
2022-06-09irqchip/gic-v3: Fix refcount leak in gic_populate_ppi_partitionsMiaoqian Lin
of_find_node_by_phandle() returns a node pointer with refcount incremented, we should use of_node_put() on it when not need anymore. Add missing of_node_put() to avoid refcount leak. Fixes: e3825ba1af3a ("irqchip/gic-v3: Add support for partitioned PPIs") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220601080930.31005-6-linmq006@gmail.com
2022-06-09irqchip/gic-v3: Fix error handling in gic_populate_ppi_partitionsMiaoqian Lin
of_get_child_by_name() returns a node pointer with refcount incremented, we should use of_node_put() on it when not need anymore. When kcalloc fails, it missing of_node_put() and results in refcount leak. Fix this by goto out_put_node label. Fixes: 52085d3f2028 ("irqchip/gic-v3: Dynamically allocate PPI partition descriptors") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220601080930.31005-5-linmq006@gmail.com
2022-06-09irqchip/apple-aic: Fix refcount leak in aic_of_ic_initMiaoqian Lin
of_get_child_by_name() returns a node pointer with refcount incremented, we should use of_node_put() on it when not need anymore. Add missing of_node_put() to avoid refcount leak. Fixes: a5e8801202b3 ("irqchip/apple-aic: Parse FIQ affinities from device-tree") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220601080930.31005-4-linmq006@gmail.com
2022-06-09irqchip/apple-aic: Fix refcount leak in build_fiq_affinityMiaoqian Lin
of_find_node_by_phandle() returns a node pointer with refcount incremented, we should use of_node_put() on it when not need anymore. Add missing of_node_put() to avoid refcount leak. Fixes: a5e8801202b3 ("irqchip/apple-aic: Parse FIQ affinities from device-tree") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220601080930.31005-3-linmq006@gmail.com
2022-06-09irqchip/gic/realview: Fix refcount leak in realview_gic_of_initMiaoqian Lin
of_find_matching_node_and_match() returns a node pointer with refcount incremented, we should use of_node_put() on it when not need anymore. Add missing of_node_put() to avoid refcount leak. Fixes: 82b0a434b436 ("irqchip/gic/realview: Support more RealView DCC variants") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220601080930.31005-2-linmq006@gmail.com
2022-06-09irqchip/xilinx: Remove microblaze+zynq dependencyJamie Iles
The Xilinx IRQ controller doesn't really have any architecture dependencies - it's a generic AXI component that can be used for any FPGA core from Zynq hard processor systems to microblaze+riscv soft cores and more. Signed-off-by: Jamie Iles <jamie@jamieiles.com> Acked-by: Michal Simek <michal.simek@amd.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220606213952.298686-1-jamie@jamieiles.com
2022-06-09docs: Move the HTE documentation to driver-api/Jonathan Corbet
The hardware timestamp engine documentation is driver API material, and really belongs in the driver-API book; move it there. Cc: Thierry Reding <treding@nvidia.com> Acked-by: Dipen Patel <dipenp@nvidia.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-06-09iavf: Fix issue with MAC address of VF shown as zeroMichal Wilczynski
After reinitialization of iavf, ice driver gets VIRTCHNL_OP_ADD_ETH_ADDR message with incorrectly set type of MAC address. Hardware address should have is_primary flag set as true. This way ice driver knows what it has to set as a MAC address. Check if the address is primary in iavf_add_filter function and set flag accordingly. To test set all-zero MAC on a VF. This triggers iavf re-initialization and VIRTCHNL_OP_ADD_ETH_ADDR message gets sent to PF. For example: ip link set dev ens785 vf 0 mac 00:00:00:00:00:00 This triggers re-initialization of iavf. New MAC should be assigned. Now check if MAC is non-zero: ip link show dev ens785 Fixes: a3e839d539e0 ("iavf: Add usage of new virtchnl format to set default MAC") Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-09i40e: Fix call trace in setup_tx_descriptorsAleksandr Loktionov
After PF reset and ethtool -t there was call trace in dmesg sometimes leading to panic. When there was some time, around 5 seconds, between reset and test there were no errors. Problem was that pf reset calls i40e_vsi_close in prep_for_reset and ethtool -t calls i40e_vsi_close in diag_test. If there was not enough time between those commands the second i40e_vsi_close starts before previous i40e_vsi_close was done which leads to crash. Add check to diag_test if pf is in reset and don't start offline tests if it is true. Add netif_info("testing failed") into unhappy path of i40e_diag_test() Fixes: e17bc411aea8 ("i40e: Disable offline diagnostics if VFs are enabled") Fixes: 510efb2682b3 ("i40e: Fix ethtool offline diagnostic with netqueues") Signed-off-by: Michal Jaron <michalx.jaron@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-09i40e: Fix calculating the number of queue pairsGrzegorz Szczurek
If ADQ is enabled for a VF, then actual number of queue pair is a number of currently available traffic classes for this VF. Without this change the configuration of the Rx/Tx queues fails with error. Fixes: d29e0d233e0d ("i40e: missing input validation on VF message handling by the PF") Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-09i40e: Fix adding ADQ filter to TC0Grzegorz Szczurek
Procedure of configure tc flower filters erroneously allows to create filters on TC0 where unfiltered packets are also directed by default. Issue was caused by insufficient checks of hw_tc parameter specifying the hardware traffic class to pass matching packets to. Fix checking hw_tc parameter which blocks creation of filters on TC0. Fixes: 2f4b411a3d67 ("i40e: Enable cloud filters via tc-flower") Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-09docs: usb: fix literal block marker in usbmon verification exampleJustin Swartz
The "Verify that bus sockets are present" example was not properly formatted due to a typo in the literal block marker. Signed-off-by: Justin Swartz <justin.swartz@risingedge.co.za> Link: https://lore.kernel.org/r/20220604155431.23246-1-justin.swartz@risingedge.co.za Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-06-09Documentation/features: Update the arch support status filesZheng Zengkai
The arch support status files don't match reality as of v5.19-rc1, use the features-refresh.sh to refresh all the arch-support.txt files in place. The main effect is to add entries for the new loong architecture. Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> Link: https://lore.kernel.org/r/20220609025656.143460-1-zhengzengkai@huawei.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-06-09genirq: PM: Use runtime PM for chained interruptsMarc Zyngier
When requesting an interrupt, we correctly call into the runtime PM framework to guarantee that the underlying interrupt controller is up and running. However, we fail to do so for chained interrupt controllers, as the mux interrupt is not requested along the same path. Augment __irq_do_set_handler() to call into the runtime PM code in this case, making sure the PM flow is the same for all interrupts. Reported-by: Lucas Stach <l.stach@pengutronix.de> Tested-by: Liu Ying <victor.liu@nxp.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/26973cddee5f527ea17184c0f3fccb70bc8969a0.camel@pengutronix.de
2022-06-09KVM: selftests: Restrict test region to 48-bit physical addresses when using ↵David Matlack
nested The selftests nested code only supports 4-level paging at the moment. This means it cannot map nested guest physical addresses with more than 48 bits. Allow perf_test_util nested mode to work on hosts with more than 48 physical addresses by restricting the guest test region to 48-bits. While here, opportunistically fix an off-by-one error when dealing with vm_get_max_gfn(). perf_test_util.c was treating this as the maximum number of GFNs, rather than the maximum allowed GFN. This didn't result in any correctness issues, but it did end up shifting the test region down slightly when using huge pages. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-12-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2David Matlack
Add an option to dirty_log_perf_test that configures the vCPUs to run in L2 instead of L1. This makes it possible to benchmark the dirty logging performance of nested virtualization, which is particularly interesting because KVM must shadow L1's EPT/NPT tables. For now this support only works on x86_64 CPUs with VMX. Otherwise passing -n results in the test being skipped. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-11-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: selftests: Clean up LIBKVM files in MakefileDavid Matlack
Break up the long lines for LIBKVM and alphabetize each architecture. This makes reading the Makefile easier, and will make reading diffs to LIBKVM easier. No functional change intended. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-10-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: selftests: Link selftests directly with lib object filesDavid Matlack
The linker does obey strong/weak symbols when linking static libraries, it simply resolves an undefined symbol to the first-encountered symbol. This means that defining __weak arch-generic functions and then defining arch-specific strong functions to override them in libkvm will not always work. More specifically, if we have: lib/generic.c: void __weak foo(void) { pr_info("weak\n"); } void bar(void) { foo(); } lib/x86_64/arch.c: void foo(void) { pr_info("strong\n"); } And a selftest that calls bar(), it will print "weak". Now if you make generic.o explicitly depend on arch.o (e.g. add function to arch.c that is called directly from generic.c) it will print "strong". In other words, it seems that the linker is free to throw out arch.o when linking because generic.o does not explicitly depend on it, which causes the linker to lose the strong symbol. One solution is to link libkvm.a with --whole-archive so that the linker doesn't throw away object files it thinks are unnecessary. However that is a bit difficult to plumb since we are using the common selftests makefile rules. An easier solution is to drop libkvm.a just link selftests with all the .o files that were originally in libkvm.a. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-9-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: selftests: Drop unnecessary rule for STATIC_LIBSDavid Matlack
Drop the "all: $(STATIC_LIBS)" rule. The KVM selftests already depend on $(STATIC_LIBS), so there is no reason to have an extra "all" rule. Suggested-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-8-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: selftests: Add a helper to check EPT/VPID capabilitiesDavid Matlack
Create a small helper function to check if a given EPT/VPID capability is supported. This will be re-used in a follow-up commit to check for 1G page support. No functional change intended. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-7-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.hDavid Matlack
This is a VMX-related macro so move it to vmx.h. While here, open code the mask like the rest of the VMX bitmask macros. No functional change intended. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-6-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: selftests: Refactor nested_map() to specify target levelDavid Matlack
Refactor nested_map() to specify that it explicityl wants 4K mappings (the existing behavior) and push the implementation down into __nested_map(), which can be used in subsequent commits to create huge page mappings. No function change intended. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-5-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: selftests: Drop stale function parameter comment for nested_map()David Matlack
nested_map() does not take a parameter named eptp_memslot. Drop the comment referring to it. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-4-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: selftests: Add option to create 2M and 1G EPT mappingsDavid Matlack
The current EPT mapping code in the selftests only supports mapping 4K pages. This commit extends that support with an option to map at 2M or 1G. This will be used in a future commit to create large page mappings to test eager page splitting. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-3-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: selftests: Replace x86_page_size with PG_LEVEL_XXDavid Matlack
x86_page_size is an enum used to communicate the desired page size with which to map a range of memory. Under the hood they just encode the desired level at which to map the page. This ends up being clunky in a few ways: - The name suggests it encodes the size of the page rather than the level. - In other places in x86_64/processor.c we just use a raw int to encode the level. Simplify this by adopting the kernel style of PG_LEVEL_XX enums and pass around raw ints when referring to the level. This makes the code easier to understand since these macros are very common in KVM MMU code. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-2-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: x86: SVM: fix nested PAUSE filtering when L0 intercepts PAUSEPaolo Bonzini
Commit 74fd41ed16fd ("KVM: x86: nSVM: support PAUSE filtering when L0 doesn't intercept PAUSE") introduced passthrough support for nested pause filtering, (when the host doesn't intercept PAUSE) (either disabled with kvm module param, or disabled with '-overcommit cpu-pm=on') Before this commit, L1 KVM didn't intercept PAUSE at all; afterwards, the feature was exposed as supported by KVM cpuid unconditionally, thus if L1 could try to use it even when the L0 KVM can't really support it. In this case the fallback caused KVM to intercept each PAUSE instruction; in some cases, such intercept can slow down the nested guest so much that it can fail to boot. Instead, before the problematic commit KVM was already setting both thresholds to 0 in vmcb02, but after the first userspace VM exit shrink_ple_window was called and would reset the pause_filter_count to the default value. To fix this, change the fallback strategy - ignore the guest threshold values, but use/update the host threshold values unless the guest specifically requests disabling PAUSE filtering (either simple or advanced). Also fix a minor bug: on nested VM exit, when PAUSE filter counter were copied back to vmcb01, a dirty bit was not set. Thanks a lot to Suravee Suthikulpanit for debugging this! Fixes: 74fd41ed16fd ("KVM: x86: nSVM: support PAUSE filtering when L0 doesn't intercept PAUSE") Reported-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Co-developed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220518072709.730031-1-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: x86: SVM: drop preempt-safe wrappers for avic_vcpu_load/putMaxim Levitsky
Now that these functions are always called with preemption disabled, remove the preempt_disable()/preempt_enable() pair inside them. No functional change intended. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-8-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: x86: disable preemption around the call to kvm_arch_vcpu_{un|}blockingMaxim Levitsky
On SVM, if preemption happens right after the call to finish_rcuwait but before call to kvm_arch_vcpu_unblocking on SVM/AVIC, it itself will re-enable AVIC, and then we will try to re-enable it again in kvm_arch_vcpu_unblocking which will lead to a warning in __avic_vcpu_load. The same problem can happen if the vCPU is preempted right after the call to kvm_arch_vcpu_blocking but before the call to prepare_to_rcuwait and in this case, we will end up with AVIC enabled during sleep - Ooops. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-7-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: x86: disable preemption while updating apicv inhibitionMaxim Levitsky
Currently nothing prevents preemption in kvm_vcpu_update_apicv. On SVM, If the preemption happens after we update the vcpu->arch.apicv_active, the preemption itself will 'update' the inhibition since the AVIC will be first disabled on vCPU unload and then enabled, when the current task is loaded again. Then we will try to update it again, which will lead to a warning in __avic_vcpu_load, that the AVIC is already enabled. Fix this by disabling preemption in this code. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-6-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: x86: SVM: fix avic_kick_target_vcpus_fastMaxim Levitsky
There are two issues in avic_kick_target_vcpus_fast 1. It is legal to issue an IPI request with APIC_DEST_NOSHORT and a physical destination of 0xFF (or 0xFFFFFFFF in case of x2apic), which must be treated as a broadcast destination. Fix this by explicitly checking for it. Also don’t use ‘index’ in this case as it gives no new information. 2. It is legal to issue a logical IPI request to more than one target. Index field only provides index in physical id table of first such target and therefore can't be used before we are sure that only a single target was addressed. Instead, parse the ICRL/ICRH, double check that a unicast interrupt was requested, and use that info to figure out the physical id of the target vCPU. At that point there is no need to use the index field as well. In addition to fixing the above issues, also skip the call to kvm_apic_match_dest. It is possible to do this now, because now as long as AVIC is not inhibited, it is guaranteed that none of the vCPUs changed their apic id from its default value. This fixes boot of windows guest with AVIC enabled because it uses IPI with 0xFF destination and no destination shorthand. Fixes: 7223fd2d5338 ("KVM: SVM: Use target APIC ID to complete AVIC IRQs when possible") Cc: stable@vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: x86: SVM: remove avic's broken code that updated APIC IDMaxim Levitsky
AVIC is now inhibited if the guest changes the apic id, and therefore this code is no longer needed. There are several ways this code was broken, including: 1. a vCPU was only allowed to change its apic id to an apic id of an existing vCPU. 2. After such change, the vCPU whose apic id entry was overwritten, could not correctly change its own apic id, because its own entry is already overwritten. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC baseMaxim Levitsky
Neither of these settings should be changed by the guest and it is a burden to support it in the acceleration code, so just inhibit this code instead. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: x86: document AVIC/APICv inhibit reasonsMaxim Levitsky
These days there are too many AVIC/APICv inhibit reasons, and it doesn't hurt to have some documentation for them. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: x86/mmu: Set memory encryption "value", not "mask", in shadow PDPTRsYuan Yao
Assign shadow_me_value, not shadow_me_mask, to PAE root entries, a.k.a. shadow PDPTRs, when host memory encryption is supported. The "mask" is the set of all possible memory encryption bits, e.g. MKTME KeyIDs, whereas "value" holds the actual value that needs to be stuffed into host page tables. Using shadow_me_mask results in a failed VM-Entry due to setting reserved PA bits in the PDPTRs, and ultimately causes an OOPS due to physical addresses with non-zero MKTME bits sending to_shadow_page() into the weeds: set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state. BUG: unable to handle page fault for address: ffd43f00063049e8 PGD 86dfd8067 P4D 0 Oops: 0000 [#1] PREEMPT SMP RIP: 0010:mmu_free_root_page+0x3c/0x90 [kvm] kvm_mmu_free_roots+0xd1/0x200 [kvm] __kvm_mmu_unload+0x29/0x70 [kvm] kvm_mmu_unload+0x13/0x20 [kvm] kvm_arch_destroy_vm+0x8a/0x190 [kvm] kvm_put_kvm+0x197/0x2d0 [kvm] kvm_vm_release+0x21/0x30 [kvm] __fput+0x8e/0x260 ____fput+0xe/0x10 task_work_run+0x6f/0xb0 do_exit+0x327/0xa90 do_group_exit+0x35/0xa0 get_signal+0x911/0x930 arch_do_signal_or_restart+0x37/0x720 exit_to_user_mode_prepare+0xb2/0x140 syscall_exit_to_user_mode+0x16/0x30 do_syscall_64+0x4e/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: e54f1ff244ac ("KVM: x86/mmu: Add shadow_me_value and repurpose shadow_me_mask") Signed-off-by: Yuan Yao <yuan.yao@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Message-Id: <20220608012015.19566-1-yuan.yao@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09Merge tag 'kvmarm-fixes-5.19-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.19, take #1 - Properly reset the SVE/SME flags on vcpu load - Fix a vgic-v2 regression regarding accessing the pending state of a HW interrupt from userspace (and make the code common with vgic-v3) - Fix access to the idreg range for protected guests - Ignore 'kvm-arm.mode=protected' when using VHE - Return an error from kvm_arch_init_vm() on allocation failure - A bunch of small cleanups (comments, annotations, indentation)