summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-01-20proc/vmcore: don't fake reading zeroes on surprise vmcore_cb unregistrationDavid Hildenbrand
In commit cc5f2704c934 ("proc/vmcore: convert oldmem_pfn_is_ram callback to more generic vmcore callbacks"), we added detection of surprise vmcore_cb unregistration after the vmcore was already opened. Once detected, we warn the user and simulate reading zeroes from that point on when accessing the vmcore. The basic reason was that unexpected unregistration, for example, by manually unbinding a driver from a device after opening the vmcore, is not supported and could result in reading oldmem the vmcore_cb would have actually prohibited while registered. However, something like that can similarly be trigger by a user that's really looking for trouble simply by unbinding the relevant driver before opening the vmcore -- or by disallowing loading the driver in the first place. So it's actually of limited help. Currently, unregistration can only be triggered via virtio-mem when manually unbinding the driver from the device inside the VM; there is no way to trigger it from the hypervisor, as hypervisors don't allow for unplugging virtio-mem devices -- ripping out system RAM from a VM without coordination with the guest is usually not a good idea. The important part is that unbinding the driver and unregistering the vmcore_cb while concurrently reading the vmcore won't crash the system, and that is handled by the rwsem. To make the mechanism more future proof, let's remove the "read zero" part, but leave the warning in place. For example, we could have a future driver (like virtio-balloon) that will contact the hypervisor to figure out if we already populated a page for a given PFN. Hotunplugging such a device and consequently unregistering the vmcore_cb could be triggered from the hypervisor without harming the system even while kdump is running. In that case, we don't want to silently end up with a vmcore that contains wrong data, because the user inside the VM might be unaware of the hypervisor action and might easily miss the warning in the log. Link: https://lkml.kernel.org/r/20211111192243.22002-1-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Philipp Rudo <prudo@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20mm: percpu: add generic pcpu_populate_pte() functionKefeng Wang
With NEED_PER_CPU_PAGE_FIRST_CHUNK enabled, we need a function to populate pte, this patch adds a generic pcpu populate pte function, pcpu_populate_pte(), which is marked __weak and used on most architectures, but it is overridden on x86, which has its own implementation. Link: https://lkml.kernel.org/r/20211216112359.103822-5-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20mm: percpu: add generic pcpu_fc_alloc/free funcitonKefeng Wang
With the previous patch, we could add a generic pcpu first chunk allocate and free function to cleanup the duplicated definations on each architecture. Link: https://lkml.kernel.org/r/20211216112359.103822-4-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20mm: percpu: add pcpu_fc_cpu_to_node_fn_t typedefKefeng Wang
Add pcpu_fc_cpu_to_node_fn_t and pass it into pcpu_fc_alloc_fn_t, pcpu first chunk allocation will call it to alloc memblock on the corresponding node by it, this is prepare for the next patch. Link: https://lkml.kernel.org/r/20211216112359.103822-3-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20mm: percpu: generalize percpu related configKefeng Wang
Patch series "mm: percpu: Cleanup percpu first chunk function". When supporting page mapping percpu first chunk allocator on arm64, we found there are lots of duplicated codes in percpu embed/page first chunk allocator. This patchset is aimed to cleanup them and should no function change. The currently supported status about 'embed' and 'page' in Archs shows below, embed: NEED_PER_CPU_PAGE_FIRST_CHUNK page: NEED_PER_CPU_EMBED_FIRST_CHUNK embed page ------------------------ arm64 Y Y mips Y N powerpc Y Y riscv Y N sparc Y Y x86 Y Y ------------------------ There are two interfaces about percpu first chunk allocator, extern int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size, size_t atom_size, pcpu_fc_cpu_distance_fn_t cpu_distance_fn, - pcpu_fc_alloc_fn_t alloc_fn, - pcpu_fc_free_fn_t free_fn); + pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn); extern int __init pcpu_page_first_chunk(size_t reserved_size, - pcpu_fc_alloc_fn_t alloc_fn, - pcpu_fc_free_fn_t free_fn, - pcpu_fc_populate_pte_fn_t populate_pte_fn); + pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn); The pcpu_fc_alloc_fn_t/pcpu_fc_free_fn_t is killed, we provide generic pcpu_fc_alloc() and pcpu_fc_free() function, which are called in the pcpu_embed/page_first_chunk(). 1) For pcpu_embed_first_chunk(), pcpu_fc_cpu_to_node_fn_t is needed to be provided when archs supported NUMA. 2) For pcpu_page_first_chunk(), the pcpu_fc_populate_pte_fn_t is killed too, a generic pcpu_populate_pte() which marked '__weak' is provided, if you need a different function to populate pte on the arch(like x86), please provide its own implementation. [1] https://github.com/kevin78/linux.git percpu-cleanup This patch (of 4): The HAVE_SETUP_PER_CPU_AREA/NEED_PER_CPU_EMBED_FIRST_CHUNK/ NEED_PER_CPU_PAGE_FIRST_CHUNK/USE_PERCPU_NUMA_NODE_ID configs, which have duplicate definitions on platforms that subscribe it. Move them into mm, drop these redundant definitions and instead just select it on applicable platforms. Link: https://lkml.kernel.org/r/20211216112359.103822-1-wangkefeng.wang@huawei.com Link: https://lkml.kernel.org/r/20211216112359.103822-2-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Cc: Will Deacon <will@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-19io-wq: delete dead lock shuffling codeJens Axboe
We used to have more code around the work loop, but now the goto and lock juggling just makes it less readable than it should. Get rid of it. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-19clk: mediatek: relicense mt7986 clock driver to GPL-2.0Sam Shih
The previous mt7986 clock drivers were incorrectly marked as GPL-1.0. This patch changes the driver to the standard GPL-2.0 license. Signed-off-by: Sam Shih <sam.shih@mediatek.com> Link: https://lore.kernel.org/r/20220119123658.10095-2-sam.shih@mediatek.com Reported-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2022-01-19ALSA: core: Simplify snd_power_ref_and_wait() with the standard macroTakashi Iwai
Use wait_event_cmd() macro and simplify snd_power_ref_wait() implementation. This may also cover possible races in the current open code, too. Reviewed-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20220119091050.30125-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-01-19Merge branch 'ipv4-avoid-pathological-hash-tables'Jakub Kicinski
Eric Dumazet says: ==================== ipv4: avoid pathological hash tables This series speeds up netns dismantles on hosts having many active netns, by making sure two hash tables used for IPV4 fib contains uniformly spread items. v2: changed second patch to add fib_info_laddrhash_bucket() for consistency (David Ahern suggestion). ==================== Link: https://lore.kernel.org/r/20220119100413.4077866-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-19ipv4: add net_hash_mix() dispersion to fib_info_laddrhash keysEric Dumazet
net/ipv4/fib_semantics.c uses a hash table (fib_info_laddrhash) in which fib_sync_down_addr() can locate fib_info based on IPv4 local address. This hash table is resized based on total number of hashed fib_info, but the hash function is only using the local address. For hosts having many active network namespaces, all fib_info for loopback devices (IPv4 address 127.0.0.1) are hashed into a single bucket, making netns dismantles very slow. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-19ipv4: avoid quadratic behavior in netns dismantleEric Dumazet
net/ipv4/fib_semantics.c uses an hash table of 256 slots, keyed by device ifindexes: fib_info_devhash[DEVINDEX_HASHSIZE] Problem is that with network namespaces, devices tend to use the same ifindex. lo device for instance has a fixed ifindex of one, for all network namespaces. This means that hosts with thousands of netns spend a lot of time looking at some hash buckets with thousands of elements, notably at netns dismantle. Simply add a per netns perturbation (net_hash_mix()) to spread elements more uniformely. Also change fib_devindex_hashfn() to use more entropy. Fixes: aa79e66eee5d ("net: Make ifindex generation per-net namespace") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-19Merge branch 'net-fsl-xgmac_mdio-add-workaround-for-erratum-a-009885'Jakub Kicinski
Tobias Waldekranz says: ==================== net/fsl: xgmac_mdio: Add workaround for erratum A-009885 The individual messages mostly speak for themselves. It is very possible that there are more chips out there that are impacted by this, but I only have access to the errata document for the T1024 family, so I've limited the DT changes to the exact FMan version used in that device. Hopefully someone from NXP can supply a follow-up if need be. The final commit is an unrelated fix that was brought to my attention by sparse. ==================== Link: https://lore.kernel.org/r/20220118215054.2629314-1-tobias@waldekranz.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-19net/fsl: xgmac_mdio: Fix incorrect iounmap when removing moduleTobias Waldekranz
As reported by sparse: In the remove path, the driver would attempt to unmap its own priv pointer - instead of the io memory that it mapped in probe. Fixes: 9f35a7342cff ("net/fsl: introduce Freescale 10G MDIO driver") Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-19powerpc/fsl/dts: Enable WA for erratum A-009885 on fman3l MDIO busesTobias Waldekranz
This block is used in (at least) T1024 and T1040, including their variants like T1023 etc. Fixes: d55ad2967d89 ("powerpc/mpc85xx: Create dts components for the FSL QorIQ DPAA FMan") Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-19dt-bindings: net: Document fsl,erratum-a009885Tobias Waldekranz
Update FMan binding documentation with the newly added workaround for erratum A-009885. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-19net/fsl: xgmac_mdio: Add workaround for erratum A-009885Tobias Waldekranz
Once an MDIO read transaction is initiated, we must read back the data register within 16 MDC cycles after the transaction completes. Outside of this window, reads may return corrupt data. Therefore, disable local interrupts in the critical section, to maximize the probability that we can satisfy this requirement. Fixes: d55ad2967d89 ("powerpc/mpc85xx: Create dts components for the FSL QorIQ DPAA FMan") Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-19HID: uhid: Use READ_ONCE()/WRITE_ONCE() for ->runningJann Horn
The flag uhid->running can be set to false by uhid_device_add_worker() without holding the uhid->devlock. Mark all reads/writes of the flag that might race with READ_ONCE()/WRITE_ONCE() for clarity and correctness. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2022-01-19HID: uhid: Fix worker destroying device without any protectionJann Horn
uhid has to run hid_add_device() from workqueue context while allowing parallel use of the userspace API (which is protected with ->devlock). But hid_add_device() can fail. Currently, that is handled by immediately destroying the associated HID device, without using ->devlock - but if there are concurrent requests from userspace, that's wrong and leads to NULL dereferences and/or memory corruption (via use-after-free). Fix it by leaving the HID device as-is in the worker. We can clean it up later, either in the UHID_DESTROY command handler or in the ->release() handler. Cc: stable@vger.kernel.org Fixes: 67f8ecc550b5 ("HID: uhid: fix timeout when probe races with IO") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2022-01-19net: mscc: ocelot: fix using match before it is setTom Rix
Clang static analysis reports this issue ocelot_flower.c:563:8: warning: 1st function call argument is an uninitialized value !is_zero_ether_addr(match.mask->dst)) { ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The variable match is used before it is set. So move the block. Fixes: 75944fda1dfe ("net: mscc: ocelot: offload ingress skbedit and vlan actions to VCAP IS1") Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-19net: phy: micrel: use kszphy_suspend()/kszphy_resume for irq aware devicesClaudiu Beznea
On a setup with KSZ9131 and MACB drivers it happens on suspend path, from time to time, that the PHY interrupt arrives after PHY and MACB were suspended (PHY via genphy_suspend(), MACB via macb_suspend()). In this case the phy_read() at the beginning of kszphy_handle_interrupt() will fail (as MACB driver is suspended at this time) leading to phy_error() being called and a stack trace being displayed on console. To solve this .suspend/.resume functions for all KSZ devices implementing .handle_interrupt were replaced with kszphy_suspend()/kszphy_resume() which disable/enable interrupt before/after calling genphy_suspend()/genphy_resume(). The fix has been adapted for all KSZ devices which implements .handle_interrupt but it has been tested only on KSZ9131. Fixes: 59ca4e58b917 ("net: phy: micrel: implement generic .handle_interrupt() callback") Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-19net: cpsw: avoid alignment faults by taking NET_IP_ALIGN into accountArd Biesheuvel
Both versions of the CPSW driver declare a CPSW_HEADROOM_NA macro that takes NET_IP_ALIGN into account, but fail to use it appropriately when storing incoming packets in memory. This results in the IPv4 source and destination addresses to appear misaligned in memory, which causes aligment faults that need to be fixed up in software. So let's switch from CPSW_HEADROOM to CPSW_HEADROOM_NA where needed. This gets rid of any alignment faults on the RX path on a Beaglebone White. Fixes: 9ed4050c0d75 ("net: ethernet: ti: cpsw: add XDP support") Cc: Grygorii Strashko <grygorii.strashko@ti.com> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-19nfc: llcp: fix NULL error pointer dereference on sendmsg() after failed bind()Krzysztof Kozlowski
Syzbot detected a NULL pointer dereference of nfc_llcp_sock->dev pointer (which is a 'struct nfc_dev *') with calls to llcp_sock_sendmsg() after a failed llcp_sock_bind(). The message being sent is a SOCK_DGRAM. KASAN report: BUG: KASAN: null-ptr-deref in nfc_alloc_send_skb+0x2d/0xc0 Read of size 4 at addr 00000000000005c8 by task llcp_sock_nfc_a/899 CPU: 5 PID: 899 Comm: llcp_sock_nfc_a Not tainted 5.16.0-rc6-next-20211224-00001-gc6437fbf18b0 #125 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x45/0x59 ? nfc_alloc_send_skb+0x2d/0xc0 __kasan_report.cold+0x117/0x11c ? mark_lock+0x480/0x4f0 ? nfc_alloc_send_skb+0x2d/0xc0 kasan_report+0x38/0x50 nfc_alloc_send_skb+0x2d/0xc0 nfc_llcp_send_ui_frame+0x18c/0x2a0 ? nfc_llcp_send_i_frame+0x230/0x230 ? __local_bh_enable_ip+0x86/0xe0 ? llcp_sock_connect+0x470/0x470 ? llcp_sock_connect+0x470/0x470 sock_sendmsg+0x8e/0xa0 ____sys_sendmsg+0x253/0x3f0 ... The issue was visible only with multiple simultaneous calls to bind() and sendmsg(), which resulted in most of the bind() calls to fail. The bind() was failing on checking if there is available WKS/SDP/SAP (respective bit in 'struct nfc_llcp_local' fields). When there was no available WKS/SDP/SAP, the bind returned error but the sendmsg() to such socket was able to trigger mentioned NULL pointer dereference of nfc_llcp_sock->dev. The code looks simply racy and currently it protects several paths against race with checks for (!nfc_llcp_sock->local) which is NULL-ified in error paths of bind(). The llcp_sock_sendmsg() did not have such check but called function nfc_llcp_send_ui_frame() had, although not protected with lock_sock(). Therefore the race could look like (same socket is used all the time): CPU0 CPU1 ==== ==== llcp_sock_bind() - lock_sock() - success - release_sock() - return 0 llcp_sock_sendmsg() - lock_sock() - release_sock() llcp_sock_bind(), same socket - lock_sock() - error - nfc_llcp_send_ui_frame() - if (!llcp_sock->local) - llcp_sock->local = NULL - nfc_put_device(dev) - dereference llcp_sock->dev - release_sock() - return -ERRNO The nfc_llcp_send_ui_frame() checked llcp_sock->local outside of the lock, which is racy and ineffective check. Instead, its caller llcp_sock_sendmsg(), should perform the check inside lock_sock(). Reported-and-tested-by: syzbot+7f23bcddf626e0593a39@syzkaller.appspotmail.com Fixes: b874dec21d1c ("NFC: Implement LLCP connection less Tx path") Cc: <stable@vger.kernel.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-19Merge branch 'axienet-fixes'David S. Miller
Robert Hancock says: ==================== Xilinx axienet fixes Various fixes for the Xilinx AXI Ethernet driver. Changed since v2: -added Reviewed-by tags, added some explanation to commit messages, no code changes Changed since v1: -corrected a Fixes tag to point to mainline commit -split up reset changes into 3 patches -added ratelimit on netdev_warn in TX busy case ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-19net: axienet: increase default TX ring size to 128Robert Hancock
With previous changes to make the driver handle the TX ring size more correctly, the default TX ring size of 64 appears to significantly bottleneck TX performance to around 600 Mbps on a 1 Gbps link on ZynqMP. Increasing this to 128 seems to bring performance up to near line rate and shouldn't cause excess bufferbloat (this driver doesn't yet support modern byte-based queue management). Fixes: 8a3b7a252dca9 ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Robert Hancock <robert.hancock@calian.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-19net: axienet: fix for TX busy handlingRobert Hancock
Network driver documentation indicates we should be avoiding returning NETDEV_TX_BUSY from ndo_start_xmit in normal cases, since it requires the packets to be requeued. Instead the queue should be stopped after a packet is added to the TX ring when there may not be enough room for an additional one. Also, when TX ring entries are completed, we should only wake the queue if we know there is room for another full maximally fragmented packet. Print a warning if there is insufficient space at the start of start_xmit, since this should no longer happen. Combined with increasing the default TX ring size (in a subsequent patch), this appears to recover the TX performance lost by previous changes to actually manage the TX ring state properly. Fixes: 8a3b7a252dca9 ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Robert Hancock <robert.hancock@calian.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-19net: axienet: fix number of TX ring slots for available checkRobert Hancock
The check for the number of available TX ring slots was off by 1 since a slot is required for the skb header as well as each fragment. This could result in overwriting a TX ring slot that was still in use. Fixes: 8a3b7a252dca9 ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Robert Hancock <robert.hancock@calian.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-19net: axienet: Fix TX ring slot available checkRobert Hancock
The check for whether a TX ring slot was available was incorrect, since a slot which had been loaded with transmit data but the device had not started transmitting would be treated as available, potentially causing non-transmitted slots to be overwritten. The control field in the descriptor should be checked, rather than the status field (which may only be updated when the device completes the entry). Fixes: 8a3b7a252dca9 ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Robert Hancock <robert.hancock@calian.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-19net: axienet: limit minimum TX ring sizeRobert Hancock
The driver will not work properly if the TX ring size is set to below MAX_SKB_FRAGS + 1 since it needs to hold at least one full maximally fragmented packet in the TX ring. Limit setting the ring size to below this value. Fixes: 8b09ca823ffb4 ("net: axienet: Make RX/TX ring sizes configurable") Signed-off-by: Robert Hancock <robert.hancock@calian.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-19net: axienet: add missing memory barriersRobert Hancock
This driver was missing some required memory barriers: Use dma_rmb to ensure we see all updates to the descriptor after we see that an entry has been completed. Use wmb and rmb to avoid stale descriptor status between the TX path and TX complete IRQ path. Fixes: 8a3b7a252dca9 ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Robert Hancock <robert.hancock@calian.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-19net: axienet: reset core on initialization prior to MDIO accessRobert Hancock
In some cases where the Xilinx Ethernet core was used in 1000Base-X or SGMII modes, which use the internal PCS/PMA PHY, and the MGT transceiver clock source for the PCS was not running at the time the FPGA logic was loaded, the core would come up in a state where the PCS could not be found on the MDIO bus. To fix this, the Ethernet core (including the PCS) should be reset after enabling the clocks, prior to attempting to access the PCS using of_mdio_find_device. Fixes: 1a02556086fc (net: axienet: Properly handle PCS/PMA PHY for 1000BaseX mode) Signed-off-by: Robert Hancock <robert.hancock@calian.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-19net: axienet: Wait for PhyRstCmplt after core resetRobert Hancock
When resetting the device, wait for the PhyRstCmplt bit to be set in the interrupt status register before continuing initialization, to ensure that the core is actually ready. When using an external PHY, this also ensures we do not start trying to access the PHY while it is still in reset. The PHY reset is initiated by the core reset which is triggered just above, but remains asserted for 5ms after the core is reset according to the documentation. The MgtRdy bit could also be waited for, but unfortunately when using 7-series devices, the bit does not appear to work as documented (it seems to behave as some sort of link state indication and not just an indication the transceiver is ready) so it can't really be relied on for this purpose. Fixes: 8a3b7a252dca9 ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Robert Hancock <robert.hancock@calian.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-19net: axienet: increase reset timeoutRobert Hancock
The previous timeout of 1ms was too short to handle some cases where the core is reset just after the input clocks were started, which will be introduced in an upcoming patch. Increase the timeout to 50ms. Also simplify the reset timeout checking to use read_poll_timeout. Fixes: 8a3b7a252dca9 ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Robert Hancock <robert.hancock@calian.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-19Merge tag 'f2fs-for-5.17-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've tried to address some performance issues in f2fs_checkpoint and direct IO flows. Also, there was a work to enhance the page cache management used for compression. Other than them, we've done typical work including sysfs, code clean-ups, tracepoint, sanity check, in addition to bug fixes on corner cases. Enhancements: - use iomap for direct IO - try to avoid lock contention to improve f2fs_ckpt speed - avoid unnecessary memory allocation in compression flow - POSIX_FADV_DONTNEED drops the page cache containing compression pages - add some sysfs entries (gc_urgent_high_remaining, pending_discard) Bug fixes: - try not to expose unwritten blocks to user by DIO (this was added to avoid merge conflict; another patch is coming to address other missing case) - relax minor error condition for file pinning feature used in Android OTA - fix potential deadlock case in compression flow - should not truncate any block on pinned file In addition, we've done some code clean-ups and tracepoint/sanity check improvement" * tag 'f2fs-for-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (29 commits) f2fs: do not allow partial truncation on pinned file f2fs: remove redunant invalidate compress pages f2fs: Simplify bool conversion f2fs: don't drop compressed page cache in .{invalidate,release}page f2fs: fix to reserve space for IO align feature f2fs: fix to check available space of CP area correctly in update_ckpt_flags() f2fs: support fault injection to f2fs_trylock_op() f2fs: clean up __find_inline_xattr() with __find_xattr() f2fs: fix to do sanity check on last xattr entry in __f2fs_setxattr() f2fs: do not bother checkpoint by f2fs_get_node_info f2fs: avoid down_write on nat_tree_lock during checkpoint f2fs: compress: fix potential deadlock of compress file f2fs: avoid EINVAL by SBI_NEED_FSCK when pinning a file f2fs: add gc_urgent_high_remaining sysfs node f2fs: fix to do sanity check in is_alive() f2fs: fix to avoid panic in is_alive() if metadata is inconsistent f2fs: fix to do sanity check on inode type during garbage collection f2fs: avoid duplicate call of mark_inode_dirty f2fs: show number of pending discard commands f2fs: support POSIX_FADV_DONTNEED drop compressed page cache ...
2022-01-19Merge tag 'trace-v5.17-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "tracing/scripts: Possible uninitialized variable The 0day bot discovered a possible uninitialized path in the scripts that sort the mcount sections at build time. Just needed to initialize that variable" * tag 'trace-v5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: script/sorttable: Fix some initialization problems
2022-01-19Merge tag 'riscv-for-linus-5.17-mw0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for the DA9063 as used on the HiFive Unmatched. - Support for relative extables, which puts us in line with other architectures and save some space in vmlinux. - A handful of kexec fixes/improvements, including the ability to run crash kernels from PCI-addressable memory on the HiFive Unmatched. - Support for the SBI SRST extension, which allows systems that do not have an explicit driver in Linux to reboot. - A handful of fixes and cleanups, including to the defconfigs and device trees. * tag 'riscv-for-linus-5.17-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (52 commits) RISC-V: Use SBI SRST extension when available riscv: mm: fix wrong phys_ram_base value for RV64 RISC-V: Use common riscv_cpuid_to_hartid_mask() for both SMP=y and SMP=n riscv: head: remove useless __PAGE_ALIGNED_BSS and .balign riscv: errata: alternative: mark vendor_patch_func __initdata riscv: head: make secondary_start_common() static riscv: remove cpu_stop() riscv: try to allocate crashkern region from 32bit addressible memory riscv: use hart id instead of cpu id on machine_kexec riscv: Don't use va_pa_offset on kdump riscv: dts: sifive: fu540-c000: Fix PLIC node riscv: dts: sifive: fu540-c000: Drop bogus soc node compatible values riscv: dts: sifive: Group tuples in register properties riscv: dts: sifive: Group tuples in interrupt properties riscv: dts: microchip: mpfs: Group tuples in interrupt properties riscv: dts: microchip: mpfs: Fix clock controller node riscv: dts: microchip: mpfs: Fix reference clock node riscv: dts: microchip: mpfs: Fix PLIC node riscv: dts: microchip: mpfs: Drop empty chosen node riscv: dts: canaan: Group tuples in interrupt properties ...
2022-01-19Merge tag 'kbuild-v5.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Add new kconfig target 'make mod2noconfig', which will be useful to speed up the build and test iteration. - Raise the minimum supported version of LLVM to 11.0.0 - Refactor certs/Makefile - Change the format of include/config/auto.conf to stop double-quoting string type CONFIG options. - Fix ARCH=sh builds in dash - Separate compression macros for general purposes (cmd_bzip2 etc.) and the ones for decompressors (cmd_bzip2_with_size etc.) - Misc Makefile cleanups * tag 'kbuild-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits) kbuild: add cmd_file_size arch: decompressor: remove useless vmlinux.bin.all-y kbuild: rename cmd_{bzip2,lzma,lzo,lz4,xzkern,zstd22} kbuild: drop $(size_append) from cmd_zstd sh: rename suffix-y to suffix_y doc: kbuild: fix default in `imply` table microblaze: use built-in function to get CPU_{MAJOR,MINOR,REV} certs: move scripts/extract-cert to certs/ kbuild: do not quote string values in include/config/auto.conf kbuild: do not include include/config/auto.conf from shell scripts certs: simplify $(srctree)/ handling and remove config_filename macro kbuild: stop using config_filename in scripts/Makefile.modsign certs: remove misleading comments about GCC PR certs: refactor file cleaning certs: remove unneeded -I$(srctree) option for system_certificates.o certs: unify duplicated cmd_extract_certs and improve the log certs: use $< and $@ to simplify the key generation rule kbuild: remove headers_check stub kbuild: move headers_check.pl to usr/include/ certs: use if_changed to re-generate the key when the key type is changed ...
2022-01-19Merge branch 'random-5.17-rc1-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull random number generator fixes from Jason Donenfeld: - Some Kconfig changes resulted in BIG_KEYS being unselectable, which Justin sent a patch to fix. - Geert pointed out that moving to BLAKE2s bloated vmlinux on little machines, like m68k, so we now compensate for this. - Numerous style and house cleaning fixes, meant to have a cleaner base for future changes. * 'random-5.17-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: random: simplify arithmetic function flow in account() random: selectively clang-format where it makes sense random: access input_pool_data directly rather than through pointer random: cleanup fractional entropy shift constants random: prepend remaining pool constants with POOL_ random: de-duplicate INPUT_POOL constants random: remove unused OUTPUT_POOL constants random: rather than entropy_store abstraction, use global random: remove unused extract_entropy() reserved argument random: remove incomplete last_data logic random: cleanup integer types random: cleanup poolinfo abstraction random: fix typo in comments lib/crypto: sha1: re-roll loops to reduce code size lib/crypto: blake2s: move hmac construction into wireguard lib/crypto: add prompts back to crypto libraries
2022-01-19Merge tag 'hwlock-v5.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux Pull hwspinlock updates from Bjorn Andersson: "This contains a change to the stm32 hwspinlock driver to ensure that the hardware is operational even without CONFIG_PM" * tag 'hwlock-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux: hwspinlock: stm32: enable clock at probe
2022-01-18Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski
Daniel Borkmann says: ==================== pull-request: bpf 2022-01-19 We've added 12 non-merge commits during the last 8 day(s) which contain a total of 12 files changed, 262 insertions(+), 64 deletions(-). The main changes are: 1) Various verifier fixes mainly around register offset handling when passed to helper functions, from Daniel Borkmann. 2) Fix XDP BPF link handling to assert program type, from Toke Høiland-Jørgensen. 3) Fix regression in mount parameter handling for BPF fs, from Yafang Shao. 4) Fix incorrect integer literal when marking scratched stack slots in verifier, from Christy Lee. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf, selftests: Add ringbuf memory type confusion test bpf, selftests: Add various ringbuf tests with invalid offset bpf: Fix ringbuf memory type confusion when passing to helpers bpf: Fix out of bounds access for ringbuf helpers bpf: Generally fix helper register offset check bpf: Mark PTR_TO_FUNC register initially with zero offset bpf: Generalize check_ctx_reg for reuse with other types bpf: Fix incorrect integer literal used for marking scratched stack. bpf/selftests: Add check for updating XDP bpf_link with wrong program type bpf/selftests: convert xdp_link test to ASSERT_* macros xdp: check prog type before updating BPF link bpf: Fix mount source show for bpffs ==================== Link: https://lore.kernel.org/r/20220119011825.9082-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-18io_uring: perform poll removal even if async work removal is successfulJens Axboe
An active work can have poll armed, hence it's not enough to just do the async work removal and return the value if it's different from "not found". Rather than make poll removal special, just fall through to do the remaining type lookups and removals. Reported-by: Florian Fischer <florian.fl.fischer@fau.de> Link: https://lore.kernel.org/io-uring/20220118151337.fac6cthvbnu7icoc@pasture/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-18io-wq: add intermediate work step between pending list and active workJens Axboe
We have a gap where a worker removes an item from the work list and to when it gets added as the workers active work. In this state, the work item cannot be found by cancelations. This is a small window, but it does exist. Add a temporary pointer to a work item that isn't on the pending work list anymore, but also not the active work. This is needed as we need to drop the wqe lock in between grabbing the work item and marking it as active, to ensure that signal based cancelations are properly ordered. Reported-by: Florian Fischer <florian.fl.fischer@fau.de> Link: https://lore.kernel.org/io-uring/20220118151337.fac6cthvbnu7icoc@pasture/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-18io-wq: perform both unstarted and started work cancelations in one goJens Axboe
Rather than split these into two separate lookups and matches, combine them into one loop. This will become important when we can guarantee that we don't have a window where a pending work item isn't discoverable in either state. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-18io-wq: invoke work cancelation with wqe->lock heldJens Axboe
io_wqe_cancel_pending_work() grabs it internally, grab it upfront instead. For the running work cancelation, grab the lock around it as well. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-18io-wq: make io_worker lock a raw spinlockJens Axboe
In preparation to nesting it under the wqe lock (which is raw due to being acquired from the scheduler side), change the io_worker lock from a normal spinlock to a raw spinlock. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-18io-wq: remove useless 'work' argument to __io_worker_busy()Jens Axboe
We don't use 'work' anymore in the busy logic, remove the dead argument. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-19bpf, selftests: Add ringbuf memory type confusion testDaniel Borkmann
Add two tests, one which asserts that ring buffer memory can be passed to other helpers for populating its entry area, and another one where verifier rejects different type of memory passed to bpf_ringbuf_submit(). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
2022-01-19bpf, selftests: Add various ringbuf tests with invalid offsetDaniel Borkmann
Assert that the verifier is rejecting invalid offsets on the ringbuf entries: # ./test_verifier | grep ring #947/u ringbuf: invalid reservation offset 1 OK #947/p ringbuf: invalid reservation offset 1 OK #948/u ringbuf: invalid reservation offset 2 OK #948/p ringbuf: invalid reservation offset 2 OK Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
2022-01-19bpf: Fix ringbuf memory type confusion when passing to helpersDaniel Borkmann
The bpf_ringbuf_submit() and bpf_ringbuf_discard() have ARG_PTR_TO_ALLOC_MEM in their bpf_func_proto definition as their first argument, and thus both expect the result from a prior bpf_ringbuf_reserve() call which has a return type of RET_PTR_TO_ALLOC_MEM_OR_NULL. While the non-NULL memory from bpf_ringbuf_reserve() can be passed to other helpers, the two sinks (bpf_ringbuf_submit(), bpf_ringbuf_discard()) right now only enforce a register type of PTR_TO_MEM. This can lead to potential type confusion since it would allow other PTR_TO_MEM memory to be passed into the two sinks which did not come from bpf_ringbuf_reserve(). Add a new MEM_ALLOC composable type attribute for PTR_TO_MEM, and enforce that: - bpf_ringbuf_reserve() returns NULL or PTR_TO_MEM | MEM_ALLOC - bpf_ringbuf_submit() and bpf_ringbuf_discard() only take PTR_TO_MEM | MEM_ALLOC but not plain PTR_TO_MEM arguments via ARG_PTR_TO_ALLOC_MEM - however, other helpers might treat PTR_TO_MEM | MEM_ALLOC as plain PTR_TO_MEM to populate the memory area when they use ARG_PTR_TO_{UNINIT_,}MEM in their func proto description Fixes: 457f44363a88 ("bpf: Implement BPF ring buffer and verifier support for it") Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
2022-01-19bpf: Fix out of bounds access for ringbuf helpersDaniel Borkmann
Both bpf_ringbuf_submit() and bpf_ringbuf_discard() have ARG_PTR_TO_ALLOC_MEM in their bpf_func_proto definition as their first argument. They both expect the result from a prior bpf_ringbuf_reserve() call which has a return type of RET_PTR_TO_ALLOC_MEM_OR_NULL. Meaning, after a NULL check in the code, the verifier will promote the register type in the non-NULL branch to a PTR_TO_MEM and in the NULL branch to a known zero scalar. Generally, pointer arithmetic on PTR_TO_MEM is allowed, so the latter could have an offset. The ARG_PTR_TO_ALLOC_MEM expects a PTR_TO_MEM register type. However, the non- zero result from bpf_ringbuf_reserve() must be fed into either bpf_ringbuf_submit() or bpf_ringbuf_discard() but with the original offset given it will then read out the struct bpf_ringbuf_hdr mapping. The verifier missed to enforce a zero offset, so that out of bounds access can be triggered which could be used to escalate privileges if unprivileged BPF was enabled (disabled by default in kernel). Fixes: 457f44363a88 ("bpf: Implement BPF ring buffer and verifier support for it") Reported-by: <tr3e.wang@gmail.com> (SecCoder Security Lab) Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
2022-01-19bpf: Generally fix helper register offset checkDaniel Borkmann
Right now the assertion on check_ptr_off_reg() is only enforced for register types PTR_TO_CTX (and open coded also for PTR_TO_BTF_ID), however, this is insufficient since many other PTR_TO_* register types such as PTR_TO_FUNC do not handle/expect register offsets when passed to helper functions. Given this can slip-through easily when adding new types, make this an explicit allow-list and reject all other current and future types by default if this is encountered. Also, extend check_ptr_off_reg() to handle PTR_TO_BTF_ID as well instead of duplicating it. For PTR_TO_BTF_ID, reg->off is used for BTF to match expected BTF ids if struct offset is used. This part still needs to be allowed, but the dynamic off from the tnum must be rejected. Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper") Fixes: eaa6bcb71ef6 ("bpf: Introduce bpf_per_cpu_ptr()") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>