summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-12-05drm/v3d: Enable Performance Counters before clearing themMaíra Canal
On the Raspberry Pi 5, performance counters are not being cleared when `v3d_perfmon_start()` is called, even though we write to the CLR register. As a result, their values accumulate until they overflow. The expected behavior is for performance counters to reset to zero at the start of a job. When the job finishes and the perfmon is stopped, the counters should accurately reflect the values for that specific job. To ensure this behavior, the performance counters are now enabled before being cleared. This allows the CLR register to function as intended, zeroing the counter values when the job begins. Fixes: 26a4dc29b74a ("drm/v3d: Expose performance counters to userspace") Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241204122831.17015-1-mcanal@igalia.com
2024-12-05arm64: cpufeature: Add GCS to cpucap_is_possible()Robin Murphy
Since system_supports_gcs() ends up referring to cpucap_is_possible(), teach the latter about GCS for consistency with similar features. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/416c7369fcdce4ebb2a8f12daae234507be27e38.1733406275.git.robin.murphy@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-12-05Merge tag 'nvme-6.13-2024-12-05' of git://git.infradead.org/nvme into block-6.13Jens Axboe
Pull NVMe fixess from Keith: "nvme fixes for Linux 6.13 - Target fix using incorrect zero buffer (Nilay) - Device specifc deallocate quirk fixes (Christoph, Keith) - Fabrics fix for handling max command target bugs (Maurizio) - Cocci fix usage for kzalloc (Yu-Chen) - DMA size fix for host memory buffer feature (Christoph) - Fabrics queue cleanup fixes (Chunguang)" * tag 'nvme-6.13-2024-12-05' of git://git.infradead.org/nvme: nvme-tcp: simplify nvme_tcp_teardown_io_queues() nvme-tcp: no need to quiesce admin_q in nvme_tcp_teardown_io_queues() nvme-rdma: unquiesce admin_q before destroy it nvme-tcp: fix the memleak while create new ctrl failed nvme-pci: don't use dma_alloc_noncontiguous with 0 merge boundary nvmet: replace kmalloc + memset with kzalloc for data allocation nvme-fabrics: handle zero MAXCMD without closing the connection nvme-pci: remove two deallocate zeroes quirks nvme: don't apply NVME_QUIRK_DEALLOCATE_ZEROES when DSM is not supported nvmet: use kzalloc instead of ZERO_PAGE in nvme_execute_identify_ns_nvm()
2024-12-05Merge tag 'asoc-fix-v6.13-rc1' of ↵Takashi Iwai
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v6.13 A few small fixes for v6.13, all system specific - the biggest thing is the fix for jack handling over suspend on some Intel laptops.
2024-12-05virtio-blk: don't keep queue frozen during system suspendMing Lei
Commit 4ce6e2db00de ("virtio-blk: Ensure no requests in virtqueues before deleting vqs.") replaces queue quiesce with queue freeze in virtio-blk's PM callbacks. And the motivation is to drain inflight IOs before suspending. block layer's queue freeze looks very handy, but it is also easy to cause deadlock, such as, any attempt to call into bio_queue_enter() may run into deadlock if the queue is frozen in current context. There are all kinds of ->suspend() called in suspend context, so keeping queue frozen in the whole suspend context isn't one good idea. And Marek reported lockdep warning[1] caused by virtio-blk's freeze queue in virtblk_freeze(). [1] https://lore.kernel.org/linux-block/ca16370e-d646-4eee-b9cc-87277c89c43c@samsung.com/ Given the motivation is to drain in-flight IOs, it can be done by calling freeze & unfreeze, meantime restore to previous behavior by keeping queue quiesced during suspend. Cc: Yi Sun <yi.sun@unisoc.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: virtualization@lists.linux.dev Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20241112125821.1475793-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-05clocksource: Make negative motion detection more robustThomas Gleixner
Guenter reported boot stalls on a emulated ARM 32-bit platform, which has a 24-bit wide clocksource. It turns out that the calculated maximal idle time, which limits idle sleeps to prevent clocksource wrap arounds, is close to the point where the negative motion detection triggers. max_idle_ns: 597268854 ns negative motion tripping point: 671088640 ns If the idle wakeup is delayed beyond that point, the clocksource advances far enough to trigger the negative motion detection. This prevents the clock to advance and in the worst case the system stalls completely if the consecutive sleeps based on the stale clock are delayed as well. Cure this by calculating a more robust cut-off value for negative motion, which covers 87.5% of the actual clocksource counter width. Compare the delta against this value to catch negative motion. This is specifically for clock sources with a small counter width as their wrap around time is close to the half counter width. For clock sources with wide counters this is not a problem because the maximum idle time is far from the half counter width due to the math overflow protection constraints. For the case at hand this results in a tripping point of 1174405120ns. Note, that this cannot prevent issues when the delay exceeds the 87.5% margin, but that's not different from the previous unchecked version which allowed arbitrary time jumps. Systems with small counter width are prone to invalid results, but this problem is unlikely to be seen on real hardware. If such a system completely stalls for more than half a second, then there are other more urgent problems than the counter wrapping around. Fixes: c163e40af9b2 ("timekeeping: Always check for negative motion") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/all/8734j5ul4x.ffs@tglx Closes: https://lore.kernel.org/all/387b120b-d68a-45e8-b6ab-768cd95d11c2@roeck-us.net
2024-12-05coco: virt: arm64: Do not enable cca guest driver by defaultSuzuki K Poulose
As per the guidelines, new drivers may not be set to default on. An expert user can always select it. Reported-by: Dan Williams <dan.j.williams@intel.com> Cc: Will Deacon <will@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: Sami Mujawar <sami.mujawar@arm.com> Link: https://lore.kernel.org/r/6750c695194cd_2508129427@dwillia2-xfh.jf.intel.com.notmuch Link: https://lore.kernel.org/r/20241205143634.306114-1-suzuki.poulose@arm.com Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-12-05drm/dp_mst: Use reset_msg_rx_state() instead of open coding itImre Deak
Use reset_msg_rx_state() in drm_dp_mst_handle_up_req() instead of open-coding it. Cc: Lyude Paul <lyude@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241203160223.2926014-8-imre.deak@intel.com
2024-12-05tracing: Fix archs that still call tracepoints without RCU watchingSteven Rostedt
Tracepoints require having RCU "watching" as it uses RCU to do updates to the tracepoints. There are some cases that would call a tracepoint when RCU was not "watching". This was usually in the idle path where RCU has "shutdown". For the few locations that had tracepoints without RCU watching, there was an trace_*_rcuidle() variant that could be used. This used SRCU for protection. There are tracepoints that trace when interrupts and preemption are enabled and disabled. In some architectures, these tracepoints are called in a path where RCU is not watching. When x86 and arm64 removed these locations, it was incorrectly assumed that it would be safe to remove the trace_*_rcuidle() variant and also remove the SRCU logic, as it made the code more complex and harder to implement new tracepoint features (like faultable tracepoints and tracepoints in rust). Instead of bringing back the trace_*_rcuidle(), as it will not be trivial to do as new code has already been added depending on its removal, add a workaround to the one file that still requires it (trace_preemptirq.c). If the architecture does not define CONFIG_ARCH_WANTS_NO_INSTR, then check if the code is in the idle path, and if so, call ct_irq_enter/exit() which will enable RCU around the tracepoint. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/20241204100414.4d3e06d0@gandalf.local.home Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Fixes: 48bcda684823 ("tracing: Remove definition of trace_*_rcuidle()") Closes: https://lore.kernel.org/all/bddb02de-957a-4df5-8e77-829f55728ea2@roeck-us.net/ Acked-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-12-05drm/dp_mst: Reset message rx state after OOM in drm_dp_mst_handle_up_req()Imre Deak
After an out-of-memory error the reception state should be reset, so that the next attempt receiving a message doesn't fail (due to getting a start-of-message packet, while the reception state has already the start-of-message flag set). Cc: Lyude Paul <lyude@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241203160223.2926014-7-imre.deak@intel.com
2024-12-05drm/dp_mst: Ensure mst_primary pointer is valid in drm_dp_mst_handle_up_req()Imre Deak
While receiving an MST up request message from one thread in drm_dp_mst_handle_up_req(), the MST topology could be removed from another thread via drm_dp_mst_topology_mgr_set_mst(false), freeing mst_primary and setting drm_dp_mst_topology_mgr::mst_primary to NULL. This could lead to a NULL deref/use-after-free of mst_primary in drm_dp_mst_handle_up_req(). Avoid the above by holding a reference for mst_primary in drm_dp_mst_handle_up_req() while it's used. v2: Fix kfreeing the request if getting an mst_primary reference fails. Cc: Lyude Paul <lyude@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com> (v1) Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241204132007.3132494-1-imre.deak@intel.com
2024-12-05drm/dp_mst: Fix down request message timeout handlingImre Deak
If receiving a reply for an MST down request message times out, the thread receiving the reply in drm_dp_mst_handle_down_rep() could try to dereference the drm_dp_sideband_msg_tx txmsg request message after the thread waiting for the reply - calling drm_dp_mst_wait_tx_reply() - has timed out and freed txmsg, hence leading to a use-after-free in drm_dp_mst_handle_down_rep(). Prevent the above by holding the drm_dp_mst_topology_mgr::qlock in drm_dp_mst_handle_down_rep() for the whole duration txmsg is looked up from the request list and dereferenced. v2: Fix unlocking mgr->qlock after verify_rx_request_type() fails. Cc: Lyude Paul <lyude@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241203174632.2941402-1-imre.deak@intel.com
2024-12-05drm/dp_mst: Simplify error path in drm_dp_mst_handle_down_rep()Imre Deak
Simplify the error return path in drm_dp_mst_handle_down_rep(), preparing for the next patch. While at it use reset_msg_rx_state() instead of open-coding it. Cc: Lyude Paul <lyude@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241203160223.2926014-4-imre.deak@intel.com
2024-12-05drm/dp_mst: Verify request type in the corresponding down message replyImre Deak
After receiving the response for an MST down request message, the response should be accepted/parsed only if the response type matches that of the request. Ensure this by checking if the request type code stored both in the request and the reply match, dropping the reply in case of a mismatch. This fixes the topology detection for an MST hub, as described in the Closes link below, where the hub sends an incorrect reply message after a CLEAR_PAYLOAD_TABLE -> LINK_ADDRESS down request message sequence. Cc: Lyude Paul <lyude@redhat.com> Cc: <stable@vger.kernel.org> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12804 Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241203160223.2926014-3-imre.deak@intel.com
2024-12-05drm/dp_mst: Fix resetting msg rx state after topology removalImre Deak
If the MST topology is removed during the reception of an MST down reply or MST up request sideband message, the drm_dp_mst_topology_mgr::up_req_recv/down_rep_recv states could be reset from one thread via drm_dp_mst_topology_mgr_set_mst(false), racing with the reading/parsing of the message from another thread via drm_dp_mst_handle_down_rep() or drm_dp_mst_handle_up_req(). The race is possible since the reader/parser doesn't hold any lock while accessing the reception state. This in turn can lead to a memory corruption in the reader/parser as described by commit bd2fccac61b4 ("drm/dp_mst: Fix MST sideband message body length check"). Fix the above by resetting the message reception state if needed before reading/parsing a message. Another solution would be to hold the drm_dp_mst_topology_mgr::lock for the whole duration of the message reception/parsing in drm_dp_mst_handle_down_rep() and drm_dp_mst_handle_up_req(), however this would require a bigger change. Since the fix is also needed for stable, opting for the simpler solution in this patch. Cc: Lyude Paul <lyude@redhat.com> Cc: <stable@vger.kernel.org> Fixes: 1d082618bbf3 ("drm/display/dp_mst: Fix down/up message handling after sink disconnect") Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13056 Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241203160223.2926014-2-imre.deak@intel.com
2024-12-05x86/cpu/topology: Remove limit of CPUs due to disabled IO/APICFernando Fernandez Mancera
The rework of possible CPUs management erroneously disabled SMP when the IO/APIC is disabled either by the 'noapic' command line parameter or during IO/APIC setup. SMP is possible without IO/APIC. Remove the ioapic_is_disabled conditions from the relevant possible CPU management code paths to restore the orgininal behaviour. Fixes: 7c0edad3643f ("x86/cpu/topology: Rework possible CPU management") Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20241202145905.1482-1-ffmancera@riseup.net
2024-12-05ALSA: hda/realtek: Fix spelling mistake "Firelfy" -> "Firefly"Colin Ian King
There is a spelling mistake in a literal string in the alc269_fixup_tbl quirk table. Fix it. Fixes: 0d08f0eec961 ("ALSA: hda/realtek: fix micmute LEDs don't work on HP Laptops") Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://patch.msgid.link/20241205102833.476190-1-colin.i.king@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-12-05ASoC: mediatek: mt8188-mt6359: Remove hardcoded dmic codecNícolas F. R. A. Prado
Remove hardcoded dmic codec from the UL_SRC dai link to avoid requiring a dmic codec to be present for the driver to probe, as not every MT8188-based platform might need a dmic codec. The codec can be assigned to the dai link through the dai-link property in Devicetree on the platforms where it is needed. No Devicetree currently relies on it so it is safe to remove without worrying about backward compatibility. Fixes: 9f08dcbddeb3 ("ASoC: mediatek: mt8188-mt6359: support new board with nau88255") Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://patch.msgid.link/20241203-mt8188-6359-unhardcode-dmic-v1-1-346e3e5cbe6d@collabora.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-12-05x86/mm: Add _PAGE_NOPTISHADOW bit to avoid updating userspace page tablesDavid Woodhouse
The set_p4d() and set_pgd() functions (in 4-level or 5-level page table setups respectively) assume that the root page table is actually a 8KiB allocation, with the userspace root immediately after the kernel root page table (so that the former can enforce NX on on all the subordinate page tables, which are actually shared). However, users of the kernel_ident_mapping_init() code do not give it an 8KiB allocation for its PGD. Both swsusp_arch_resume() and acpi_mp_setup_reset() allocate only a single 4KiB page. The kexec code on x86_64 currently gets away with it purely by chance, because it allocates 8KiB for its "control code page" and then actually uses the first half for the PGD, then copies the actual trampoline code into the second half only after the identmap code has finished scribbling over it. Fix this by defining a _PAGE_NOPTISHADOW bit (which can use the same bit as _PAGE_SAVED_DIRTY since one is only for the PGD/P4D root and the other is exclusively for leaf PTEs.). This instructs __pti_set_user_pgtbl() not to write to the userspace 'shadow' PGD. Strictly, the _PAGE_NOPTISHADOW bit doesn't need to be written out to the actual page tables; since __pti_set_user_pgtbl() returns the value to be written to the kernel page table, it could be filtered out. But there seems to be no benefit to actually doing so. Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/412c90a4df7aef077141d9f68d19cbe5602d6c6d.camel@infradead.org Cc: stable@kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com>
2024-12-05spi: omap2-mcspi: Fix the IS_ERR() bug for devm_clk_get_optional_enabled()Purushothama Siddaiah
The devm_clk_get_optional_enabled() function returns error pointers(PTR_ERR()). So use IS_ERR() to check it. Verified on K3-J7200 EVM board, without clock node mentioned in the device tree. Signed-off-by: Purushothama Siddaiah <psiddaiah@mvista.com> Reviewed-by: Corey Minyard <cminyard@mvista.com> Link: https://patch.msgid.link/20241205070426.1861048-1-psiddaiah@mvista.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-12-05jffs2: Fix rtime decompressorRichard Weinberger
The fix for a memory corruption contained a off-by-one error and caused the compressor to fail in legit cases. Cc: Kinsey Moore <kinsey.moore@oarcorp.com> Cc: stable@vger.kernel.org Fixes: fe051552f5078 ("jffs2: Prevent rtime decompress memory corruption") Signed-off-by: Richard Weinberger <richard@nod.at>
2024-12-05arm64: mte: Fix copy_highpage() warning on hugetlb foliosCatalin Marinas
Commit 25c17c4b55de ("hugetlb: arm64: add mte support") improved the copy_highpage() function to update the tags in the destination hugetlb folio. However, when the source folio isn't tagged, the code takes the non-hugetlb path where try_page_mte_tagging() warns as the destination is a hugetlb folio: WARNING: CPU: 0 PID: 363 at arch/arm64/include/asm/mte.h:58 copy_highpage+0x1d4/0x2d8 [...] pc : copy_highpage+0x1d4/0x2d8 lr : copy_highpage+0x78/0x2d8 [...] Call trace: copy_highpage+0x1d4/0x2d8 (P) copy_highpage+0x78/0x2d8 (L) copy_user_highpage+0x20/0x48 copy_user_large_folio+0x1bc/0x268 hugetlb_wp+0x190/0x860 hugetlb_fault+0xa28/0xc10 handle_mm_fault+0x2a0/0x2c0 do_page_fault+0x12c/0x578 do_mem_abort+0x4c/0xa8 el0_da+0x44/0xb0 el0t_64_sync_handler+0xc4/0x138 el0t_64_sync+0x198/0x1a0 Change the check for the tagged status of the source folio so that it does not fall through the non-hugetlb case. In addition, only perform the copy (for the full folio) if the source page is the folio head and warn if the destination folio is already tagged, for symmetry with the non-hugetlb case. Fixes: 25c17c4b55de ("hugetlb: arm64: add mte support") Reported-by: Sasha Levin <sashal@kernel.org> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: David Hildenbrand <david@redhat.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/Z0STR6VLt2MCalnY@sashalap Link: https://lore.kernel.org/r/20241204175004.906754-1-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-12-05arm64: Ensure bits ASID[15:8] are masked out when the kernel uses 8-bit ASIDsCatalin Marinas
Linux currently sets the TCR_EL1.AS bit unconditionally during CPU bring-up. On an 8-bit ASID CPU, this is RES0 and ignored, otherwise 16-bit ASIDs are enabled. However, if running in a VM and the hypervisor reports 8-bit ASIDs (ID_AA64MMFR0_EL1.ASIDBits == 0) on a 16-bit ASIDs CPU, Linux uses bits 8 to 63 as a generation number for tracking old process ASIDs. The bottom 8 bits of this generation end up being written to TTBR1_EL1 and also used for the ASID-based TLBI operations as the upper 8 bits of the ASID. Following an ASID roll-over event we can have threads of the same application with the same 8-bit ASID but different generation numbers running on separate CPUs. Both TLB caching and the TLBI operations will end up using different actual 16-bit ASIDs for the same process. A similar scenario can happen in a big.LITTLE configuration if the boot CPU only uses 8-bit ASIDs while secondary CPUs have 16-bit ASIDs. Ensure that the ASID generation is only tracked by bits 16 and up, leaving bits 15:8 as 0 if the kernel uses 8-bit ASIDs. Note that clearing TCR_EL1.AS is not sufficient since the architecture requires that the top 8 bits of the ASID passed to TLBI instructions are 0 rather than ignored in such configuration. Cc: stable@vger.kernel.org Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: James Morse <james.morse@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241203151941.353796-1-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-12-05ACPI/IORT: Add PMCG platform information for HiSilicon HIP09AQinxin Xia
HiSilicon HIP09A platforms using the same SMMU PMCG with HIP09 and thus suffers the same erratum. List them in the PMCG platform information list without introducing a new SMMU PMCG Model. Update the silicon-errata.rst as well. Reviewed-by: Yicong Yang <yangyicong@hisilicon.com> Acked-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Qinxin Xia <xiaqinxin@huawei.com> Link: https://lore.kernel.org/r/20241205013331.1484017-1-xiaqinxin@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-12-05net :mana :Request a V2 response version for MANA_QUERY_GF_STATShradha Gupta
The current requested response version(V1) for MANA_QUERY_GF_STAT query results in STATISTICS_FLAGS_TX_ERRORS_GDMA_ERROR value being set to 0 always. In order to get the correct value for this counter we request the response version to be V2. Cc: stable@vger.kernel.org Fixes: e1df5202e879 ("net :mana :Add remaining GDMA stats for MANA to ethtool") Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://patch.msgid.link/1733291300-12593-1-git-send-email-shradhagupta@linux.microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-05net: avoid potential UAF in default_operstate()Eric Dumazet
syzbot reported an UAF in default_operstate() [1] Issue is a race between device and netns dismantles. After calling __rtnl_unlock() from netdev_run_todo(), we can not assume the netns of each device is still alive. Make sure the device is not in NETREG_UNREGISTERED state, and add an ASSERT_RTNL() before the call to __dev_get_by_index(). We might move this ASSERT_RTNL() in __dev_get_by_index() in the future. [1] BUG: KASAN: slab-use-after-free in __dev_get_by_index+0x5d/0x110 net/core/dev.c:852 Read of size 8 at addr ffff888043eba1b0 by task syz.0.0/5339 CPU: 0 UID: 0 PID: 5339 Comm: syz.0.0 Not tainted 6.12.0-syzkaller-10296-gaaf20f870da0 #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0x169/0x550 mm/kasan/report.c:489 kasan_report+0x143/0x180 mm/kasan/report.c:602 __dev_get_by_index+0x5d/0x110 net/core/dev.c:852 default_operstate net/core/link_watch.c:51 [inline] rfc2863_policy+0x224/0x300 net/core/link_watch.c:67 linkwatch_do_dev+0x3e/0x170 net/core/link_watch.c:170 netdev_run_todo+0x461/0x1000 net/core/dev.c:10894 rtnl_unlock net/core/rtnetlink.c:152 [inline] rtnl_net_unlock include/linux/rtnetlink.h:133 [inline] rtnl_dellink+0x760/0x8d0 net/core/rtnetlink.c:3520 rtnetlink_rcv_msg+0x791/0xcf0 net/core/rtnetlink.c:6911 netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2541 netlink_unicast_kernel net/netlink/af_netlink.c:1321 [inline] netlink_unicast+0x7f6/0x990 net/netlink/af_netlink.c:1347 netlink_sendmsg+0x8e4/0xcb0 net/netlink/af_netlink.c:1891 sock_sendmsg_nosec net/socket.c:711 [inline] __sock_sendmsg+0x221/0x270 net/socket.c:726 ____sys_sendmsg+0x52a/0x7e0 net/socket.c:2583 ___sys_sendmsg net/socket.c:2637 [inline] __sys_sendmsg+0x269/0x350 net/socket.c:2669 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f2a3cb80809 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f2a3d9cd058 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f2a3cd45fa0 RCX: 00007f2a3cb80809 RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000008 RBP: 00007f2a3cbf393e R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 00007f2a3cd45fa0 R15: 00007ffd03bc65c8 </TASK> Allocated by task 5339: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 poison_kmalloc_redzone mm/kasan/common.c:377 [inline] __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:394 kasan_kmalloc include/linux/kasan.h:260 [inline] __kmalloc_cache_noprof+0x243/0x390 mm/slub.c:4314 kmalloc_noprof include/linux/slab.h:901 [inline] kmalloc_array_noprof include/linux/slab.h:945 [inline] netdev_create_hash net/core/dev.c:11870 [inline] netdev_init+0x10c/0x250 net/core/dev.c:11890 ops_init+0x31e/0x590 net/core/net_namespace.c:138 setup_net+0x287/0x9e0 net/core/net_namespace.c:362 copy_net_ns+0x33f/0x570 net/core/net_namespace.c:500 create_new_namespaces+0x425/0x7b0 kernel/nsproxy.c:110 unshare_nsproxy_namespaces+0x124/0x180 kernel/nsproxy.c:228 ksys_unshare+0x57d/0xa70 kernel/fork.c:3314 __do_sys_unshare kernel/fork.c:3385 [inline] __se_sys_unshare kernel/fork.c:3383 [inline] __x64_sys_unshare+0x38/0x40 kernel/fork.c:3383 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Freed by task 12: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:582 poison_slab_object mm/kasan/common.c:247 [inline] __kasan_slab_free+0x59/0x70 mm/kasan/common.c:264 kasan_slab_free include/linux/kasan.h:233 [inline] slab_free_hook mm/slub.c:2338 [inline] slab_free mm/slub.c:4598 [inline] kfree+0x196/0x420 mm/slub.c:4746 netdev_exit+0x65/0xd0 net/core/dev.c:11992 ops_exit_list net/core/net_namespace.c:172 [inline] cleanup_net+0x802/0xcc0 net/core/net_namespace.c:632 process_one_work kernel/workqueue.c:3229 [inline] process_scheduled_works+0xa63/0x1850 kernel/workqueue.c:3310 worker_thread+0x870/0xd30 kernel/workqueue.c:3391 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 The buggy address belongs to the object at ffff888043eba000 which belongs to the cache kmalloc-2k of size 2048 The buggy address is located 432 bytes inside of freed 2048-byte region [ffff888043eba000, ffff888043eba800) The buggy address belongs to the physical page: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x43eb8 head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0x4fff00000000040(head|node=1|zone=1|lastcpupid=0x7ff) page_type: f5(slab) raw: 04fff00000000040 ffff88801ac42000 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000080008 00000001f5000000 0000000000000000 head: 04fff00000000040 ffff88801ac42000 dead000000000122 0000000000000000 head: 0000000000000000 0000000000080008 00000001f5000000 0000000000000000 head: 04fff00000000003 ffffea00010fae01 ffffffffffffffff 0000000000000000 head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 5339, tgid 5338 (syz.0.0), ts 69674195892, free_ts 69663220888 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1556 prep_new_page mm/page_alloc.c:1564 [inline] get_page_from_freelist+0x3649/0x3790 mm/page_alloc.c:3474 __alloc_pages_noprof+0x292/0x710 mm/page_alloc.c:4751 alloc_pages_mpol_noprof+0x3e8/0x680 mm/mempolicy.c:2265 alloc_slab_page+0x6a/0x140 mm/slub.c:2408 allocate_slab+0x5a/0x2f0 mm/slub.c:2574 new_slab mm/slub.c:2627 [inline] ___slab_alloc+0xcd1/0x14b0 mm/slub.c:3815 __slab_alloc+0x58/0xa0 mm/slub.c:3905 __slab_alloc_node mm/slub.c:3980 [inline] slab_alloc_node mm/slub.c:4141 [inline] __do_kmalloc_node mm/slub.c:4282 [inline] __kmalloc_noprof+0x2e6/0x4c0 mm/slub.c:4295 kmalloc_noprof include/linux/slab.h:905 [inline] sk_prot_alloc+0xe0/0x210 net/core/sock.c:2165 sk_alloc+0x38/0x370 net/core/sock.c:2218 __netlink_create+0x65/0x260 net/netlink/af_netlink.c:629 __netlink_kernel_create+0x174/0x6f0 net/netlink/af_netlink.c:2015 netlink_kernel_create include/linux/netlink.h:62 [inline] uevent_net_init+0xed/0x2d0 lib/kobject_uevent.c:783 ops_init+0x31e/0x590 net/core/net_namespace.c:138 setup_net+0x287/0x9e0 net/core/net_namespace.c:362 page last free pid 1032 tgid 1032 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] free_pages_prepare mm/page_alloc.c:1127 [inline] free_unref_page+0xdf9/0x1140 mm/page_alloc.c:2657 __slab_free+0x31b/0x3d0 mm/slub.c:4509 qlink_free mm/kasan/quarantine.c:163 [inline] qlist_free_all+0x9a/0x140 mm/kasan/quarantine.c:179 kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286 __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:329 kasan_slab_alloc include/linux/kasan.h:250 [inline] slab_post_alloc_hook mm/slub.c:4104 [inline] slab_alloc_node mm/slub.c:4153 [inline] kmem_cache_alloc_node_noprof+0x1d9/0x380 mm/slub.c:4205 __alloc_skb+0x1c3/0x440 net/core/skbuff.c:668 alloc_skb include/linux/skbuff.h:1323 [inline] alloc_skb_with_frags+0xc3/0x820 net/core/skbuff.c:6612 sock_alloc_send_pskb+0x91a/0xa60 net/core/sock.c:2881 sock_alloc_send_skb include/net/sock.h:1797 [inline] mld_newpack+0x1c3/0xaf0 net/ipv6/mcast.c:1747 add_grhead net/ipv6/mcast.c:1850 [inline] add_grec+0x1492/0x19a0 net/ipv6/mcast.c:1988 mld_send_initial_cr+0x228/0x4b0 net/ipv6/mcast.c:2234 ipv6_mc_dad_complete+0x88/0x490 net/ipv6/mcast.c:2245 addrconf_dad_completed+0x712/0xcd0 net/ipv6/addrconf.c:4342 addrconf_dad_work+0xdc2/0x16f0 process_one_work kernel/workqueue.c:3229 [inline] process_scheduled_works+0xa63/0x1850 kernel/workqueue.c:3310 Memory state around the buggy address: ffff888043eba080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888043eba100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff888043eba180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888043eba200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888043eba280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: 8c55facecd7a ("net: linkwatch: only report IF_OPER_LOWERLAYERDOWN if iflink is actually down") Reported-by: syzbot+1939f24bdb783e9e43d9@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/674f3a18.050a0220.48a03.0041.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241203170933.2449307-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-05Merge tag 'nf-24-12-05' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Fix esoteric undefined behaviour due to uninitialized stack access in ip_vs_protocol_init(), from Jinghao Jia. 2) Fix iptables xt_LED slab-out-of-bounds due to incorrect sanitization of the led string identifier, reported by syzbot. Patch from Dmitry Antipov. 3) Remove WARN_ON_ONCE reachable from userspace to check for the maximum cgroup level, nft_socket cgroup matching is restricted to 255 levels, but cgroups allow for INT_MAX levels by default. Reported by syzbot. 4) Fix nft_inner incorrect use of percpu area to store tunnel parser context with softirqs, resulting in inconsistent inner header offsets that could lead to bogus rule mismatches, reported by syzbot. 5) Grab module reference on ipset core while requesting set type modules, otherwise kernel crash is possible by removing ipset core module, patch from Phil Sutter. 6) Fix possible double-free in nft_hash garbage collector due to unstable walk interator that can provide twice the same element. Use a sequence number to skip expired/dead elements that have been already scheduled for removal. Based on patch from Laurent Fasnach netfilter pull request 24-12-05 * tag 'nf-24-12-05' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nft_set_hash: skip duplicated elements pending gc run netfilter: ipset: Hold module reference while requesting a module netfilter: nft_inner: incorrect percpu area handling under softirq netfilter: nft_socket: remove WARN_ON_ONCE on maximum cgroup level netfilter: x_tables: fix LED ID check in led_tg_check() ipvs: fix UB due to uninitialized stack access in ip_vs_protocol_init() ==================== Link: https://patch.msgid.link/20241205002854.162490-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-05Merge branch 'vsock-test-fix-wrong-setsockopt-parameters'Paolo Abeni
Konstantin Shkolnyy says: ==================== vsock/test: fix wrong setsockopt() parameters Parameters were created using wrong C types, which caused them to be of wrong size on some architectures, causing problems. The problem with SO_RCVLOWAT was found on s390 (big endian), while x86-64 didn't show it. After the fix, all tests pass on s390. Then Stefano Garzarella pointed out that SO_VM_SOCKETS_* calls might have a similar problem, which turned out to be true, hence, the second patch. Changes for v8: - Fix whitespace warnings from "checkpatch.pl --strict" - Add maintainers to Cc: Changes for v7: - Rebase on top of https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git - Add the "net" tags to the subjects Changes for v6: - rework the patch #3 to avoid creating a new file for new functions, and exclude vsock_perf from calling the new functions. - add "Reviewed-by:" to the patch #2. Changes for v5: - in the patch #2 replace the introduced uint64_t with unsigned long long to match documentation - add a patch #3 that verifies every setsockopt() call. Changes for v4: - add "Reviewed-by:" to the first patch, and add a second patch fixing SO_VM_SOCKETS_* calls, which depends on the first one (hence, it's now a patch series.) Changes for v3: - fix the same problem in vsock_perf and update commit message Changes for v2: - add "Fixes:" lines to the commit message ==================== Link: https://patch.msgid.link/20241203150656.287028-1-kshk@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-05vsock/test: verify socket options after setting themKonstantin Shkolnyy
Replace setsockopt() calls with calls to functions that follow setsockopt() with getsockopt() and check that the returned value and its size are the same as have been set. (Except in vsock_perf.) Signed-off-by: Konstantin Shkolnyy <kshk@linux.ibm.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-05vsock/test: fix parameter types in SO_VM_SOCKETS_* callsKonstantin Shkolnyy
Change parameters of SO_VM_SOCKETS_* to unsigned long long as documented in the vm_sockets.h, because the corresponding kernel code requires them to be at least 64-bit, no matter what architecture. Otherwise they are too small on 32-bit machines. Fixes: 5c338112e48a ("test/vsock: rework message bounds test") Fixes: 685a21c314a8 ("test/vsock: add big message test") Fixes: 542e893fbadc ("vsock/test: two tests to check credit update logic") Fixes: 8abbffd27ced ("test/vsock: vsock_perf utility") Signed-off-by: Konstantin Shkolnyy <kshk@linux.ibm.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-05vsock/test: fix failures due to wrong SO_RCVLOWAT parameterKonstantin Shkolnyy
This happens on 64-bit big-endian machines. SO_RCVLOWAT requires an int parameter. However, instead of int, the test uses unsigned long in one place and size_t in another. Both are 8 bytes long on 64-bit machines. The kernel, having received the 8 bytes, doesn't test for the exact size of the parameter, it only cares that it's >= sizeof(int), and casts the 4 lower-addressed bytes to an int, which, on a big-endian machine, contains 0. 0 doesn't trigger an error, SO_RCVLOWAT returns with success and the socket stays with the default SO_RCVLOWAT = 1, which results in vsock_test failures, while vsock_perf doesn't even notice that it's failed to change it. Fixes: b1346338fbae ("vsock_test: POLLIN + SO_RCVLOWAT test") Fixes: 542e893fbadc ("vsock/test: two tests to check credit update logic") Fixes: 8abbffd27ced ("test/vsock: vsock_perf utility") Signed-off-by: Konstantin Shkolnyy <kshk@linux.ibm.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-05Merge branch 'mitigate-the-two-reallocations-issue-for-iptunnels'Paolo Abeni
Justin Iurman says: ==================== Mitigate the two-reallocations issue for iptunnels RESEND v5: - v5 was sent just when net-next closed v5: - address Paolo's comments - s/int dst_dev_overhead()/unsigned int dst_dev_overhead()/ v4: - move static inline function to include/net/dst.h v3: - fix compilation error in seg6_iptunnel v2: - add missing "static" keywords in seg6_iptunnel - use a static-inline function to return the dev overhead (as suggested by Olek, thanks) The same pattern is found in ioam6, rpl6, and seg6. Basically, it first makes sure there is enough room for inserting a new header: (1) err = skb_cow_head(skb, len + skb->mac_len); Then, when the insertion (encap or inline) is performed, the input and output handlers respectively make sure there is enough room for layer 2: (2) err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev)); skb_cow_head() does nothing when there is enough room. Otherwise, it reallocates more room, which depends on the architecture. Briefly, skb_cow_head() calls __skb_cow() which then calls pskb_expand_head() as follows: pskb_expand_head(skb, ALIGN(delta, NET_SKB_PAD), 0, GFP_ATOMIC); "delta" represents the number of bytes to be added. This value is aligned with NET_SKB_PAD, which is defined as follows: NET_SKB_PAD = max(32, L1_CACHE_BYTES) ... where L1_CACHE_BYTES also depends on the architecture. In our case (x86), it is defined as follows: L1_CACHE_BYTES = (1 << CONFIG_X86_L1_CACHE_SHIFT) ... where (again, in our case) CONFIG_X86_L1_CACHE_SHIFT equals 6 (=X86_GENERIC). All this to say, skb_cow_head() would reallocate to the next multiple of NET_SKB_PAD (in our case a 64-byte multiple) when there is not enough room. Back to the main issue with the pattern: in some cases, two reallocations are triggered, resulting in a performance drop (i.e., lines (1) and (2) would both trigger an implicit reallocation). How's that possible? Well, this is kind of bad luck as we hit an exact NET_SKB_PAD boundary and when skb->mac_len (=14) is smaller than LL_RESERVED_SPACE(dst->dev) (=16 in our case). For an x86 arch, it happens in the following cases (with the default needed_headroom): - ioam6: - (inline mode) pre-allocated data trace of 236 or 240 bytes - (encap mode) pre-allocated data trace of 196 or 200 bytes - seg6: - (encap mode) for 13, 17, 21, 25, 29, 33, ...(+4)... prefixes Let's illustrate the problem, i.e., when we fall on the exact NET_SKB_PAD boundary. In the case of ioam6, for the above problematic values, the total overhead is 256 bytes for both modes. Based on line (1), skb->mac_len (=14) is added, therefore passing 270 bytes to skb_cow_head(). At that moment, the headroom has 206 bytes available (in our case). Since 270 > 206, skb_cow_head() performs a reallocation and the new headroom is now 206 + 64 (NET_SKB_PAD) = 270. Which is exactly the room we needed. After the insertion, the headroom has 0 byte available. But, there's line (2) where 16 bytes are still needed. Which, again, triggers another reallocation. The same logic is applied to seg6 (although it does not happen with the inline mode, i.e., -40 bytes). It happens with other L1 cache shifts too (the larger the cache shift, the less often it happens). For example, with a +32 cache shift (instead of +64), the following number of segments would trigger two reallocations: 11, 15, 19, ... With a +128 cache shift, the following number of segments would trigger two reallocations: 17, 25, 33, ... And so on and so forth. Note that it is the same for both the "encap" and "l2encap" modes. For the "encap.red" and "l2encap.red" modes, it is the same logic but with "segs+1" (e.g., 14, 18, 22, 26, etc for a +64 cache shift). Note also that it may happen with rpl6 (based on some calculations), although it did not in our case. This series provides a solution to mitigate the aforementioned issue for ioam6, seg6, and rpl6. It provides the dst_entry (in the cache) to skb_cow_head() **before** the insertion (line (1)). As a result, the very first iteration would still trigger two reallocations (i.e., empty cache), while next iterations would only trigger a single reallocation. ==================== Link: https://patch.msgid.link/20241203124945.22508-1-justin.iurman@uliege.be Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-05net: ipv6: rpl_iptunnel: mitigate 2-realloc issueJustin Iurman
This patch mitigates the two-reallocations issue with rpl_iptunnel by providing the dst_entry (in the cache) to the first call to skb_cow_head(). As a result, the very first iteration would still trigger two reallocations (i.e., empty cache), while next iterations would only trigger a single reallocation. Performance tests before/after applying this patch, which clearly shows there is no impact (it even shows improvement): - before: https://ibb.co/nQJhqwc - after: https://ibb.co/4ZvW6wV Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Cc: Alexander Aring <aahringo@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-05net: ipv6: seg6_iptunnel: mitigate 2-realloc issueJustin Iurman
This patch mitigates the two-reallocations issue with seg6_iptunnel by providing the dst_entry (in the cache) to the first call to skb_cow_head(). As a result, the very first iteration would still trigger two reallocations (i.e., empty cache), while next iterations would only trigger a single reallocation. Performance tests before/after applying this patch, which clearly shows the improvement: - before: https://ibb.co/3Cg4sNH - after: https://ibb.co/8rQ350r Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Cc: David Lebrun <dlebrun@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-05net: ipv6: ioam6_iptunnel: mitigate 2-realloc issueJustin Iurman
This patch mitigates the two-reallocations issue with ioam6_iptunnel by providing the dst_entry (in the cache) to the first call to skb_cow_head(). As a result, the very first iteration may still trigger two reallocations (i.e., empty cache), while next iterations would only trigger a single reallocation. Performance tests before/after applying this patch, which clearly shows the improvement: - inline mode: - before: https://ibb.co/LhQ8V63 - after: https://ibb.co/x5YT2bS - encap mode: - before: https://ibb.co/3Cjm5m0 - after: https://ibb.co/TwpsxTC - encap mode with tunsrc: - before: https://ibb.co/Gpy9QPg - after: https://ibb.co/PW1bZFT This patch also fixes an incorrect behavior: after the insertion, the second call to skb_cow_head() makes sure that the dev has enough headroom in the skb for layer 2 and stuff. In that case, the "old" dst_entry was used, which is now fixed. After discussing with Paolo, it appears that both patches can be merged into a single one -this one- (for the sake of readability) and target net-next. Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-05include: net: add static inline dst_dev_overhead() to dst.hJustin Iurman
Add static inline dst_dev_overhead() function to include/net/dst.h. This helper function is used by ioam6_iptunnel, rpl_iptunnel and seg6_iptunnel to get the dev's overhead based on a cache entry (dst_entry). If the cache is empty, the default and generic value skb->mac_len is returned. Otherwise, LL_RESERVED_SPACE() over dst's dev is returned. Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-05net/mlx5: qos: Add ifc support for cross-esw schedulingCosmin Ratiu
This adds the capability bit and the vport element fields related to cross-esw scheduling. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241204220931.254964-5-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-05net/mlx5: Add support for new scheduling elementsCarolina Jubran
Introduce new scheduling elements in the E-Switch QoS hierarchy to enhance traffic management capabilities. This patch adds support for: - Rate Limit scheduling elements: Enables bandwidth limitation across multiple nodes without a shared ancestor, providing a mechanism for more granular control of bandwidth allocation. - Traffic Class Transmit Scheduling Arbiter (TSAR): Introduces the infrastructure for creating Traffic Class TSARs, allowing hierarchical arbitration based on traffic classes. - Traffic Class Arbiter TSAR: Adds support for a TSAR capable of managing arbitration between multiple traffic classes, enabling improved bandwidth prioritization and traffic management. No functional changes are introduced in this patch. Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241204220931.254964-4-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-05net/mlx5: Add ConnectX-8 device to ifcYevgeny Kliteynik
In preparation for ConnectX-8 SWS support, add enum for the new device type. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241204220931.254964-3-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-05net/mlx5: ifc: Reorganize mlx5_ifc_flow_table_context_bitsCosmin Ratiu
The nested union at the end is not in the same style as the rest of the code, so un-nest it to make the style uniformly applied again. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241204220931.254964-2-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-04audit: workaround a GCC bug triggered by task comm changesYafang shao
A build failure has been reported with the following details: In file included from include/linux/string.h:390, from include/linux/bitmap.h:13, from include/linux/cpumask.h:12, from include/linux/smp.h:13, from include/linux/lockdep.h:14, from include/linux/spinlock.h:63, from include/linux/wait.h:9, from include/linux/wait_bit.h:8, from include/linux/fs.h:6, from kernel/auditsc.c:37: In function 'sized_strscpy', inlined from '__audit_ptrace' at kernel/auditsc.c:2732:2: >> include/linux/fortify-string.h:293:17: error: call to '__write_overflow' declared with attribute error: detected write beyond size of object (1st parameter) 293 | __write_overflow(); | ^~~~~~~~~~~~~~~~~~ In function 'sized_strscpy', inlined from 'audit_signal_info_syscall' at kernel/auditsc.c:2759:3: >> include/linux/fortify-string.h:293:17: error: call to '__write_overflow' declared with attribute error: detected write beyond size of object (1st parameter) 293 | __write_overflow(); | ^~~~~~~~~~~~~~~~~~ The issue appears to be a GCC bug, though the root cause remains unclear at this time. For now, let's implement a workaround. A bug report has also been filed with GCC [0]. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117912 [0] Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202410171420.1V00ICVG-lkp@intel.com/ Reported-by: Steven Rostedt (Google) <rostedt@goodmis.org> Closes: https://lore.kernel.org/all/20241128182435.57a1ea6f@gandalf.local.home/ Reported-by: Zhuo, Qiuxu <qiuxu.zhuo@intel.com> Closes: https://lore.kernel.org/all/CY8PR11MB71348E568DBDA576F17DAFF389362@CY8PR11MB7134.namprd11.prod.outlook.com/ Originally-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/linux-hardening/202410171059.C2C395030@keescook/ Signed-off-by: Yafang shao <laoar.shao@gmail.com> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Yafang Shao <laoar.shao@gmail.com> [PM: subject tweak, description line wrapping] Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-12-04Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-12-03 (ice, idpf, ixgbe, ixgbevf, igb) This series contains updates to ice, idpf, ixgbe, ixgbevf, and igb drivers. For ice: Arkadiusz corrects search for determining whether PHY clock recovery is supported on the device. Przemyslaw corrects mask used for PHY timestamps on ETH56G devices. Wojciech adds missing virtchnl ops which caused NULL pointer dereference. Marcin fixes VLAN filter settings for uplink VSI in switchdev mode. For idpf: Josh restores setting of completion tag for empty buffers. For ixgbevf: Jake removes incorrect initialization/support of IPSEC for mailbox version 1.5. For ixgbe: Jake rewords and downgrades misleading message when negotiation of VF mailbox version is not supported. Tore Amundsen corrects value for BASE-BX10 capability. For igb: Yuan Can adds proper teardown on failed pci_register_driver() call. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: igb: Fix potential invalid memory access in igb_init_module() ixgbe: Correct BASE-BX10 compliance code ixgbe: downgrade logging of unsupported VF API version to debug ixgbevf: stop attempting IPSEC offload on Mailbox API 1.5 idpf: set completion tag for "empty" bufs associated with a packet ice: Fix VLAN pruning in switchdev mode ice: Fix NULL pointer dereference in switchdev ice: fix PHY timestamp extraction for ETH56G ice: fix PHY Clock Recovery availability check ==================== Link: https://patch.msgid.link/20241203215521.1646668-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-04r8169: simplify setting hwmon attribute visibilityHeiner Kallweit
Use new member visible to simplify setting the static visibility. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/dba77e76-be45-4a30-96c7-45e284072ad2@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-04Merge branch 'mlx5-misc-fixes-2024-12-03'Jakub Kicinski
Tariq Toukan says: ==================== mlx5 misc fixes 2024-12-03 This patchset provides misc bug fixes from the team to the mlx5 core and Eth drivers. ==================== Link: https://patch.msgid.link/20241203204920.232744-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-04net/mlx5e: Remove workaround to avoid syndrome for internal portJianbo Liu
Previously a workaround was added to avoid syndrome 0xcdb051. It is triggered when offload a rule with tunnel encapsulation, and forwarding to another table, but not matching on the internal port in firmware steering mode. The original workaround skips internal tunnel port logic, which is not correct as not all cases are considered. As an example, if vlan is configured on the uplink port, traffic can't pass because vlan header is not added with this workaround. Besides, there is no such issue for software steering. So, this patch removes that, and returns error directly if trying to offload such rule for firmware steering. Fixes: 06b4eac9c4be ("net/mlx5e: Don't offload internal port if filter device is out device") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Tested-by: Frode Nordahl <frode.nordahl@canonical.com> Reviewed-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241203204920.232744-7-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-04net/mlx5e: SD, Use correct mdev to build channel paramTariq Toukan
In a multi-PF netdev, each traffic channel creates its own resources against a specific PF. In the cited commit, where this support was added, the channel_param logic was mistakenly kept unchanged, so it always used the primary PF which is found at priv->mdev. In this patch we fix this by moving the logic to be per-channel, and passing the correct mdev instance. This bug happened to be usually harmless, as the resulting cparam structures would be the same for all channels, due to identical FW logic and decisions. However, in some use cases, like fwreset, this gets broken. This could lead to different symptoms. Example: Error cqe on cqn 0x428, ci 0x0, qn 0x10a9, opcode 0xe, syndrome 0x4, vendor syndrome 0x32 Fixes: e4f9686bdee7 ("net/mlx5e: Let channels be SD-aware") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20241203204920.232744-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-04net/mlx5: E-Switch, Fix switching to switchdev mode in MPVPatrisious Haddad
Fix the mentioned commit change for MPV mode, since in MPV mode the IB device is shared between different core devices, so under this change when moving both devices simultaneously to switchdev mode the IB device removal and re-addition can race with itself causing unexpected behavior. In such case do rescan_drivers() only once in order to add the ethernet representor auxiliary device, and skip adding and removing IB devices. Fixes: ab85ebf43723 ("net/mlx5: E-switch, refactor eswitch mode change") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241203204920.232744-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-04net/mlx5: E-Switch, Fix switching to switchdev mode with IB device disabledPatrisious Haddad
In case that IB device is already disabled when moving to switchdev mode, which can happen when working with LAG, need to do rescan_drivers() before leaving in order to add ethernet representor auxiliary device. Fixes: ab85ebf43723 ("net/mlx5: E-switch, refactor eswitch mode change") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241203204920.232744-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-04net/mlx5: HWS: Properly set bwc queue locks lock classesCosmin Ratiu
The mentioned "Fixes" patch forgot to do that. Fixes: 9addffa34359 ("net/mlx5: HWS, use lock classes for bwc locks") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241203204920.232744-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-04net/mlx5: HWS: Fix memory leak in mlx5hws_definer_calc_layoutCosmin Ratiu
It allocates a match template, which creates a compressed definer fc struct, but that is not deallocated. This commit fixes that. Fixes: 74a778b4a63f ("net/mlx5: HWS, added definers handling") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241203204920.232744-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>