summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-10-31net: hns3: fix missing features due to dev->features configuration too earlyHao Lan
Currently, the netdev->features is configured in hns3_nic_set_features. As a result, __netdev_update_features considers that there is no feature difference, and the procedures of the real features are missing. Fixes: 2a7556bb2b73 ("net: hns3: implement ndo_features_check ops for hns3 driver") Signed-off-by: Hao Lan <lanhao@huawei.com> Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-31net: hns3: fixed reset failure issues caused by the incorrect reset typeHao Lan
When a reset type that is not supported by the driver is input, a reset pending flag bit of the HNAE3_NONE_RESET type is generated in reset_pending. The driver does not have a mechanism to clear this type of error. As a result, the driver considers that the reset is not complete. This patch provides a mechanism to clear the HNAE3_NONE_RESET flag and the parameter of hnae3_ae_ops.set_default_reset_request is verified. The error message: hns3 0000:39:01.0: cmd failed -16 hns3 0000:39:01.0: hclge device re-init failed, VF is disabled! hns3 0000:39:01.0: failed to reset VF stack hns3 0000:39:01.0: failed to reset VF(4) hns3 0000:39:01.0: prepare reset(2) wait done hns3 0000:39:01.0 eth4: already uninitialized Use the crash tool to view struct hclgevf_dev: struct hclgevf_dev { ... default_reset_request = 0x20, reset_level = HNAE3_NONE_RESET, reset_pending = 0x100, reset_type = HNAE3_NONE_RESET, ... }; Fixes: 720bd5837e37 ("net: hns3: add set_default_reset_request in the hnae3_ae_ops") Signed-off-by: Hao Lan <lanhao@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-31net: hns3: add sync command to sync io-pgtableJian Shen
To avoid errors in pgtable prefectch, add a sync command to sync io-pagtable. This is a supplement for the previous patch. We want all the tx packet can be handled with tx bounce buffer path. But it depends on the remain space of the spare buffer, checked by the hns3_can_use_tx_bounce(). In most cases, maybe 99.99%, it returns true. But once it return false by no available space, the packet will be handled with the former path, which will map/unmap the skb buffer. Then the driver will face the smmu prefetch risk again. So add a sync command in this case to avoid smmu prefectch, just protects corner scenes. Fixes: 295ba232a8c3 ("net: hns3: add device version to replace pci revision") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peiyang Wang <wangpeiyang1@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-31net: hns3: default enable tx bounce buffer when smmu enabledPeiyang Wang
The SMMU engine on HIP09 chip has a hardware issue. SMMU pagetable prefetch features may prefetch and use a invalid PTE even the PTE is valid at that time. This will cause the device trigger fake pagefaults. The solution is to avoid prefetching by adding a SYNC command when smmu mapping a iova. But the performance of nic has a sharp drop. Then we do this workaround, always enable tx bounce buffer, avoid mapping/unmapping on TX path. This issue only affects HNS3, so we always enable tx bounce buffer when smmu enabled to improve performance. Fixes: 295ba232a8c3 ("net: hns3: add device version to replace pci revision") Signed-off-by: Peiyang Wang <wangpeiyang1@huawei.com> Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-31netfilter: nft_payload: sanitize offset and length before calling skb_checksum()Pablo Neira Ayuso
If access to offset + length is larger than the skbuff length, then skb_checksum() triggers BUG_ON(). skb_checksum() internally subtracts the length parameter while iterating over skbuff, BUG_ON(len) at the end of it checks that the expected length to be included in the checksum calculation is fully consumed. Fixes: 7ec3f7b47b8d ("netfilter: nft_payload: add packet mangling support") Reported-by: Slavin Liu <slavin-ayu@qq.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-10-31drm/panel: leadtek-ltk050h3146w: transition to mipi_dsi wrapped functionsTejas Vipin
Changes the leadtek-ltk050h3146w panel to use multi style functions for improved error handling. Signed-off-by: Tejas Vipin <tejasvipin76@gmail.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20241026035928.183454-1-tejasvipin76@gmail.com Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20241026035928.183454-1-tejasvipin76@gmail.com
2024-10-31drm/panel: samsung-s6e88a0-ams427ap24: Add flip optionJakob Hauser
The way of implementing a flip option follows the existing panel-samsung-s6e8aa0.c [1][2][3]. The value to flip the screen is taken from a downstream kernel file of a similar but older panel [4]. The mipi clock [5] for the new panel samsung-s6e88a0-ams427ap24 matches 461 MHz and a hardware read-out of the 0xcb values corresponds to revision R01 of that older panel [6]. Although for samsung-s6e88a0-ams427ap24 that's in non-flipped state while in this older driver it seems to be the other way around. Further up there is a hint [7] basically saying for revision R01 to change the first word of the 0xcb command from 0x06 to 0x0e, which is actually setting BIT(3) of that word. This causes a horizontal flip. [1] https://github.com/torvalds/linux/blob/v6.11/drivers/gpu/drm/panel/panel-samsung-s6e8aa0.c#L103 [2] https://github.com/torvalds/linux/blob/v6.11/drivers/gpu/drm/panel/panel-samsung-s6e8aa0.c#L207-L211 [3] https://github.com/torvalds/linux/blob/v6.11/drivers/gpu/drm/panel/panel-samsung-s6e8aa0.c#L954-L974 [4] https://github.com/LineageOS/android_kernel_samsung_msm8930-common/blob/lineage-15.1/drivers/video/msm/mipi_samsung_oled_video_qhd_pt-8930.c [5] https://github.com/LineageOS/android_kernel_samsung_msm8930-common/blob/lineage-15.1/drivers/video/msm/mipi_samsung_oled_video_qhd_pt-8930.c#L2027-L2028 [6] https://github.com/LineageOS/android_kernel_samsung_msm8930-common/blob/lineage-15.1/drivers/video/msm/mipi_samsung_oled_video_qhd_pt-8930.c#L137-L151 [7] https://github.com/LineageOS/android_kernel_samsung_msm8930-common/blob/lineage-15.1/drivers/video/msm/mipi_samsung_oled_video_qhd_pt-8930.c#L66-L74 Signed-off-by: Jakob Hauser <jahau@rocketmail.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/1491f275e9956b2da1f1e2580abd54f4e459c7d2.1730070570.git.jahau@rocketmail.com Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/1491f275e9956b2da1f1e2580abd54f4e459c7d2.1730070570.git.jahau@rocketmail.com
2024-10-31drm/panel: samsung-s6e88a0-ams427ap24: Add brightness controlJakob Hauser
The tables for brightness to candela, aid and elvss were taken from downstream kernel file "dsi_panel_S6E88A0_AMS427AP24_qhd_octa_video.dtsi" [1][2][3]. The gamma table gets generated in "ss_dsi_smart_dimming_S6E88A0_AMS427AP24.c" [4] with hard-coded starting values. The function smart_dimming_init() [5] goes through the v{*}_adjustments, generate_gray_scale and gamma_init procedure. Instead of calculating it manually, it's easier to compile a custom downstream kernel with SMART_DIMMING_DEBUG enabled and read out dmesg early at boot. Selection of the values for aid and elvss are again according to downstream file "dsi_panel_S6E88A0_AMS427AP24_qhd_octa_video.dtsi" [6][7]. The set of write commands is guided by downstream file "ss_dsi_panel_common.c" [8] followed by "ss_dsi_panel_S6E88A0_AMS427AP24.c" [9]. The dsi mode flag MIPI_DSI_MODE_VIDEO_NO_HFP prevents screen flickering while changing the brightness. [1] https://github.com/msm8916-mainline/linux-downstream/blob/GT-I9195I/drivers/video/msm/mdss/samsung/S6E88A0_AMS427AP24/dsi_panel_S6E88A0_AMS427AP24_qhd_octa_video.dtsi#L341-L397 [2] https://github.com/msm8916-mainline/linux-downstream/blob/GT-I9195I/drivers/video/msm/mdss/samsung/S6E88A0_AMS427AP24/dsi_panel_S6E88A0_AMS427AP24_qhd_octa_video.dtsi#L214-L254 [3] https://github.com/msm8916-mainline/linux-downstream/blob/GT-I9195I/drivers/video/msm/mdss/samsung/S6E88A0_AMS427AP24/dsi_panel_S6E88A0_AMS427AP24_qhd_octa_video.dtsi#L301-L320 [4] https://github.com/msm8916-mainline/linux-downstream/blob/GT-I9195I/drivers/video/msm/mdss/samsung/S6E88A0_AMS427AP24/ss_dsi_smart_dimming_S6E88A0_AMS427AP24.c [5] https://github.com/msm8916-mainline/linux-downstream/blob/GT-I9195I/drivers/video/msm/mdss/samsung/S6E88A0_AMS427AP24/ss_dsi_smart_dimming_S6E88A0_AMS427AP24.c#L1816-L1900 [6] https://github.com/msm8916-mainline/linux-downstream/blob/GT-I9195I/drivers/video/msm/mdss/samsung/S6E88A0_AMS427AP24/dsi_panel_S6E88A0_AMS427AP24_qhd_octa_video.dtsi#L256-L268 [7] https://github.com/msm8916-mainline/linux-downstream/blob/GT-I9195I/drivers/video/msm/mdss/samsung/S6E88A0_AMS427AP24/dsi_panel_S6E88A0_AMS427AP24_qhd_octa_video.dtsi#L322-L334 [8] https://github.com/msm8916-mainline/linux-downstream/blob/GT-I9195I/drivers/video/msm/mdss/samsung/ss_dsi_panel_common.c#L1389-L1517 [9] https://github.com/msm8916-mainline/linux-downstream/blob/GT-I9195I/drivers/video/msm/mdss/samsung/S6E88A0_AMS427AP24/ss_dsi_panel_S6E88A0_AMS427AP24.c#L666-L678 Signed-off-by: Jakob Hauser <jahau@rocketmail.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/30c154954a4e0f75d90f6ef6f393a5ba8b3ffea9.1730070570.git.jahau@rocketmail.com Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/30c154954a4e0f75d90f6ef6f393a5ba8b3ffea9.1730070570.git.jahau@rocketmail.com
2024-10-31drm/panel: samsung-s6e88a0-ams427ap24: Add initial driverJakob Hauser
This initial part of the panel driver was mostly generated by the "linux-mdss-dsi-panel-driver-generator" tool [1], reading downstream Android kernel file "dsi_panel_S6E88A0_AMS427AP24_qhd_octa_video.dtsi" [2]. On top of the generic output of the tool, there were a couple of changes applied: - Added mipi_dsi_dcs_set_display_on() to function s6e88a0_ams427ap24_on(), otherwise the display does not show up. - In functions s6e88a0_ams427ap24_on() and s6e88a0_ams427ap24_off() changed DSI commands to multi context and used "accum_err" returns. - In functions s6e88a0_ams427ap24_on() and s6e88a0_ams427ap24_off() replaced msleep() by mipi_dsi_msleep(). - The function s6e88a0_ams427ap24_get_modes() was changed to make use of drm_connector_helper_get_modes_fixed(). This also required to include drm/drm_probe_helper.h. - In function s6e88a0_ams427ap24_probe() registring the regulators was changed to devm_regulator_bulk_get_const(). This required to change supplies in struct s6e88a0_ams427ap24 to a pointer. - Removed bool "prepared" from struct s6e88a0_ams427ap24 and according parts in functions s6e88a0_ams427ap24_prepare() and s6e88a0_ams427ap24_unprepare(). - Removed include <linux/of.h>, it's not needed. - Added comments to the mipi_dsi_dcs_write_seq_multi() lines in function s6e88a0_ams427ap24_on(). [1] https://github.com/msm8916-mainline/linux-mdss-dsi-panel-driver-generator [2] https://github.com/msm8916-mainline/linux-downstream/blob/GT-I9195I/drivers/video/msm/mdss/samsung/S6E88A0_AMS427AP24/dsi_panel_S6E88A0_AMS427AP24_qhd_octa_video.dtsi Signed-off-by: Jakob Hauser <jahau@rocketmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/d92ef0036b66520bb6d1ec908165e776cf30c303.1730070570.git.jahau@rocketmail.com Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/d92ef0036b66520bb6d1ec908165e776cf30c303.1730070570.git.jahau@rocketmail.com
2024-10-31dt-bindings: display: panel: Add Samsung S6E88A0-AMS427AP24Jakob Hauser
Add bindings for Samsung AMS427AP24 panel with S6E88A0 controller. Signed-off-by: Jakob Hauser <jahau@rocketmail.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/d60428e7dad57efd42948adfd34b81335dabcd1f.1730070570.git.jahau@rocketmail.com Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/d60428e7dad57efd42948adfd34b81335dabcd1f.1730070570.git.jahau@rocketmail.com
2024-10-31dt-bindings: display: panel: Move flip properties to panel-commonJakob Hauser
The flip properties were used by "samsung,s6e8aa0.yaml" only so far. By introducing "samsung,s6e88a0-ams427ap24.yaml" they become more common. Signed-off-by: Jakob Hauser <jahau@rocketmail.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/e4edd8fc07b89006207f5f8962a816ff7f8f4601.1730070570.git.jahau@rocketmail.com Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/e4edd8fc07b89006207f5f8962a816ff7f8f4601.1730070570.git.jahau@rocketmail.com
2024-10-31drm/tests: hdmi: Fix memory leaks in drm_display_mode_from_cea_vic()Jinjie Ruan
modprobe drm_hdmi_state_helper_test and then rmmod it, the following memory leak occurs. The `mode` allocated in drm_mode_duplicate() called by drm_display_mode_from_cea_vic() is not freed, which cause the memory leak: unreferenced object 0xffffff80ccd18100 (size 128): comm "kunit_try_catch", pid 1851, jiffies 4295059695 hex dump (first 32 bytes): 57 62 00 00 80 02 90 02 f0 02 20 03 00 00 e0 01 Wb........ ..... ea 01 ec 01 0d 02 00 00 0a 00 00 00 00 00 00 00 ................ backtrace (crc c2f1aa95): [<000000000f10b11b>] kmemleak_alloc+0x34/0x40 [<000000001cd4cf73>] __kmalloc_cache_noprof+0x26c/0x2f4 [<00000000f1f3cffa>] drm_mode_duplicate+0x44/0x19c [<000000008cbeef13>] drm_display_mode_from_cea_vic+0x88/0x98 [<0000000019daaacf>] 0xffffffedc11ae69c [<000000000aad0f85>] kunit_try_run_case+0x13c/0x3ac [<00000000a9210bac>] kunit_generic_run_threadfn_adapter+0x80/0xec [<000000000a0b2e9e>] kthread+0x2e8/0x374 [<00000000bd668858>] ret_from_fork+0x10/0x20 ...... Free `mode` by using drm_kunit_display_mode_from_cea_vic() to fix it. Cc: stable@vger.kernel.org Fixes: 4af70f19e559 ("drm/tests: Add RGB Quantization tests") Acked-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241030023504.530425-4-ruanjinjie@huawei.com Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-10-31drm/connector: hdmi: Fix memory leak in drm_display_mode_from_cea_vic()Jinjie Ruan
modprobe drm_connector_test and then rmmod drm_connector_test, the following memory leak occurs. The `mode` allocated in drm_mode_duplicate() called by drm_display_mode_from_cea_vic() is not freed, which cause the memory leak: unreferenced object 0xffffff80cb0ee400 (size 128): comm "kunit_try_catch", pid 1948, jiffies 4294950339 hex dump (first 32 bytes): 14 44 02 00 80 07 d8 07 04 08 98 08 00 00 38 04 .D............8. 3c 04 41 04 65 04 00 00 05 00 00 00 00 00 00 00 <.A.e........... backtrace (crc 90e9585c): [<00000000ec42e3d7>] kmemleak_alloc+0x34/0x40 [<00000000d0ef055a>] __kmalloc_cache_noprof+0x26c/0x2f4 [<00000000c2062161>] drm_mode_duplicate+0x44/0x19c [<00000000f96c74aa>] drm_display_mode_from_cea_vic+0x88/0x98 [<00000000d8f2c8b4>] 0xffffffdc982a4868 [<000000005d164dbc>] kunit_try_run_case+0x13c/0x3ac [<000000006fb23398>] kunit_generic_run_threadfn_adapter+0x80/0xec [<000000006ea56ca0>] kthread+0x2e8/0x374 [<000000000676063f>] ret_from_fork+0x10/0x20 ...... Free `mode` by using drm_kunit_display_mode_from_cea_vic() to fix it. Cc: stable@vger.kernel.org Fixes: abb6f74973e2 ("drm/tests: Add HDMI TDMS character rate tests") Acked-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241030023504.530425-3-ruanjinjie@huawei.com Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-10-31drm/tests: helpers: Add helper for drm_display_mode_from_cea_vic()Jinjie Ruan
As Maxime suggested, add a new helper drm_kunit_display_mode_from_cea_vic(), it can replace the direct call of drm_display_mode_from_cea_vic(), and it will help solving the `mode` memory leaks. Acked-by: Maxime Ripard <mripard@kernel.org> Suggested-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241030023504.530425-2-ruanjinjie@huawei.com Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-10-30mm: allow set/clear page_type againYu Zhao
Some page flags (page->flags) were converted to page types (page->page_types). A recent example is PG_hugetlb. From the exclusive writer's perspective, e.g., a thread doing __folio_set_hugetlb(), there is a difference between the page flag and type APIs: the former allows the same non-atomic operation to be repeated whereas the latter does not. For example, calling __folio_set_hugetlb() twice triggers VM_BUG_ON_FOLIO(), since the second call expects the type (PG_hugetlb) not to be set previously. Using add_hugetlb_folio() as an example, it calls __folio_set_hugetlb() in the following error-handling path. And when that happens, it triggers the aforementioned VM_BUG_ON_FOLIO(). if (folio_test_hugetlb(folio)) { rc = hugetlb_vmemmap_restore_folio(h, folio); if (rc) { spin_lock_irq(&hugetlb_lock); add_hugetlb_folio(h, folio, false); ... It is possible to make hugeTLB comply with the new requirements from the page type API. However, a straightforward fix would be to just allow the same page type to be set or cleared again inside the API, to avoid any changes to its callers. Link: https://lkml.kernel.org/r/20241020042212.296781-1-yuzhao@google.com Fixes: d99e3140a4d3 ("mm: turn folio_test_hugetlb into a PageType") Signed-off-by: Yu Zhao <yuzhao@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-30nilfs2: fix potential deadlock with newly created symlinksRyusuke Konishi
Syzbot reported that page_symlink(), called by nilfs_symlink(), triggers memory reclamation involving the filesystem layer, which can result in circular lock dependencies among the reader/writer semaphore nilfs->ns_segctor_sem, s_writers percpu_rwsem (intwrite) and the fs_reclaim pseudo lock. This is because after commit 21fc61c73c39 ("don't put symlink bodies in pagecache into highmem"), the gfp flags of the page cache for symbolic links are overwritten to GFP_KERNEL via inode_nohighmem(). This is not a problem for symlinks read from the backing device, because the __GFP_FS flag is dropped after inode_nohighmem() is called. However, when a new symlink is created with nilfs_symlink(), the gfp flags remain overwritten to GFP_KERNEL. Then, memory allocation called from page_symlink() etc. triggers memory reclamation including the FS layer, which may call nilfs_evict_inode() or nilfs_dirty_inode(). And these can cause a deadlock if they are called while nilfs->ns_segctor_sem is held: Fix this issue by dropping the __GFP_FS flag from the page cache GFP flags of newly created symlinks in the same way that nilfs_new_inode() and __nilfs_read_inode() do, as a workaround until we adopt nofs allocation scope consistently or improve the locking constraints. Link: https://lkml.kernel.org/r/20241020050003.4308-1-konishi.ryusuke@gmail.com Fixes: 21fc61c73c39 ("don't put symlink bodies in pagecache into highmem") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+9ef37ac20608f4836256@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=9ef37ac20608f4836256 Tested-by: syzbot+9ef37ac20608f4836256@syzkaller.appspotmail.com Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-30Squashfs: fix variable overflow in squashfs_readpage_blockPhillip Lougher
Syzbot reports a slab out of bounds access in squashfs_readpage_block(). This is caused by an attempt to read page index 0x2000000000. This value (start_index) is stored in an integer loop variable which overflows producing a value of 0. This causes a loop which iterates over pages start_index -> end_index to iterate over 0 -> end_index, which ultimately causes an out of bounds page array access. Fix by changing variable to a loff_t, and rename to index to make it clearer it is a page index, and not a loop count. Link: https://lkml.kernel.org/r/20241020232200.837231-1-phillip@squashfs.org.uk Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reported-by: "Lai, Yi" <yi1.lai@linux.intel.com> Closes: https://lore.kernel.org/all/ZwzcnCAosIPqQ9Ie@ly-workstation/ Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-30kasan: remove vmalloc_percpu testAndrey Konovalov
Commit 1a2473f0cbc0 ("kasan: improve vmalloc tests") added the vmalloc_percpu KASAN test with the assumption that __alloc_percpu always uses vmalloc internally, which is tagged by KASAN. However, __alloc_percpu might allocate memory from the first per-CPU chunk, which is not allocated via vmalloc(). As a result, the test might fail. Remove the test until proper KASAN annotation for the per-CPU allocated are added; tracked in https://bugzilla.kernel.org/show_bug.cgi?id=215019. Link: https://lkml.kernel.org/r/20241022160706.38943-1-andrey.konovalov@linux.dev Fixes: 1a2473f0cbc0 ("kasan: improve vmalloc tests") Signed-off-by: Andrey Konovalov <andreyknvl@gmail.com> Reported-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/all/4a245fff-cc46-44d1-a5f9-fd2f1c3764ae@sifive.com/ Reported-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com> Link: https://lore.kernel.org/all/CACzwLxiWzNqPBp4C1VkaXZ2wDwvY3yZeetCi1TLGFipKW77drA@mail.gmail.com/ Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Cc: Sabyrzhan Tasbolatov <snovitoll@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-30tools/mm: -Werror fixes in page-types/slabinfoWladislav Wiebe
Commit e6d2c436ff693 ("tools/mm: allow users to provide additional cflags/ldflags") passes now CFLAGS to Makefile. With this, build systems with default -Werror enabled found: slabinfo.c:1300:25: error: ignoring return value of 'chdir' declared with attribute 'warn_unused_result' [-Werror=unused-result]                          chdir("..");                          ^~~~~~~~~~~ page-types.c:397:35: error: format '%lu' expects argument of type 'long unsigned int', but argument 2 has type 'uint64_t' {aka 'long long unsigned int'} [-Werror=format=]                          printf("%lu\t", mapcnt0);                                  ~~^     ~~~~~~~ .. Fix page-types by using PRIu64 for uint64_t prints and check in slabinfo for return code on chdir(".."). Link: https://lkml.kernel.org/r/c1ceb507-94bc-461c-934d-c19b77edd825@gmail.com Fixes: e6d2c436ff69 ("tools/mm: allow users to provide additional cflags/ldflags") Signed-off-by: Wladislav Wiebe <wladislav.kw@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Herton R. Krzesinski <herton@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-30mm, swap: avoid over reclaim of full clustersKairui Song
When running low on usable slots, cluster allocator will try to reclaim the full clusters aggressively to reclaim HAS_CACHE slots. This guarantees that as long as there are any usable slots, HAS_CACHE or not, the swap device will be usable and workload won't go OOM early. Before the cluster allocator, swap allocator fails easily if device is filled up with reclaimable HAS_CACHE slots. Which can be easily reproduced with following simple program: #include <stdio.h> #include <string.h> #include <linux/mman.h> #include <sys/mman.h> #define SIZE 8192UL * 1024UL * 1024UL int main(int argc, char **argv) { long tmp; char *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); memset(p, 0, SIZE); madvise(p, SIZE, MADV_PAGEOUT); for (unsigned long i = 0; i < SIZE; ++i) tmp += p[i]; getchar(); /* Pause */ return 0; } Setup an 8G non ramdisk swap, the first run of the program will swapout 8G ram successfully. But run same program again after the first run paused, the second run can't swapout all 8G memory as now half of the swap device is pinned by HAS_CACHE. There was a random scan in the old allocator that may reclaim part of the HAS_CACHE by luck, but it's unreliable. The new allocator's added reclaim of full clusters when device is low on usable slots. But when multiple CPUs are seeing the device is low on usable slots at the same time, they ran into a thundering herd problem. This is an observable problem on large machine with mass parallel workload, as full cluster reclaim is slower on large swap device and higher number of CPUs will also make things worse. Testing using a 128G ZRAM on a 48c96t system. When the swap device is very close to full (eg. 124G / 128G), running build linux kernel with make -j96 in a 1G memory cgroup will hung (not a softlockup though) spinning in full cluster reclaim for about ~5min before go OOM. To solve this, split the full reclaim into two parts: - Instead of do a synchronous aggressively reclaim when device is low, do only one aggressively reclaim when device is strictly full with a kworker. This still ensures in worst case the device won't be unusable because of HAS_CACHE slots. - To avoid allocation (especially higher order) suffer from HAS_CACHE filling up clusters and kworker not responsive enough, do one synchronous scan every time the free list is drained, and only scan one cluster. This is kind of similar to the random reclaim before, keeps the full clusters rotated and has a minimal latency. This should provide a fair reclaim strategy suitable for most workloads. Link: https://lkml.kernel.org/r/20241022175512.10398-1-ryncsn@gmail.com Fixes: 2cacbdfdee65 ("mm: swap: add a adaptive full cluster cache reclaim") Signed-off-by: Kairui Song <kasong@tencent.com> Cc: Barry Song <v-songbaohua@oppo.com> Cc: Chris Li <chrisl@kernel.org> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-30mm: fix PSWPIN counter for large folios swap-inBarry Song
Similar to PSWPOUT, we should count the number of base pages instead of large folios. Link: https://lkml.kernel.org/r/20241023210201.2798-1-21cnbao@gmail.com Fixes: 242d12c98174 ("mm: support large folios swap-in for sync io devices") Signed-off-by: Barry Song <v-songbaohua@oppo.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Chris Li <chrisl@kernel.org> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Kairui Song <kasong@tencent.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> Cc: Usama Arif <usamaarif642@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-30mm: avoid VM_BUG_ON when try to map an anon large folio to zero page.Zi Yan
An anonymous large folio can be split into non order-0 folios, try_to_map_unused_to_zeropage() should not VM_BUG_ON compound pages but just return false. This fixes the crash when splitting anonymous large folios to non order-0 folios. Link: https://lkml.kernel.org/r/20241023171236.1122535-1-ziy@nvidia.com Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage when splitting isolated thp") Signed-off-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Usama Arif <usamaarif642@gmail.com> Cc: Barry Song <baohua@kernel.org> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Nico Pache <npache@redhat.com> Cc: Rik van Riel <riel@surriel.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-30mm/codetag: fix null pointer check logic for ref and tagHao Ge
When we compile and load lib/slub_kunit.c,it will cause a panic. The root cause is that __kmalloc_cache_noprof was directly called instead of kmem_cache_alloc,which resulted in no alloc_tag being allocated.This caused current->alloc_tag to be null,leading to a null pointer dereference in alloc_tag_ref_set. Despite the fact that my colleague Pei Xiao will later fix the code in slub_kunit.c,we still need fix null pointer check logic for ref and tag to avoid panic caused by a null pointer dereference. Here is the log for the panic: [ 74.779373][ T2158] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020 [ 74.780130][ T2158] Mem abort info: [ 74.780406][ T2158] ESR = 0x0000000096000004 [ 74.780756][ T2158] EC = 0x25: DABT (current EL), IL = 32 bits [ 74.781225][ T2158] SET = 0, FnV = 0 [ 74.781529][ T2158] EA = 0, S1PTW = 0 [ 74.781836][ T2158] FSC = 0x04: level 0 translation fault [ 74.782288][ T2158] Data abort info: [ 74.782577][ T2158] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 74.783068][ T2158] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 74.783533][ T2158] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 74.784010][ T2158] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000105f34000 [ 74.784586][ T2158] [0000000000000020] pgd=0000000000000000, p4d=0000000000000000 [ 74.785293][ T2158] Internal error: Oops: 0000000096000004 [#1] SMP [ 74.785805][ T2158] Modules linked in: slub_kunit kunit ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_mangle 4 [ 74.790661][ T2158] CPU: 0 UID: 0 PID: 2158 Comm: kunit_try_catch Kdump: loaded Tainted: G W N 6.12.0-rc3+ #2 [ 74.791535][ T2158] Tainted: [W]=WARN, [N]=TEST [ 74.791889][ T2158] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 74.792479][ T2158] pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 74.793101][ T2158] pc : alloc_tagging_slab_alloc_hook+0x120/0x270 [ 74.793607][ T2158] lr : alloc_tagging_slab_alloc_hook+0x120/0x270 [ 74.794095][ T2158] sp : ffff800084d33cd0 [ 74.794418][ T2158] x29: ffff800084d33cd0 x28: 0000000000000000 x27: 0000000000000000 [ 74.795095][ T2158] x26: 0000000000000000 x25: 0000000000000012 x24: ffff80007b30e314 [ 74.795822][ T2158] x23: ffff000390ff6f10 x22: 0000000000000000 x21: 0000000000000088 [ 74.796555][ T2158] x20: ffff000390285840 x19: fffffd7fc3ef7830 x18: ffffffffffffffff [ 74.797283][ T2158] x17: ffff8000800e63b4 x16: ffff80007b33afc4 x15: ffff800081654c00 [ 74.798011][ T2158] x14: 0000000000000000 x13: 205d383531325420 x12: 5b5d383734363537 [ 74.798744][ T2158] x11: ffff800084d337e0 x10: 000000000000005d x9 : 00000000ffffffd0 [ 74.799476][ T2158] x8 : 7f7f7f7f7f7f7f7f x7 : ffff80008219d188 x6 : c0000000ffff7fff [ 74.800206][ T2158] x5 : ffff0003fdbc9208 x4 : ffff800081edd188 x3 : 0000000000000001 [ 74.800932][ T2158] x2 : 0beaa6dee1ac5a00 x1 : 0beaa6dee1ac5a00 x0 : ffff80037c2cb000 [ 74.801656][ T2158] Call trace: [ 74.801954][ T2158] alloc_tagging_slab_alloc_hook+0x120/0x270 [ 74.802494][ T2158] __kmalloc_cache_noprof+0x148/0x33c [ 74.802976][ T2158] test_kmalloc_redzone_access+0x4c/0x104 [slub_kunit] [ 74.803607][ T2158] kunit_try_run_case+0x70/0x17c [kunit] [ 74.804124][ T2158] kunit_generic_run_threadfn_adapter+0x2c/0x4c [kunit] [ 74.804768][ T2158] kthread+0x10c/0x118 [ 74.805141][ T2158] ret_from_fork+0x10/0x20 [ 74.805540][ T2158] Code: b9400a80 11000400 b9000a80 97ffd858 (f94012d3) [ 74.806176][ T2158] SMP: stopping secondary CPUs [ 74.808130][ T2158] Starting crashdump kernel... Link: https://lkml.kernel.org/r/20241020070819.307944-1-hao.ge@linux.dev Fixes: e0a955bf7f61 ("mm/codetag: add pgalloc_tag_copy()") Signed-off-by: Hao Ge <gehao@kylinos.cn> Acked-by: Suren Baghdasaryan <surenb@google.com> Suggested-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Yu Zhao <yuzhao@google.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-30mm/gup: stop leaking pinned pages in low memory conditionsJohn Hubbard
If a driver tries to call any of the pin_user_pages*(FOLL_LONGTERM) family of functions, and requests "too many" pages, then the call will erroneously leave pages pinned. This is visible in user space as an actual memory leak. Repro is trivial: just make enough pin_user_pages(FOLL_LONGTERM) calls to exhaust memory. The root cause of the problem is this sequence, within __gup_longterm_locked(): __get_user_pages_locked() rc = check_and_migrate_movable_pages() ...which gets retried in a loop. The loop error handling is incomplete, clearly due to a somewhat unusual and complicated tri-state error API. But anyway, if -ENOMEM, or in fact, any unexpected error is returned from check_and_migrate_movable_pages(), then __gup_longterm_locked() happily returns the error, while leaving the pages pinned. In the failed case, which is an app that requests (via a device driver) 30720000000 bytes to be pinned, and then exits, I see this: $ grep foll /proc/vmstat nr_foll_pin_acquired 7502048 nr_foll_pin_released 2048 And after applying this patch, it returns to balanced pins: $ grep foll /proc/vmstat nr_foll_pin_acquired 7502048 nr_foll_pin_released 7502048 Note that the child routine, check_and_migrate_movable_folios(), avoids this problem, by unpinning any folios in the **folios argument, before returning an error. Fix this by making check_and_migrate_movable_pages() behave in exactly the same way as check_and_migrate_movable_folios(): unpin all pages in **pages, before returning an error. Also, documentation was an aggravating factor, so: 1) Consolidate the documentation for these two routines, now that they have identical external behavior. 2) Rewrite the consolidated documentation: a) Clearly list the three return code cases, and what happens in each case. b) Mention that one of the cases unpins the pages or folios, before returning an error code. Link: https://lkml.kernel.org/r/20241018223411.310331-1-jhubbard@nvidia.com Fixes: 24a95998e9ba ("mm/gup.c: simplify and fix check_and_migrate_movable_pages() return codes") Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Suggested-by: David Hildenbrand <david@redhat.com> Cc: Shigeru Yoshida <syoshida@redhat.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-30net: ethernet: mtk_wed: fix path of MT7988 WO firmwareDaniel Golle
linux-firmware commit 808cba84 ("mtk_wed: add firmware for mt7988 Wireless Ethernet Dispatcher") added mt7988_wo_{0,1}.bin in the 'mediatek/mt7988' directory while driver current expects the files in the 'mediatek' directory. Change path in the driver header now that the firmware has been added. Fixes: e2f64db13aa1 ("net: ethernet: mtk_wed: introduce WED support for MT7988") Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://patch.msgid.link/Zxz0GWTR5X5LdWPe@pidgin.makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30Merge branch 'mlxsw-fixes'Jakub Kicinski
Petr Machata says: ==================== mlxsw: Fixes In this patchset: - Tx header should be pushed for each packet which is transmitted via Spectrum ASICs. Patch #1 adds a missing call to skb_cow_head() to make sure that there is both enough room to push the Tx header and that the SKB header is not cloned and can be modified. - Commit b5b60bb491b2 ("mlxsw: pci: Use page pool for Rx buffers allocation") converted mlxsw to use page pool for Rx buffers allocation. Sync for CPU and for device should be done for Rx pages. In patches #2 and #3, add the missing calls to sync pages for, respectively, CPU and the device. - Patch #4 then fixes a bug to IPv6 GRE forwarding offload. Patch #5 adds a generic forwarding test that fails with mlxsw ports prior to the fix. ==================== Link: https://patch.msgid.link/cover.1729866134.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30selftests: forwarding: Add IPv6 GRE remote change testsIdo Schimmel
Test that after changing the remote address of an ip6gre net device traffic is forwarded as expected. Test with both flat and hierarchical topologies and with and without an input / output keys. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/02b05246d2cdada0cf2fccffc0faa8a424d0f51b.1729866134.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30mlxsw: spectrum_ipip: Fix memory leak when changing remote IPv6 addressIdo Schimmel
The device stores IPv6 addresses that are used for encapsulation in linear memory that is managed by the driver. Changing the remote address of an ip6gre net device never worked properly, but since cited commit the following reproducer [1] would result in a warning [2] and a memory leak [3]. The problem is that the new remote address is never added by the driver to its hash table (and therefore the device) and the old address is never removed from it. Fix by programming the new address when the configuration of the ip6gre net device changes and removing the old one. If the address did not change, then the above would result in increasing the reference count of the address and then decreasing it. [1] # ip link add name bla up type ip6gre local 2001:db8:1::1 remote 2001:db8:2::1 tos inherit ttl inherit # ip link set dev bla type ip6gre remote 2001:db8:3::1 # ip link del dev bla # devlink dev reload pci/0000:01:00.0 [2] WARNING: CPU: 0 PID: 1682 at drivers/net/ethernet/mellanox/mlxsw/spectrum.c:3002 mlxsw_sp_ipv6_addr_put+0x140/0x1d0 Modules linked in: CPU: 0 UID: 0 PID: 1682 Comm: ip Not tainted 6.12.0-rc3-custom-g86b5b55bc835 #151 Hardware name: Nvidia SN5600/VMOD0013, BIOS 5.13 05/31/2023 RIP: 0010:mlxsw_sp_ipv6_addr_put+0x140/0x1d0 [...] Call Trace: <TASK> mlxsw_sp_router_netdevice_event+0x55f/0x1240 notifier_call_chain+0x5a/0xd0 call_netdevice_notifiers_info+0x39/0x90 unregister_netdevice_many_notify+0x63e/0x9d0 rtnl_dellink+0x16b/0x3a0 rtnetlink_rcv_msg+0x142/0x3f0 netlink_rcv_skb+0x50/0x100 netlink_unicast+0x242/0x390 netlink_sendmsg+0x1de/0x420 ____sys_sendmsg+0x2bd/0x320 ___sys_sendmsg+0x9a/0xe0 __sys_sendmsg+0x7a/0xd0 do_syscall_64+0x9e/0x1a0 entry_SYSCALL_64_after_hwframe+0x77/0x7f [3] unreferenced object 0xffff898081f597a0 (size 32): comm "ip", pid 1626, jiffies 4294719324 hex dump (first 32 bytes): 20 01 0d b8 00 02 00 00 00 00 00 00 00 00 00 01 ............... 21 49 61 83 80 89 ff ff 00 00 00 00 01 00 00 00 !Ia............. backtrace (crc fd9be911): [<00000000df89c55d>] __kmalloc_cache_noprof+0x1da/0x260 [<00000000ff2a1ddb>] mlxsw_sp_ipv6_addr_kvdl_index_get+0x281/0x340 [<000000009ddd445d>] mlxsw_sp_router_netdevice_event+0x47b/0x1240 [<00000000743e7757>] notifier_call_chain+0x5a/0xd0 [<000000007c7b9e13>] call_netdevice_notifiers_info+0x39/0x90 [<000000002509645d>] register_netdevice+0x5f7/0x7a0 [<00000000c2e7d2a9>] ip6gre_newlink_common.isra.0+0x65/0x130 [<0000000087cd6d8d>] ip6gre_newlink+0x72/0x120 [<000000004df7c7cc>] rtnl_newlink+0x471/0xa20 [<0000000057ed632a>] rtnetlink_rcv_msg+0x142/0x3f0 [<0000000032e0d5b5>] netlink_rcv_skb+0x50/0x100 [<00000000908bca63>] netlink_unicast+0x242/0x390 [<00000000cdbe1c87>] netlink_sendmsg+0x1de/0x420 [<0000000011db153e>] ____sys_sendmsg+0x2bd/0x320 [<000000003b6d53eb>] ___sys_sendmsg+0x9a/0xe0 [<00000000cae27c62>] __sys_sendmsg+0x7a/0xd0 Fixes: cf42911523e0 ("mlxsw: spectrum_ipip: Use common hash table for IPv6 address mapping") Reported-by: Maksym Yaremchuk <maksymy@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/e91012edc5a6cb9df37b78fd377f669381facfcb.1729866134.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30mlxsw: pci: Sync Rx buffers for deviceAmit Cohen
Non-coherent architectures, like ARM, may require invalidating caches before the device can use the DMA mapped memory, which means that before posting pages to device, drivers should sync the memory for device. Sync for device can be configured as page pool responsibility. Set the relevant flag and define max_len for sync. Cc: Jiri Pirko <jiri@resnulli.us> Fixes: b5b60bb491b2 ("mlxsw: pci: Use page pool for Rx buffers allocation") Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/92e01f05c4f506a4f0a9b39c10175dcc01994910.1729866134.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30mlxsw: pci: Sync Rx buffers for CPUAmit Cohen
When Rx packet is received, drivers should sync the pages for CPU, to ensure the CPU reads the data written by the device and not stale data from its cache. Add the missing sync call in Rx path, sync the actual length of data for each fragment. Cc: Jiri Pirko <jiri@resnulli.us> Fixes: b5b60bb491b2 ("mlxsw: pci: Use page pool for Rx buffers allocation") Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/461486fac91755ca4e04c2068c102250026dcd0b.1729866134.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30mlxsw: spectrum_ptp: Add missing verification before pushing Tx headerAmit Cohen
Tx header should be pushed for each packet which is transmitted via Spectrum ASICs. The cited commit moved the call to skb_cow_head() from mlxsw_sp_port_xmit() to functions which handle Tx header. In case that mlxsw_sp->ptp_ops->txhdr_construct() is used to handle Tx header, and txhdr_construct() is mlxsw_sp_ptp_txhdr_construct(), there is no call for skb_cow_head() before pushing Tx header size to SKB. This flow is relevant for Spectrum-1 and Spectrum-4, for PTP packets. Add the missing call to skb_cow_head() to make sure that there is both enough room to push the Tx header and that the SKB header is not cloned and can be modified. An additional set will be sent to net-next to centralize the handling of the Tx header by pushing it to every packet just before transmission. Cc: Richard Cochran <richardcochran@gmail.com> Fixes: 24157bc69f45 ("mlxsw: Send PTP packets as data packets to overcome a limitation") Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/5145780b07ebbb5d3b3570f311254a3a2d554a44.1729866134.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30net: skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extensionBenoît Monin
As documented in skbuff.h, devices with NETIF_F_IPV6_CSUM capability can only checksum TCP and UDP over IPv6 if the IP header does not contains extension. This is enforced for UDP packets emitted from user-space to an IPv6 address as they go through ip6_make_skb(), which calls __ip6_append_data() where a check is done on the header size before setting CHECKSUM_PARTIAL. But the introduction of UDP encapsulation with fou6 added a code-path where it is possible to get an skb with a partial UDP checksum and an IPv6 header with extension: * fou6 adds a UDP header with a partial checksum if the inner packet does not contains a valid checksum. * ip6_tunnel adds an IPv6 header with a destination option extension header if encap_limit is non-zero (the default value is 4). The thread linked below describes in more details how to reproduce the problem with GRE-in-UDP tunnel. Add a check on the network header size in skb_csum_hwoffload_help() to make sure no IPv6 packet with extension header is handed to a network device with NETIF_F_IPV6_CSUM capability. Link: https://lore.kernel.org/netdev/26548921.1r3eYUQgxm@benoit.monin/T/#u Fixes: aa3463d65e7b ("fou: Add encap ops for IPv6 tunnels") Signed-off-by: Benoît Monin <benoit.monin@gmx.fr> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/5fbeecfc311ea182aa1d1c771725ab8b4cac515e.1729778144.git.benoit.monin@gmx.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-31cfi: tweak llvm version for HAVE_CFI_ICALL_NORMALIZE_INTEGERSAlice Ryhl
The llvm fix [1] did not make it for 19.0.0, but ended up getting backported to llvm 19.1.3 [2]. Thus, fix the version requirement to correctly specify which versions have the bug. Link: https://github.com/llvm/llvm-project/pull/104826 [1] Link: https://github.com/llvm/llvm-project/pull/113938 [2] Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202410281414.c351044e-oliver.sang@intel.com Fixes: 8b8ca9c25fe6 ("cfi: fix conditions for HAVE_CFI_ICALL_NORMALIZE_INTEGERS") Signed-off-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20241030-cfi-icall-1913-v1-1-ab8a26e13733@google.com Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-10-30sched/ext: Fix scx vs sched_delayedPeter Zijlstra
Commit 98442f0ccd82 ("sched: Fix delayed_dequeue vs switched_from_fair()") forgot about scx :/ Fixes: 98442f0ccd82 ("sched: Fix delayed_dequeue vs switched_from_fair()") Reported-by: Tejun Heo <tj@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lkml.kernel.org/r/20241030104934.GK14555@noisy.programming.kicks-ass.net
2024-10-30x86/uaccess: Avoid barrier_nospec() in 64-bit copy_from_user()Linus Torvalds
The barrier_nospec() in 64-bit copy_from_user() is slow. Instead use pointer masking to force the user pointer to all 1's for an invalid address. The kernel test robot reports a 2.6% improvement in the per_thread_ops benchmark [1]. This is a variation on a patch originally by Josh Poimboeuf [2]. Link: https://lore.kernel.org/202410281344.d02c72a2-oliver.sang@intel.com [1] Link: https://lore.kernel.org/5b887fe4c580214900e21f6c61095adf9a142735.1730166635.git.jpoimboe@kernel.org [2] Tested-and-reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-10-30Merge tag 'perf-tools-fixes-for-v6.12-2-2024-10-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools fixes from Arnaldo Carvalho de Melo: - Update more header copies with the kernel sources, including const.h, msr-index.h, arm64's cputype.h, kvm's, bits.h and unaligned.h - The return from 'write' isn't a pid, fix cut'n'paste error in 'perf trace' - Fix up the python binding build on architectures without HAVE_KVM_STAT_SUPPORT - Add some more bounds checks to augmented_raw_syscalls.bpf.c (used to collect syscall pointer arguments in 'perf trace') to make the resulting bytecode to pass the kernel BPF verifier, allowing us to go back accepting clang 12.0.1 as the minimum version required for compiling BPF sources - Add __NR_capget for x86 to fix a regression on running perf + intel PT (hw tracing) as non-root setting up the capabilities as described in https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html - Fix missing syscalltbl in non-explicitly listed architectures, noticed on ARM 32-bit, that still needs a .tbl generator for the syscall id<->name tables, should be added for v6.13 - Handle 'perf test' failure when handling broken DWARF for ASM files * tag 'perf-tools-fixes-for-v6.12-2-2024-10-30' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: perf cap: Add __NR_capget to arch/x86 unistd tools headers: Update the linux/unaligned.h copy with the kernel sources tools headers arm64: Sync arm64's cputype.h with the kernel sources tools headers: Synchronize {uapi/}linux/bits.h with the kernel sources tools arch x86: Sync the msr-index.h copy with the kernel sources perf python: Fix up the build on architectures without HAVE_KVM_STAT_SUPPORT perf test: Handle perftool-testsuite_probe failure due to broken DWARF tools headers UAPI: Sync kvm headers with the kernel sources perf trace: Fix non-listed archs in the syscalltbl routines perf build: Change the clang check back to 12.0.1 perf trace augmented_raw_syscalls: Add more checks to pass the verifier perf trace augmented_raw_syscalls: Add extra array index bounds checking to satisfy some BPF verifiers perf trace: The return from 'write' isn't a pid tools headers UAPI: Sync linux/const.h with the kernel headers
2024-10-30rpcrdma: Always release the rpcrdma_device's xa_arrayChuck Lever
Dai pointed out that the xa_init_flags() in rpcrdma_add_one() needs to have a matching xa_destroy() in rpcrdma_remove_one() to release underlying memory that the xarray might have accrued during operation. Reported-by: Dai Ngo <dai.ngo@oracle.com> Fixes: 7e86845a0346 ("rpcrdma: Implement generic device removal") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-10-30Merge branch 'fixes-for-bits-iterator'Alexei Starovoitov
Hou Tao says: ==================== The patch set fixes several issues in bits iterator. Patch #1 fixes the kmemleak problem of bits iterator. Patch #2~#3 fix the overflow problem of nr_bits. Patch #4 fixes the potential stack corruption when bits iterator is used on 32-bit host. Patch #5 adds more test cases for bits iterator. Please see the individual patches for more details. And comments are always welcome. --- v4: * patch #1: add ack from Yafang * patch #3: revert code-churn like changes: (1) compute nr_bytes and nr_bits before the check of nr_words. (2) use nr_bits == 64 to check for single u64, preventing build warning on 32-bit hosts. * patch #4: use "BITS_PER_LONG == 32" instead of "!defined(CONFIG_64BIT)" v3: https://lore.kernel.org/bpf/20241025013233.804027-1-houtao@huaweicloud.com/T/#t * split the bits-iterator related patches from "Misc fixes for bpf" patch set * patch #1: use "!nr_bits || bits >= nr_bits" to stop the iteration * patch #2: add a new helper for the overflow problem * patch #3: decrease the limitation from 512 to 511 and check whether nr_bytes is too large for bpf memory allocator explicitly * patch #5: add two more test cases for bit iterator v2: http://lore.kernel.org/bpf/d49fa2f4-f743-c763-7579-c3cab4dd88cb@huaweicloud.com ==================== Link: https://lore.kernel.org/r/20241030100516.3633640-1-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-30selftests/bpf: Add three test cases for bits_iterHou Tao
Add more test cases for bits iterator: (1) huge word test Verify the multiplication overflow of nr_bits in bits_iter. Without the overflow check, when nr_words is 67108865, nr_bits becomes 64, causing bpf_probe_read_kernel_common() to corrupt the stack. (2) max word test Verify correct handling of maximum nr_words value (511). (3) bad word test Verify early termination of bits iteration when bits iterator initialization fails. Also rename bits_nomem to bits_too_big to better reflect its purpose. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241030100516.3633640-6-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-30bpf: Use __u64 to save the bits in bits iteratorHou Tao
On 32-bit hosts (e.g., arm32), when a bpf program passes a u64 to bpf_iter_bits_new(), bpf_iter_bits_new() will use bits_copy to store the content of the u64. However, bits_copy is only 4 bytes, leading to stack corruption. The straightforward solution would be to replace u64 with unsigned long in bpf_iter_bits_new(). However, this introduces confusion and problems for 32-bit hosts because the size of ulong in bpf program is 8 bytes, but it is treated as 4-bytes after passed to bpf_iter_bits_new(). Fix it by changing the type of both bits and bit_count from unsigned long to u64. However, the change is not enough. The main reason is that bpf_iter_bits_next() uses find_next_bit() to find the next bit and the pointer passed to find_next_bit() is an unsigned long pointer instead of a u64 pointer. For 32-bit little-endian host, it is fine but it is not the case for 32-bit big-endian host. Because under 32-bit big-endian host, the first iterated unsigned long will be the bits 32-63 of the u64 instead of the expected bits 0-31. Therefore, in addition to changing the type, swap the two unsigned longs within the u64 for 32-bit big-endian host. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241030100516.3633640-5-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-30bpf: Check the validity of nr_words in bpf_iter_bits_new()Hou Tao
Check the validity of nr_words in bpf_iter_bits_new(). Without this check, when multiplication overflow occurs for nr_bits (e.g., when nr_words = 0x0400-0001, nr_bits becomes 64), stack corruption may occur due to bpf_probe_read_kernel_common(..., nr_bytes = 0x2000-0008). Fix it by limiting the maximum value of nr_words to 511. The value is derived from the current implementation of BPF memory allocator. To ensure compatibility if the BPF memory allocator's size limitation changes in the future, use the helper bpf_mem_alloc_check_size() to check whether nr_bytes is too larger. And return -E2BIG instead of -ENOMEM for oversized nr_bytes. Fixes: 4665415975b0 ("bpf: Add bits iterator") Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241030100516.3633640-4-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-30bpf: Add bpf_mem_alloc_check_size() helperHou Tao
Introduce bpf_mem_alloc_check_size() to check whether the allocation size exceeds the limitation for the kmalloc-equivalent allocator. The upper limit for percpu allocation is LLIST_NODE_SZ bytes larger than non-percpu allocation, so a percpu argument is added to the helper. The helper will be used in the following patch to check whether the size parameter passed to bpf_mem_alloc() is too big. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241030100516.3633640-3-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-30bpf: Free dynamically allocated bits in bpf_iter_bits_destroy()Hou Tao
bpf_iter_bits_destroy() uses "kit->nr_bits <= 64" to check whether the bits are dynamically allocated. However, the check is incorrect and may cause a kmemleak as shown below: unreferenced object 0xffff88812628c8c0 (size 32): comm "swapper/0", pid 1, jiffies 4294727320 hex dump (first 32 bytes): b0 c1 55 f5 81 88 ff ff f0 f0 f0 f0 f0 f0 f0 f0 ..U........... f0 f0 f0 f0 f0 f0 f0 f0 00 00 00 00 00 00 00 00 .............. backtrace (crc 781e32cc): [<00000000c452b4ab>] kmemleak_alloc+0x4b/0x80 [<0000000004e09f80>] __kmalloc_node_noprof+0x480/0x5c0 [<00000000597124d6>] __alloc.isra.0+0x89/0xb0 [<000000004ebfffcd>] alloc_bulk+0x2af/0x720 [<00000000d9c10145>] prefill_mem_cache+0x7f/0xb0 [<00000000ff9738ff>] bpf_mem_alloc_init+0x3e2/0x610 [<000000008b616eac>] bpf_global_ma_init+0x19/0x30 [<00000000fc473efc>] do_one_initcall+0xd3/0x3c0 [<00000000ec81498c>] kernel_init_freeable+0x66a/0x940 [<00000000b119f72f>] kernel_init+0x20/0x160 [<00000000f11ac9a7>] ret_from_fork+0x3c/0x70 [<0000000004671da4>] ret_from_fork_asm+0x1a/0x30 That is because nr_bits will be set as zero in bpf_iter_bits_next() after all bits have been iterated. Fix the issue by setting kit->bit to kit->nr_bits instead of setting kit->nr_bits to zero when the iteration completes in bpf_iter_bits_next(). In addition, use "!nr_bits || bits >= nr_bits" to check whether the iteration is complete and still use "nr_bits > 64" to indicate whether bits are dynamically allocated. The "!nr_bits" check is necessary because bpf_iter_bits_new() may fail before setting kit->nr_bits, and this condition will stop the iteration early instead of accessing the zeroed or freed kit->bits. Considering the initial value of kit->bits is -1 and the type of kit->nr_bits is unsigned int, change the type of kit->nr_bits to int. The potential overflow problem will be handled in the following patch. Fixes: 4665415975b0 ("bpf: Add bits iterator") Acked-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241030100516.3633640-2-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-30Bluetooth: hci: fix null-ptr-deref in hci_read_supported_codecsSungwoo Kim
Fix __hci_cmd_sync_sk() to return not NULL for unknown opcodes. __hci_cmd_sync_sk() returns NULL if a command returns a status event. However, it also returns NULL where an opcode doesn't exist in the hci_cc table because hci_cmd_complete_evt() assumes status = skb->data[0] for unknown opcodes. This leads to null-ptr-deref in cmd_sync for HCI_OP_READ_LOCAL_CODECS as there is no hci_cc for HCI_OP_READ_LOCAL_CODECS, which always assumes status = skb->data[0]. KASAN: null-ptr-deref in range [0x0000000000000070-0x0000000000000077] CPU: 1 PID: 2000 Comm: kworker/u9:5 Not tainted 6.9.0-ga6bcb805883c-dirty #10 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Workqueue: hci7 hci_power_on RIP: 0010:hci_read_supported_codecs+0xb9/0x870 net/bluetooth/hci_codec.c:138 Code: 08 48 89 ef e8 b8 c1 8f fd 48 8b 75 00 e9 96 00 00 00 49 89 c6 48 ba 00 00 00 00 00 fc ff df 4c 8d 60 70 4c 89 e3 48 c1 eb 03 <0f> b6 04 13 84 c0 0f 85 82 06 00 00 41 83 3c 24 02 77 0a e8 bf 78 RSP: 0018:ffff888120bafac8 EFLAGS: 00010212 RAX: 0000000000000000 RBX: 000000000000000e RCX: ffff8881173f0040 RDX: dffffc0000000000 RSI: ffffffffa58496c0 RDI: ffff88810b9ad1e4 RBP: ffff88810b9ac000 R08: ffffffffa77882a7 R09: 1ffffffff4ef1054 R10: dffffc0000000000 R11: fffffbfff4ef1055 R12: 0000000000000070 R13: 0000000000000000 R14: 0000000000000000 R15: ffff88810b9ac000 FS: 0000000000000000(0000) GS:ffff8881f6c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f6ddaa3439e CR3: 0000000139764003 CR4: 0000000000770ef0 PKRU: 55555554 Call Trace: <TASK> hci_read_local_codecs_sync net/bluetooth/hci_sync.c:4546 [inline] hci_init_stage_sync net/bluetooth/hci_sync.c:3441 [inline] hci_init4_sync net/bluetooth/hci_sync.c:4706 [inline] hci_init_sync net/bluetooth/hci_sync.c:4742 [inline] hci_dev_init_sync net/bluetooth/hci_sync.c:4912 [inline] hci_dev_open_sync+0x19a9/0x2d30 net/bluetooth/hci_sync.c:4994 hci_dev_do_open net/bluetooth/hci_core.c:483 [inline] hci_power_on+0x11e/0x560 net/bluetooth/hci_core.c:1015 process_one_work kernel/workqueue.c:3267 [inline] process_scheduled_works+0x8ef/0x14f0 kernel/workqueue.c:3348 worker_thread+0x91f/0xe50 kernel/workqueue.c:3429 kthread+0x2cb/0x360 kernel/kthread.c:388 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Fixes: abfeea476c68 ("Bluetooth: hci_sync: Convert MGMT_OP_START_DISCOVERY") Signed-off-by: Sungwoo Kim <iam@sung-woo.kim> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-10-30Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Two small fixes, both in drivers (ufs and scsi_debug)" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ufs: core: Fix another deadlock during RTC update scsi: scsi_debug: Fix do_device_access() handling of unexpected SG copy length
2024-10-30NFSD: Never decrement pending_async_copies on errorChuck Lever
The error flow in nfsd4_copy() calls cleanup_async_copy(), which already decrements nn->pending_async_copies. Reported-by: Olga Kornievskaia <okorniev@redhat.com> Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-10-30drm/panthor: Report group as timedout when we fail to properly suspendBoris Brezillon
If we don't do that, the group is considered usable by userspace, but all further GROUP_SUBMIT will fail with -EINVAL. Changes in v3: - Add R-bs Changes in v2: - New patch Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block") Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241029152912.270346-3-boris.brezillon@collabora.com
2024-10-30drm/panthor: Fail job creation when the group is deadBoris Brezillon
Userspace can use GROUP_SUBMIT errors as a trigger to check the group state and recreate the group if it became unusable. Make sure we report an error when the group became unusable. Changes in v3: - None Changes in v2: - Add R-bs Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block") Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241029152912.270346-2-boris.brezillon@collabora.com
2024-10-30drm/panthor: Fix firmware initialization on systems with a page size > 4kBoris Brezillon
The system and GPU MMU page size might differ, which becomes a problem for FW sections that need to be mapped at explicit addresses since our PAGE_SIZE alignment might cover a VA range that's expected to be used for another section. Make sure we never map more than we need. Changes in v3: - Add R-bs Changes in v2: - Plan for per-VM page sizes so the MCU VM and user VM can have different pages sizes Fixes: 2718d91816ee ("drm/panthor: Add the FW logical block") Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241030150231.768949-1-boris.brezillon@collabora.com
2024-10-30Documentation/gpu: Fix Panthor documentation build warningsAdrián Larumbe
Fix Panthor documentation build errors uncovered by the makedocs target when building with extra warnings enabled. Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com> Fixes: 6a797bdfde77 ("drm/panthor: add sysfs knob for enabling job profiling") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241009214346.2308917-1-adrian.larumbe@collabora.com