summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-11-07bcachefs: Fix bch_member.btree_bitmap_shift validationKent Overstreet
Needs to match the assert later when we resize... Reported-by: syzbot+e8eff054face85d7ea41@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-11-07bcachefs: bch2_btree_write_buffer_flush_going_ro()Kent Overstreet
The write buffer needs to be specifically flushed when going RO: keys in the journal that haven't yet been moved to the write buffer don't have a journal pin yet. This fixes numerous syzbot bugs, all with symptoms of still doing writes after we've got RO. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-11-07Merge branch 'net-phy-remove-genphy_config_eee_advert'Jakub Kicinski
Heiner Kallweit says: ==================== net: phy: remove genphy_config_eee_advert This series removes genphy_config_eee_advert(). Note: The change to bcm_config_lre_aneg() is compile-tested only as I don't have supported hardware. ==================== Link: https://patch.msgid.link/69d22b31-57d1-4b01-bfde-0c6a1df1e310@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-07net: phy: remove genphy_config_eee_advertHeiner Kallweit
bcm_config_lre_aneg() doesn't use genphy_config_eee_advert() any longer. As this was the only user, we can remove genphy_config_eee_advert() now. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/37da7f3e-b883-4c07-9881-b8c0516822b7@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-07net: phy: broadcom: use genphy_c45_an_config_eee_aneg in bcm_config_lre_anegHeiner Kallweit
bcm_config_lre_aneg() is the only user of genphy_config_eee_advert(), therefore use genphy_c45_an_config_eee_aneg() instead. The resulting functionality is equivalent, and bcm_config_lre_aneg() follows the structure of __genphy_config_aneg(). In a follow-up step genphy_config_eee_advert() can be removed. Note: We preserve the current behavior to ignore errors. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/6e5cd4ab-28bb-4d82-b449-fec85f3d1e8a@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-07net: phy: export genphy_c45_an_config_eee_anegHeiner Kallweit
We'll use this function in bcm_config_lre_aneg(), therefore export it. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/02bd7c39-7413-4433-bafc-a276089bd292@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-07net: phy: make genphy_c45_write_eee_adv() staticHeiner Kallweit
genphy_c45_write_eee_adv() isn't used outside phy-c45.c, so make it static. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/d23bd784-44e6-4a15-af3a-b37379156521@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-08Merge tag 'amd-drm-fixes-6.12-2024-11-07' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.12-2024-11-07: amdgpu: - Brightness fix - DC vbios parsing fix - ACPI fix - SMU 14.x fix - Power workload profile fix - GC partitioning fix - Debugfs fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241107182722.14147-1-alexander.deucher@amd.com
2024-11-07Merge tag 'spi-fix-v6.12-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fix from Mark Brown: "An update for the maintainers of the AMD driver following some job changes there" * tag 'spi-fix-v6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: MAINTAINERS: update AMD SPI maintainer
2024-11-07Merge tag 'regulator-fix-v6.12-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "A couple of small fixes for drivers, nothing particularly remarkable" * tag 'regulator-fix-v6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: rk808: Add apply_bit for BUCK3 on RK809 regulator: rtq2208: Fix uninitialized use of regulator_config
2024-11-07mailmap: add entry for Thorsten BlumThorsten Blum
Map my previously used email address to my @linux.dev address. Link: https://lkml.kernel.org/r/20241103234411.2522-2-thorsten.blum@linux.dev Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Cc: Alex Elder <elder@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: Geliang Tang <geliang@kernel.org> Cc: Kees Cook <kees@kernel.org> Cc: Mathieu Othacehe <m.othacehe@gmail.com> Cc: Matthieu Baerts (NGI0) <matttbe@kernel.org> Cc: Matt Ranostay <matt@ranostay.sg> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Neeraj Upadhyay <neeraj.upadhyay@kernel.org> Cc: Quentin Monnet <qmo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07ocfs2: remove entry once instead of null-ptr-dereference in ocfs2_xa_remove()Andrew Kanner
Syzkaller is able to provoke null-ptr-dereference in ocfs2_xa_remove(): [ 57.319872] (a.out,1161,7):ocfs2_xa_remove:2028 ERROR: status = -12 [ 57.320420] (a.out,1161,7):ocfs2_xa_cleanup_value_truncate:1999 ERROR: Partial truncate while removing xattr overlay.upper. Leaking 1 clusters and removing the entry [ 57.321727] BUG: kernel NULL pointer dereference, address: 0000000000000004 [...] [ 57.325727] RIP: 0010:ocfs2_xa_block_wipe_namevalue+0x2a/0xc0 [...] [ 57.331328] Call Trace: [ 57.331477] <TASK> [...] [ 57.333511] ? do_user_addr_fault+0x3e5/0x740 [ 57.333778] ? exc_page_fault+0x70/0x170 [ 57.334016] ? asm_exc_page_fault+0x2b/0x30 [ 57.334263] ? __pfx_ocfs2_xa_block_wipe_namevalue+0x10/0x10 [ 57.334596] ? ocfs2_xa_block_wipe_namevalue+0x2a/0xc0 [ 57.334913] ocfs2_xa_remove_entry+0x23/0xc0 [ 57.335164] ocfs2_xa_set+0x704/0xcf0 [ 57.335381] ? _raw_spin_unlock+0x1a/0x40 [ 57.335620] ? ocfs2_inode_cache_unlock+0x16/0x20 [ 57.335915] ? trace_preempt_on+0x1e/0x70 [ 57.336153] ? start_this_handle+0x16c/0x500 [ 57.336410] ? preempt_count_sub+0x50/0x80 [ 57.336656] ? _raw_read_unlock+0x20/0x40 [ 57.336906] ? start_this_handle+0x16c/0x500 [ 57.337162] ocfs2_xattr_block_set+0xa6/0x1e0 [ 57.337424] __ocfs2_xattr_set_handle+0x1fd/0x5d0 [ 57.337706] ? ocfs2_start_trans+0x13d/0x290 [ 57.337971] ocfs2_xattr_set+0xb13/0xfb0 [ 57.338207] ? dput+0x46/0x1c0 [ 57.338393] ocfs2_xattr_trusted_set+0x28/0x30 [ 57.338665] ? ocfs2_xattr_trusted_set+0x28/0x30 [ 57.338948] __vfs_removexattr+0x92/0xc0 [ 57.339182] __vfs_removexattr_locked+0xd5/0x190 [ 57.339456] ? preempt_count_sub+0x50/0x80 [ 57.339705] vfs_removexattr+0x5f/0x100 [...] Reproducer uses faultinject facility to fail ocfs2_xa_remove() -> ocfs2_xa_value_truncate() with -ENOMEM. In this case the comment mentions that we can return 0 if ocfs2_xa_cleanup_value_truncate() is going to wipe the entry anyway. But the following 'rc' check is wrong and execution flow do 'ocfs2_xa_remove_entry(loc);' twice: * 1st: in ocfs2_xa_cleanup_value_truncate(); * 2nd: returning back to ocfs2_xa_remove() instead of going to 'out'. Fix this by skipping the 2nd removal of the same entry and making syzkaller repro happy. Link: https://lkml.kernel.org/r/20241103193845.2940988-1-andrew.kanner@gmail.com Fixes: 399ff3a748cf ("ocfs2: Handle errors while setting external xattr values.") Signed-off-by: Andrew Kanner <andrew.kanner@gmail.com> Reported-by: syzbot+386ce9e60fa1b18aac5b@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/671e13ab.050a0220.2b8c0f.01d0.GAE@google.com/T/ Tested-by: syzbot+386ce9e60fa1b18aac5b@syzkaller.appspotmail.com Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07signal: restore the override_rlimit logicRoman Gushchin
Prior to commit d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts") UCOUNT_RLIMIT_SIGPENDING rlimit was not enforced for a class of signals. However now it's enforced unconditionally, even if override_rlimit is set. This behavior change caused production issues. For example, if the limit is reached and a process receives a SIGSEGV signal, sigqueue_alloc fails to allocate the necessary resources for the signal delivery, preventing the signal from being delivered with siginfo. This prevents the process from correctly identifying the fault address and handling the error. From the user-space perspective, applications are unaware that the limit has been reached and that the siginfo is effectively 'corrupted'. This can lead to unpredictable behavior and crashes, as we observed with java applications. Fix this by passing override_rlimit into inc_rlimit_get_ucounts() and skip the comparison to max there if override_rlimit is set. This effectively restores the old behavior. Link: https://lkml.kernel.org/r/20241104195419.3962584-1-roman.gushchin@linux.dev Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts") Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Co-developed-by: Andrei Vagin <avagin@google.com> Signed-off-by: Andrei Vagin <avagin@google.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Alexey Gladkov <legion@kernel.org> Cc: Kees Cook <kees@kernel.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07fs/proc: fix compile warning about variable 'vmcore_mmap_ops'Qi Xi
When build with !CONFIG_MMU, the variable 'vmcore_mmap_ops' is defined but not used: >> fs/proc/vmcore.c:458:42: warning: unused variable 'vmcore_mmap_ops' 458 | static const struct vm_operations_struct vmcore_mmap_ops = { Fix this by only defining it when CONFIG_MMU is enabled. Link: https://lkml.kernel.org/r/20241101034803.9298-1-xiqi2@huawei.com Fixes: 9cb218131de1 ("vmcore: introduce remap_oldmem_pfn_range()") Signed-off-by: Qi Xi <xiqi2@huawei.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/lkml/202410301936.GcE8yUos-lkp@intel.com/ Cc: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Wang ShaoBo <bobo.shaobowang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07ucounts: fix counter leak in inc_rlimit_get_ucounts()Andrei Vagin
The inc_rlimit_get_ucounts() increments the specified rlimit counter and then checks its limit. If the value exceeds the limit, the function returns an error without decrementing the counter. Link: https://lkml.kernel.org/r/20241101191940.3211128-1-roman.gushchin@linux.dev Fixes: 15bc01effefe ("ucounts: Fix signal ucount refcounting") Signed-off-by: Andrei Vagin <avagin@google.com> Co-developed-by: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Tested-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Alexey Gladkov <legion@kernel.org> Cc: Kees Cook <kees@kernel.org> Cc: Andrei Vagin <avagin@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Alexey Gladkov <legion@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07selftests: hugetlb_dio: check for initial conditions to skip in the startMuhammad Usama Anjum
The test should be skipped if initial conditions aren't fulfilled in the start instead of failing and outputting non-compliant TAP logs. This kind of failure pollutes the results. The initial conditions are: - The test should only execute if /tmp file can be allocated. - The test should only execute if huge pages are free. Before: TAP version 13 1..4 Bail out! Error opening file : Read-only file system (30) # Planned tests != run tests (4 != 0) # Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0 After: TAP version 13 1..0 # SKIP Unable to allocate file: Read-only file system Link: https://lkml.kernel.org/r/20241101141557.3159432-1-usama.anjum@collabora.com Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Fixes: 3a103b5315b7 ("selftest: mm: Test if hugepage does not get leaked during __bio_release_pages()") Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Donet Tom <donettom@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07mm: fix docs for the kernel parameter ``thp_anon=``Maíra Canal
If we add ``thp_anon=32,64K:always`` to the kernel command line, we will see the following error: [ 0.000000] huge_memory: thp_anon=32,64K:always: error parsing string, ignoring setting This happens because the correct format isn't ``thp_anon=<size>,<size>[KMG]:<state>```, as [KMG] must follow each number to especify its unit. So, the correct format is ``thp_anon=<size>[KMG],<size>[KMG]:<state>```. Therefore, adjust the documentation to reflect the correct format of the parameter ``thp_anon=``. Link: https://lkml.kernel.org/r/20241101165719.1074234-3-mcanal@igalia.com Fixes: dd4d30d1cdbe ("mm: override mTHP "enabled" defaults at kernel cmdline") Signed-off-by: Maíra Canal <mcanal@igalia.com> Acked-by: Barry Song <baohua@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lance Yang <ioworker0@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07mm/damon/core: avoid overflow in damon_feed_loop_next_input()SeongJae Park
damon_feed_loop_next_input() is inefficient and fragile to overflows. Specifically, 'score_goal_diff_bp' calculation can overflow when 'score' is high. The calculation is actually unnecessary at all because 'goal' is a constant of value 10,000. Calculation of 'compensation' is again fragile to overflow. Final calculation of return value for under-achiving case is again fragile to overflow when the current score is under-achieving the target. Add two corner cases handling at the beginning of the function to make the body easier to read, and rewrite the body of the function to avoid overflows and the unnecessary bp value calcuation. Link: https://lkml.kernel.org/r/20241031161203.47751-1-sj@kernel.org Fixes: 9294a037c015 ("mm/damon/core: implement goal-oriented feedback-driven quota auto-tuning") Signed-off-by: SeongJae Park <sj@kernel.org> Reported-by: Guenter Roeck <linux@roeck-us.net> Closes: https://lore.kernel.org/944f3d5b-9177-48e7-8ec9-7f1331a3fea3@roeck-us.net Tested-by: Guenter Roeck <linux@roeck-us.net> Cc: <stable@vger.kernel.org> [6.8.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07mm/damon/core: handle zero schemes apply intervalSeongJae Park
DAMON's logics to determine if this is the time to apply damos schemes assumes next_apply_sis is always set larger than current passed_sample_intervals. And therefore assume continuously incrementing passed_sample_intervals will make it reaches to the next_apply_sis in future. The logic hence does apply the scheme and update next_apply_sis only if passed_sample_intervals is same to next_apply_sis. If Schemes apply interval is set as zero, however, next_apply_sis is set same to current passed_sample_intervals, respectively. And passed_sample_intervals is incremented before doing the next_apply_sis check. Hence, next_apply_sis becomes larger than next_apply_sis, and the logic says it is not the time to apply schemes and update next_apply_sis. In other words, DAMON stops applying schemes until passed_sample_intervals overflows. Based on the documents and the common sense, a reasonable behavior for such inputs would be applying the schemes for every sampling interval. Handle the case by removing the assumption. Link: https://lkml.kernel.org/r/20241031183757.49610-3-sj@kernel.org Fixes: 42f994b71404 ("mm/damon/core: implement scheme-specific apply interval") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> [6.7.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07mm/damon/core: handle zero {aggregation,ops_update} intervalsSeongJae Park
Patch series "mm/damon/core: fix handling of zero non-sampling intervals". DAMON's internal intervals accounting logic is not correctly handling non-sampling intervals of zero values for a wrong assumption. This could cause unexpected monitoring behavior, and even result in infinite hang of DAMON sysfs interface user threads in case of zero aggregation interval. Fix those by updating the intervals accounting logic. For details of the root case and solutions, please refer to commit messages of fixes. This patch (of 2): DAMON's logics to determine if this is the time to do aggregation and ops update assumes next_{aggregation,ops_update}_sis are always set larger than current passed_sample_intervals. And therefore it further assumes continuously incrementing passed_sample_intervals every sampling interval will make it reaches to the next_{aggregation,ops_update}_sis in future. The logic therefore make the action and update next_{aggregation,ops_updaste}_sis only if passed_sample_intervals is same to the counts, respectively. If Aggregation interval or Ops update interval are zero, however, next_aggregation_sis or next_ops_update_sis are set same to current passed_sample_intervals, respectively. And passed_sample_intervals is incremented before doing the next_{aggregation,ops_update}_sis check. Hence, passed_sample_intervals becomes larger than next_{aggregation,ops_update}_sis, and the logic says it is not the time to do the action and update next_{aggregation,ops_update}_sis forever, until an overflow happens. In other words, DAMON stops doing aggregations or ops updates effectively forever, and users cannot get monitoring results. Based on the documents and the common sense, a reasonable behavior for such inputs is doing an aggregation and an ops update for every sampling interval. Handle the case by removing the assumption. Note that this could incur particular real issue for DAMON sysfs interface users, in case of zero Aggregation interval. When user starts DAMON with zero Aggregation interval and asks online DAMON parameter tuning via DAMON sysfs interface, the request is handled by the aggregation callback. Until the callback finishes the work, the user who requested the online tuning just waits. Hence, the user will be stuck until the passed_sample_intervals overflows. Link: https://lkml.kernel.org/r/20241031183757.49610-1-sj@kernel.org Link: https://lkml.kernel.org/r/20241031183757.49610-2-sj@kernel.org Fixes: 4472edf63d66 ("mm/damon/core: use number of passed access sampling as a timer") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> [6.7.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07mm/mlock: set the correct prev on failureWei Yang
After commit 94d7d9233951 ("mm: abstract the vma_merge()/split_vma() pattern for mprotect() et al."), if vma_modify_flags() return error, the vma is set to an error code. This will lead to an invalid prev be returned. Generally this shouldn't matter as the caller should treat an error as indicating state is now invalidated, however unfortunately apply_mlockall_flags() does not check for errors and assumes that mlock_fixup() correctly maintains prev even if an error were to occur. This patch fixes that assumption. [lorenzo.stoakes@oracle.com: provide a better fix and rephrase the log] Link: https://lkml.kernel.org/r/20241027123321.19511-1-richard.weiyang@gmail.com Fixes: 94d7d9233951 ("mm: abstract the vma_merge()/split_vma() pattern for mprotect() et al.") Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07objpool: fix to make percpu slot allocation more robustMasami Hiramatsu (Google)
Since gfp & GFP_ATOMIC == GFP_ATOMIC is true for GFP_KERNEL | GFP_HIGH, it will use kmalloc if user specifies that combination. Here the reason why combining the __vmalloc_node() and kmalloc_node() is that the vmalloc does not support all GFP flag, especially GFP_ATOMIC. So we should check if gfp & (GFP_ATOMIC | GFP_KERNEL) != GFP_ATOMIC for vmalloc first. This ensures caller can sleep. And for the robustness, even if vmalloc fails, it should retry with kmalloc to allocate it. Link: https://lkml.kernel.org/r/173008598713.1262174.2959179484209897252.stgit@mhiramat.roam.corp.google.com Fixes: aff1871bfc81 ("objpool: fix choosing allocation for percpu slots") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Closes: https://lore.kernel.org/all/CAHk-=whO+vSH+XVRio8byJU8idAWES0SPGVZ7KAVdc4qrV0VUA@mail.gmail.com/ Cc: Leo Yan <leo.yan@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Wu <wuqiang.matt@bytedance.com> Cc: Mikel Rychliski <mikel@mikelr.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Viktor Malik <vmalik@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07mm/page_alloc: keep track of free highatomicYu Zhao
OOM kills due to vastly overestimated free highatomic reserves were observed: ... invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0 ... Node 0 Normal free:1482936kB boost:0kB min:410416kB low:739404kB high:1068392kB reserved_highatomic:1073152KB ... Node 0 Normal: 1292*4kB (ME) 1920*8kB (E) 383*16kB (UE) 220*32kB (ME) 340*64kB (E) 2155*128kB (UE) 3243*256kB (UE) 615*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 1477408kB The second line above shows that the OOM kill was due to the following condition: free (1482936kB) - reserved_highatomic (1073152kB) = 409784KB < min (410416kB) And the third line shows there were no free pages in any MIGRATE_HIGHATOMIC pageblocks, which otherwise would show up as type 'H'. Therefore __zone_watermark_unusable_free() underestimated the usable free memory by over 1GB, which resulted in the unnecessary OOM kill above. The comments in __zone_watermark_unusable_free() warns about the potential risk, i.e., If the caller does not have rights to reserves below the min watermark then subtract the high-atomic reserves. This will over-estimate the size of the atomic reserve but it avoids a search. However, it is possible to keep track of free pages in reserved highatomic pageblocks with a new per-zone counter nr_free_highatomic protected by the zone lock, to avoid a search when calculating the usable free memory. And the cost would be minimal, i.e., simple arithmetics in the highatomic alloc/free/move paths. Note that since nr_free_highatomic can be relatively small, using a per-cpu counter might cause too much drift and defeat its purpose, in addition to the extra memory overhead. Dependson e0932b6c1f94 ("mm: page_alloc: consolidate free page accounting") - see [1] [akpm@linux-foundation.org: s/if/else if/, per Johannes, stealth whitespace tweak] Link: https://lkml.kernel.org/r/20241028182653.3420139-1-yuzhao@google.com Link: https://lkml.kernel.org/r/0d0ddb33-fcdc-43e2-801f-0c1df2031afb@suse.cz [1] Fixes: 0aaa29a56e4f ("mm, page_alloc: reserve pageblocks for high-order atomic allocations on demand") Signed-off-by: Yu Zhao <yuzhao@google.com> Reported-by: Link Lin <linkl@google.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07vdpa/mlx5: Fix error path during device addDragos Tatulea
In the error recovery path of mlx5_vdpa_dev_add(), the cleanup is executed and at the end put_device() is called which ends up calling mlx5_vdpa_free(). This function will execute the same cleanup all over again. Most resources support being cleaned up twice, but the recent mlx5_vdpa_destroy_mr_resources() doesn't. This change drops the explicit cleanup from within the mlx5_vdpa_dev_add() and lets mlx5_vdpa_free() do its work. This issue was discovered while trying to add 2 vdpa devices with the same name: $> vdpa dev add name vdpa-0 mgmtdev auxiliary/mlx5_core.sf.2 $> vdpa dev add name vdpa-0 mgmtdev auxiliary/mlx5_core.sf.3 ... yields the following dump: BUG: kernel NULL pointer dereference, address: 00000000000000b8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP CPU: 4 UID: 0 PID: 2811 Comm: vdpa Not tainted 6.12.0-rc6 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:destroy_workqueue+0xe/0x2a0 Code: ... RSP: 0018:ffff88814920b9a8 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff888105c10000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffff888100400168 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffff888100120c00 R09: ffffffff828578c0 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff888131fd99a0 R14: 0000000000000000 R15: ffff888105c10580 FS: 00007fdfa6b4f740(0000) GS:ffff88852ca00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000b8 CR3: 000000018db09006 CR4: 0000000000372eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __die+0x20/0x60 ? page_fault_oops+0x150/0x3e0 ? exc_page_fault+0x74/0x130 ? asm_exc_page_fault+0x22/0x30 ? destroy_workqueue+0xe/0x2a0 mlx5_vdpa_destroy_mr_resources+0x2b/0x40 [mlx5_vdpa] mlx5_vdpa_free+0x45/0x150 [mlx5_vdpa] vdpa_release_dev+0x1e/0x50 [vdpa] device_release+0x31/0x90 kobject_put+0x8d/0x230 mlx5_vdpa_dev_add+0x328/0x8b0 [mlx5_vdpa] vdpa_nl_cmd_dev_add_set_doit+0x2b8/0x4c0 [vdpa] genl_family_rcv_msg_doit+0xd0/0x120 genl_rcv_msg+0x180/0x2b0 ? __vdpa_alloc_device+0x1b0/0x1b0 [vdpa] ? genl_family_rcv_msg_dumpit+0xf0/0xf0 netlink_rcv_skb+0x54/0x100 genl_rcv+0x24/0x40 netlink_unicast+0x1fc/0x2d0 netlink_sendmsg+0x1e4/0x410 __sock_sendmsg+0x38/0x60 ? sockfd_lookup_light+0x12/0x60 __sys_sendto+0x105/0x160 ? __count_memcg_events+0x53/0xe0 ? handle_mm_fault+0x100/0x220 ? do_user_addr_fault+0x40d/0x620 __x64_sys_sendto+0x20/0x30 do_syscall_64+0x4c/0x100 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7fdfa6c66b57 Code: ... RSP: 002b:00007ffeace22998 EFLAGS: 00000202 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 000055a498608350 RCX: 00007fdfa6c66b57 RDX: 000000000000006c RSI: 000055a498608350 RDI: 0000000000000003 RBP: 00007ffeace229c0 R08: 00007fdfa6d35200 R09: 000000000000000c R10: 0000000000000000 R11: 0000000000000202 R12: 000055a4986082a0 R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffeace233f3 </TASK> Modules linked in: ... CR2: 00000000000000b8 Fixes: 62111654481d ("vdpa/mlx5: Postpone MR deletion") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20241105185101.1323272-2-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com>
2024-11-07bcachefs: Fix UAF in __promote_alloc() error pathKent Overstreet
If we error in data_update_init() after adding to the rhashtable of outstanding promotes, kfree_rcu() is required. Reported-by: Reed Riley <reed@riley.engineer> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-11-07bcachefs: Change OPT_STR max to be 1 less than the size of choices arrayPiotr Zalewski
Change OPT_STR max value to be 1 less than the "ARRAY_SIZE" of "_choices" array. As a result, remove -1 from (opt->max-1) in bch2_opt_to_text. The "_choices" array is a null-terminated array, so computing the maximum using "ARRAY_SIZE" without subtracting 1 yields an incorrect result. Since bch2_opt_validate don't subtract 1, as bch2_opt_to_text does, values bigger than the actual maximum would pass through option validation. Reported-by: syzbot+bee87a0c3291c06aa8c6@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=bee87a0c3291c06aa8c6 Fixes: 63c4b2545382 ("bcachefs: Better superblock opt validation") Suggested-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Piotr Zalewski <pZ010001011111@proton.me> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-11-07bcachefs: btree_cache.freeable list fixesKent Overstreet
When allocating new btree nodes, we were leaving them on the freeable list - unlocked - allowing them to be reclaimed: ouch. Additionally, bch2_btree_node_free_never_used() -> bch2_btree_node_hash_remove was putting it on the freelist, while bch2_btree_node_free_never_used() was putting it back on the btree update reserve list - ouch. Originally, the code was written to always keep btree nodes on a list - live or freeable - and this worked when new nodes were kept locked. But now with the cycle detector, we can't keep nodes locked that aren't tracked by the cycle detector; and this is fine as long as they're not reachable. We also have better and more robust leak detection now, with memory allocation profiling, so the original justification no longer applies. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-11-07bcachefs: check the invalid parameter for perf testHongbo Li
The perf_test does not check the number of iterations and threads when it is zero. If nr_thread is 0, the perf test will keep waiting for wakekup. If iteration is 0, it will cause exception of division by zero. This can be reproduced by: echo "rand_insert 0 1" > /sys/fs/bcachefs/${uuid}/perf_test or echo "rand_insert 1 0" > /sys/fs/bcachefs/${uuid}/perf_test Fixes: 1c6fdbd8f246 ("bcachefs: Initial commit") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-11-07bcachefs: add check NULL return of bio_kmalloc in journal_read_bucketPei Xiao
bio_kmalloc may return NULL, will cause NULL pointer dereference. Add check NULL return for bio_kmalloc in journal_read_bucket. Signed-off-by: Pei Xiao <xiaopei01@kylinos.cn> Fixes: ac10a9611d87 ("bcachefs: Some fixes for building in userspace") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-11-07bcachefs: Ensure BCH_FS_may_go_rw is set before exiting recoveryKent Overstreet
If BCH_FS_may_go_rw is not yet set, it indicates to the transaction commit path that updates should be done via the list of journal replay keys. This must be set before multithreaded use commences. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-11-07bcachefs: Fix topology errors on split after mergeKent Overstreet
If a btree split picks a pivot that's being deleted by a btree node merge, we're going to have problems. Fix this by checking if the pivot is being deleted, the same as we check for deletions in journal replay keys. Found by single_devic.ktest small_nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-11-07bcachefs: Ancient versions with bad bkey_formats are no longer supportedKent Overstreet
Syzbot found an assertion pop, by generating an ancient filesystem version with an invalid bkey_format (with fields that can overflow) as well as packed keys that aren't representable unpacked. This breaks key comparisons in all sorts of painful ways. Filesystems have been automatically rewriting nodes with such invalid formats for years; we can safely drop support for them. Reported-by: syzbot+8a0109511de9d4b61217@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-11-07bcachefs: Fix error handling in bch2_btree_node_prefetch()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-11-07bcachefs: Fix null ptr deref in bucket_gen_get()Kent Overstreet
bucket_gen() checks if we're lookup up a valid bucket and returns NULL otherwise, but bucket_gen_get() was failing to check; other callers were correct. Also do a bit of cleanup on callers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-11-07selftests: net: add a test for closing a netlink socket ith dump in progressJakub Kicinski
Close a socket with dump in progress. We need a dump which generates enough info not to fit into a single skb. Policy dump fits the bill. Use the trick discovered by syzbot for keeping a ref on the socket longer than just close, with mqueue. TAP version 13 1..3 # Starting 3 tests from 1 test cases. # RUN global.test_sanity ... # OK global.test_sanity ok 1 global.test_sanity # RUN global.close_in_progress ... # OK global.close_in_progress ok 2 global.close_in_progress # RUN global.close_with_ref ... # OK global.close_with_ref ok 3 global.close_with_ref # PASSED: 3 / 3 tests passed. # Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0 Note that this test is not expected to fail but rather crash the kernel if we get the cleanup wrong. Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241106015235.2458807-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-07netlink: terminate outstanding dump on socket closeJakub Kicinski
Netlink supports iterative dumping of data. It provides the families the following ops: - start - (optional) kicks off the dumping process - dump - actual dump helper, keeps getting called until it returns 0 - done - (optional) pairs with .start, can be used for cleanup The whole process is asynchronous and the repeated calls to .dump don't actually happen in a tight loop, but rather are triggered in response to recvmsg() on the socket. This gives the user full control over the dump, but also means that the user can close the socket without getting to the end of the dump. To make sure .start is always paired with .done we check if there is an ongoing dump before freeing the socket, and if so call .done. The complication is that sockets can get freed from BH and .done is allowed to sleep. So we use a workqueue to defer the call, when needed. Unfortunately this does not work correctly. What we defer is not the cleanup but rather releasing a reference on the socket. We have no guarantee that we own the last reference, if someone else holds the socket they may release it in BH and we're back to square one. The whole dance, however, appears to be unnecessary. Only the user can interact with dumps, so we can clean up when socket is closed. And close always happens in process context. Some async code may still access the socket after close, queue notification skbs to it etc. but no dumps can start, end or otherwise make progress. Delete the workqueue and flush the dump state directly from the release handler. Note that further cleanup is possible in -next, for instance we now always call .done before releasing the main module reference, so dump doesn't have to take a reference of its own. Reported-by: syzkaller <syzkaller@googlegroups.com> Fixes: ed5d7788a934 ("netlink: Do not schedule work from sk_destruct") Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241106015235.2458807-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.12-rc7). Conflicts: drivers/net/ethernet/freescale/enetc/enetc_pf.c e15c5506dd39 ("net: enetc: allocate vf_state during PF probes") 3774409fd4c6 ("net: enetc: build enetc_pf_common.c as a separate module") https://lore.kernel.org/20241105114100.118bd35e@canb.auug.org.au Adjacent changes: drivers/net/ethernet/ti/am65-cpsw-nuss.c de794169cf17 ("net: ethernet: ti: am65-cpsw: Fix multi queue Rx on J7") 4a7b2ba94a59 ("net: ethernet: ti: am65-cpsw: Use tstats instead of open coded version") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-07Merge tag 'net-6.12-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from can and netfilter. Things are slowing down quite a bit, mostly driver fixes here. No known ongoing investigations. Current release - new code bugs: - eth: ti: am65-cpsw: - fix multi queue Rx on J7 - fix warning in am65_cpsw_nuss_remove_rx_chns() Previous releases - regressions: - mptcp: do not require admin perm to list endpoints, got missed in a refactoring - mptcp: use sock_kfree_s instead of kfree Previous releases - always broken: - sctp: properly validate chunk size in sctp_sf_ootb() fix OOB access - virtio_net: make RSS interact properly with queue number - can: mcp251xfd: mcp251xfd_get_tef_len(): fix length calculation - can: mcp251xfd: mcp251xfd_ring_alloc(): fix coalescing configuration when switching CAN modes Misc: - revert earlier hns3 fixes, they were ignoring IOMMU abstractions and need to be reworked - can: {cc770,sja1000}_isa: allow building on x86_64" * tag 'net-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (42 commits) drivers: net: ionic: add missed debugfs cleanup to ionic_probe() error path net/smc: do not leave a dangling sk pointer in __smc_create() rxrpc: Fix missing locking causing hanging calls net/smc: Fix lookup of netdev by using ib_device_get_netdev() net: arc: rockchip: fix emac mdio node support net: arc: fix the device for dma_map_single/dma_unmap_single virtio_net: Update rss when set queue virtio_net: Sync rss config to device when virtnet_probe virtio_net: Add hash_key_length check virtio_net: Support dynamic rss indirection table size netfilter: nf_tables: wait for rcu grace period on net_device removal net: stmmac: Fix unbalanced IRQ wake disable warning on single irq case net: vertexcom: mse102x: Fix possible double free of TX skb mptcp: use sock_kfree_s instead of kfree mptcp: no admin perm to list endpoints net: phy: ti: add PHY_RST_AFTER_CLK_EN flag net: ethernet: ti: am65-cpsw: fix warning in am65_cpsw_nuss_remove_rx_chns() net: ethernet: ti: am65-cpsw: Fix multi queue Rx on J7 net: hns3: fix kernel crash when uninstalling driver Revert "Merge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'" ...
2024-11-07Merge tag 'nvme-6.12-2024-11-07' of git://git.infradead.org/nvme into block-6.12Jens Axboe
Pull NVMe fix from Keith: "nvme fix for Linux 6.13 - Use correct list traversal for srcu lists (Breno)" * tag 'nvme-6.12-2024-11-07' of git://git.infradead.org/nvme: nvme/host: Fix RCU list traversal to use SRCU primitive
2024-11-07drivers: net: ionic: add missed debugfs cleanup to ionic_probe() error pathWentao Liang
The ionic_setup_one() creates a debugfs entry for ionic upon successful execution. However, the ionic_probe() does not release the dentry before returning, resulting in a memory leak. To fix this bug, we add the ionic_debugfs_del_dev() to release the resources in a timely manner before returning. Fixes: 0de38d9f1dba ("ionic: extract common bits from ionic_probe") Signed-off-by: Wentao Liang <Wentao_liang_g@163.com> Acked-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://patch.msgid.link/20241107021756.1677-1-liangwentao@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-07net/smc: do not leave a dangling sk pointer in __smc_create()Eric Dumazet
Thanks to commit 4bbd360a5084 ("socket: Print pf->create() when it does not clear sock->sk on failure."), syzbot found an issue with AF_SMC: smc_create must clear sock->sk on failure, family: 43, type: 1, protocol: 0 WARNING: CPU: 0 PID: 5827 at net/socket.c:1565 __sock_create+0x96f/0xa30 net/socket.c:1563 Modules linked in: CPU: 0 UID: 0 PID: 5827 Comm: syz-executor259 Not tainted 6.12.0-rc6-next-20241106-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 RIP: 0010:__sock_create+0x96f/0xa30 net/socket.c:1563 Code: 03 00 74 08 4c 89 e7 e8 4f 3b 85 f8 49 8b 34 24 48 c7 c7 40 89 0c 8d 8b 54 24 04 8b 4c 24 0c 44 8b 44 24 08 e8 32 78 db f7 90 <0f> 0b 90 90 e9 d3 fd ff ff 89 e9 80 e1 07 fe c1 38 c1 0f 8c ee f7 RSP: 0018:ffffc90003e4fda0 EFLAGS: 00010246 RAX: 099c6f938c7f4700 RBX: 1ffffffff1a595fd RCX: ffff888034823c00 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 00000000ffffffe9 R08: ffffffff81567052 R09: 1ffff920007c9f50 R10: dffffc0000000000 R11: fffff520007c9f51 R12: ffffffff8d2cafe8 R13: 1ffffffff1a595fe R14: ffffffff9a789c40 R15: ffff8880764298c0 FS: 000055557b518380(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa62ff43225 CR3: 0000000031628000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> sock_create net/socket.c:1616 [inline] __sys_socket_create net/socket.c:1653 [inline] __sys_socket+0x150/0x3c0 net/socket.c:1700 __do_sys_socket net/socket.c:1714 [inline] __se_sys_socket net/socket.c:1712 [inline] For reference, see commit 2d859aff775d ("Merge branch 'do-not-leave-dangling-sk-pointers-in-pf-create-functions'") Fixes: d25a92ccae6b ("net/smc: Introduce IPPROTO_SMC") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ignat Korchagin <ignat@cloudflare.com> Cc: D. Wythe <alibuda@linux.alibaba.com> Cc: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Link: https://patch.msgid.link/20241106221922.1544045-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-07rxrpc: Fix missing locking causing hanging callsDavid Howells
If a call gets aborted (e.g. because kafs saw a signal) between it being queued for connection and the I/O thread picking up the call, the abort will be prioritised over the connection and it will be removed from local->new_client_calls by rxrpc_disconnect_client_call() without a lock being held. This may cause other calls on the list to disappear if a race occurs. Fix this by taking the client_call_lock when removing a call from whatever list its ->wait_link happens to be on. Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org Reported-by: Marc Dionne <marc.dionne@auristor.com> Fixes: 9d35d880e0e4 ("rxrpc: Move client call connection to the I/O thread") Link: https://patch.msgid.link/726660.1730898202@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-07net/smc: Fix lookup of netdev by using ib_device_get_netdev()Wenjia Zhang
The SMC-R variant of the SMC protocol used direct call to function ib_device_ops.get_netdev() to lookup netdev. As we used mlx5 device driver to run SMC-R, it failed to find a device, because in mlx5_ib the internal net device management for retrieving net devices was replaced by a common interface ib_device_get_netdev() in commit 8d159eb2117b ("RDMA/mlx5: Use IB set_netdev and get_netdev functions"). Since such direct accesses to the internal net device management is not recommended at all, update the SMC-R code to use proper API ib_device_get_netdev(). Fixes: 54903572c23c ("net/smc: allow pnetid-less configuration") Reported-by: Aswin K <aswin@linux.ibm.com> Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://patch.msgid.link/20241106082612.57803-1-wenjia@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-07Merge tag 'pwm/for-6.12-rc7-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux Pull pwm fix from Uwe Kleine-König: "Fix period setting in imx-tpm driver and a maintainer update Erik Schumacher found and fixed a problem in the calculation of the PWM period setting yielding too long periods. Trevor Gamblin - who already cared about mainlining the pwm-axi-pwmgen driver - stepped forward as an additional reviewer. Thanks to Erik and Trevor" * tag 'pwm/for-6.12-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux: MAINTAINERS: add self as reviewer for AXI PWM GENERATOR pwm: imx-tpm: Use correct MODULO value for EPWM mode
2024-11-07proc/softirqs: replace seq_printf with seq_put_decimal_ull_widthDavid Wang
seq_printf is costy, on a system with n CPUs, reading /proc/softirqs would yield 10*n decimal values, and the extra cost parsing format string grows linearly with number of cpus. Replace seq_printf with seq_put_decimal_ull_width have significant performance improvement. On an 8CPUs system, reading /proc/softirqs show ~40% performance gain with this patch. Signed-off-by: David Wang <00107082@163.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-11-07drm/panthor: Be stricter about IO mapping flagsJann Horn
The current panthor_device_mmap_io() implementation has two issues: 1. For mapping DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET, panthor_device_mmap_io() bails if VM_WRITE is set, but does not clear VM_MAYWRITE. That means userspace can use mprotect() to make the mapping writable later on. This is a classic Linux driver gotcha. I don't think this actually has any impact in practice: When the GPU is powered, writes to the FLUSH_ID seem to be ignored; and when the GPU is not powered, the dummy_latest_flush page provided by the driver is deliberately designed to not do any flushes, so the only thing writing to the dummy_latest_flush could achieve would be to make *more* flushes happen. 2. panthor_device_mmap_io() does not block MAP_PRIVATE mappings (which are mappings without the VM_SHARED flag). MAP_PRIVATE in combination with VM_MAYWRITE indicates that the VMA has copy-on-write semantics, which for VM_PFNMAP are semi-supported but fairly cursed. In particular, in such a mapping, the driver can only install PTEs during mmap() by calling remap_pfn_range() (because remap_pfn_range() wants to **store the physical address of the mapped physical memory into the vm_pgoff of the VMA**); installing PTEs later on with a fault handler (as panthor does) is not supported in private mappings, and so if you try to fault in such a mapping, vmf_insert_pfn_prot() splats when it hits a BUG() check. Fix it by clearing the VM_MAYWRITE flag (userspace writing to the FLUSH_ID doesn't make sense) and requiring VM_SHARED (copy-on-write semantics for the FLUSH_ID don't make sense). Reproducers for both scenarios are in the notes of my patch on the mailing list; I tested that these bugs exist on a Rock 5B machine. Note that I only compile-tested the patch, I haven't tested it; I don't have a working kernel build setup for the test machine yet. Please test it before applying it. Cc: stable@vger.kernel.org Fixes: 5fe909cae118 ("drm/panthor: Add the device logical block") Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241105-panthor-flush-page-fixes-v1-1-829aaf37db93@google.com
2024-11-07Merge tag 'nf-24-11-07' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fix for net The following series contains a Netfilter fix: 1) Wait for rcu grace period after netdevice removal is reported via event. * tag 'nf-24-11-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: wait for rcu grace period on net_device removal ==================== Link: https://patch.msgid.link/20241107113212.116634-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-07ASoC: SOF: sof-client-probes-ipc4: Set param_size extension bitsJyri Sarha
Write the size of the optional payload of SOF_IPC4_MOD_INIT_INSTANCE message to extension param_size-bits. The previous IPC4 version does not set these bits that should indicate the size of the optional payload (struct sof_ipc4_probe_cfg). The old firmware side component code works well without these bits, but when the probes are converted to use the generic module API, this does not work anymore. Fixes: f5623593060f ("ASoC: SOF: IPC4: probes: Implement IPC4 ops for probes client device") Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Liam Girdwood <liam.r.girdwood@intel.com> Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Link: https://patch.msgid.link/20241107132840.17386-1-peter.ujfalusi@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-11-07drm/panthor: Lock XArray when getting entries for the VMLiviu Dudau
Similar to commit cac075706f29 ("drm/panthor: Fix race when converting group handle to group object") we need to use the XArray's internal locking when retrieving a vm pointer from there. v2: Removed part of the patch that was trying to protect fetching the heap pointer from XArray, as that operation is protected by the @pool->lock. Fixes: 647810ec2476 ("drm/panthor: Add the MMU/VM logical block") Reported-by: Jann Horn <jannh@google.com> Cc: stable@vger.kernel.org Signed-off-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241106185806.389089-1-liviu.dudau@arm.com
2024-11-07drm: panel-orientation-quirks: Make Lenovo Yoga Tab 3 X90F DMI match less strictHans de Goede
There are 2G and 4G RAM versions of the Lenovo Yoga Tab 3 X90F and it turns out that the 2G version has a DMI product name of "CHERRYVIEW D1 PLATFORM" where as the 4G version has "CHERRYVIEW C0 PLATFORM". The sys-vendor + product-version check are unique enough that the product-name check is not necessary. Drop the product-name check so that the existing DMI match for the 4G RAM version also matches the 2G RAM version. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240825132131.6643-1-hdegoede@redhat.com