summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-03-26mailmap: update entry for Leonard CrestezLeonard Crestez
Put my personal email first because NXP employment ended some time ago. Also add my old intel email address. Link: https://lkml.kernel.org/r/f568faa0-2380-4e93-a312-b80c1e367645@gmail.com Signed-off-by: Leonard Crestez <cdleonard@gmail.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-26init: open /initrd.image with O_LARGEFILEJohn Sperbeck
If initrd data is larger than 2Gb, we'll eventually fail to write to the /initrd.image file when we hit that limit, unless O_LARGEFILE is set. Link: https://lkml.kernel.org/r/20240317221522.896040-1-jsperbeck@google.com Signed-off-by: John Sperbeck <jsperbeck@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-26selftests/mm: Fix build with _FORTIFY_SOURCEVitaly Chikunov
Add missing flags argument to open(2) call with O_CREAT. Some tests fail to compile if _FORTIFY_SOURCE is defined (to any valid value) (together with -O), resulting in similar error messages such as: In file included from /usr/include/fcntl.h:342, from gup_test.c:1: In function 'open', inlined from 'main' at gup_test.c:206:10: /usr/include/bits/fcntl2.h:50:11: error: call to '__open_missing_mode' declared with attribute error: open with O_CREAT or O_TMPFILE in second argument needs 3 arguments 50 | __open_missing_mode (); | ^~~~~~~~~~~~~~~~~~~~~~ _FORTIFY_SOURCE is enabled by default in some distributions, so the tests are not built by default and are skipped. open(2) man-page warns about missing flags argument: "if it is not supplied, some arbitrary bytes from the stack will be applied as the file mode." Link: https://lkml.kernel.org/r/20240318023445.3192922-1-vt@altlinux.org Fixes: aeb85ed4f41a ("tools/testing/selftests/vm/gup_benchmark.c: allow user specified file") Fixes: fbe37501b252 ("mm: huge_memory: debugfs for file-backed THP split") Fixes: c942f5bd17b3 ("selftests: soft-dirty: add test for mprotect") Signed-off-by: Vitaly Chikunov <vt@altlinux.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Keith Busch <kbusch@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-26mm/memory: fix missing pte marker for !page on pte zapsPeter Xu
Commit 0cf18e839f64 of large folio zap work broke uffd-wp. Now mm's uffd unit test "wp-unpopulated" will trigger this WARN_ON_ONCE(). The WARN_ON_ONCE() asserts that an VMA cannot be registered with userfaultfd-wp if it contains a !normal page, but it's actually possible. One example is an anonymous vma, register with uffd-wp, read anything will install a zero page. Then when zap on it, this should trigger. What's more, removing that WARN_ON_ONCE may not be enough either, because we should also not rely on "whether it's a normal page" to decide whether pte marker is needed. For example, one can register wr-protect over some DAX regions to track writes when UFFD_FEATURE_WP_ASYNC enabled, in which case it can have page==NULL for a devmap but we may want to keep the marker around. Link: https://lkml.kernel.org/r/20240313213107.235067-1-peterx@redhat.com Fixes: 0cf18e839f64 ("mm/memory: handle !page case in zap_present_pte() separately") Signed-off-by: Peter Xu <peterx@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-26Merge tag 'printk-for-6.9-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk fix from Petr Mladek: - Prevent scheduling in an atomic context when printk() takes over the console flushing duty * tag 'printk-for-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk: Update @console_may_schedule in console_trylock_spinning()
2024-03-26Merge tag 'pwm/for-6.9-rc2-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux Pull pwm fix from Uwe Kleine-König: "This contains a single fix for a regression introduced in v5.18-rc1 which made the img pwm driver fail to bind" * tag 'pwm/for-6.9-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux: pwm: img: fix pwm clock lookup
2024-03-26i40e: fix vf may be used uninitialized in this function warningAleksandr Loktionov
To fix the regression introduced by commit 52424f974bc5, which causes servers hang in very hard to reproduce conditions with resets races. Using two sources for the information is the root cause. In this function before the fix bumping v didn't mean bumping vf pointer. But the code used this variables interchangeably, so stale vf could point to different/not intended vf. Remove redundant "v" variable and iterate via single VF pointer across whole function instead to guarantee VF pointer validity. Fixes: 52424f974bc5 ("i40e: Fix VF hang when reset is triggered on another VF") Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-26i40e: fix i40e_count_filters() to count only active/new filtersAleksandr Loktionov
The bug usually affects untrusted VFs, because they are limited to 18 MACs, it affects them badly, not letting to create MAC all filters. Not stable to reproduce, it happens when VF user creates MAC filters when other MACVLAN operations are happened in parallel. But consequence is that VF can't receive desired traffic. Fix counter to be bumped only for new or active filters. Fixes: 621650cabee5 ("i40e: Refactoring VF MAC filters counting to make more reliable") Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-26btrfs: fix race in read_extent_buffer_pages()Tavian Barnes
There are reports from tree-checker that detects corrupted nodes, without any obvious pattern so possibly an overwrite in memory. After some debugging it turns out there's a race when reading an extent buffer the uptodate status can be missed. To prevent concurrent reads for the same extent buffer, read_extent_buffer_pages() performs these checks: /* (1) */ if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags)) return 0; /* (2) */ if (test_and_set_bit(EXTENT_BUFFER_READING, &eb->bflags)) goto done; At this point, it seems safe to start the actual read operation. Once that completes, end_bbio_meta_read() does /* (3) */ set_extent_buffer_uptodate(eb); /* (4) */ clear_bit(EXTENT_BUFFER_READING, &eb->bflags); Normally, this is enough to ensure only one read happens, and all other callers wait for it to finish before returning. Unfortunately, there is a racey interleaving: Thread A | Thread B | Thread C ---------+----------+--------- (1) | | | (1) | (2) | | (3) | | (4) | | | (2) | | | (1) When this happens, thread B kicks of an unnecessary read. Worse, thread C will see UPTODATE set and return immediately, while the read from thread B is still in progress. This race could result in tree-checker errors like this as the extent buffer is concurrently modified: BTRFS critical (device dm-0): corrupted node, root=256 block=8550954455682405139 owner mismatch, have 11858205567642294356 expect [256, 18446744073709551360] Fix it by testing UPTODATE again after setting the READING bit, and if it's been set, skip the unnecessary read. Fixes: d7172f52e993 ("btrfs: use per-buffer locking for extent_buffer reading") Link: https://lore.kernel.org/linux-btrfs/CAHk-=whNdMaN9ntZ47XRKP6DBes2E5w7fi-0U3H2+PS18p+Pzw@mail.gmail.com/ Link: https://lore.kernel.org/linux-btrfs/f51a6d5d7432455a6a858d51b49ecac183e0bbc9.1706312914.git.wqu@suse.com/ Link: https://lore.kernel.org/linux-btrfs/c7241ea4-fcc6-48d2-98c8-b5ea790d6c89@gmx.com/ CC: stable@vger.kernel.org # 6.5+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tavian Barnes <tavianator@tavianator.com> Reviewed-by: David Sterba <dsterba@suse.com> [ minor update of changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-26btrfs: return accurate error code on open failure in open_fs_devices()Anand Jain
When attempting to exclusive open a device which has no exclusive open permission, such as a physical device associated with the flakey dm device, the open operation will fail, resulting in a mount failure. In this particular scenario, we erroneously return -EINVAL instead of the correct error code provided by the bdev_open_by_path() function, which is -EBUSY. Fix this, by returning error code from the bdev_open_by_path() function. With this correction, the mount error message will align with that of ext4 and xfs. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-26btrfs: zoned: don't skip block groups with 100% zone unusableJohannes Thumshirn
Commit f4a9f219411f ("btrfs: do not delete unused block group if it may be used soon") changed the behaviour of deleting unused block-groups on zoned filesystems. Starting with this commit, we're using btrfs_space_info_used() to calculate the number of used bytes in a space_info. But btrfs_space_info_used() also accounts btrfs_space_info::bytes_zone_unusable as used bytes. So if a block group is 100% zone_unusable it is skipped from the deletion step. In order not to skip fully zone_unusable block-groups, also check if the block-group has bytes left that can be used on a zoned filesystem. Fixes: f4a9f219411f ("btrfs: do not delete unused block group if it may be used soon") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-26btrfs: use btrfs_warn() to log message at btrfs_add_extent_mapping()Filipe Manana
At btrfs_add_extent_mapping(), if we failed to merge the extent map, which is unexpected and theoretically should never happen, we use WARN_ONCE() to log a message which is not great because we don't get information about which filesystem it relates to in case we have multiple btrfs filesystems mounted. So change this to use btrfs_warn() and surround the error check with WARN_ON() so we always get a useful stack trace and the condition is flagged as "unlikely" since it's not expected to ever happen. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-26btrfs: fix message not properly printing interval when adding extent mapFilipe Manana
At btrfs_add_extent_mapping(), if we are unable to merge the existing extent map, we print a warning message that suggests interval ranges in the form "[X, Y)", where the first element is the inclusive start offset of a range and the second element is the exclusive end offset. However we end up printing the length of the ranges instead of the exclusive end offsets. So fix this by printing the range end offsets. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-26btrfs: fix warning messages not printing interval at unpin_extent_range()Filipe Manana
At unpin_extent_range() we print warning messages that are supposed to print an interval in the form "[X, Y)", with the first element being an inclusive start offset and the second element being the exclusive end offset of a range. However we end up printing the range's length instead of the range's exclusive end offset, so fix that to avoid having confusing and non-sense messages in case we hit one of these unexpected scenarios. Fixes: 00deaf04df35 ("btrfs: log messages at unpin_extent_range() during unexpected cases") Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-26btrfs: fix extent map leak in unexpected scenario at unpin_extent_cache()Filipe Manana
At unpin_extent_cache() if we happen to find an extent map with an unexpected start offset, we jump to the 'out' label and never release the reference we added to the extent map through the call to lookup_extent_mapping(), therefore resulting in a leak. So fix this by moving the free_extent_map() under the 'out' label. Fixes: c03c89f821e5 ("btrfs: handle errors returned from unpin_extent_cache()") Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-26btrfs: validate device maj:min during openAnand Jain
Boris managed to create a device capable of changing its maj:min without altering its device path. Only multi-devices can be scanned. A device that gets scanned and remains in the btrfs kernel cache might end up with an incorrect maj:min. Despite the temp-fsid feature patch did not introduce this bug, it could lead to issues if the above multi-device is converted to a single device with a stale maj:min. Subsequently, attempting to mount the same device with the correct maj:min might mistake it for another device with the same fsid, potentially resulting in wrongly auto-enabling the temp-fsid feature. To address this, this patch validates the device's maj:min at the time of device open and updates it if it has changed since the last scan. CC: stable@vger.kernel.org # 6.7+ Fixes: a5b8a5f9f835 ("btrfs: support cloned-device mount capability") Reported-by: Boris Burkov <boris@bur.io> Co-developed-by: Boris Burkov <boris@bur.io> Reviewed-by: Boris Burkov <boris@bur.io># Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-26btrfs: zoned: fix use-after-free in do_zone_finish()Johannes Thumshirn
Shinichiro reported the following use-after-free triggered by the device replace operation in fstests btrfs/070. BTRFS info (device nullb1): scrub: finished on devid 1 with status: 0 ================================================================== BUG: KASAN: slab-use-after-free in do_zone_finish+0x91a/0xb90 [btrfs] Read of size 8 at addr ffff8881543c8060 by task btrfs-cleaner/3494007 CPU: 0 PID: 3494007 Comm: btrfs-cleaner Tainted: G W 6.8.0-rc5-kts #1 Hardware name: Supermicro Super Server/X11SPi-TF, BIOS 3.3 02/21/2020 Call Trace: <TASK> dump_stack_lvl+0x5b/0x90 print_report+0xcf/0x670 ? __virt_addr_valid+0x200/0x3e0 kasan_report+0xd8/0x110 ? do_zone_finish+0x91a/0xb90 [btrfs] ? do_zone_finish+0x91a/0xb90 [btrfs] do_zone_finish+0x91a/0xb90 [btrfs] btrfs_delete_unused_bgs+0x5e1/0x1750 [btrfs] ? __pfx_btrfs_delete_unused_bgs+0x10/0x10 [btrfs] ? btrfs_put_root+0x2d/0x220 [btrfs] ? btrfs_clean_one_deleted_snapshot+0x299/0x430 [btrfs] cleaner_kthread+0x21e/0x380 [btrfs] ? __pfx_cleaner_kthread+0x10/0x10 [btrfs] kthread+0x2e3/0x3c0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x31/0x70 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> Allocated by task 3493983: kasan_save_stack+0x33/0x60 kasan_save_track+0x14/0x30 __kasan_kmalloc+0xaa/0xb0 btrfs_alloc_device+0xb3/0x4e0 [btrfs] device_list_add.constprop.0+0x993/0x1630 [btrfs] btrfs_scan_one_device+0x219/0x3d0 [btrfs] btrfs_control_ioctl+0x26e/0x310 [btrfs] __x64_sys_ioctl+0x134/0x1b0 do_syscall_64+0x99/0x190 entry_SYSCALL_64_after_hwframe+0x6e/0x76 Freed by task 3494056: kasan_save_stack+0x33/0x60 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3f/0x60 poison_slab_object+0x102/0x170 __kasan_slab_free+0x32/0x70 kfree+0x11b/0x320 btrfs_rm_dev_replace_free_srcdev+0xca/0x280 [btrfs] btrfs_dev_replace_finishing+0xd7e/0x14f0 [btrfs] btrfs_dev_replace_by_ioctl+0x1286/0x25a0 [btrfs] btrfs_ioctl+0xb27/0x57d0 [btrfs] __x64_sys_ioctl+0x134/0x1b0 do_syscall_64+0x99/0x190 entry_SYSCALL_64_after_hwframe+0x6e/0x76 The buggy address belongs to the object at ffff8881543c8000 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 96 bytes inside of freed 1024-byte region [ffff8881543c8000, ffff8881543c8400) The buggy address belongs to the physical page: page:00000000fe2c1285 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1543c8 head:00000000fe2c1285 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0x17ffffc0000840(slab|head|node=0|zone=2|lastcpupid=0x1fffff) page_type: 0xffffffff() raw: 0017ffffc0000840 ffff888100042dc0 ffffea0019e8f200 dead000000000002 raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8881543c7f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8881543c7f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff8881543c8000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8881543c8080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8881543c8100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb This UAF happens because we're accessing stale zone information of a already removed btrfs_device in do_zone_finish(). The sequence of events is as follows: btrfs_dev_replace_start btrfs_scrub_dev btrfs_dev_replace_finishing btrfs_dev_replace_update_device_in_mapping_tree <-- devices replaced btrfs_rm_dev_replace_free_srcdev btrfs_free_device <-- device freed cleaner_kthread btrfs_delete_unused_bgs btrfs_zone_finish do_zone_finish <-- refers the freed device The reason for this is that we're using a cached pointer to the chunk_map from the block group, but on device replace this cached pointer can contain stale device entries. The staleness comes from the fact, that btrfs_block_group::physical_map is not a pointer to a btrfs_chunk_map but a memory copy of it. Also take the fs_info::dev_replace::rwsem to prevent btrfs_dev_replace_update_device_in_mapping_tree() from changing the device underneath us again. Note: btrfs_dev_replace_update_device_in_mapping_tree() is holding fs_info::mapping_tree_lock, but as this is a spinning read/write lock we cannot take it as the call to blkdev_zone_mgmt() requires a memory allocation which may not sleep. But btrfs_dev_replace_update_device_in_mapping_tree() is always called with the fs_info::dev_replace::rwsem held in write mode. Many thanks to Shinichiro for analyzing the bug. Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> CC: stable@vger.kernel.org # 6.8 Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-26i40e: Enforce software interrupt during busy-poll exitIvan Vecera
As for ice bug fixed by commit b7306b42beaf ("ice: manage interrupts during poll exit") followed by commit 23be7075b318 ("ice: fix software generating extra interrupts") I'm seeing the similar issue also with i40e driver. In certain situation when busy-loop is enabled together with adaptive coalescing, the driver occasionally misses that there are outstanding descriptors to clean when exiting busy poll. Try to catch the remaining work by triggering a software interrupt when exiting busy poll. No extra interrupts will be generated when busy polling is not used. The issue was found when running sockperf ping-pong tcp test with adaptive coalescing and busy poll enabled (50 as value busy_pool and busy_read sysctl knobs) and results in huge latency spikes with more than 100000us. The fix is inspired from the ice driver and do the following: 1) During napi poll exit in case of busy-poll (napo_complete_done() returns false) this is recorded to q_vector that we were in busy loop. 2) Extends i40e_buildreg_itr() to be able to add an enforced software interrupt into built value 2) In i40e_update_enable_itr() enforces a software interrupt trigger if we are exiting busy poll to catch any pending clean-ups 3) Reuses unused 3rd ITR (interrupt throttle) index and set it to 20K interrupts per second to limit the number of these sw interrupts. Test results ============ Prior: [root@dell-per640-07 net]# sockperf ping-pong -i 10.9.9.1 --tcp -m 1000 --mps=max -t 120 sockperf: == version #3.10-no.git == sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s) [ 0] IP = 10.9.9.1 PORT = 11111 # TCP sockperf: Warmup stage (sending a few dummy messages)... sockperf: Starting test... sockperf: Test end (interrupted by timer) sockperf: Test ended sockperf: [Total Run] RunTime=119.999 sec; Warm up time=400 msec; SentMessages=2438563; ReceivedMessages=2438562 sockperf: ========= Printing statistics for Server No: 0 sockperf: [Valid Duration] RunTime=119.549 sec; SentMessages=2429473; ReceivedMessages=2429473 sockperf: ====> avg-latency=24.571 (std-dev=93.297, mean-ad=4.904, median-ad=1.510, siqr=1.063, cv=3.797, std-error=0.060, 99.0% ci=[24.417, 24.725]) sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0 sockperf: Summary: Latency is 24.571 usec sockperf: Total 2429473 observations; each percentile contains 24294.73 observations sockperf: ---> <MAX> observation = 103294.331 sockperf: ---> percentile 99.999 = 45.633 sockperf: ---> percentile 99.990 = 37.013 sockperf: ---> percentile 99.900 = 35.910 sockperf: ---> percentile 99.000 = 33.390 sockperf: ---> percentile 90.000 = 28.626 sockperf: ---> percentile 75.000 = 27.741 sockperf: ---> percentile 50.000 = 26.743 sockperf: ---> percentile 25.000 = 25.614 sockperf: ---> <MIN> observation = 12.220 After: [root@dell-per640-07 net]# sockperf ping-pong -i 10.9.9.1 --tcp -m 1000 --mps=max -t 120 sockperf: == version #3.10-no.git == sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s) [ 0] IP = 10.9.9.1 PORT = 11111 # TCP sockperf: Warmup stage (sending a few dummy messages)... sockperf: Starting test... sockperf: Test end (interrupted by timer) sockperf: Test ended sockperf: [Total Run] RunTime=119.999 sec; Warm up time=400 msec; SentMessages=2400055; ReceivedMessages=2400054 sockperf: ========= Printing statistics for Server No: 0 sockperf: [Valid Duration] RunTime=119.549 sec; SentMessages=2391186; ReceivedMessages=2391186 sockperf: ====> avg-latency=24.965 (std-dev=5.934, mean-ad=4.642, median-ad=1.485, siqr=1.067, cv=0.238, std-error=0.004, 99.0% ci=[24.955, 24.975]) sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0 sockperf: Summary: Latency is 24.965 usec sockperf: Total 2391186 observations; each percentile contains 23911.86 observations sockperf: ---> <MAX> observation = 195.841 sockperf: ---> percentile 99.999 = 45.026 sockperf: ---> percentile 99.990 = 39.009 sockperf: ---> percentile 99.900 = 35.922 sockperf: ---> percentile 99.000 = 33.482 sockperf: ---> percentile 90.000 = 28.902 sockperf: ---> percentile 75.000 = 27.821 sockperf: ---> percentile 50.000 = 26.860 sockperf: ---> percentile 25.000 = 25.685 sockperf: ---> <MIN> observation = 12.277 Fixes: 0bcd952feec7 ("ethernet/intel: consolidate NAPI and NAPI exit") Reported-by: Hugo Ferreira <hferreir@redhat.com> Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-26Merge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'Paolo Abeni
Jijie Shao says: ==================== There are some bugfix for the HNS3 ethernet driver ==================== Link: https://lore.kernel.org/r/20240325124311.1866197-1-shaojijie@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-26net: hns3: mark unexcuted loopback test result as UNEXECUTEDJian Shen
Currently, loopback test may be skipped when resetting, but the test result will still show as 'PASS', because the driver doesn't set ETH_TEST_FL_FAILED flag. Fix it by setting the flag and initializating the value to UNEXECUTED. Fixes: 4c8dab1c709c ("net: hns3: reconstruct function hns3_self_test") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-26net: hns3: fix kernel crash when devlink reload during pf initializationYonglong Liu
The devlink reload process will access the hardware resources, but the register operation is done before the hardware is initialized. So, processing the devlink reload during initialization may lead to kernel crash. This patch fixes this by taking devl_lock during initialization. Fixes: b741269b2759 ("net: hns3: add support for registering devlink for PF") Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-26net: hns3: fix index limit to support all queue statsJie Wang
Currently, hns hardware supports more than 512 queues and the index limit in hclge_comm_tqps_update_stats is wrong. So this patch removes it. Fixes: 287db5c40d15 ("net: hns3: create new set of common tqp stats APIs for PF and VF reuse") Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-26MAINTAINERS: wifi: mwifiex: add Francesco as reviewerFrancesco Dolcini
As discussed on the mailing list, add myself as mwifiex driver reviewer. Link: https://lore.kernel.org/all/20240318112830.GA9565@francesco-nb/ Signed-off-by: Francesco Dolcini <francesco@dolcini.it> Acked-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://msgid.link/20240321163420.11158-1-francesco@dolcini.it
2024-03-26Merge tag 'for-netdev' of ↵Paolo Abeni
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-03-25 The following pull-request contains BPF updates for your *net* tree. We've added 17 non-merge commits during the last 12 day(s) which contain a total of 19 files changed, 184 insertions(+), 61 deletions(-). The main changes are: 1) Fix an arm64 BPF JIT bug in BPF_LDX_MEMSX implementation's offset handling found via test_bpf module, from Puranjay Mohan. 2) Various fixups to the BPF arena code in particular in the BPF verifier and around BPF selftests to match latest corresponding LLVM implementation, from Puranjay Mohan and Alexei Starovoitov. 3) Fix xsk to not assume that metadata is always requested in TX completion, from Stanislav Fomichev. 4) Fix riscv BPF JIT's kfunc parameter incompatibility between BPF and the riscv ABI which requires sign-extension on int/uint, from Pu Lehui. 5) Fix s390x BPF JIT's bpf_plt pointer arithmetic which triggered a crash when testing struct_ops, from Ilya Leoshkevich. 6) Fix libbpf's arena mmap handling which had incorrect u64-to-pointer cast on 32-bit architectures, from Andrii Nakryiko. 7) Fix libbpf to define MFD_CLOEXEC when not available, from Arnaldo Carvalho de Melo. 8) Fix arm64 BPF JIT implementation for 32bit unconditional bswap which resulted in an incorrect swap as indicated by test_bpf, from Artem Savkov. 9) Fix BPF man page build script to use silent mode, from Hangbin Liu. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: riscv, bpf: Fix kfunc parameters incompatibility between bpf and riscv abi bpf: verifier: reject addr_space_cast insn without arena selftests/bpf: verifier_arena: fix mmap address for arm64 bpf: verifier: fix addr_space_cast from as(1) to as(0) libbpf: Define MFD_CLOEXEC if not available arm64: bpf: fix 32bit unconditional bswap bpf, arm64: fix bug in BPF_LDX_MEMSX libbpf: fix u64-to-pointer cast on 32-bit arches s390/bpf: Fix bpf_plt pointer arithmetic xsk: Don't assume metadata is always requested in TX completion selftests/bpf: Add arena test case for 4Gbyte corner case selftests/bpf: Remove hard coded PAGE_SIZE macro. libbpf, selftests/bpf: Adjust libbpf, bpftool, selftests to match LLVM bpf: Clarify bpf_arena comments. MAINTAINERS: Update email address for Quentin Monnet scripts/bpf_doc: Use silent mode when exec make cmd bpf: Temporarily disable atomic operations in BPF arena ==================== Link: https://lore.kernel.org/r/20240325213520.26688-1-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-26selftests: vxlan_mdb: Fix failures with old libnetIdo Schimmel
Locally generated IP multicast packets (such as the ones used in the test) do not perform routing and simply egress the bound device. However, as explained in commit 8bcfb4ae4d97 ("selftests: forwarding: Fix failing tests with old libnet"), old versions of libnet (used by mausezahn) do not use the "SO_BINDTODEVICE" socket option. Specifically, the library started using the option for IPv6 sockets in version 1.1.6 and for IPv4 sockets in version 1.2. This explains why on Ubuntu - which uses version 1.1.6 - the IPv4 overlay tests are failing whereas the IPv6 ones are passing. Fix by specifying the source and destination MAC of the packets which will cause mausezahn to use a packet socket instead of an IP socket. Fixes: 62199e3f1658 ("selftests: net: Add VXLAN MDB test") Reported-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr> Closes: https://lore.kernel.org/netdev/5bb50349-196d-4892-8ed2-f37543aa863f@alu.unizg.hr/ Tested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20240325075030.2379513-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-26MAINTAINERS: split Renesas Ethernet drivers entrySergey Shtylyov
Since the Renesas Ethernet Switch driver was added by Yoshihiro Shimoda, I started receiving the patches to review for it -- which I was unable to do, as I don't know this hardware and don't even have the manuals for it. Fortunately, Shimoda-san has volunteered to be a reviewer for this new driver, thus let's now split the single entry into 3 per-driver entries, each with its own reviewer... Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Link: https://lore.kernel.org/r/de0ccc1d-6fc0-583f-4f80-f70e6461d62d@omp.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-26net: dsa: mt7530: fix improper frames on all 25MHz and 40MHz XTAL MT7530Arınç ÜNAL
The MT7530 switch after reset initialises with a core clock frequency that works with a 25MHz XTAL connected to it. For 40MHz XTAL, the core clock frequency must be set to 500MHz. The mt7530_pll_setup() function is responsible of setting the core clock frequency. Currently, it runs on MT7530 with 25MHz and 40MHz XTAL. This causes MT7530 switch with 25MHz XTAL to egress and ingress frames improperly. Introduce a check to run it only on MT7530 with 40MHz XTAL. The core clock frequency is set by writing to a switch PHY's register. Access to the PHY's register is done via the MDIO bus the switch is also on. Therefore, it works only when the switch makes switch PHYs listen on the MDIO bus the switch is on. This is controlled either by the state of the ESW_P1_LED_1 pin after reset deassertion or modifying bit 5 of the modifiable trap register. When ESW_P1_LED_1 is pulled high, PHY indirect access is used. That means accessing PHY registers via the PHY indirect access control register of the switch. When ESW_P1_LED_1 is pulled low, PHY direct access is used. That means accessing PHY registers via the MDIO bus the switch is on. For MT7530 switch with 40MHz XTAL on a board with ESW_P1_LED_1 pulled high, the core clock frequency won't be set to 500MHz, causing the switch to egress and ingress frames improperly. Run mt7530_pll_setup() after PHY direct access is set on the modifiable trap register. With these two changes, all MT7530 switches with 25MHz and 40MHz, and P1_LED_1 pulled high or low, will egress and ingress frames properly. Link: https://github.com/BPI-SINOVOIP/BPI-R2-bsp/blob/4a5dd143f2172ec97a2872fa29c7c4cd520f45b5/linux-mt/drivers/net/ethernet/mediatek/gsw_mt7623.c#L1039 Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Link: https://lore.kernel.org/r/20240320-for-net-mt7530-fix-25mhz-xtal-with-direct-phy-access-v1-1-d92f605f1160@arinc9.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-25net: wwan: t7xx: Split 64bit accesses to fix alignment issuesBjørn Mork
Some of the registers are aligned on a 32bit boundary, causing alignment faults on 64bit platforms. Unable to handle kernel paging request at virtual address ffffffc084a1d004 Mem abort info: ESR = 0x0000000096000061 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x21: alignment fault Data abort info: ISV = 0, ISS = 0x00000061, ISS2 = 0x00000000 CM = 0, WnR = 1, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000046ad6000 [ffffffc084a1d004] pgd=100000013ffff003, p4d=100000013ffff003, pud=100000013ffff003, pmd=0068000020a00711 Internal error: Oops: 0000000096000061 [#1] SMP Modules linked in: mtk_t7xx(+) qcserial pppoe ppp_async option nft_fib_inet nf_flow_table_inet mt7921u(O) mt7921s(O) mt7921e(O) mt7921_common(O) iwlmvm(O) iwldvm(O) usb_wwan rndis_host qmi_wwan pppox ppp_generic nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack mt7996e(O) mt792x_usb(O) mt792x_lib(O) mt7915e(O) mt76_usb(O) mt76_sdio(O) mt76_connac_lib(O) mt76(O) mac80211(O) iwlwifi(O) huawei_cdc_ncm cfg80211(O) cdc_ncm cdc_ether wwan usbserial usbnet slhc sfp rtc_pcf8563 nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 mt6577_auxadc mdio_i2c libcrc32c compat(O) cdc_wdm cdc_acm at24 crypto_safexcel pwm_fan i2c_gpio i2c_smbus industrialio i2c_algo_bit i2c_mux_reg i2c_mux_pca954x i2c_mux_pca9541 i2c_mux_gpio i2c_mux dummy oid_registry tun sha512_arm64 sha1_ce sha1_generic seqiv md5 geniv des_generic libdes cbc authencesn authenc leds_gpio xhci_plat_hcd xhci_pci xhci_mtk_hcd xhci_hcd nvme nvme_core gpio_button_hotplug(O) dm_mirror dm_region_hash dm_log dm_crypt dm_mod dax usbcore usb_common ptp aquantia pps_core mii tpm encrypted_keys trusted CPU: 3 PID: 5266 Comm: kworker/u9:1 Tainted: G O 6.6.22 #0 Hardware name: Bananapi BPI-R4 (DT) Workqueue: md_hk_wq t7xx_fsm_uninit [mtk_t7xx] pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : t7xx_cldma_hw_set_start_addr+0x1c/0x3c [mtk_t7xx] lr : t7xx_cldma_start+0xac/0x13c [mtk_t7xx] sp : ffffffc085d63d30 x29: ffffffc085d63d30 x28: 0000000000000000 x27: 0000000000000000 x26: 0000000000000000 x25: ffffff80c804f2c0 x24: ffffff80ca196c05 x23: 0000000000000000 x22: ffffff80c814b9b8 x21: ffffff80c814b128 x20: 0000000000000001 x19: ffffff80c814b080 x18: 0000000000000014 x17: 0000000055c9806b x16: 000000007c5296d0 x15: 000000000f6bca68 x14: 00000000dbdbdce4 x13: 000000001aeaf72a x12: 0000000000000001 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 x8 : ffffff80ca1ef6b4 x7 : ffffff80c814b818 x6 : 0000000000000018 x5 : 0000000000000870 x4 : 0000000000000000 x3 : 0000000000000000 x2 : 000000010a947000 x1 : ffffffc084a1d004 x0 : ffffffc084a1d004 Call trace: t7xx_cldma_hw_set_start_addr+0x1c/0x3c [mtk_t7xx] t7xx_fsm_uninit+0x578/0x5ec [mtk_t7xx] process_one_work+0x154/0x2a0 worker_thread+0x2ac/0x488 kthread+0xe0/0xec ret_from_fork+0x10/0x20 Code: f9400800 91001000 8b214001 d50332bf (f9000022) ---[ end trace 0000000000000000 ]--- The inclusion of io-64-nonatomic-lo-hi.h indicates that all 64bit accesses can be replaced by pairs of nonatomic 32bit access. Fix alignment by forcing all accesses to be 32bit on 64bit platforms. Link: https://forum.openwrt.org/t/fibocom-fm350-gl-support/142682/72 Fixes: 39d439047f1d ("net: wwan: t7xx: Add control DMA interface") Signed-off-by: Bjørn Mork <bjorn@mork.no> Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Tested-by: Liviu Dudau <liviu@dudau.co.uk> Link: https://lore.kernel.org/r/20240322144000.1683822-1-bjorn@mork.no Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-25tcp: properly terminate timers for kernel socketsEric Dumazet
We had various syzbot reports about tcp timers firing after the corresponding netns has been dismantled. Fortunately Josef Bacik could trigger the issue more often, and could test a patch I wrote two years ago. When TCP sockets are closed, we call inet_csk_clear_xmit_timers() to 'stop' the timers. inet_csk_clear_xmit_timers() can be called from any context, including when socket lock is held. This is the reason it uses sk_stop_timer(), aka del_timer(). This means that ongoing timers might finish much later. For user sockets, this is fine because each running timer holds a reference on the socket, and the user socket holds a reference on the netns. For kernel sockets, we risk that the netns is freed before timer can complete, because kernel sockets do not hold reference on the netns. This patch adds inet_csk_clear_xmit_timers_sync() function that using sk_stop_timer_sync() to make sure all timers are terminated before the kernel socket is released. Modules using kernel sockets close them in their netns exit() handler. Also add sock_not_owned_by_me() helper to get LOCKDEP support : inet_csk_clear_xmit_timers_sync() must not be called while socket lock is held. It is very possible we can revert in the future commit 3a58f13a881e ("net: rds: acquire refcount on TCP sockets") which attempted to solve the issue in rds only. (net/smc/af_smc.c and net/mptcp/subflow.c have similar code) We probably can remove the check_net() tests from tcp_out_of_resources() and __tcp_close() in the future. Reported-by: Josef Bacik <josef@toxicpanda.com> Closes: https://lore.kernel.org/netdev/20240314210740.GA2823176@perftesting/ Fixes: 26abe14379f8 ("net: Modify sk_alloc to not reference count the netns of kernel sockets.") Fixes: 8a68173691f0 ("net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket") Link: https://lore.kernel.org/bpf/CANn89i+484ffqb93aQm1N-tjxxvb3WDKX0EbD7318RwRgsatjw@mail.gmail.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Josef Bacik <josef@toxicpanda.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: https://lore.kernel.org/r/20240322135732.1535772-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-25net: hsr: hsr_slave: Fix the promiscuous mode in offload modeRavi Gunasekaran
commit e748d0fd66ab ("net: hsr: Disable promiscuous mode in offload mode") disables promiscuous mode of slave devices while creating an HSR interface. But while deleting the HSR interface, it does not take care of it. It decreases the promiscuous mode count, which eventually enables promiscuous mode on the slave devices when creating HSR interface again. Fix this by not decrementing the promiscuous mode count while deleting the HSR interface when offload is enabled. Fixes: e748d0fd66ab ("net: hsr: Disable promiscuous mode in offload mode") Signed-off-by: Ravi Gunasekaran <r-gunasekaran@ti.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240322100447.27615-1-r-gunasekaran@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-25net: ll_temac: platform_get_resource replaced by wrong functionClaus Hansen Ries
The function platform_get_resource was replaced with devm_platform_ioremap_resource_byname and is called using 0 as name. This eventually ends up in platform_get_resource_byname in the call stack, where it causes a null pointer in strcmp. if (type == resource_type(r) && !strcmp(r->name, name)) It should have been replaced with devm_platform_ioremap_resource. Fixes: bd69058f50d5 ("net: ll_temac: Use devm_platform_ioremap_resource_byname()") Signed-off-by: Claus Hansen Ries <chr@terma.com> Cc: stable@vger.kernel.org Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/cca18f9c630a41c18487729770b492bb@terma.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-25s390/qeth: handle deferred cc1Alexandra Winter
The IO subsystem expects a driver to retry a ccw_device_start, when the subsequent interrupt response block (irb) contains a deferred condition code 1. Symptoms before this commit: On the read channel we always trigger the next read anyhow, so no different behaviour here. On the write channel we may experience timeout errors, because the expected reply will never be received without the retry. Other callers of qeth_send_control_data() may wrongly assume that the ccw was successful, which may cause problems later. Note that since commit 2297791c92d0 ("s390/cio: dont unregister subchannel from child-drivers") and commit 5ef1dc40ffa6 ("s390/cio: fix invalid -EBUSY on ccw_device_start") deferred CC1s are much more likely to occur. See the commit message of the latter for more background information. Fixes: 2297791c92d0 ("s390/cio: dont unregister subchannel from child-drivers") Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Co-developed-by: Thorsten Winkler <twinkler@linux.ibm.com> Signed-off-by: Thorsten Winkler <twinkler@linux.ibm.com> Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Link: https://lore.kernel.org/r/20240321115337.3564694-1-wintera@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-25dpll: indent DPLL option type by a tabPrasad Pandit
Indent config option type by a tab. It helps Kconfig parsers to read file without error. Fixes: 9431063ad323 ("dpll: core: Add DPLL framework base functions") Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240322114819.1801795-1-ppandit@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-25riscv, bpf: Fix kfunc parameters incompatibility between bpf and riscv abiPu Lehui
We encountered a failing case when running selftest in no_alu32 mode: The failure case is `kfunc_call/kfunc_call_test4` and its source code is like bellow: ``` long bpf_kfunc_call_test4(signed char a, short b, int c, long d) __ksym; int kfunc_call_test4(struct __sk_buff *skb) { ... tmp = bpf_kfunc_call_test4(-3, -30, -200, -1000); ... } ``` And its corresponding asm code is: ``` 0: r1 = -3 1: r2 = -30 2: r3 = 0xffffff38 # opcode: 18 03 00 00 38 ff ff ff 00 00 00 00 00 00 00 00 4: r4 = -1000 5: call bpf_kfunc_call_test4 ``` insn 2 is parsed to ld_imm64 insn to emit 0x00000000ffffff38 imm, and converted to int type and then send to bpf_kfunc_call_test4. But since it is zero-extended in the bpf calling convention, riscv jit will directly treat it as an unsigned 32-bit int value, and then fails with the message "actual 4294966063 != expected -1234". The reason is the incompatibility between bpf and riscv abi, that is, bpf will do zero-extension on uint, but riscv64 requires sign-extension on int or uint. We can solve this problem by sign extending the 32-bit parameters in kfunc. The issue is related to [0], and thanks to Yonghong and Alexei. Link: https://github.com/llvm/llvm-project/pull/84874 [0] Fixes: d40c3847b485 ("riscv, bpf: Add kfunc support for RV64") Signed-off-by: Pu Lehui <pulehui@huawei.com> Tested-by: Puranjay Mohan <puranjay12@gmail.com> Reviewed-by: Puranjay Mohan <puranjay12@gmail.com> Link: https://lore.kernel.org/r/20240324103306.2202954-1-pulehui@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-25Merge tag 'gfs2-v6.8-fix' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fix from Andreas Gruenbacher: - Fix boundary check in punch_hole * tag 'gfs2-v6.8-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix invalid metadata access in punch_hole
2024-03-25Merge tag 'v6.9-p2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes a regression that broke iwd as well as a divide by zero in iaa" * tag 'v6.9-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: iaa - Fix nr_cpus < nr_iaa case Revert "crypto: pkcs7 - remove sha1 support"
2024-03-25igc: Remove stale comment about Tx timestampingKurt Kanzenbach
The initial igc Tx timestamping implementation used only one register for retrieving Tx timestamps. Commit 3ed247e78911 ("igc: Add support for multiple in-flight TX timestamps") added support for utilizing all four of them e.g., for multiple domain support. Remove the stale comment/FIXME. Fixes: 3ed247e78911 ("igc: Add support for multiple in-flight TX timestamps") Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-25ixgbe: avoid sleeping allocation in ixgbe_ipsec_vf_add_sa()Przemek Kitszel
Change kzalloc() flags used in ixgbe_ipsec_vf_add_sa() to GFP_ATOMIC, to avoid sleeping in IRQ context. Dan Carpenter, with the help of Smatch, has found following issue: The patch eda0333ac293: "ixgbe: add VF IPsec management" from Aug 13, 2018 (linux-next), leads to the following Smatch static checker warning: drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c:917 ixgbe_ipsec_vf_add_sa() warn: sleeping in IRQ context The call tree that Smatch is worried about is: ixgbe_msix_other() <- IRQ handler -> ixgbe_msg_task() -> ixgbe_rcv_msg_from_vf() -> ixgbe_ipsec_vf_add_sa() Fixes: eda0333ac293 ("ixgbe: add VF IPsec management") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/intel-wired-lan/db31a0b0-4d9f-4e6b-aed8-88266eb5665c@moroto.mountain Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-25ice: fix memory corruption bug with suspend and rebuildJesse Brandeburg
The ice driver would previously panic after suspend. This is caused from the driver *only* calling the ice_vsi_free_q_vectors() function by itself, when it is suspending. Since commit b3e7b3a6ee92 ("ice: prevent NULL pointer deref during reload") the driver has zeroed out num_q_vectors, and only restored it in ice_vsi_cfg_def(). This further causes the ice_rebuild() function to allocate a zero length buffer, after which num_q_vectors is updated, and then the new value of num_q_vectors is used to index into the zero length buffer, which corrupts memory. The fix entails making sure all the code referencing num_q_vectors only does so after it has been reset via ice_vsi_cfg_def(). I didn't perform a full bisect, but I was able to test against 6.1.77 kernel and that ice driver works fine for suspend/resume with no panic, so sometime since then, this problem was introduced. Also clean up an un-needed init of a local variable in the function being modified. PANIC from 6.8.0-rc1: [1026674.915596] PM: suspend exit [1026675.664697] ice 0000:17:00.1: PTP reset successful [1026675.664707] ice 0000:17:00.1: 2755 msecs passed between update to cached PHC time [1026675.667660] ice 0000:b1:00.0: PTP reset successful [1026675.675944] ice 0000:b1:00.0: 2832 msecs passed between update to cached PHC time [1026677.137733] ixgbe 0000:31:00.0 ens787: NIC Link is Up 1 Gbps, Flow Control: None [1026677.190201] BUG: kernel NULL pointer dereference, address: 0000000000000010 [1026677.192753] ice 0000:17:00.0: PTP reset successful [1026677.192764] ice 0000:17:00.0: 4548 msecs passed between update to cached PHC time [1026677.197928] #PF: supervisor read access in kernel mode [1026677.197933] #PF: error_code(0x0000) - not-present page [1026677.197937] PGD 1557a7067 P4D 0 [1026677.212133] ice 0000:b1:00.1: PTP reset successful [1026677.212143] ice 0000:b1:00.1: 4344 msecs passed between update to cached PHC time [1026677.212575] [1026677.243142] Oops: 0000 [#1] PREEMPT SMP NOPTI [1026677.247918] CPU: 23 PID: 42790 Comm: kworker/23:0 Kdump: loaded Tainted: G W 6.8.0-rc1+ #1 [1026677.257989] Hardware name: Intel Corporation M50CYP2SBSTD/M50CYP2SBSTD, BIOS SE5C620.86B.01.01.0005.2202160810 02/16/2022 [1026677.269367] Workqueue: ice ice_service_task [ice] [1026677.274592] RIP: 0010:ice_vsi_rebuild_set_coalesce+0x130/0x1e0 [ice] [1026677.281421] Code: 0f 84 3a ff ff ff 41 0f b7 74 ec 02 66 89 b0 22 02 00 00 81 e6 ff 1f 00 00 e8 ec fd ff ff e9 35 ff ff ff 48 8b 43 30 49 63 ed <41> 0f b7 34 24 41 83 c5 01 48 8b 3c e8 66 89 b7 aa 02 00 00 81 e6 [1026677.300877] RSP: 0018:ff3be62a6399bcc0 EFLAGS: 00010202 [1026677.306556] RAX: ff28691e28980828 RBX: ff28691e41099828 RCX: 0000000000188000 [1026677.314148] RDX: 0000000000000000 RSI: 0000000000000010 RDI: ff28691e41099828 [1026677.321730] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [1026677.329311] R10: 0000000000000007 R11: ffffffffffffffc0 R12: 0000000000000010 [1026677.336896] R13: 0000000000000000 R14: 0000000000000000 R15: ff28691e0eaa81a0 [1026677.344472] FS: 0000000000000000(0000) GS:ff28693cbffc0000(0000) knlGS:0000000000000000 [1026677.353000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1026677.359195] CR2: 0000000000000010 CR3: 0000000128df4001 CR4: 0000000000771ef0 [1026677.366779] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [1026677.374369] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [1026677.381952] PKRU: 55555554 [1026677.385116] Call Trace: [1026677.388023] <TASK> [1026677.390589] ? __die+0x20/0x70 [1026677.394105] ? page_fault_oops+0x82/0x160 [1026677.398576] ? do_user_addr_fault+0x65/0x6a0 [1026677.403307] ? exc_page_fault+0x6a/0x150 [1026677.407694] ? asm_exc_page_fault+0x22/0x30 [1026677.412349] ? ice_vsi_rebuild_set_coalesce+0x130/0x1e0 [ice] [1026677.418614] ice_vsi_rebuild+0x34b/0x3c0 [ice] [1026677.423583] ice_vsi_rebuild_by_type+0x76/0x180 [ice] [1026677.429147] ice_rebuild+0x18b/0x520 [ice] [1026677.433746] ? delay_tsc+0x8f/0xc0 [1026677.437630] ice_do_reset+0xa3/0x190 [ice] [1026677.442231] ice_service_task+0x26/0x440 [ice] [1026677.447180] process_one_work+0x174/0x340 [1026677.451669] worker_thread+0x27e/0x390 [1026677.455890] ? __pfx_worker_thread+0x10/0x10 [1026677.460627] kthread+0xee/0x120 [1026677.464235] ? __pfx_kthread+0x10/0x10 [1026677.468445] ret_from_fork+0x2d/0x50 [1026677.472476] ? __pfx_kthread+0x10/0x10 [1026677.476671] ret_from_fork_asm+0x1b/0x30 [1026677.481050] </TASK> Fixes: b3e7b3a6ee92 ("ice: prevent NULL pointer deref during reload") Reported-by: Robert Elliott <elliott@hpe.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-25ice: Refactor FW data type and fix bitmap casting issueSteven Zou
According to the datasheet, the recipe association data is an 8-byte little-endian value. It is described as 'Bitmap of the recipe indexes associated with this profile', it is from 24 to 31 byte area in FW. Therefore, it is defined to '__le64 recipe_assoc' in struct ice_aqc_recipe_to_profile. And then fix the bitmap casting issue, as we must never ever use castings for bitmap type. Fixes: 1e0f9881ef79 ("ice: Flesh out implementation of support for SRIOV on bonded interface") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Andrii Staikov <andrii.staikov@intel.com> Reviewed-by: Jan Sokolowski <jan.sokolowski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steven Zou <steven.zou@intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-25kunit: fix wireless test dependenciesJohannes Berg
For the wireless tests, CONFIG_WLAN and CONFIG_NETDEVICES are needed, though seem to be available by default on ARCH=um, so we didn't notice this before. Add them to fix kunit running on other architectures. Fixes: 28b3df1fe6ba ("kunit: add wireless unit tests") Reported-by: Mark Brown <broonie@kernel.org> Closes: https://lore.kernel.org/r/b743a5ec-3d07-4747-85e0-2fb2ef69db7c@sirena.org.uk/ Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25net: mark racy access on sk->sk_rcvbuflinke li
sk->sk_rcvbuf in __sock_queue_rcv_skb() and __sk_receive_skb() can be changed by other threads. Mark this as benign using READ_ONCE(). This patch is aimed at reducing the number of benign races reported by KCSAN in order to focus future debugging effort on harmful races. Signed-off-by: linke li <lilinke99@qq.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-25wifi: iwlwifi: mvm: include link ID when releasing framesBenjamin Berg
When releasing frames from the reorder buffer, the link ID was not included in the RX status information. This subsequently led mac80211 to drop the frame. Change it so that the link information is set immediately when possible so that it doesn't not need to be filled in anymore when submitting the frame to mac80211. Fixes: b8a85a1d42d7 ("wifi: iwlwifi: mvm: rxmq: report link ID to mac80211") Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Tested-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240320232419.bbbd5e9bfe80.Iec1bf5c884e371f7bc5ea2534ed9ea8d3f2c0bf6@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: iwlwifi: mvm: handle debugfs names more carefullyJohannes Berg
With debugfs=off, we can get here with the dbgfs_dir being an ERR_PTR(). Instead of checking for all this, which is often flagged as a mistake, simply handle the names here more carefully by printing them, then we don't need extra checks. Also, while checking, I noticed theoretically 'buf' is too small, so fix that size as well. Cc: stable@vger.kernel.org Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218422 Fixes: c36235acb34f ("wifi: iwlwifi: mvm: rework debugfs handling") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240320232419.4dc1eb3dd015.I32f308b0356ef5bcf8d188dd98ce9b210e3ab9fd@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: iwlwifi: mvm: guard against invalid STA ID on removalBenjamin Berg
Guard against invalid station IDs in iwl_mvm_mld_rm_sta_id as that would result in out-of-bounds array accesses. This prevents issues should the driver get into a bad state during error handling. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240320232419.d523167bda9c.I1cffd86363805bf86a95d8bdfd4b438bb54baddc@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: iwlwifi: read txq->read_ptr under lockJohannes Berg
If we read txq->read_ptr without lock, we can read the same value twice, then obtain the lock, and reclaim from there to two different places, but crucially reclaim the same entry twice, resulting in the WARN_ONCE() a little later. Fix that by reading txq->read_ptr under lock. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240319100755.bf4c62196504.I978a7ca56c6bd6f1bf42c15aa923ba03366a840b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: iwlwifi: fw: don't always use FW dump trigJohannes Berg
Since the dump_data (struct iwl_fwrt_dump_data) is a union, it's not safe to unconditionally access and use the 'trig' member, it might be 'desc' instead. Access it only if it's known to be 'trig' rather than 'desc', i.e. if ini-debug is present. Cc: stable@vger.kernel.org Fixes: 0eb50c674a1e ("iwlwifi: yoyo: send hcmd to fw after dump collection completes.") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240319100755.e2976bc58b29.I72fbd6135b3623227de53d8a2bb82776066cb72b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: iwlwifi: mvm: rfi: fix potential response leaksJohannes Berg
If the rx payload length check fails, or if kmemdup() fails, we still need to free the command response. Fix that. Fixes: 21254908cbe9 ("iwlwifi: mvm: add RFI-M support") Co-authored-by: Anjaneyulu <pagadala.yesu.anjaneyulu@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240319100755.db2fa0196aa7.I116293b132502ac68a65527330fa37799694b79c@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: mac80211: correctly set active links upon TTLMAyala Beker
Fix ieee80211_ttlm_set_links() to not set all active links, but instead let the driver know that valid links status changed and select the active links properly. Fixes: 8f500fbc6c65 ("wifi: mac80211: process and save negotiated TID to Link mapping request") Signed-off-by: Ayala Beker <ayala.beker@intel.com> Reviewed-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240318184907.acddbbf39584.Ide858f95248fcb3e483c97fcaa14b0cd4e964b10@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: iwlwifi: mvm: Configure the link mapping for non-MLD FWIlan Peer
In the non MLD firmware flows, although the deflink is used, the mapping of link ID to BSS configuration was missing, which causes flows that need this mapping to crash. Fix this by adding the link ID to BSS configuration mapping to non MLD flows as well. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240311081938.0b5c361e8f0c.Ib11f41815d2efa5d1ec57f855de4c8563142987b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>