summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-05-02Merge branch 'bpf-udp-exactly-once-socket-iteration'Martin KaFai Lau
Jordan Rife says: ==================== bpf: udp: Exactly-once socket iteration Both UDP and TCP socket iterators use iter->offset to track progress through a bucket, which is a measure of the number of matching sockets from the current bucket that have been seen or processed by the iterator. On subsequent iterations, if the current bucket has unprocessed items, we skip at least iter->offset matching items in the bucket before adding any remaining items to the next batch. However, iter->offset isn't always an accurate measure of "things already seen" when the underlying bucket changes between reads which can lead to repeated or skipped sockets. Instead, this series remembers the cookies of the sockets we haven't seen yet in the current bucket and resumes from the first cookie in that list that we can find on the next iteration. This series focuses on UDP socket iterators, but a later series will apply a similar approach to TCP socket iterators. To be more specific, this series replaces struct sock **batch inside struct bpf_udp_iter_state with union bpf_udp_iter_batch_item *batch, where union bpf_udp_iter_batch_item can contain either a pointer to a socket or a socket cookie. During reads, batch contains pointers to all sockets in the current batch while between reads batch contains all the cookies of the sockets in the current bucket that have yet to be processed. On subsequent reads, when iteration resumes, bpf_iter_udp_batch finds the first saved cookie that matches a socket in the bucket's socket list and picks up from there to construct the next batch. On average, assuming it's rare that the next socket disappears before the next read occurs, we should only need to scan as much as we did with the offset-based approach to find the starting point. In the case that the next socket is no longer there, we keep scanning through the saved cookies list until we find a match. The worst case is when none of the sockets from last time exist anymore, but again, this should be rare. CHANGES ======= v6 -> v7: * Move initialization of iter->state.bucket to -1 from patch five ("bpf: udp: Avoid socket skips and repeats during iteration") to patch three ("bpf: udp: Get rid of st_bucket_done") to avoid skipping the first bucket in the patch three and four (Martin). * Rename sock to sk in bpf_iter_batch_item (Martin). * Use ASSERT_OK_PTR in do_resume_test to check if counts is NULL (Martin). * goto done in do_resume_test when calloc or sock_iter_batch__open fails to make sure things are cleaned up properly, and initialize pointers to NULL explicitly to silence warnings from llvm 20 in CI. v5 -> v6: * Rework the logic in patch two ("bpf: udp: Make sure iter->batch always contains a full bucket snapshot") again to simplify it: * Only try realloc with GFP_USER one time instead of two (Alexei). * v5 introduced a second call to bpf_iter_udp_realloc_batch inside the loop to handle the GFP_ATOMIC case. In v6, move the GFP_USER case inside the loop as well, so it's all in once place. This, I feel, makes it a bit easier to understand the control flow. Consequently, it also simplifies the logic outside the loop. * Use GFP_NOWAIT instead of GFP_ATOMIC to avoid depleting memory reserves, since iterators are not critical operation (Alexei). Alexei suggested using __GFP_NOWARN as well with GFP_NOWAIT, but this is already set inside bpf_iter_udp_realloc_batch, so no change was needed there. * Introduce patch three ("bpf: udp: Get rid of st_bucket_done") to simplify things further, since with patch two, st_bucket_done == true is equivalent to iter->cur_sk == iter->end_sk. * In patch five ("bpf: udp: Avoid socket skips and repeats during iteration"), initialize iter->state.bucket to -1 so that on the first call to bpf_iter_udp_batch, the resume_bucket condition is not hit. This avoids adding a special case to the condition around bpf_iter_udp_resume for bucket zero. v4 -> v5: * Rework the logic from patch two ("bpf: udp: Make sure iter->batch always contains a full bucket snapshot") to move the handling of the GFP_ATOMIC case inside the main loop and get rid of the extra lock variable. This makes the logic clearer and makes it clearer that the bucket lock is always released (Martin). * Introduce udp_portaddr_for_each_entry_from in patch two instead of patch four ("bpf: udp: Avoid socket skips and repeats during iteration"), since patch two now needs to be able to resume list iteration from an arbitrary point in the GFP_ATOMIC case. * Similarly, introduce the memcpy inside bpf_iter_udp_realloc_batch in patch two instead of patch four, since in the GFP_ATOMIC case the new batch needs to remember the sockets from the old batch. * Use sock_gen_cookie instead of __sock_gen_cookie inside bpf_iter_udp_put_batch, since it can be called from a preemptible context (Martin). v3 -> v4: * Explicitly assign sk = NULL on !iter->end_sk exit condition (Kuniyuki). * Reword the commit message of patch two ("bpf: udp: Make sure iter->batch always contains a full bucket snapshot") to make the reasoning for GFP_ATOMIC more clear. v2 -> v3: * Guarantee that iter->batch is always a full snapshot of a bucket to prevent socket repeat scenarios [3]. This supercedes the patch from v2 that simply propagated ENOMEM up from bpf_iter_udp_batch and covers the scenario where the batch size is still too small after a realloc. * Fix up self tests (Martin) * ASSERT_EQ(nread, sizeof(out), "nread") instead of ASSERT_GE(nread, 1, "nread) in read_n. * Use ASSERT_OK and ASSERT_OK_FD in several places. * Add missing free(counts) to do_resume_test. * Move int local_port declaration to the top of do_resume_test. * Remove unnecessary guards before close and free. v1 -> v2: * Drop WARN_ON_ONCE from bpf_iter_udp_realloc_batch (Kuniyuki). * Fixed memcpy size parameter in bpf_iter_udp_realloc_batch; before it was missing sizeof(elem) * (Kuniyuki). * Move "bpf: udp: Propagate ENOMEM up from bpf_iter_udp_batch" to patch two in the series (Kuniyuki). rfc [1] -> v1: * Use hlist_entry_safe directly to retrieve the first socket in the current bucket's linked list instead of immediately breaking from udp_portaddr_for_each_entry (Martin). * Cancel iteration if bpf_iter_udp_realloc_batch() can't grab enough memory to contain a full snapshot of the current bucket to prevent unwanted skips or repeats [2]. [1]: https://lore.kernel.org/bpf/20250404220221.1665428-1-jordan@jrife.io/ [2]: https://lore.kernel.org/bpf/CABi4-ogUtMrH8-NVB6W8Xg_F_KDLq=yy-yu-tKr2udXE2Mu1Lg@mail.gmail.com/ [3]: https://lore.kernel.org/bpf/d323d417-3e8b-48af-ae94-bc28469ac0c1@linux.dev/ ==================== Link: https://patch.msgid.link/20250502161528.264630-1-jordan@jrife.io Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2025-05-02selftests/bpf: Add tests for bucket resume logic in UDP socket iteratorsJordan Rife
Introduce a set of tests that exercise various bucket resume scenarios: * remove_seen resumes iteration after removing a socket from the bucket that we've already processed. Before, with the offset-based approach, this test would have skipped an unseen socket after resuming iteration. With the cookie-based approach, we now see all sockets exactly once. * remove_unseen exercises the condition where the next socket that we would have seen is removed from the bucket before we resume iteration. This tests the scenario where we need to scan past the first cookie in our remembered cookies list to find the socket from which to resume iteration. * remove_all exercises the condition where all sockets we remembered were removed from the bucket to make sure iteration terminates and returns no more results. * add_some exercises the condition where a few, but not enough to trigger a realloc, sockets are added to the head of the current bucket between reads. Before, with the offset-based approach, this test would have repeated sockets we've already seen. With the cookie-based approach, we now see all sockets exactly once. * force_realloc exercises the condition that we need to realloc the batch on a subsequent read, since more sockets than can be held in the current batch array were added to the current bucket. This exercies the logic inside bpf_iter_udp_realloc_batch that copies cookies into the new batch to make sure nothing is skipped or repeated. Signed-off-by: Jordan Rife <jordan@jrife.io> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2025-05-02selftests/bpf: Return socket cookies from sock_iter_batch progsJordan Rife
Extend the iter_udp_soreuse and iter_tcp_soreuse programs to write the cookie of the current socket, so that we can track the identity of the sockets that the iterator has seen so far. Update the existing do_test function to account for this change to the iterator program output. At the same time, teach both programs to work with AF_INET as well. Signed-off-by: Jordan Rife <jordan@jrife.io> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2025-05-02bpf: udp: Avoid socket skips and repeats during iterationJordan Rife
Replace the offset-based approach for tracking progress through a bucket in the UDP table with one based on socket cookies. Remember the cookies of unprocessed sockets from the last batch and use this list to pick up where we left off or, in the case that the next socket disappears between reads, find the first socket after that point that still exists in the bucket and resume from there. This approach guarantees that all sockets that existed when iteration began and continue to exist throughout will be visited exactly once. Sockets that are added to the table during iteration may or may not be seen, but if they are they will be seen exactly once. Signed-off-by: Jordan Rife <jordan@jrife.io> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2025-05-02irqchip/qcom-mpm: Prevent crash when trying to handle non-wake GPIOsStephan Gerhold
On Qualcomm chipsets not all GPIOs are wakeup capable. Those GPIOs do not have a corresponding MPM pin and should not be handled inside the MPM driver. The IRQ domain hierarchy is always applied, so it's required to explicitly disconnect the hierarchy for those. The pinctrl-msm driver marks these with GPIO_NO_WAKE_IRQ. qcom-pdc has a check for this, but irq-qcom-mpm is currently missing the check. This is causing crashes when setting up interrupts for non-wake GPIOs: root@rb1:~# gpiomon -c gpiochip1 10 irq: IRQ159: trimming hierarchy from :soc@0:interrupt-controller@f200000-1 Unable to handle kernel paging request at virtual address ffff8000a1dc3820 Hardware name: Qualcomm Technologies, Inc. Robotics RB1 (DT) pc : mpm_set_type+0x80/0xcc lr : mpm_set_type+0x5c/0xcc Call trace: mpm_set_type+0x80/0xcc (P) qcom_mpm_set_type+0x64/0x158 irq_chip_set_type_parent+0x20/0x38 msm_gpio_irq_set_type+0x50/0x530 __irq_set_trigger+0x60/0x184 __setup_irq+0x304/0x6bc request_threaded_irq+0xc8/0x19c edge_detector_setup+0x260/0x364 linereq_create+0x420/0x5a8 gpio_ioctl+0x2d4/0x6c0 Fix this by copying the check for GPIO_NO_WAKE_IRQ from qcom-pdc.c, so that MPM is removed entirely from the hierarchy for non-wake GPIOs. Fixes: a6199bb514d8 ("irqchip: Add Qualcomm MPM controller driver") Reported-by: Alexey Klimov <alexey.klimov@linaro.org> Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Alexey Klimov <alexey.klimov@linaro.org> Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20250502-irq-qcom-mpm-fix-no-wake-v1-1-8a1eafcd28d4@linaro.org
2025-05-02bpf: udp: Use bpf_udp_iter_batch_item for bpf_udp_iter_state batch itemsJordan Rife
Prepare for the next patch that tracks cookies between iterations by converting struct sock **batch to union bpf_udp_iter_batch_item *batch inside struct bpf_udp_iter_state. Signed-off-by: Jordan Rife <jordan@jrife.io> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
2025-05-02Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Two minor updates, both in drivers" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ufs: core: Remove redundant query_complete trace scsi: myrb: Fix spelling mistake "statux" -> "status"
2025-05-02bpf: udp: Get rid of st_bucket_doneJordan Rife
Get rid of the st_bucket_done field to simplify UDP iterator state and logic. Before, st_bucket_done could be false if bpf_iter_udp_batch returned a partial batch; however, with the last patch ("bpf: udp: Make sure iter->batch always contains a full bucket snapshot"), st_bucket_done == true is equivalent to iter->cur_sk == iter->end_sk. Signed-off-by: Jordan Rife <jordan@jrife.io> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2025-05-02bpf: udp: Make sure iter->batch always contains a full bucket snapshotJordan Rife
Require that iter->batch always contains a full bucket snapshot. This invariant is important to avoid skipping or repeating sockets during iteration when combined with the next few patches. Before, there were two cases where a call to bpf_iter_udp_batch may only capture part of a bucket: 1. When bpf_iter_udp_realloc_batch() returns -ENOMEM [1]. 2. When more sockets are added to the bucket while calling bpf_iter_udp_realloc_batch(), making the updated batch size insufficient [2]. In cases where the batch size only covers part of a bucket, it is possible to forget which sockets were already visited, especially if we have to process a bucket in more than two batches. This forces us to choose between repeating or skipping sockets, so don't allow this: 1. Stop iteration and propagate -ENOMEM up to userspace if reallocation fails instead of continuing with a partial batch. 2. Try bpf_iter_udp_realloc_batch() with GFP_USER just as before, but if we still aren't able to capture the full bucket, call bpf_iter_udp_realloc_batch() again while holding the bucket lock to guarantee the bucket does not change. On the second attempt use GFP_NOWAIT since we hold onto the spin lock. Introduce the udp_portaddr_for_each_entry_from macro and use it instead of udp_portaddr_for_each_entry to make it possible to continue iteration from an arbitrary socket. This is required for this patch in the GFP_NOWAIT case to allow us to fill the rest of a batch starting from the middle of a bucket and the later patch which skips sockets that were already seen. Testing all scenarios directly is a bit difficult, but I did some manual testing to exercise the code paths where GFP_NOWAIT is used and where ERR_PTR(err) is returned. I used the realloc test case included later in this series to trigger a scenario where a realloc happens inside bpf_iter_udp_batch and made a small code tweak to force the first realloc attempt to allocate a too-small batch, thus requiring another attempt with GFP_NOWAIT. Some printks showed both reallocs with the tests passing: Apr 25 23:16:24 crow kernel: go again GFP_USER Apr 25 23:16:24 crow kernel: go again GFP_NOWAIT With this setup, I also forced each of the bpf_iter_udp_realloc_batch calls to return -ENOMEM to ensure that iteration ends and that the read() in userspace fails. [1]: https://lore.kernel.org/bpf/CABi4-ogUtMrH8-NVB6W8Xg_F_KDLq=yy-yu-tKr2udXE2Mu1Lg@mail.gmail.com/ [2]: https://lore.kernel.org/bpf/7ed28273-a716-4638-912d-f86f965e54bb@linux.dev/ Signed-off-by: Jordan Rife <jordan@jrife.io> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2025-05-02bpf: udp: Make mem flags configurable through bpf_iter_udp_realloc_batchJordan Rife
Prepare for the next patch which needs to be able to choose either GFP_USER or GFP_NOWAIT for calls to bpf_iter_udp_realloc_batch. Signed-off-by: Jordan Rife <jordan@jrife.io> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
2025-05-02Merge tag 'block-6.15-20250502' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - NVMe pull request via Christoph: - fix queue unquiesce check on PCI slot_reset (Keith Busch) - fix premature queue removal and I/O failover in nvme-tcp (Michael Liang) - don't restore null sk_state_change (Alistair Francis) - select CONFIG_TLS where needed (Alistair Francis) - always free derived key data (Hannes Reinecke) - more quirks (Wentao Guan) - ublk zero copy fix - ublk selftest fix for UBLK_F_NEED_GET_DATA * tag 'block-6.15-20250502' of git://git.kernel.dk/linux: nvmet-auth: always free derived key data nvmet-tcp: don't restore null sk_state_change nvmet-tcp: select CONFIG_TLS from CONFIG_NVME_TARGET_TCP_TLS nvme-tcp: select CONFIG_TLS from CONFIG_NVME_TCP_TLS nvme-tcp: fix premature queue removal and I/O failover nvme-pci: add quirks for WDC Blue SN550 15b7:5009 nvme-pci: add quirks for device 126f:1001 nvme-pci: fix queue unquiesce check on slot_reset ublk: remove the check of ublk_need_req_ref() from __ublk_check_and_get_req ublk: enhance check for register/unregister io buffer command ublk: decouple zero copy from user copy selftests: ublk: fix UBLK_F_NEED_GET_DATA
2025-05-02Merge tag 'io_uring-6.15-20250502' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring fix from Jens Axboe: "Just a single fix, annotating the fdinfo side SQ/CQ head/tail reads with data_race() as they are known racy. Only serves to silence syzbot testing, by definition these debug outputs are going to be racy as they may change as soon as we've read them" * tag 'io_uring-6.15-20250502' of git://git.kernel.dk/linux: io_uring/fdinfo: annotate racy sq/cq head/tail reads
2025-05-02Merge tag 'bcachefs-2025-05-01' of git://evilpiepirate.org/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: "Lots of assorted small fixes... - Some repair path fixes, a fix for -ENOMEM when reconstructing lots of alloc info on large filesystems, upgrade for ancient 0.14 filesystems, etc. - Various assert tweaks; assert -> ERO, ERO -> log the error in the superblock and continue - casefolding now uses d_ops like on other casefolding filesystems - fix device label create on device add, fix bucket array resize on filesystem resize - fix xattrs with FORTIFY_SOURCE builds with gcc-15/clang" * tag 'bcachefs-2025-05-01' of git://evilpiepirate.org/bcachefs: (22 commits) bcachefs: Remove incorrect __counted_by annotation bcachefs: add missing sched_annotate_sleep() bcachefs: Fix __bch2_dev_group_set() bcachefs: Kill ERO for i_blocks check in truncate bcachefs: check for inode.bi_sectors underflow bcachefs: Kill ERO in __bch2_i_sectors_acct() bcachefs: readdir fixes bcachefs: improve missing journal write device error message bcachefs: Topology error after insert is now an ERO bcachefs: Use bch2_kvmalloc() for journal keys array bcachefs: More informative error message when shutting down due to error bcachefs: btree_root_unreadable_and_scan_found_nothing autofix for non data btrees bcachefs: btree_node_data_missing is now autofix bcachefs: Don't generate alloc updates to invalid buckets bcachefs: Improve bch2_dev_bucket_missing() bcachefs: fix bch2_dev_buckets_resize() bcachefs: Add upgrade table entry from 0.14 bcachefs: Run BCH_RECOVERY_PASS_reconstruct_snapshots on missing subvol -> snapshot bcachefs: Add missing utf8_unload() bcachefs: Emit unicode version message on startup ...
2025-05-02Merge tag 'pinctrl-v6.15-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: - Fix potential NULL dereference in the i.MX driver - Fix the pull up/down resistor values in the Meson driver - Fix the mapping of the PHY LED pins in the Airhoa driver - Fix EINT interrupts on older controllers and a debounce value issue in the Mediatek driver - Fix an erronoeus PINGROUP define in the Qualcomm driver * tag 'pinctrl-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: qcom: Fix PINGROUP definition for sm8750 pinctrl: mediatek: common-v1: Fix error checking in mtk_eint_init() pinctrl: mediatek: Fix new design debounce issue pinctrl: mediatek: common-v1: Fix EINT breakage on older controllers pinctrl: airoha: fix wrong PHY LED mapping and PHY2 LED defines pinctrl: meson: define the pull up/down resistor value as 60 kOhm pinctrl: imx: Return NULL if no group is matched and found
2025-05-02Merge tag 'iommu-fixes-v6.15-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu fixes from Joerg Roedel: "ARM-SMMU fixes: - Fix broken detection of the S2FWB feature - Ensure page-size bitmap is initialised for SVA domains - Fix handling of SMMU client devices with duplicate Stream IDs - Don't fail SMMU probe if Stream IDs are aliased across clients Intel VT-d fixes: - Add quirk for IGFX device - Revert an ATS change to fix a boot failure AMD IOMMU: - Fix potential buffer overflow Core: - Fix for iommu_copy_struct_from_user()" * tag 'iommu-fixes-v6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: iommu/vt-d: Apply quirk_iommu_igfx for 8086:0044 (QM57/QS57) iommu/vt-d: Revert ATS timing change to fix boot failure iommu: Fix two issues in iommu_copy_struct_from_user() iommu/amd: Fix potential buffer overflow in parse_ivrs_acpihid iommu/arm-smmu-v3: Fail aliasing StreamIDs more gracefully iommu/arm-smmu-v3: Fix iommu_device_probe bug due to duplicated stream ids iommu/arm-smmu-v3: Fix pgsize_bit for sva domains iommu/arm-smmu-v3: Add missing S2FWB feature detection
2025-05-02Merge tag 'slab-for-6.15-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab fix from Vlastimil Babka: - Stable fix to avoid bugs due to leftover obj_ext after allocation profiling is disabled at runtime (Zhenhua Huang) * tag 'slab-for-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm, slab: clean up slab->obj_exts always
2025-05-02Merge tag 'i2c-host-fixes-6.15-rc5' of ↵Wolfram Sang
git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current i2c-host-fixes for v6.15-rc5 - imx-lpi2c: fix error handling sequence in probe
2025-05-02btrfs: open code folio_index() in btree_clear_folio_dirty_tag()Kairui Song
The folio_index() helper is only needed for mixed usage of page cache and swap cache, for pure page cache usage, the caller can just use folio->index instead. It can't be a swap cache folio here. Swap mapping may only call into fs through 'swap_rw' but btrfs does not use that method for swap. Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-02Revert "btrfs: canonicalize the device path before adding it"Qu Wenruo
This reverts commit 7e06de7c83a746e58d4701e013182af133395188. Commit 7e06de7c83a7 ("btrfs: canonicalize the device path before adding it") tries to make btrfs to use "/dev/mapper/*" name first, then any filename inside "/dev/" as the device path. This is mostly fine when there is only the root namespace involved, but when multiple namespace are involved, things can easily go wrong for the d_path() usage. As d_path() returns a file path that is namespace dependent, the resulted string may not make any sense in another namespace. Furthermore, the "/dev/" prefix checks itself is not reliable, one can still make a valid initramfs without devtmpfs, and fill all needed device nodes manually. Overall the userspace has all its might to pass whatever device path for mount, and we are not going to win the war trying to cover every corner case. So just revert that commit, and do no extra d_path() based file path sanity check. CC: stable@vger.kernel.org # 6.12+ Link: https://lore.kernel.org/linux-fsdevel/20250115185608.GA2223535@zen.localdomain/ Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-02btrfs: avoid NULL pointer dereference if no valid csum treeQu Wenruo
[BUG] When trying read-only scrub on a btrfs with rescue=idatacsums mount option, it will crash with the following call trace: BUG: kernel NULL pointer dereference, address: 0000000000000208 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page CPU: 1 UID: 0 PID: 835 Comm: btrfs Tainted: G O 6.15.0-rc3-custom+ #236 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022 RIP: 0010:btrfs_lookup_csums_bitmap+0x49/0x480 [btrfs] Call Trace: <TASK> scrub_find_fill_first_stripe+0x35b/0x3d0 [btrfs] scrub_simple_mirror+0x175/0x290 [btrfs] scrub_stripe+0x5f7/0x6f0 [btrfs] scrub_chunk+0x9a/0x150 [btrfs] scrub_enumerate_chunks+0x333/0x660 [btrfs] btrfs_scrub_dev+0x23e/0x600 [btrfs] btrfs_ioctl+0x1dcf/0x2f80 [btrfs] __x64_sys_ioctl+0x97/0xc0 do_syscall_64+0x4f/0x120 entry_SYSCALL_64_after_hwframe+0x76/0x7e [CAUSE] Mount option "rescue=idatacsums" will completely skip loading the csum tree, so that any data read will not find any data csum thus we will ignore data checksum verification. Normally call sites utilizing csum tree will check the fs state flag NO_DATA_CSUMS bit, but unfortunately scrub does not check that bit at all. This results in scrub to call btrfs_search_slot() on a NULL pointer and triggered above crash. [FIX] Check both extent and csum tree root before doing any tree search. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-02btrfs: handle empty eb->folios in num_extent_folios()Boris Burkov
num_extent_folios() unconditionally calls folio_order() on eb->folios[0]. If that is NULL this will be a segfault. It is reasonable for it to return 0 as the number of folios in the eb when the first entry is NULL, so do that instead. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-02btrfs: correct the order of prelim_ref arguments in btrfs__prelim_refGoldwyn Rodrigues
btrfs_prelim_ref() calls the old and new reference variables in the incorrect order. This causes a NULL pointer dereference because oldref is passed as NULL to trace_btrfs_prelim_ref_insert(). Note, trace_btrfs_prelim_ref_insert() is being called with newref as oldref (and oldref as NULL) on purpose in order to print out the values of newref. To reproduce: echo 1 > /sys/kernel/debug/tracing/events/btrfs/btrfs_prelim_ref_insert/enable Perform some writeback operations. Backtrace: BUG: kernel NULL pointer dereference, address: 0000000000000018 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 115949067 P4D 115949067 PUD 11594a067 PMD 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 1 UID: 0 PID: 1188 Comm: fsstress Not tainted 6.15.0-rc2-tester+ #47 PREEMPT(voluntary) 7ca2cef72d5e9c600f0c7718adb6462de8149622 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-2-gc13ff2cd-prebuilt.qemu.org 04/01/2014 RIP: 0010:trace_event_raw_event_btrfs__prelim_ref+0x72/0x130 Code: e8 43 81 9f ff 48 85 c0 74 78 4d 85 e4 0f 84 8f 00 00 00 49 8b 94 24 c0 06 00 00 48 8b 0a 48 89 48 08 48 8b 52 08 48 89 50 10 <49> 8b 55 18 48 89 50 18 49 8b 55 20 48 89 50 20 41 0f b6 55 28 88 RSP: 0018:ffffce44820077a0 EFLAGS: 00010286 RAX: ffff8c6b403f9014 RBX: ffff8c6b55825730 RCX: 304994edf9cf506b RDX: d8b11eb7f0fdb699 RSI: ffff8c6b403f9010 RDI: ffff8c6b403f9010 RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000010 R10: 00000000ffffffff R11: 0000000000000000 R12: ffff8c6b4e8fb000 R13: 0000000000000000 R14: ffffce44820077a8 R15: ffff8c6b4abd1540 FS: 00007f4dc6813740(0000) GS:ffff8c6c1d378000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000018 CR3: 000000010eb42000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <TASK> prelim_ref_insert+0x1c1/0x270 find_parent_nodes+0x12a6/0x1ee0 ? __entry_text_end+0x101f06/0x101f09 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 btrfs_is_data_extent_shared+0x167/0x640 ? fiemap_process_hole+0xd0/0x2c0 extent_fiemap+0xa5c/0xbc0 ? __entry_text_end+0x101f05/0x101f09 btrfs_fiemap+0x7e/0xd0 do_vfs_ioctl+0x425/0x9d0 __x64_sys_ioctl+0x75/0xc0 Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-02btrfs: compression: adjust cb->compressed_folios allocation typeKees Cook
In preparation for making the kmalloc() family of allocators type aware, we need to make sure that the returned type from the allocation matches the type of the variable being assigned. (Before, the allocator would always return "void *", which can be implicitly cast to any pointer type.) The assigned type is "struct folio **" but the returned type will be "struct page **". These are the same allocation size (pointer size), but the types don't match. Adjust the allocation type to match the assignment. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Kees Cook <kees@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-02Merge branch 'tools-ynl-gen-additional-c-types-and-classic-netlink-handling'Paolo Abeni
Jakub Kicinski says: ==================== tools: ynl-gen: additional C types and classic netlink handling This series is a bit of a random grab bag adding things we need to generate code for rt-link. First two patches are pretty random code cleanups. Patch 3 adds default values if the spec is missing them. Patch 4 adds support for setting Netlink request flags (NLM_F_CREATE, NLM_F_REPLACE etc.). Classic netlink uses those quite a bit. Patches 5 and 6 extend the notification handling for variations used in classic netlink. Patch 6 adds support for when notification ID is the same as the ID of the response message to GET. Next 4 patches add support for handling a couple of complex types. These are supported by the schema and Python but C code gen wasn't there. Patch 11 is a bit of a hack, it skips code related to kernel policy generation, since we don't need it for classic netlink. Patch 12 adds support for having different fixed headers per op. Something we could avoid in previous rtnetlink specs but some specs do mix. v2: https://lore.kernel.org/20250425024311.1589323-1-kuba@kernel.org v1: https://lore.kernel.org/20250424021207.1167791-1-kuba@kernel.org ==================== Link: https://patch.msgid.link/20250429154704.2613851-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-02tools: ynl: allow fixed-header to be specified per opJakub Kicinski
rtnetlink has variety of ops with different fixed headers. Detect that op fixed header is not the same as family one, and use sizeof() directly. For reverse parsing we need to pass the fixed header len along the policy (in the socket state). Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250429154704.2613851-13-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-02tools: ynl-gen: don't init enum checks for classic netlinkJakub Kicinski
rt-link has a vlan-protocols enum with: name: 8021q value: 33024 name: 8021ad value: 34984 It's nice to have, since it converts the values to strings in Python. For C, however, the codegen is trying to use enums to generate strict policy checks. Parsing such sparse enums is not possible via policies. Since for classic netlink we don't support kernel codegen and policy generation - skip the auto-generation of checks from enums. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250429154704.2613851-12-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-02tools: ynl-gen: array-nest: support binary array with exact-lenJakub Kicinski
IPv6 addresses are expressed as binary arrays since we don't have u128. Since they are not variable length, however, they are relatively easy to represent as an array of known size. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250429154704.2613851-11-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-02tools: ynl-gen: array-nest: support put for scalarJakub Kicinski
C codegen supports ArrayNest AKA indexed-array carrying scalars, but only for the netlink -> struct parsing. Support rendering from struct to netlink. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250429154704.2613851-10-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-02tools: ynl-gen: mutli-attr: support binary types with structJakub Kicinski
Binary types with struct are fixed size, relatively easy to handle for multi attr. Declare the member as a pointer. Count the members, allocate an array, copy in the data. Allow the netlink attr to be smaller or larger than our view of the struct in case the build headers are newer or older than the running kernel. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250429154704.2613851-9-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-02tools: ynl-gen: multi-attr: type gen for stringJakub Kicinski
Add support for multi attr strings (needed for link alt_names). We record the length individual strings in a len member, to do the same for multi-attr create a struct ynl_string in ynl.h and use it as a layer holding both the string and its length. Since strings may be arbitrary length dynamically allocate each individual one. Adjust arg_member and struct member to avoid spacing the double pointers to get "type **name;" rather than "type * *name;" Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250429154704.2613851-8-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-02tools: ynl-gen: support CRUD-like notifications for classic NetlinkJakub Kicinski
Allow CRUD-style notification where the notification is more like the response to the request, which can optionally be looped back onto the requesting socket. Since the notification and request are different ops in the spec, for example: - name: delrule doc: Remove an existing FIB rule attribute-set: fib-rule-attrs do: request: value: 33 attributes: *fib-rule-all - name: delrule-ntf doc: Notify a rule deletion value: 33 notify: getrule We need to find the request by ID. Ideally we'd detect this model from the spec properties, rather than assume that its what all classic netlink families do. But maybe that'd cause this model to spread and its easy to get wrong. For now assume CRUD == classic. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250429154704.2613851-7-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-02tools: ynl-gen: support using dump types for ntfJakub Kicinski
Classic Netlink has GET callbacks with no doit support, just dumps. Support using their responses in notifications. If notification points at a type which only has a dump - use the dump's type. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250429154704.2613851-6-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-02tools: ynl: let classic netlink requests specify extra nlflagsJakub Kicinski
Classic netlink makes extensive use of flags. Support specifying them the same way as attributes are specified (using a helper), for example: rt_link_newlink_req_set_nlflags(req, NLM_F_CREATE | NLM_F_ECHO); Wrap the code up in a RenderInfo predicate. I think that some genetlink families may want this, too. It should be easy to add a spec property later. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250429154704.2613851-5-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-02tools: ynl-gen: fill in missing empty attr listsJakub Kicinski
The C codegen refers to op attribute lists all over the place, without checking if they are present, even tho attribute list is technically an optional property. Add them automatically at init if missing so that we don't have to make specs longer. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250429154704.2613851-4-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-02tools: ynl-gen: factor out free_needs_iter for a structJakub Kicinski
Instead of walking the entries in the code gen add a method for the struct class to return if any of the members need an iterator. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250429154704.2613851-3-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-02tools: ynl-gen: fix comment about nested struct dictJakub Kicinski
The dict stores struct objects (of class Struct), not just a trivial set with directions. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250429154704.2613851-2-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-02Merge tag 'drm-xe-fixes-2025-05-01' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes Driver Changes: - Eustall locking fix and disabling on VF - Documentation fix kernel version supporting hwmon entries - SVM fixes on error handling Signed-off-by: Dave Airlie <airlied@redhat.com> From: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/fqkoqvo62fbkvw6xoxoxutzozqksxxudbmqacjm3durid2pkak@imlxghgrk3ob
2025-05-01drm/gpusvm: set has_dma_mapping inside mapping loopDafna Hirschfeld
The 'has_dma_mapping' flag should be set once there is a mapping so it could be unmapped in case of error. v2: - Resend for CI Fixes: 99624bdff867 ("drm/gpusvm: Add support for GPU Shared Virtual Memory") Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250428024752.881292-1-matthew.brost@intel.com (cherry picked from commit f64cf7b681af72d3f715c0d0fd72091a54471c1a) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-05-01drm/xe/hwmon: Fix kernel version documentation for temperatureLucas De Marchi
The version in the sysfs attribute should correspond to the version in which this is enabled and visible for end users. It usually doesn't correspond to the version in which the patch was developed, but rather a release that will contain it. Update them to 6.15. Fixes: dac328dea701 ("drm/xe/hwmon: expose package and vram temperature") Reported-by: Ulisses Furquim <ulisses.furquim@intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4840 Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://lore.kernel.org/r/20250421-hwmon-doc-fix-v1-1-9f68db702249@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 8500393a8e6c58e5e7c135133ad792fc6fd5b6f4) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-05-01tracing: Do not take trace_event_sem in print_event_fields()Steven Rostedt
On some paths in print_event_fields() it takes the trace_event_sem for read, even though it should always be held when the function is called. Remove the taking of that mutex and add a lockdep_assert_held_read() to make sure the trace_event_sem is held when print_event_fields() is called. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20250501224128.0b1f0571@batman.local.home Fixes: 80a76994b2d88 ("tracing: Add "fields" option to show raw trace event fields") Reported-by: syzbot+441582c1592938fccf09@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/6813ff5e.050a0220.14dd7d.001b.GAE@google.com/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-01dt-bindings: net: via-rhine: Convert to YAMLAlexey Charkov
Rewrite the textual description for the VIA Rhine platform Ethernet controller as YAML schema, and switch the filename to follow the compatible string. These are used in several VIA/WonderMedia SoCs Signed-off-by: Alexey Charkov <alchark@gmail.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20250430-rhine-binding-v2-1-4290156c0f57@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01ipv6: sr: switch to GFP_ATOMIC flag to allocate memory during seg6local LWT ↵Andrea Mayer
setup Recent updates to the locking mechanism that protects IPv6 routing tables [1] have affected the SRv6 networking subsystem. Such changes cause problems with some SRv6 Endpoints behaviors, like End.B6.Encaps and also impact SRv6 counters. Starting from commit 169fd62799e8 ("ipv6: Get rid of RTNL for SIOCADDRT and RTM_NEWROUTE."), the inet6_rtm_newroute() function no longer needs to acquire the RTNL lock for creating and configuring IPv6 routes and set up lwtunnels. The RTNL lock can be avoided because the ip6_route_add() function finishes setting up a new route in a section protected by RCU. This makes sure that no dev/nexthops can disappear during the operation. Because of this, the steps for setting up lwtunnels - i.e., calling lwtunnel_build_state() - are now done in a RCU lock section and not under the RTNL lock anymore. However, creating and configuring a lwtunnel instance in an RCU-protected section can be problematic when that tunnel needs to allocate memory using the GFP_KERNEL flag. For example, the following trace shows what happens when an SRv6 End.B6.Encaps behavior is instantiated after commit 169fd62799e8 ("ipv6: Get rid of RTNL for SIOCADDRT and RTM_NEWROUTE."): [ 3061.219696] BUG: sleeping function called from invalid context at ./include/linux/sched/mm.h:321 [ 3061.226136] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 445, name: ip [ 3061.232101] preempt_count: 0, expected: 0 [ 3061.235414] RCU nest depth: 1, expected: 0 [ 3061.238622] 1 lock held by ip/445: [ 3061.241458] #0: ffffffff83ec64a0 (rcu_read_lock){....}-{1:3}, at: ip6_route_add+0x41/0x1e0 [ 3061.248520] CPU: 1 UID: 0 PID: 445 Comm: ip Not tainted 6.15.0-rc3-micro-vm-dev-00590-ge527e891492d #2058 PREEMPT(full) [ 3061.248532] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 3061.248549] Call Trace: [ 3061.248620] <TASK> [ 3061.248633] dump_stack_lvl+0xa9/0xc0 [ 3061.248846] __might_resched+0x218/0x360 [ 3061.248871] __kmalloc_node_track_caller_noprof+0x332/0x4e0 [ 3061.248889] ? rcu_is_watching+0x3a/0x70 [ 3061.248902] ? parse_nla_srh+0x56/0xa0 [ 3061.248938] kmemdup_noprof+0x1c/0x40 [ 3061.248952] parse_nla_srh+0x56/0xa0 [ 3061.248969] seg6_local_build_state+0x2e0/0x580 [ 3061.248992] ? __lock_acquire+0xaff/0x1cd0 [ 3061.249013] ? do_raw_spin_lock+0x111/0x1d0 [ 3061.249027] ? __pfx_seg6_local_build_state+0x10/0x10 [ 3061.249068] ? lwtunnel_build_state+0xe1/0x3a0 [ 3061.249274] lwtunnel_build_state+0x10d/0x3a0 [ 3061.249303] fib_nh_common_init+0xce/0x1e0 [ 3061.249337] ? __pfx_fib_nh_common_init+0x10/0x10 [ 3061.249352] ? in6_dev_get+0xaf/0x1f0 [ 3061.249369] ? __rcu_read_unlock+0x64/0x2e0 [ 3061.249392] fib6_nh_init+0x290/0xc30 [ 3061.249422] ? __pfx_fib6_nh_init+0x10/0x10 [ 3061.249447] ? __lock_acquire+0xaff/0x1cd0 [ 3061.249459] ? _raw_spin_unlock_irqrestore+0x22/0x70 [ 3061.249624] ? ip6_route_info_create+0x423/0x520 [ 3061.249641] ? rcu_is_watching+0x3a/0x70 [ 3061.249683] ip6_route_info_create_nh+0x190/0x390 [ 3061.249715] ip6_route_add+0x71/0x1e0 [ 3061.249730] ? __pfx_inet6_rtm_newroute+0x10/0x10 [ 3061.249743] inet6_rtm_newroute+0x426/0xc50 [ 3061.249764] ? avc_has_perm_noaudit+0x13d/0x360 [ 3061.249853] ? __pfx_inet6_rtm_newroute+0x10/0x10 [ 3061.249905] ? __lock_acquire+0xaff/0x1cd0 [ 3061.249962] ? rtnetlink_rcv_msg+0x52f/0x890 [ 3061.249996] ? __pfx_inet6_rtm_newroute+0x10/0x10 [ 3061.250012] rtnetlink_rcv_msg+0x551/0x890 [ 3061.250040] ? __pfx_rtnetlink_rcv_msg+0x10/0x10 [ 3061.250065] ? __lock_acquire+0xaff/0x1cd0 [ 3061.250092] netlink_rcv_skb+0xbd/0x1f0 [ 3061.250108] ? __pfx_rtnetlink_rcv_msg+0x10/0x10 [ 3061.250124] ? __pfx_netlink_rcv_skb+0x10/0x10 [ 3061.250179] ? netlink_deliver_tap+0x10b/0x700 [ 3061.250210] netlink_unicast+0x2e7/0x410 [ 3061.250232] ? __pfx_netlink_unicast+0x10/0x10 [ 3061.250241] ? __lock_acquire+0xaff/0x1cd0 [ 3061.250280] netlink_sendmsg+0x366/0x670 [ 3061.250306] ? __pfx_netlink_sendmsg+0x10/0x10 [ 3061.250313] ? find_held_lock+0x2d/0xa0 [ 3061.250344] ? import_ubuf+0xbc/0xf0 [ 3061.250370] ? __pfx_netlink_sendmsg+0x10/0x10 [ 3061.250381] __sock_sendmsg+0x13e/0x150 [ 3061.250420] ____sys_sendmsg+0x33d/0x450 [ 3061.250442] ? __pfx_____sys_sendmsg+0x10/0x10 [ 3061.250453] ? __pfx_copy_msghdr_from_user+0x10/0x10 [ 3061.250489] ? __pfx_slab_free_after_rcu_debug+0x10/0x10 [ 3061.250514] ___sys_sendmsg+0xe5/0x160 [ 3061.250530] ? __pfx____sys_sendmsg+0x10/0x10 [ 3061.250568] ? __lock_acquire+0xaff/0x1cd0 [ 3061.250617] ? find_held_lock+0x2d/0xa0 [ 3061.250678] ? __virt_addr_valid+0x199/0x340 [ 3061.250704] ? preempt_count_sub+0xf/0xc0 [ 3061.250736] __sys_sendmsg+0xca/0x140 [ 3061.250750] ? __pfx___sys_sendmsg+0x10/0x10 [ 3061.250786] ? syscall_exit_to_user_mode+0xa2/0x1e0 [ 3061.250825] do_syscall_64+0x62/0x140 [ 3061.250844] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 3061.250855] RIP: 0033:0x7f0b042ef914 [ 3061.250868] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 48 8d 05 e9 5d 0c 00 8b 00 85 c0 75 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 41 89 d4 55 48 89 f5 53 [ 3061.250876] RSP: 002b:00007ffc2d113ef8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 3061.250885] RAX: ffffffffffffffda RBX: 00000000680f93fa RCX: 00007f0b042ef914 [ 3061.250891] RDX: 0000000000000000 RSI: 00007ffc2d113f60 RDI: 0000000000000003 [ 3061.250897] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000008 [ 3061.250902] R10: fffffffffffff26d R11: 0000000000000246 R12: 0000000000000001 [ 3061.250907] R13: 000055a961f8a520 R14: 000055a961f63eae R15: 00007ffc2d115270 [ 3061.250952] </TASK> To solve this issue, we replace the GFP_KERNEL flag with the GFP_ATOMIC one in those SRv6 Endpoints that need to allocate memory during the setup phase. This change makes sure that memory allocations are handled in a way that works with RCU critical sections. [1] - https://lore.kernel.org/all/20250418000443.43734-1-kuniyu@amazon.com/ Fixes: 169fd62799e8 ("ipv6: Get rid of RTNL for SIOCADDRT and RTM_NEWROUTE.") Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250429132453.31605-1-andrea.mayer@uniroma2.it Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01net: phy: factor out provider part from mdio_bus.cHeiner Kallweit
After 52358dd63e34 ("net: phy: remove function stubs") there's a problem if CONFIG_MDIO_BUS is set, but CONFIG_PHYLIB is not. mdiobus_scan() uses phylib functions like get_phy_device(). Bringing back the stub wouldn't make much sense, because it would allow to compile mdiobus_scan(), but the function would be unusable. The stub returned NULL, and we have the following in mdiobus_scan(): phydev = get_phy_device(bus, addr, c45); if (IS_ERR(phydev)) return phydev; So calling mdiobus_scan() w/o CONFIG_PHYLIB would cause a crash later in mdiobus_scan(). In general the PHYLIB functionality isn't optional here. Consequently, MDIO bus providers depend on PHYLIB. Therefore factor it out and build it together with the libphy core modules. In addition make all MDIO bus providers under /drivers/net/mdio depend on PHYLIB. Same applies to enetc MDIO bus provider. Note that PHYLIB selects MDIO_DEVRES, therefore we can omit this here. Fixes: 52358dd63e34 ("net: phy: remove function stubs") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202504270639.mT0lh2o1-lkp@intel.com/ Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/c74772a9-dab6-44bf-a657-389df89d85c2@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01net: ethernet: mtk_eth_soc: add support for MT7988 internal 2.5G PHYDaniel Golle
The MediaTek MT7988 SoC comes with an single built-in Ethernet PHY for 2500Base-T/1000Base-T/100Base-TX/10Base-T link partners in addition to the built-in 1GE switch. The built-in PHY only supports full duplex. Add muxes allowing to select GMAC2->2.5G PHY path and add basic support for XGMAC as the built-in 2.5G PHY is internally connected via XGMII. The XGMAC features will also be used by 5GBase-R, 10GBase-R and USXGMII SerDes modes which are going to be added once support for standalone PCS drivers is in place. In order to make use of the built-in 2.5G PHY the appropriate PHY driver as well as (proprietary) PHY firmware has to be present as well. Signed-off-by: Daniel Golle <daniel@makrotopia.org> Link: https://patch.msgid.link/9072cefbff6db969720672ec98ed5cef65e8218c.1745715380.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01r8152: use SHA-256 library API instead of crypto_shash APIEric Biggers
This user of SHA-256 does not support any other algorithm, so the crypto_shash abstraction provides no value. Just use the SHA-256 library API instead, which is much simpler and easier to use. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://patch.msgid.link/20250428191606.856198-1-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01net: phy: realtek: Add support for WOL magic packet on RTL8211FDaniel Braunwarth
The RTL8211F supports multiple WOL modes. This patch adds support for magic packets. The PHY notifies the system via the INTB/PMEB pin when a WOL event occurs. Signed-off-by: Daniel Braunwarth <daniel.braunwarth@kuka.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250429-realtek_wol-v2-1-8f84def1ef2c@kuka.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01selftests: drv-net: rss_input_xfrm: Check test prerequisites before runningGal Pressman
Ensure the following prerequisites before executing the test: 1. 'socat' is installed on the remote host. 2. Python version supports socket.SO_INCOMING_CPU (available since v3.11). Skip the test if either prerequisite is not met. Reviewed-by: Nimrod Oren <noren@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20250430054801.750646-1-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01dt-bindings: net: sun8i-emac: Add A523 EMAC0 compatibleYixun Lan
Allwinner A523 SoC variant (A527/T527) contains an "EMAC0" Ethernet MAC compatible to the A64 version. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Yixun Lan <dlan@gentoo.org> Link: https://patch.msgid.link/20250430-01-sun55i-emac0-v3-2-6fc000bbccbd@gentoo.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01Merge branch '1GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2025-04-29 (igb, igc, ixgbe, idpf) For igb: Kurt Kanzenbach adds linking of IRQs and queues to NAPI instances and adds persistent NAPI config. Lastly, he removes undesired IRQs that occur while busy polling. For igc: Kurt Kanzenbach switches the Tx mode for MQPRIO offload to harmonize the current implementation with TAPRIO. For ixgbe: Jedrzej adds separate ethtool ops for E610 devices to account for device differences. Slawomir adds devlink region support for E610 devices. For idpf: Mateusz assigns and utilizes the ptype field out of libeth_rqe_info. Michal removes unreachable code. * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: idpf: remove unreachable code from setting mailbox idpf: assign extracted ptype to struct libeth_rqe_info field ixgbe: devlink: add devlink region support for E610 ixgbe: add E610 .set_phys_id() callback implementation ixgbe: apply different rules for setting FC on E610 ixgbe: add support for ACPI WOL for E610 ixgbe: create E610 specific ethtool_ops structure igc: Change Tx mode for MQPRIO offloading igc: Limit netdev_tc calls to MQPRIO igb: Get rid of spurious interrupts igb: Add support for persistent NAPI config igb: Link queues to NAPI instances igb: Link IRQs to NAPI instances ==================== Link: https://patch.msgid.link/20250429234651.3982025-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01ksmbd: fix memory leak in parse_lease_state()Wang Zhaolong
The previous patch that added bounds check for create lease context introduced a memory leak. When the bounds check fails, the function returns NULL without freeing the previously allocated lease_ctx_info structure. This patch fixes the issue by adding kfree(lreq) before returning NULL in both boundary check cases. Fixes: bab703ed8472 ("ksmbd: add bounds check for create lease context") Signed-off-by: Wang Zhaolong <wangzhaolong1@huawei.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>