summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-04-23crypto: scomp - Fix off-by-one bug when calculating last pageHerbert Xu
Fix off-by-one bug in the last page calculation for src and dst. Reported-by: Nhat Pham <nphamcs@gmail.com> Fixes: 2d3553ecb4e3 ("crypto: scomp - Remove support for some non-trivial SG lists") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-22virtio-net: disable delayed refill when pausing rxBui Quang Minh
When pausing rx (e.g. set up xdp, xsk pool, rx resize), we call napi_disable() on the receive queue's napi. In delayed refill_work, it also calls napi_disable() on the receive queue's napi. When napi_disable() is called on an already disabled napi, it will sleep in napi_disable_locked while still holding the netdev_lock. As a result, later napi_enable gets stuck too as it cannot acquire the netdev_lock. This leads to refill_work and the pause-then-resume tx are stuck altogether. This scenario can be reproducible by binding a XDP socket to virtio-net interface without setting up the fill ring. As a result, try_fill_recv will fail until the fill ring is set up and refill_work is scheduled. This commit adds virtnet_rx_(pause/resume)_all helpers and fixes up the virtnet_rx_resume to disable future and cancel all inflights delayed refill_work before calling napi_disable() to pause the rx. Fixes: 413f0271f396 ("net: protect NAPI enablement with netdev_lock()") Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20250417072806.18660-2-minhquangbui99@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22net: phy: leds: fix memory leakQingfang Deng
A network restart test on a router led to an out-of-memory condition, which was traced to a memory leak in the PHY LED trigger code. The root cause is misuse of the devm API. The registration function (phy_led_triggers_register) is called from phy_attach_direct, not phy_probe, and the unregister function (phy_led_triggers_unregister) is called from phy_detach, not phy_remove. This means the register and unregister functions can be called multiple times for the same PHY device, but devm-allocated memory is not freed until the driver is unbound. This also prevents kmemleak from detecting the leak, as the devm API internally stores the allocated pointer. Fix this by replacing devm_kzalloc/devm_kcalloc with standard kzalloc/kcalloc, and add the corresponding kfree calls in the unregister path. Fixes: 3928ee6485a3 ("net: phy: leds: Add support for "link" trigger") Fixes: 2e0bc452f472 ("net: phy: leds: add support for led triggers on phy link state change") Signed-off-by: Hao Guan <hao.guan@siflower.com.cn> Signed-off-by: Qingfang Deng <qingfang.deng@siflower.com.cn> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250417032557.2929427-1-dqfext@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22net: phylink: mac_link_(up|down)() clarificationsRussell King (Oracle)
As a result of an email from the fbnic author, I reviewed the phylink documentation, and I have decided to clarify the wording in the mac_link_(up|down)() kernel documentation as this was written from the point of view of mvneta/mvpp2 and is misleading. The documentation talks about forcing the link - indeed, this is what is done in the mvneta and mvpp2 drivers but not at the physical layer but the MACs idea, which has the effect of only allowing or stopping packet flow at the MAC. This "link" needs to be controlled when using a PHY or fixed link to start or stop packet flow at the MAC. However, as the MAC and PCS are tightly integrated, if the MACs idea of the link is forced down, it has the side effect that there is no way to determine that the media link has come up - in this mode, the MAC must be allowed to follow its built-in PCS so we can read the link state. Frame the documentation in more generic terms, to avoid the thought that the physical media link to the partner needs in some way to be forced up or down with these calls; it does not. If that were to be done, it would be a self-fulfilling prophecy - e.g. if the media link goes down, then mac_link_down() will be called, and if the media link is then placed into a forced down state, there is no possibility that the media link will ever come up again - clearly this is a wrong interpretation. These methods are notifications to the MAC about what has happened to the media link state - either from the PHY, or a PCS, or whatever mechanism fixed-link is using. Thus, reword them to get away from talking about changing link state to avoid confusion with media link state. This is not a change of any requirements of these methods. Also, remove the obsolete references to EEE for these methods, we now have the LPI functions for configuring the EEE parameters which renders this redundant, and also makes the passing of "phy" to the mac_link_up() function obsolete. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/E1u5Ah5-001GO1-7E@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22net: phylink: fix suspend/resume with WoL enabled and link downRussell King (Oracle)
When WoL is enabled, we update the software state in phylink to indicate that the link is down, and disable the resolver from bringing the link back up. On resume, we attempt to bring the overall state into consistency by calling the .mac_link_down() method, but this is wrong if the link was already down, as phylink strictly orders the .mac_link_up() and .mac_link_down() methods - and this would break that ordering. Fixes: f97493657c63 ("net: phylink: add suspend/resume support") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1u55Qf-0016RN-PA@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22drm/amd/display: do not copy invalid CRTC timing infoGergo Koteles
Since b255ce4388e0, it is possible that the CRTC timing information for the preferred mode has not yet been calculated while amdgpu_dm_connector_mode_valid() is running. In this case use the CRTC timing information of the actual mode. Fixes: b255ce4388e0 ("drm/amdgpu: don't change mode in amdgpu_dm_connector_mode_valid()") Closes: https://lore.kernel.org/all/ed09edb167e74167a694f4854102a3de6d2f1433.camel@irl.hu/ Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4085 Signed-off-by: Gergo Koteles <soyer@irl.hu> Reviewed-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Mark Broadworth <mark.broadworth@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 20232192a5044d1f66dcbef0a24de1bb8157738d) Cc: stable@vger.kernel.org
2025-04-22drm/amd/display: Default IPS to RCG_IN_ACTIVE_IPS2_IN_OFFLeo Li
[Why] Recent findings show negligible power savings between IPS2 and RCG during static desktop. In fact, DCN related clocks are higher when IPS2 is enabled vs RCG. RCG_IN_ACTIVE is also the default policy for another OS supported by DC, and it has faster entry/exit. [How] Remove previous logic that checked for IPS2 support, and just default to `DMUB_IPS_RCG_IN_ACTIVE_IPS2_IN_OFF`. Fixes: 199888aa25b3 ("drm/amd/display: Update IPS default mode for DCN35/DCN351") Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Mark Broadworth <mark.broadworth@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8f772d79ef39b463ead00ef6f009bebada3a9d49) Cc: stable@vger.kernel.org
2025-04-22drm/amd/display: Use 16ms AUX read interval for LTTPR with old sinksGeorge Shen
[Why/How] LTTPR are required to program DPCD 0000Eh to 0x4 (16ms) upon AUX read reply to this register. Since old Sinks witih DPCD rev 1.1 and earlier may not support this register, assume the mandatory value is programmed by the LTTPR to avoid AUX timeout issues. Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: George Shen <george.shen@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Mark Broadworth <mark.broadworth@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1594b60d74959c0680ddf777a74963c98afcdd7e)
2025-04-22drm/amd/display: Fix ACPI edid parsing on some Lenovo systemsMario Limonciello
[Why] The ACPI EDID in the BIOS of a Lenovo laptop includes 3 blocks, but dm_helpers_probe_acpi_edid() has a start that is 'char'. The 3rd block index starts after 255, so it can't be indexed properly. This leads to problems with the display when the EDID is parsed. [How] Change the variable type to 'short' so that larger values can be indexed. Cc: Renjith Pananchikkal <renjith.pananchikkal@amd.com> Reported-by: Mark Pearson <mpearson@lenovo.com> Suggested-by: David Ober <dober@lenovo.com> Fixes: c6a837088bed ("drm/amd/display: Fetch the EDID from _DDC if available for eDP") Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Mark Broadworth <mark.broadworth@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a918bb4a90d423ced2976a794f2724c362c1f063) Cc: stable@vger.kernel.org
2025-04-22drm/amdgpu: Allow P2P access through XGMIFelix Kuehling
If peer memory is accessible through XGMI, allow leaving it in VRAM rather than forcing its migration to GTT on DMABuf attachment. Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Tested-by: Hao (Claire) Zhou <hao.zhou@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 372c8d72c3680fdea3fbb2d6b089f76b4a6d596a)
2025-04-22drm/amd/display: Enable urgent latency adjustment on DCN35Nicholas Susanto
[Why] Urgent latency adjustment was disabled on DCN35 due to issues with P0 enablement on some platforms. Without urgent latency, underflows occur when doing certain high timing configurations. After testing, we found that reenabling urgent latency didn't reintroduce p0 support on multiple platforms. [How] renable urgent latency on DCN35 and setting it to 3000 Mhz. This reverts commit 3412860cc4c0c484f53f91b371483e6e4440c3e5. Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Nicholas Susanto <nsusanto@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Mark Broadworth <mark.broadworth@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit cd74ce1f0cddffb3f36d0995d0f61e89f0010738)
2025-04-22drm/amd/display: Force full update in gpu resetRoman Li
[Why] While system undergoing gpu reset always do full update to sync the dc state before and after reset. [How] Return true in should_reset_plane() if gpu reset detected Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Roman Li <Roman.Li@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Mark Broadworth <mark.broadworth@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2ba8619b9a378ad218ad6c2e2ccaee8f531e08de) Cc: stable@vger.kernel.org
2025-04-22drm/amd/display: Fix gpu reset in multidisplay configRoman Li
[Why] The indexing of stream_status in dm_gpureset_commit_state() is incorrect. That leads to asserts in multi-display configuration after gpu reset. [How] Adjust the indexing logic to align stream_status with surface_updates. Fixes: cdaae8371aa9 ("drm/amd/display: Handle GPU reset for DC block") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3808 Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Roman Li <Roman.Li@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Mark Broadworth <mark.broadworth@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d91bc901398741d317d9b55c59ca949d4bc7394b) Cc: stable@vger.kernel.org
2025-04-22drm/amdgpu: Don't pin VRAM without DMABUF_MOVE_NOTIFYFelix Kuehling
Pinning of VRAM is for peer devices that don't support dynamic attachment and move notifiers. But it requires that all such peer devices are able to access VRAM via PCIe P2P. Any device without P2P access requires migration to GTT, which fails if the memory is already pinned for another peer device. Sharing between GPUs should not require pinning in VRAM. However, if DMABUF_MOVE_NOTIFY is disabled in the kernel build, even DMABufs shared between GPUs must be pinned, which can lead to failures and functional regressions on systems where some peer GPUs are not P2P accessible. Disable VRAM pinning if move notifiers are disabled in the kernel build to fix regressions when sharing BOs between GPUs. Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Tested-by: Hao (Claire) Zhou <hao.zhou@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 05185812ae3695fe049c14847ce3cbeccff1bf2e)
2025-04-22drm/amdgpu: Use allowed_domains for pinning dmabufsFelix Kuehling
When determining the domains for pinning DMABufs, filter allowed_domains and fail with a warning if VRAM is forbidden and GTT is not an allowed domain. Fixes: f5e7fabd1f5c ("drm/amdgpu: allow pinning DMA-bufs into VRAM if all importers can do P2P") Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 3940796a6eefa555fec688a4adee5659ef9fa431)
2025-04-22Merge tag 'for-6.15-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - subpage mode fixes: - access correct object (folio) when looking up bit offset - fix assertion condition for number of blocks per folio - fix upper boundary of locking range in hole punch - zoned fixes: - fix potential deadlock caught by lockdep when zone reporting and device freeze run in parallel - fix zone write pointer mismatch and NULL pointer dereference when metadata are converted from DUP to RAID1 - fix error handling when reloc inode creation fails - in tree-checker, unify error code for header level check - block layer: add helpers to read zone capacity * tag 'for-6.15-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: zoned: skip reporting zone for new block group block: introduce zone capacity helper btrfs: tree-checker: adjust error code for header level check btrfs: fix invalid inode pointer after failure to create reloc inode btrfs: zoned: return EIO on RAID1 block group write pointer mismatch btrfs: fix the ASSERT() inside GET_SUBPAGE_BITMAP() btrfs: avoid page_lockend underflow in btrfs_punch_hole_lock_range() btrfs: subpage: access correct object when reading bitmap start in subpage_calc_start_bit()
2025-04-22Merge tag 'integrity-6.15-rc3-fix' of https://github.com/linux-integrity/linuxLinus Torvalds
Pull integrity fix from Roberto Sassu: "One performance fix to avoid unnecessarily taking the inode lock" * tag 'integrity-6.15-rc3-fix' of https://github.com/linux-integrity/linux: ima: process_measurement() needlessly takes inode_lock() on MAY_READ
2025-04-22fs: fall back to file_ref_put() for non-last referenceMateusz Guzik
This reduces the slowdown in face of multiple callers issuing close on what turns out to not be the last reference. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/20250418125756.59677-1-mjguzik@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202504171513.6d6f8a16-lkp@intel.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-22Merge patch series "fs/buffer: split pagecache lookups into atomic or blocking"Christian Brauner
Davidlohr Bueso <dave@stgolabs.net> says: This is a respin of the series[0] to address the sleep in atomic scenarios for noref migration with large folios, introduced in: 3c20917120ce61 ("block/bdev: enable large folio support for large logical block sizes") The main difference is that it removes the first patch and moves the fix (reducing the i_private_lock critical region in the migration path) to the final patch, which also introduces the new BH_Migrate flag. It also simplifies the locking scheme in patch 1 to avoid folio trylocking in the atomic lookup cases. So essentially blocking users will take the folio lock and hence wait for migration, and otherwise nonblocking callers will bail the lookup if a noref migration is on-going. Blocking callers will also benefit from potential performance gains by reducing contention on the spinlock for bdev mappings. * patches from https://lore.kernel.org/20250418015921.132400-1-dave@stgolabs.net: mm/migrate: fix sleep in atomic for large folios and buffer heads fs/ext4: use sleeping version of sb_find_get_block() fs/jbd2: use sleeping version of __find_get_block() fs/ocfs2: use sleeping version of __find_get_block() fs/buffer: use sleeping version of __find_get_block() fs/buffer: introduce sleeping flavors for pagecache lookups fs/buffer: split locking for pagecache lookups Link: https://lore.kernel.org/20250418015921.132400-1-dave@stgolabs.net Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-22mm/migrate: fix sleep in atomic for large folios and buffer headsDavidlohr Bueso
The large folio + buffer head noref migration scenarios are being naughty and blocking while holding a spinlock. As a consequence of the pagecache lookup path taking the folio lock this serializes against migration paths, so they can wait for each other. For the private_lock atomic case, a new BH_Migrate flag is introduced which enables the lookup to bail. This allows the critical region of the private_lock on the migration path to be reduced to the way it was before ebdf4de5642fb6 ("mm: migrate: fix reference check race between __find_get_block() and migration"), that is covering the count checks. The scope is always noref migration. Reported-by: kernel test robot <oliver.sang@intel.com> Reported-by: syzbot+f3c6fda1297c748a7076@syzkaller.appspotmail.com Closes: https://lore.kernel.org/oe-lkp/202503101536.27099c77-lkp@intel.com Fixes: 3c20917120ce61 ("block/bdev: enable large folio support for large logical block sizes") Reviewed-by: Jan Kara <jack@suse.cz> Co-developed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Link: https://kdevops.org/ext4/v6.15-rc2.html # [0] Link: https://lore.kernel.org/all/aAAEvcrmREWa1SKF@bombadil.infradead.org/ # [1] Link: https://lore.kernel.org/20250418015921.132400-8-dave@stgolabs.net Tested-by: kdevops@lists.linux.dev # [0] [1] Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-22fs/ext4: use sleeping version of sb_find_get_block()Davidlohr Bueso
Enable ext4_free_blocks() to use it, which has a cond_resched to begin with. Convert to the new nonatomic flavor to benefit from potential performance benefits and adapt in the future vs migration such that semantics are kept. Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Link: https://kdevops.org/ext4/v6.15-rc2.html # [0] Link: https://lore.kernel.org/all/aAAEvcrmREWa1SKF@bombadil.infradead.org/ # [1] Link: https://lore.kernel.org/20250418015921.132400-7-dave@stgolabs.net Tested-by: kdevops@lists.linux.dev Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-22fs/jbd2: use sleeping version of __find_get_block()Davidlohr Bueso
Convert to the new nonatomic flavor to benefit from potential performance benefits and adapt in the future vs migration such that semantics are kept. - jbd2_journal_revoke(): can sleep (has might_sleep() in the beginning) - jbd2_journal_cancel_revoke(): only used from do_get_write_access() and do_get_create_access() which do sleep. So can sleep. - jbd2_clear_buffer_revoked_flags() - only called from journal commit code which sleeps. So can sleep. Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Link: https://kdevops.org/ext4/v6.15-rc2.html # [0] Link: https://lore.kernel.org/all/aAAEvcrmREWa1SKF@bombadil.infradead.org/ # [1] Link: https://lore.kernel.org/20250418015921.132400-6-dave@stgolabs.net Tested-by: kdevops@lists.linux.dev Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-22fs/ocfs2: use sleeping version of __find_get_block()Davidlohr Bueso
This is a path that allows for blocking as it does IO. Convert to the new nonatomic flavor to benefit from potential performance benefits and adapt in the future vs migration such that semantics are kept. Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Link: https://kdevops.org/ext4/v6.15-rc2.html # [0] Link: https://lore.kernel.org/all/aAAEvcrmREWa1SKF@bombadil.infradead.org/ # [1] Link: https://lore.kernel.org/20250418015921.132400-5-dave@stgolabs.net Tested-by: kdevops@lists.linux.dev Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-22fs/buffer: use sleeping version of __find_get_block()Davidlohr Bueso
Convert to the new nonatomic flavor to benefit from potential performance benefits and adapt in the future vs migration such that semantics are kept. Convert write_boundary_block() which already takes the buffer lock as well as bdev_getblk() depending on the respective gpf flags. There are no changes in semantics. Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Link: https://kdevops.org/ext4/v6.15-rc2.html # [0] Link: https://lore.kernel.org/all/aAAEvcrmREWa1SKF@bombadil.infradead.org/ # [1] Link: https://lore.kernel.org/20250418015921.132400-4-dave@stgolabs.net Tested-by: kdevops@lists.linux.dev # [0] [1] Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-22fs/buffer: introduce sleeping flavors for pagecache lookupsDavidlohr Bueso
Add __find_get_block_nonatomic() and sb_find_get_block_nonatomic() calls for which users will be converted where safe. These versions will take the folio lock instead of the mapping's private_lock. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Link: https://kdevops.org/ext4/v6.15-rc2.html # [0] Link: https://lore.kernel.org/all/aAAEvcrmREWa1SKF@bombadil.infradead.org/ # [1] Link: https://lore.kernel.org/20250418015921.132400-3-dave@stgolabs.net Tested-by: kdevops@lists.linux.dev Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-22MAINTAINERS: add HFS/HFS+ maintainersViacheslav Dubeyko
Both the hfs and hfsplus filesystem have been orphaned since at least 2014, i.e., over 10 years. However, HFS/HFS+ driver needs to stay for Debian Ports as otherwise we won't be able to boot PowerMacs using GRUB because GRUB won't be usable anymore on PowerMacs with HFS/HFS+ being removed from the kernel. This patch proposes to add Viacheslav Dubeyko and John Paul Adrian Glaubitz as maintainers of HFS/HFS+ driver. Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Link: https://lore.kernel.org/20250417223507.1097186-1-slava@dubeyko.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-22fs/buffer: split locking for pagecache lookupsDavidlohr Bueso
Callers of __find_get_block() may or may not allow for blocking semantics, and is currently assumed that it will not. Layout two paths based on this. The the private_lock scheme will continued to be used for atomic contexts. Otherwise take the folio lock instead, which protects the buffers, such as vs migration and try_to_free_buffers(). Per the "hack idea", the latter can alleviate contention on the private_lock for bdev mappings. For reasons of determinism and avoid making bugs hard to reproduce, the trylocking is not attempted. No change in semantics. All lookup users still take the spinlock. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Link: https://kdevops.org/ext4/v6.15-rc2.html # [0] Link: https://lore.kernel.org/all/aAAEvcrmREWa1SKF@bombadil.infradead.org/ # [1] Link: https://lore.kernel.org/20250418015921.132400-2-dave@stgolabs.net Tested-by: kdevops@lists.linux.dev Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-22lib: Ensure prime numbers tests are included in KUnit test runsMark Brown
When the select of PRIME_MUMBERS was removed from it's KUnit test Kconfig nothing was added to the KUnit configs, meaning that when run via the KUnit runner the tests are neither built nor run. Add PRIME_NUMBERS to all_tests.config so they are enabled when the KUnit runner builds the kernel. Fixes: 3f2925174f8b ("lib/prime_numbers: KUnit test should not select PRIME_NUMBERS") Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250422-lib-fix-prime-numbers-kunit-v1-1-4278c1d4a4ae@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>
2025-04-22dma-coherent: Warn if OF reserved memory is beyond current coherent DMA maskChen-Yu Tsai
When a reserved memory region described in the device tree is attached to a device, it is expected that the device's limitations are correctly included in that description. However, if the device driver failed to implement DMA address masking or addressing beyond the default 32 bits (on arm64), then bad things could happen because the DMA address was truncated, such as playing back audio with no actual audio coming out, or DMA overwriting random blocks of kernel memory. Check against the coherent DMA mask when the memory regions are attached to the device. Give a warning when the memory region can not be covered by the mask. A warning instead of a hard error was chosen, because it is possible that existing drivers could be working fine even if they forgot to extend the coherent DMA mask. Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250421083930.374173-1-wenst@chromium.org
2025-04-22ima: process_measurement() needlessly takes inode_lock() on MAY_READFrederick Lawler
On IMA policy update, if a measure rule exists in the policy, IMA_MEASURE is set for ima_policy_flags which makes the violation_check variable always true. Coupled with a no-action on MAY_READ for a FILE_CHECK call, we're always taking the inode_lock(). This becomes a performance problem for extremely heavy read-only workloads. Therefore, prevent this only in the case there's no action to be taken. Signed-off-by: Frederick Lawler <fred@cloudflare.com> Acked-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2025-04-22xfs: remove duplicate Zoned Filesystems sections in admin-guideHans Holmberg
Remove the duplicated section and while at it, turn spaces into tabs. Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Fixes: c7b67ddc3c99 ("xfs: document zoned rt specifics in admin-guide") Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-04-22XFS: fix zoned gc threshold math for 32-bit archesCarlos Maiolino
xfs_zoned_need_gc makes use of mult_frac() to calculate the threshold for triggering the zoned garbage collector, but, turns out mult_frac() doesn't properly work with 64-bit data types and this caused build failures on some 32-bit architectures. Fix this by essentially open coding mult_frac() in a 64-bit friendly way. Notice we don't need to bother with counters underflow here because xfs_estimate_freecounter() will always return a positive value, as it leverages percpu_counter_read_positive to read such counters. Fixes: 845abeb1f06a ("xfs: add tunable threshold parameter for triggering zone GC") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202504181233.F7D9Atra-lkp@intel.com/ Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-04-22net: lwtunnel: disable BHs when requiredJustin Iurman
In lwtunnel_{output|xmit}(), dev_xmit_recursion() may be called in preemptible scope for PREEMPT kernels. This patch disables BHs before calling dev_xmit_recursion(). BHs are re-enabled only at the end, since we must ensure the same CPU is used for both dev_xmit_recursion_inc() and dev_xmit_recursion_dec() (and any other recursion levels in some cases) in order to maintain valid per-cpu counters. Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Closes: https://lore.kernel.org/netdev/CAADnVQJFWn3dBFJtY+ci6oN1pDFL=TzCmNbRgey7MdYxt_AP2g@mail.gmail.com/ Reported-by: Eduard Zingerman <eddyz87@gmail.com> Closes: https://lore.kernel.org/netdev/m2h62qwf34.fsf@gmail.com/ Fixes: 986ffb3a57c5 ("net: lwtunnel: fix recursion loops") Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250416160716.8823-1-justin.iurman@uliege.be Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22net: selftests: initialize TCP header and skb payload with zeroOleksij Rempel
Zero-initialize TCP header via memset() to avoid garbage values that may affect checksum or behavior during test transmission. Also zero-fill allocated payload and padding regions using memset() after skb_put(), ensuring deterministic content for all outgoing test packets. Fixes: 3e1e58d64c3d ("net: add generic selftest support") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Cc: stable@vger.kernel.org Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250416160125.2914724-1-o.rempel@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22dma-mapping: Fix warning reported for missing prototypeBalbir Singh
lkp reported a warning about missing prototype for a recent patch. The kernel-doc style comments are out of sync, move them to the right function. Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Christoph Hellwig <hch@lst.de> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202504190615.g9fANxHw-lkp@intel.com/ Signed-off-by: Balbir Singh <balbirs@nvidia.com> [mszyprow: reformatted subject] Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250422114034.3535515-1-balbirs@nvidia.com
2025-04-22net: phy: microchip: force IRQ polling mode for lan88xxFiona Klute
With lan88xx based devices the lan78xx driver can get stuck in an interrupt loop while bringing the device up, flooding the kernel log with messages like the following: lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped Removing interrupt support from the lan88xx PHY driver forces the driver to use polling instead, which avoids the problem. The issue has been observed with Raspberry Pi devices at least since 4.14 (see [1], bug report for their downstream kernel), as well as with Nvidia devices [2] in 2020, where disabling interrupts was the vendor-suggested workaround (together with the claim that phylib changes in 4.9 made the interrupt handling in lan78xx incompatible). Iperf reports well over 900Mbits/sec per direction with client in --dualtest mode, so there does not seem to be a significant impact on throughput (lan88xx device connected via switch to the peer). [1] https://github.com/raspberrypi/linux/issues/2447 [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11 Link: https://lore.kernel.org/0901d90d-3f20-4a10-b680-9c978e04ddda@lunn.ch Fixes: 792aec47d59d ("add microchip LAN88xx phy driver") Signed-off-by: Fiona Klute <fiona.klute@gmx.de> Cc: kernel-list@raspberrypi.com Cc: stable@vger.kernel.org Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250416102413.30654-1-fiona.klute@gmx.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22nvmet: fix out-of-bounds access in nvmet_enable_portRichard Weinberger
When trying to enable a port that has no transport configured yet, nvmet_enable_port() uses NVMF_TRTYPE_MAX (255) to query the transports array, causing an out-of-bounds access: [ 106.058694] BUG: KASAN: global-out-of-bounds in nvmet_enable_port+0x42/0x1da [ 106.058719] Read of size 8 at addr ffffffff89dafa58 by task ln/632 [...] [ 106.076026] nvmet: transport type 255 not supported Since commit 200adac75888, NVMF_TRTYPE_MAX is the default state as configured by nvmet_ports_make(). Avoid this by checking for NVMF_TRTYPE_MAX before proceeding. Fixes: 200adac75888 ("nvme: Add PCI transport type") Signed-off-by: Richard Weinberger <richard@nod.at> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
2025-04-22drm: panel: jd9365da: fix reset signal polarity in unprepareHugo Villeneuve
commit a8972d5a49b4 ("drm: panel: jd9365da-h3: fix reset signal polarity") fixed reset signal polarity in jadard_dsi_probe() and jadard_prepare(). It was not done in jadard_unprepare() because of an incorrect assumption about reset line handling in power off mode. After looking into the datasheet, it now appears that before disabling regulators, the reset line is deasserted first, and if reset_before_power_off_vcioo is true, then the reset line is asserted. Fix reset polarity by inverting gpiod_set_value() second argument in in jadard_unprepare(). Fixes: 6b818c533dd8 ("drm: panel: Add Jadard JD9365DA-H3 DSI panel") Fixes: 2b976ad760dc ("drm/panel: jd9365da: Support for kd101ne3-40ti MIPI-DSI panel") Fixes: a8972d5a49b4 ("drm: panel: jd9365da-h3: fix reset signal polarity") Cc: stable@vger.kernel.org Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250417195507.778731-1-hugo@hugovil.com Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250417195507.778731-1-hugo@hugovil.com
2025-04-22drm/meson: use unsigned long long / Hz for frequency typesMartin Blumenstingl
Christian reports that 4K output using YUV420 encoding fails with the following error: Fatal Error, invalid HDMI vclk freq 593406 Modetest shows the following: 3840x2160 59.94 3840 4016 4104 4400 2160 2168 2178 2250 593407 flags: xxxx, xxxx, drm calculated value -------------------------------------^ This indicates that there's a (1kHz) mismatch between the clock calculated by the drm framework and the meson driver. Relevant function call stack: (drm framework) -> meson_encoder_hdmi_atomic_enable() -> meson_encoder_hdmi_set_vclk() -> meson_vclk_setup() The video clock requested by the drm framework is 593407kHz. This is passed by meson_encoder_hdmi_atomic_enable() to meson_encoder_hdmi_set_vclk() and the following formula is applied: - the frequency is halved (which would be 296703.5kHz) and rounded down to the next full integer, which is 296703kHz - TMDS clock is calculated (296703kHz * 10) - video encoder clock is calculated - this needs to match a table from meson_vclk.c and so it doubles the previously halved value again (resulting in 593406kHz) - meson_vclk_setup() can't find (either directly, or by deriving it from 594000kHz * 1000 / 1001 and rounding to the closest integer value - which is 593407kHz as originally requested by the drm framework) a matching clock in it's internal table and errors out with "invalid HDMI vclk freq" Fix the division precision by switching the whole meson driver to use unsigned long long (64-bit) Hz values for clock frequencies instead of unsigned int (32-bit) kHz to fix the rouding error. Fixes: e5fab2ec9ca4 ("drm/meson: vclk: add support for YUV420 setup") Reported-by: Christian Hewitt <christianshewitt@gmail.com> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250421201300.778955-3-martin.blumenstingl@googlemail.com Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250421201300.778955-3-martin.blumenstingl@googlemail.com
2025-04-22Revert "drm/meson: vclk: fix calculation of 59.94 fractional rates"Christian Hewitt
This reverts commit bfbc68e. The patch does permit the offending YUV420 @ 59.94 phy_freq and vclk_freq mode to match in calculations. It also results in all fractional rates being unavailable for use. This was unintended and requires the patch to be reverted. Fixes: bfbc68e4d869 ("drm/meson: vclk: fix calculation of 59.94 fractional rates") Cc: stable@vger.kernel.org Signed-off-by: Christian Hewitt <christianshewitt@gmail.com> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250421201300.778955-2-martin.blumenstingl@googlemail.com Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250421201300.778955-2-martin.blumenstingl@googlemail.com
2025-04-22scsi: Improve CDL controlDamien Le Moal
With ATA devices supporting the CDL feature, using CDL requires that the feature be enabled with a SET FEATURES command. This command is issued as the translated command for the MODE SELECT command issued by scsi_cdl_enable() when the user enables CDL through the device cdl_enable sysfs attribute. However, the implementation of scsi_cdl_enable() always issues a MODE SELECT command for ATA devices when the enable argument is true, even if CDL is already enabled on the device. While this does not cause any issue with using CDL descriptors with read/write commands (the CDL feature will be enabled on the drive), issuing the MODE SELECT command even when the device CDL feature is already enabled will cause a reset of the ATA device CDL statistics log page (as defined in ACS, any CDL enable action must reset the device statistics). Avoid this needless actions (and the implied statistics log page reset) by modifying scsi_cdl_enable() to issue the MODE SELECT command to enable CDL if and only if CDL is not reported as already enabled on the device. And while at it, simplify the initialization of the is_ata boolean variable and move the declaration of the scsi mode data and sense header variables to within the scope of ATA device handling. Fixes: 1b22cfb14142 ("scsi: core: Allow enabling and disabling command duration limits") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Igor Pylypiv <ipylypiv@google.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-04-22ata: libata-scsi: Improve CDL controlDamien Le Moal
With ATA devices supporting the CDL feature, using CDL requires that the feature be enabled with a SET FEATURES command. This command is issued as the translated command for the MODE SELECT command issued by scsi_cdl_enable() when the user enables CDL through the device cdl_enable sysfs attribute. Currently, ata_mselect_control_ata_feature() always translates a MODE SELECT command for the ATA features subpage of the control mode page to a SET FEATURES command to enable or disable CDL based on the cdl_ctrl field. However, there is no need to issue the SET FEATURES command if: 1) The MODE SELECT command requests disabling CDL and CDL is already disabled. 2) The MODE SELECT command requests enabling CDL and CDL is already enabled. Fix ata_mselect_control_ata_feature() to issue the SET FEATURES command only when necessary. Since enabling CDL also implies a reset of the CDL statistics log page, avoiding useless CDL enable operations also avoids clearing the CDL statistics log. Also add debug messages to clearly signal when CDL is being enabled or disabled using a SET FEATURES command. Fixes: df60f9c64576 ("scsi: ata: libata: Add ATA feature control sub-page translation") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Igor Pylypiv <ipylypiv@google.com>
2025-04-22ata: libata-scsi: Fix ata_msense_control_ata_feature()Damien Le Moal
For the ATA features subpage of the control mode page, the T10 SAT-6 specifications state that: For a MODE SENSE command, the SATL shall return the CDL_CTRL field value that was last set by an application client. However, the function ata_msense_control_ata_feature() always sets the CDL_CTRL field to the 0x02 value to indicate support for the CDL T2A and T2B pages. This is thus incorrect and the value 0x02 must be reported only after the user enables the CDL feature, which is indicated with the ATA_DFLAG_CDL_ENABLED device flag. When this flag is not set, the CDL_CTRL field of the ATA feature subpage of the control mode page must report a value of 0x00. Fix ata_msense_control_ata_feature() to report the correct values for the CDL_CTRL field, according to the enable/disable state of the device CDL feature. Fixes: df60f9c64576 ("scsi: ata: libata: Add ATA feature control sub-page translation") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Igor Pylypiv <ipylypiv@google.com>
2025-04-22ata: libata-scsi: Fix ata_mselect_control_ata_feature() return typeDamien Le Moal
The function ata_mselect_control_ata_feature() has a return type defined as unsigned int but this function may return negative error codes, which are correctly propagated up the call chain as integers. Fix ata_mselect_control_ata_feature() to have the correct int return type. While at it, also fix a typo in this function description comment. Fixes: df60f9c64576 ("scsi: ata: libata: Add ATA feature control sub-page translation") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Igor Pylypiv <ipylypiv@google.com>
2025-04-21Merge tag 'sched_ext-for-6.15-rc3-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext fixes from Tejun Heo: - Use kvzalloc() so that large exit_dump buffer allocations don't fail easily - Remove cpu.weight / cpu.idle unimplemented warnings which are more annoying than helpful. This makes SCX_OPS_HAS_CGROUP_WEIGHT unnecessary. Mark it for deprecation * tag 'sched_ext-for-6.15-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: sched_ext: Mark SCX_OPS_HAS_CGROUP_WEIGHT for deprecation sched_ext: Remove cpu.weight / cpu.idle unimplemented warnings sched_ext: Use kvzalloc for large exit_dump allocation
2025-04-21Merge tag 'cgroup-for-6.15-rc3-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - Fix compilation in CONFIG_LOCKDEP && !CONFIG_PROVE_RCU configurations - Allow "cpuset_v2_mode" mount option for "cpuset" filesystem type to make life easier for android * tag 'cgroup-for-6.15-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup/cpuset-v1: Add missing support for cpuset_v2_mode cgroup: Fix compilation issue due to cgroup_mutex not being exported
2025-04-21Merge branch 'enetc-bug-fixes-for-bpf_xdp_adjust_head-and-bpf_xdp_adjust_tail'Jakub Kicinski
Vladimir Oltean says: ==================== ENETC bug fixes for bpf_xdp_adjust_head() and bpf_xdp_adjust_tail() It has been reported that on the ENETC driver, bpf_xdp_adjust_head() and bpf_xdp_adjust_tail() are broken in combination with the XDP_PASS verdict. I have constructed a series a simple XDP programs and tested with various packet sizes and confirmed that this is the case. Patch 3/3 fixes the core issue, which is that the sk_buff created on XDP_PASS is created by the driver as if XDP never ran, but in fact the geometry needs to be adjusted according to the delta applied by the program on the original xdp_buff. It depends on commit 539c1fba1ac7 ("xdp: add generic xdp_build_skb_from_buff()") which is not available in "stable" but perhaps should be. Patch 2/3 is a small refactor necessary for 3/3. Patch 1/3 fixes a related issue I noticed, which is that bpf_xdp_adjust_tail() with a positive offset works for linear XDP buffers, but returns an error for non-linear ones, even if there is plenty of space in the final page fragment. ==================== Link: https://patch.msgid.link/20250417120005.3288549-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-21net: enetc: fix frame corruption on bpf_xdp_adjust_head/tail() and XDP_PASSVladimir Oltean
Vlatko Markovikj reported that XDP programs attached to ENETC do not work well if they use bpf_xdp_adjust_head() or bpf_xdp_adjust_tail(), combined with the XDP_PASS verdict. A typical use case is to add or remove a VLAN tag. The resulting sk_buff passed to the stack is corrupted, because the algorithm used by the driver for XDP_PASS is to unwind the current buffer pointer in the RX ring and to re-process the current frame with enetc_build_skb() as if XDP hadn't run. That is incorrect because XDP may have modified the geometry of the buffer, which we then are completely unaware of. We are looking at a modified buffer with the original geometry. The initial reaction, both from me and from Vlatko, was to shop around the kernel for code to steal that would calculate a delta between the old and the new XDP buffer geometry, and apply that to the sk_buff too. We noticed that veth and generic xdp have such code. The headroom adjustment is pretty uncontroversial, but what turned out severely problematic is the tailroom. veth has this snippet: __skb_put(skb, off); /* positive on grow, negative on shrink */ which on first sight looks decent enough, except __skb_put() takes an "unsigned int" for the second argument, and the arithmetic seems to only work correctly by coincidence. Second issue, __skb_put() contains a SKB_LINEAR_ASSERT(). It's not a great pattern to make more widespread. The skb may still be nonlinear at that point - it only becomes linear later when resetting skb->data_len to zero. To avoid the above, bpf_prog_run_generic_xdp() does this instead: skb_set_tail_pointer(skb, xdp->data_end - xdp->data); skb->len += off; /* positive on grow, negative on shrink */ which is more open-coded, uses lower-level functions and is in general a bit too much to spread around in driver code. Then there is the snippet: if (xdp_buff_has_frags(xdp)) skb->data_len = skb_shinfo(skb)->xdp_frags_size; else skb->data_len = 0; One would have expected __pskb_trim() to be the function of choice for this task. But it's not used in veth/xdpgeneric because the extraneous fragments were _already_ freed by bpf_xdp_adjust_tail() -> bpf_xdp_frags_shrink_tail() -> ... -> __xdp_return() - the backing memory for the skb frags and the xdp frags is the same, but they don't keep individual references. In fact, that is the biggest reason why this snippet cannot be reused as-is, because ENETC temporarily constructs an skb with the original len and the original number of frags. Because the extraneous frags are already freed by bpf_xdp_adjust_tail() and returned to the page allocator, it means the entire approach of using enetc_build_skb() is questionable for XDP_PASS. To avoid that, one would need to elevate the page refcount of all frags before calling bpf_prog_run_xdp() and drop it after XDP_PASS. There are other things that are missing in ENETC's handling of XDP_PASS, like for example updating skb_shinfo(skb)->meta_len. These are all handled correctly and cleanly in commit 539c1fba1ac7 ("xdp: add generic xdp_build_skb_from_buff()"), added to net-next in Dec 2024, and in addition might even be quicker that way. I have a very strong preference towards backporting that commit for "stable", and that is what is used to fix the handling bugs. It is way too messy to go this deep into the guts of an sk_buff from the code of a device driver. Fixes: d1b15102dd16 ("net: enetc: add support for XDP_DROP and XDP_PASS") Reported-by: Vlatko Markovikj <vlatko.markovikj@etas.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20250417120005.3288549-4-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-21net: enetc: refactor bulk flipping of RX buffers to separate functionVladimir Oltean
This small snippet of code ensures that we do something with the array of RX software buffer descriptor elements after passing the skb to the stack. In this case, we see if the other half of the page is reusable, and if so, we "turn around" the buffers, making them directly usable by enetc_refill_rx_ring() without going to enetc_new_page(). We will need to perform this kind of buffer flipping from a new code path, i.e. from XDP_PASS. Currently, enetc_build_skb() does it there buffer by buffer, but in a subsequent change we will stop using enetc_build_skb() for XDP_PASS. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20250417120005.3288549-3-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-21net: enetc: register XDP RX queues with frag_sizeVladimir Oltean
At the time when bpf_xdp_adjust_tail() gained support for non-linear buffers, ENETC was already generating this kind of geometry on RX, due to its use of 2K half page buffers. Frames larger than 1472 bytes (without FCS) are stored as multi-buffer, presenting a need for multi buffer support to work properly even in standard MTU circumstances. Allow bpf_xdp_frags_increase_tail() to know the allocation size of paged data, so it can safely permit growing the tailroom of the buffer from XDP programs. Fixes: bf25146a5595 ("bpf: add frags support to the bpf_xdp_adjust_tail() API") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20250417120005.3288549-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>