linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2025-09-16	scsi: lpfc: Remove ndlp kref decrement clause for F_Port_Ctrl in lpfc_cleanup	Justin Tee
	In lpfc_cleanup, there is an extraneous nlp_put for NPIV ports on the F_Port_Ctrl ndlp object. In cases when an ABTS is issued, the outstanding kref is needed for when a second XRI_ABORTED CQE is received. The final kref for the ndlp is designed to be decremented in lpfc_sli4_els_xri_aborted instead. Also, add a new log message to allow for future diagnostics when debugging related issues. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Message-ID: <20250915180811.137530-5-justintee8345@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-09-16	scsi: lpfc: Clean up allocated queues when queue setup mbox commands fail	Justin Tee
	lpfc_sli4_queue_setup() does not allocate memory and is used for submitting CREATE_QUEUE mailbox commands. Thus, if such mailbox commands fail we should clean up by also freeing the memory allocated for the queues with lpfc_sli4_queue_destroy(). Change the intended clean up label for the lpfc_sli4_queue_setup() error case to out_destroy_queue. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Message-ID: <20250915180811.137530-4-justintee8345@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-09-16	scsi: lpfc: Abort outstanding ELS WQEs regardless of if rmmod is in progress	Justin Tee
	Driver rmmod may take a long time when in a very large SAN environment. This is because outstanding ELS WQEs may end up taking E_D_TOV seconds to complete causing long delays. Speed this up by issuing aborts with the ia bit set so that outstanding ELS WQEs complete faster. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Message-ID: <20250915180811.137530-3-justintee8345@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-09-16	scsi: lpfc: Remove unused member variables in struct lpfc_hba and lpfc_vport	Justin Tee
	There are variables defined in struct lpfc_hba and lpfc_vport that are not used anywhere. Delete the unused variables from struct lpfc_hba and lpfc_vport. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Message-ID: <20250915180811.137530-2-justintee8345@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-09-16	Merge patch series "scsi: qla2xxx: Fix incorrect sign of error code"	Martin K. Petersen
	Qianfeng Rong <rongqianfeng@vivo.com> says: qla2x00_start_sp() returns only negative error codes or QLA_SUCCESS. Therefore, comparing its return value with positive error codes (e.g., if (_rval == EAGAIN)) causes logical errors. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-09-16	scsi: qla2xxx: Fix incorrect sign of error code in qla_nvme_xmt_ls_rsp()	Qianfeng Rong
	Change the error code EAGAIN to -EAGAIN in qla_nvme_xmt_ls_rsp() to align with qla2x00_start_sp() returning negative error codes or QLA_SUCCESS, preventing logical errors. Fixes: 875386b98857 ("scsi: qla2xxx: Add Unsolicited LS Request and Response Support for NVMe") Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com> Message-ID: <20250905075446.381139-4-rongqianfeng@vivo.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-09-16	scsi: qla2xxx: Fix incorrect sign of error code in START_SP_W_RETRIES()	Qianfeng Rong
	Change the error code EAGAIN to -EAGAIN in START_SP_W_RETRIES() to align with qla2x00_start_sp() returning negative error codes or QLA_SUCCESS, preventing logical errors. Additionally, the '_rval' variable should store negative error codes to conform to Linux kernel error code conventions. Fixes: 9803fb5d2759 ("scsi: qla2xxx: Fix task management cmd failure") Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com> Message-ID: <20250905075446.381139-3-rongqianfeng@vivo.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-09-16	scsi: qla2xxx: edif: Fix incorrect sign of error code	Qianfeng Rong
	Change the error code EAGAIN to -EAGAIN in qla24xx_sadb_update() and qla_edif_process_els() to align with qla2x00_start_sp() returning negative error codes or QLA_SUCCESS, preventing logical errors. Fixes: 0b3f3143d473 ("scsi: qla2xxx: edif: Add retry for ELS passthrough") Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com> Message-ID: <20250905075446.381139-2-rongqianfeng@vivo.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-09-16	scsi: ufs: core: Disable timestamp functionality if not supported	Bart Van Assche
	Some Kioxia UFS 4 devices do not support the qTimestamp attribute. Set the UFS_DEVICE_QUIRK_NO_TIMESTAMP_SUPPORT for these devices such that no error messages appear in the kernel log about failures to set the qTimestamp attribute. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Avri Altman <avri.altman@sandisk.com> Tested-by: Nitin Rawat <quic_nitirawa@quicinc.com> # on SM8650-QRD Reviewed-by: Nitin Rawat <quic_nitirawa@quicinc.com> Reviewed-by: Peter Wang <peter.wang@mediatek.com> Reviewed-by: Manivannan Sadhasivam <mani@kernel.org> Message-ID: <20250909190614.3531435-1-bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-09-16	net/mlx5: Lag, add net namespace support	Shay Drory
	Update the LAG implementation to support net namespace isolation. Recent devcom changes added namespace-aware client matching. Align LAG with this model so that hardware LAG forms only between mlx5 interfaces that share the same network namespace. This avoids cross-namespace interference and matches user expectations when devices are placed in different netns. Make LAG netns-aware by storing the device’s namespace in mlx5_lag and registering the devcom client with that namespace. As a result, only peers in the same netns are eligible to form a LAG. Adjust reload handling so LAG teardown/re-evaluation happens in the correct namespace context. Remove the blanket restriction that prevented devlink reload when LAG was active. Remove the reload restriction here allowing devlink reload in LAG mode is part of delivering complete netns aware LAG support: With per-netns devcom registration, reload no longer risks cross-namespace coupling. The devcom client is torn down and re-registered in the device’s current netns, and LAG is re-evaluated within that scope. The change is trivial and self-contained, and keeping it in this patch avoids splitting a feature that is functionally one unit. Only devices in same netns can form hardware LAG. devlink reload no longer fails just because LAG is active. LAG is torn down/re-created as needed within the correct namespace. No change for setups that don’t use namespaces. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1757940070-618661-5-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-16	net/mlx5: Add net namespace support to devcom	Shay Drory
	Extend the devcom framework to support namespace-aware components. The existing devcom matching logic was based solely on numeric keys, limiting its use to the global (init_net) scope or requiring clients to ignore namespaces altogether, both of which are incorrect in multi-namespace environments. This patch introduces namespace support by allowing devcom clients to provide a namespace match attribute. The devcom pairing mechanism is updated to compare the namespace, enabling proper isolation and interaction of components across different net namespaces. With this change, components that require namespace aware pairing, such as SD groups or LAG, can now work correctly in multi-namespace scenarios. In particular, this opens the way to support hardware LAG within a net namespace. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1757940070-618661-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-16	net/mlx5: Lag, move devcom registration to LAG layer	Shay Drory
	Move the devcom registration for the HCA_PORTS component from the core initialization path into the LAG logic. This better reflects the logical ownership of this component and ensures proper alignment with the LAG lifecycle. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1757940070-618661-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-16	net/mlx5: Refactor devcom to use match attributes	Shay Drory
	Refactor the devcom interface to use a match attribute structure instead of passing raw keys. This change lays the groundwork for extending devcom matching logic with additional fields like net namespace, improving its flexibility and robustness. No functional changes. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1757940070-618661-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-16	net/mlx5e: Add a miss level for ipsec crypto offload	Lama Kayal
	The cited commit adds a miss table for switchdev mode. But it uses the same level as policy table. Will hit the following error when running command: # ip xfrm state add src 192.168.1.22 dst 192.168.1.21 proto \ esp spi 1001 reqid 10001 aead 'rfc4106(gcm(aes))' \ 0x3a189a7f9374955d3817886c8587f1da3df387ff 128 \ mode tunnel offload dev enp8s0f0 dir in Error: mlx5_core: Device failed to offload this state. The dmesg error is: mlx5_core 0000:03:00.0: ipsec_miss_create:578:(pid 311797): fail to create IPsec miss_rule err=-22 Fix it by adding a new miss level to avoid the error. Fixes: 7d9e292ecd67 ("net/mlx5e: Move IPSec policy check after decryption") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Chris Mi <cmi@nvidia.com> Signed-off-by: Lama Kayal <lkayal@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1757939074-617281-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-16	net/mlx5e: Harden uplink netdev access against device unbind	Jianbo Liu
	The function mlx5_uplink_netdev_get() gets the uplink netdevice pointer from mdev->mlx5e_res.uplink_netdev. However, the netdevice can be removed and its pointer cleared when unbound from the mlx5_core.eth driver. This results in a NULL pointer, causing a kernel panic. BUG: unable to handle page fault for address: 0000000000001300 at RIP: 0010:mlx5e_vport_rep_load+0x22a/0x270 [mlx5_core] Call Trace: <TASK> mlx5_esw_offloads_rep_load+0x68/0xe0 [mlx5_core] esw_offloads_enable+0x593/0x910 [mlx5_core] mlx5_eswitch_enable_locked+0x341/0x420 [mlx5_core] mlx5_devlink_eswitch_mode_set+0x17e/0x3a0 [mlx5_core] devlink_nl_eswitch_set_doit+0x60/0xd0 genl_family_rcv_msg_doit+0xe0/0x130 genl_rcv_msg+0x183/0x290 netlink_rcv_skb+0x4b/0xf0 genl_rcv+0x24/0x40 netlink_unicast+0x255/0x380 netlink_sendmsg+0x1f3/0x420 __sock_sendmsg+0x38/0x60 __sys_sendto+0x119/0x180 do_syscall_64+0x53/0x1d0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Ensure the pointer is valid before use by checking it for NULL. If it is valid, immediately call netdev_hold() to take a reference, and preventing the netdevice from being freed while it is in use. Fixes: 7a9fb35e8c3a ("net/mlx5e: Do not reload ethernet ports when changing eswitch mode") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1757939074-617281-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-17	firewire: core: shrink critical section of fw_card spinlock in bm_work	Takashi Sakamoto
	Now fw_core_handle_bus_reset() and bm_work() are serialized. Some members of fw_card are free to access in bm_work() This commit shrinks critical section of fw_card spinlock in bm_work() Link: https://lore.kernel.org/r/20250917000347.52369-4-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2025-09-17	firewire: core: disable bus management work temporarily during updating topology	Takashi Sakamoto
	When processing selfID sequence, bus topology tree is (re)built, and some members of fw_card are determined. Once determined, the members are valid during the bus generation. The above operations are in the critical section of fw_card spin lock. Before building the bus topology, a work item is scheduled for bus manager work. The bm_work() function is invoked by the work item. The function tries to acquire the spin lock, then can be stalled until the bus topology building finishes. The bus manager should work once the members of fw_card are determined. This commit suppresses the above situation by disabling the work item during processing selfID sequence. Link: https://lore.kernel.org/r/20250917000347.52369-3-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2025-09-17	firewire: core: schedule bm_work item outside of spin lock	Takashi Sakamoto
	Before (re)building topology tree, fw_core_handle_bus_reset() schedules a work item under acquiring fw_card spin lock. The work item invokes bm_work() which acquires the spin lock at first, then can be stalled to wait until the building tree finishes. This is inconvenient. This commit moves the timing to schedule the work item after releasing the spin lock. Link: https://lore.kernel.org/r/20250917000347.52369-2-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2025-09-16	net: dsa: mv88e6xxx: clean up PTP clock during setup failure	Russell King (Oracle)
	If an error occurs during mv88e6xxx_setup() and the PTP clock has been registered, the clock will not be unregistered as mv88e6xxx_ptp_free() will not be called. mv88e6xxx_hwtstamp_free() also is not called. As mv88e6xxx_ptp_free() can cope with being called without a successful call to mv88e6xxx_ptp_setup(), and mv88e6xxx_hwtstamp_free() is empty, add both these _free() calls to the error cleanup paths in mv88e6xxx_setup(). Moreover, mv88e6xxx_teardown() should teardown setup done in mv88e6xxx_setup() - see dsa_switch_setup(). However, instead _free() are called from mv88e6xxx_remove() function that is only called when a device is unbound, which omits cleanup should a failure occur later in dsa_switch_setup(). Move the *_free() calls from mv88e6xxx_remove() to mv88e6xxx_teardown(). Note that mv88e6xxx_ptp_setup() must be called holding the reg_lock, but mv88e6xxx_ptp_free() must never be. This is especially true after commit "ptp: rework ptp_clock_unregister() to disable events". This patch does not change this, but adds a comment to that effect. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://patch.msgid.link/E1uy84w-00000005Spi-46iF@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-16	net: mvpp2: add support for hardware timestamps	Russell King
	Add support for hardware timestamps in (e.g.) the PHY by calling skb_tx_timestamp() as close as reasonably possible to the point that the hardware is instructed to send the queued packets. As this also introduces software timestamping support, report those capabilities via the .get_ts_info() method. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1uy82E-00000005Sll-0SSy@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-16	libie: fix linking with libie_{adminq,fwlog} when CONFIG_LIBIE=n	Alexander Lobakin
	Initially, libie contained only 1 module and I assumed that new modules in its folder would depend on it. However, Michał did a good job and libie_{adminq,fwlog} are completely independent, but libie/ is still traversed by Kbuild only under CONFIG_LIBIE != n. This results in undefined references with certain kernel configs. Tell Kbuild to always descend to libie/ to be able to build each module regardless of whether the basic one is enabled. If none of CONFIG_LIBIE* is set, Kbuild will just create an empty built-in.a there with no side effects. Fixes: 641585bc978e ("ixgbe: fwlog support for e610") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/202509140606.j8z3rE73-lkp@intel.com Reported-by: Breno Leitao <leitao@debian.org> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Closes: https://lore.kernel.org/CA+G9fYvH8d6pJRbHpOCMZFjgDCff3zcL_AsXL-nf5eB2smS8SA@mail.gmail.com Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250916160118.2209412-1-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-16	drm/xe: Allow error injection for xe_pxp_exec_queue_add	Daniele Ceraolo Spurio
	This will allow us to simulate this function returning an error like we do for other functions called in the exec_queue_create path. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250909221240.3711023-4-daniele.ceraolospurio@intel.com
2025-09-16	drm/xe: Fix error handling if PXP fails to start	Daniele Ceraolo Spurio
	Since the PXP start comes after __xe_exec_queue_init() has completed, we need to cleanup what was done in that function in case of a PXP start error. __xe_exec_queue_init calls the submission backend init() function, so we need to introduce an opposite for that. Unfortunately, while we already have a fini() function pointer, it performs other operations in addition to cleaning up what was done by the init(). Therefore, for clarity, the existing fini() has been renamed to destroy(), while a new fini() has been added to only clean up what was done by the init(), with the latter being called by the former (via xe_exec_queue_fini). Fixes: 72d479601d67 ("drm/xe/pxp/uapi: Add userspace and LRC support for PXP-using queues") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250909221240.3711023-3-daniele.ceraolospurio@intel.com
2025-09-16	drm/amd: Only restore cached manual clock settings in restore if OD enabled	Mario Limonciello
	If OD is not enabled then restoring cached clock settings doesn't make sense and actually leads to errors in resume. Check if enabled before restoring settings. Fixes: 4e9526924d09 ("drm/amd: Restore cached manual clock settings during resume") Reported-by: Jérôme Lécuyer <jerome.4a4c@gmail.com> Closes: https://lore.kernel.org/amd-gfx/0ffe2692-7bfa-4821-856e-dd0f18e2c32b@amd.com/T/#me6db8ddb192626360c462b7570ed7eba0c6c9733 Suggested-by: Jérôme Lécuyer <jerome.4a4c@gmail.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1a4dd33cc6e1baaa81efdbe68227a19f51c50f20) Cc: stable@vger.kernel.org
2025-09-16	drm/amdgpu: re-order and document VM code	Christian König
	Re-order fields in the VM structure and try to improve the documentation a bit. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-16	drm/amdgpu: remove check for BO reservation add assert instead	Christian König
	We should leave such checks to lockdep and not implement something manually. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-16	drm/amd/pm: Update pmfw headers for smu_v13_0_12	Asad Kamal
	Update pmfw headers for smu_v13_0_12 to include node power limit Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-16	drm/amd/pm: Rename amdgpu_hwmon_get_sensor_generic	Asad Kamal
	Rename amdgpu_hwmon_get_sensor_generic to use for generic pm interfaces Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-16	drm/amd: Only restore cached manual clock settings in restore if OD enabled	Mario Limonciello
	If OD is not enabled then restoring cached clock settings doesn't make sense and actually leads to errors in resume. Check if enabled before restoring settings. Fixes: 4e9526924d09 ("drm/amd: Restore cached manual clock settings during resume") Reported-by: Jérôme Lécuyer <jerome.4a4c@gmail.com> Closes: https://lore.kernel.org/amd-gfx/0ffe2692-7bfa-4821-856e-dd0f18e2c32b@amd.com/T/#me6db8ddb192626360c462b7570ed7eba0c6c9733 Suggested-by: Jérôme Lécuyer <jerome.4a4c@gmail.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-16	drm/amd/pm: Use devm_i2c_add_adapter() in the V14_0_2 smu	Rodrigo Siqueira
	The I2C init for V14_0_2 uses i2c_add_adapter() and i2c_del_adapter(), this commit replaces the use of these two functions with devm_i2c_add_adapter(). Notice that V14_0_2 init initializes multiple I2C buses in a loop; if something goes wrong, the previous adapters are removed, and the amdgpu load is interrupted. Since I2C init is required for the correct load of amdgpu, it is safe to rely on devm_i2c_add_adapter() to handle any previously initialized I2C adapter. Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-16	drm/amd/pm: Use devm_i2c_add_adapter() in the V13_0_6 smu	Rodrigo Siqueira
	The I2C init for V13_0_6 uses i2c_add_adapter() and i2c_del_adapter(), this commit replaces the use of these two functions with devm_i2c_add_adapter(). Notice that V13_0_6 init initializes multiple I2C buses in a loop; if something goes wrong, the previous adapters are removed, and the amdgpu load is interrupted. Since I2C init is required for the correct load of amdgpu, it is safe to rely on devm_i2c_add_adapter() to handle any previously initialized I2C adapter. Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-16	drm/amd/pm: Use devm_i2c_add_adapter() in the V13 smu	Rodrigo Siqueira
	The I2C init for SMU_V13 uses i2c_add_adapter() and i2c_del_adapter(), this commit replaces the use of these two functions with devm_i2c_add_adapter(). Notice that SMU_V13 init initializes multiple I2C buses in a loop; if something goes wrong, the previous adapters are removed, and the amdgpu load is interrupted. Since I2C init is required for the correct load of amdgpu, it is safe to rely on devm_i2c_add_adapter() to handle any previously initialized I2C adapter. Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-16	drm/amd/pm: Use devm_i2c_add_adapter() in the Sienna smu	Rodrigo Siqueira
	The I2C init for Sienna Cichlid uses i2c_add_adapter() and i2c_del_adapter(), this commit replaces the use of these two functions with devm_i2c_add_adapter(). Notice that Sienna Cichlid init initializes multiple I2C buses in a loop; if something goes wrong, the previous adapters are removed, and the amdgpu load is interrupted. Since I2C init is required for the correct load of amdgpu, it is safe to rely on devm_i2c_add_adapter() to handle any previously initialized I2C adapter. Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-16	drm/amd/pm: Use devm_i2c_add_adapter() in the Navi10 smu	Rodrigo Siqueira
	The I2C init for Navi10 uses i2c_add_adapter() and i2c_del_adapter(), this commit replaces the use of these two functions with devm_i2c_add_adapter(). Notice that Navi10 init initializes multiple I2C buses in a loop; if something goes wrong, the previous adapters are removed, and the amdgpu load is interrupted. Since I2C init is required for the correct load of amdgpu, it is safe to rely on devm_i2c_add_adapter() to handle any previously initialized I2C adapter. Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-16	drm/amd/pm: Use devm_i2c_add_adapter() in the Arcturus smu	Rodrigo Siqueira
	The I2C init for Arcturus uses i2c_add_adapter() and i2c_del_adapter(), this commit replaces the use of these two functions with devm_i2c_add_adapter(). Notice that Arcturus init initializes multiple I2C buses in a loop; if something goes wrong, the previous adapters are removed, and the amdgpu load is interrupted. Since I2C init is required for the correct load of amdgpu, it is safe to rely on devm_i2c_add_adapter() to handle any previously initialized I2C adapter. Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-16	drm/amd/pm: Use devm_i2c_add_adapter() in the i2c init	Rodrigo Siqueira
	Instead of using i2c_add_adapter() and i2c_del_adapter(), replace them with devm_i2c_add_adapter() to simplify the i2c logic. Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-16	drm/amdgpu: Use devm_i2c_add_adapter() in SMU V11	Rodrigo Siqueira
	Instead of using i2c_add_adapter() and i2c_del_adapter() in the SMU V11, use devm_i2c_add_adapter() to simplify the code path. Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-16	drm/amdgpu/amdgpu_i2c: Use devm_i2c_add_adapter instead of i2c_add_adapter	Rodrigo Siqueira
	This commit replaces i2c_add_adapter() with devm_i2c_add_adapter() and removes part of the cleanup logic since the new function handles the i2c removal. Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-16	drm/amd/display: Use devm_i2c_add_adapter to simplify i2c cleanup logic	Rodrigo Siqueira
	This commit replaces the utilization of i2c_add/del_adapter() with devm_i2c_add_adapter() to reduce the amount of boilerplate. Using devm_i2c_add_adapter() has the advantage of removing the manual manipulation of the I2C adapter. Suggested-by: Robert Beckett <bob.beckett@collabora.com> Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-16	drm/amd/display: Use kmalloc_array() instead of kmalloc()	James Flowers
	Documentation/process/deprecated.rst recommends against the use of kmalloc with dynamic size calculations due to the risk of overflow and smaller allocation being made than the caller was expecting. This could lead to buffer overflow in code similar to the memcpy in amdgpu_dm_plane_add_modifier(). Signed-off-by: James Flowers <bold.zone2373@fastmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-16	drm/amdgpu: fix userq VM validation v4	Christian König
	That was actually complete nonsense and not validating the BOs at all. The code just cleared all VM areas were it couldn't grab the lock for a BO. Try to fix this. Only compile tested at the moment. v2: fix fence slot reservation as well as pointed out by Sunil. also validate PDs, PTs, per VM BOs and update PDEs v3: grab the status_lock while working with the done list. v4: rename functions, add some comments, fix waiting for updates to complete. v4: rename amdgpu_vm_lock_done_list(), add some more comments Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-16	drm/amdgpu: reject gang submissions under SRIOV	Christian König
	Gang submission means that the kernel driver guarantees that multiple submissions are executed on the HW at the same time on different engines. Background is that those submissions then depend on each other and each can't finish stand alone. SRIOV now uses world switch to preempt submissions on the engines to allow sharing the HW resources between multiple VFs. The problem is now that the SRIOV world switch can't know about such inter dependencies and will cause a timeout if it waits for a partially running gang submission. To conclude SRIOV and gang submissions are fundamentally incompatible at the moment. For now just disable them. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-16	igc: don't fail igc_probe() on LED setup error	Kohei Enju
	When igc_led_setup() fails, igc_probe() fails and triggers kernel panic in free_netdev() since unregister_netdev() is not called. [1] This behavior can be tested using fault-injection framework, especially the failslab feature. [2] Since LED support is not mandatory, treat LED setup failures as non-fatal and continue probe with a warning message, consequently avoiding the kernel panic. [1] kernel BUG at net/core/dev.c:12047! Oops: invalid opcode: 0000 [#1] SMP NOPTI CPU: 0 UID: 0 PID: 937 Comm: repro-igc-led-e Not tainted 6.17.0-rc4-enjuk-tnguy-00865-gc4940196ab02 #64 PREEMPT(voluntary) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:free_netdev+0x278/0x2b0 [...] Call Trace: <TASK> igc_probe+0x370/0x910 local_pci_probe+0x3a/0x80 pci_device_probe+0xd1/0x200 [...] [2] #!/bin/bash -ex FAILSLAB_PATH=/sys/kernel/debug/failslab/ DEVICE=0000:00:05.0 START_ADDR=$(grep " igc_led_setup" /proc/kallsyms \ \| awk '{printf("0x%s", $1)}') END_ADDR=$(printf "0x%x" $((START_ADDR + 0x100))) echo $START_ADDR > $FAILSLAB_PATH/require-start echo $END_ADDR > $FAILSLAB_PATH/require-end echo 1 > $FAILSLAB_PATH/times echo 100 > $FAILSLAB_PATH/probability echo N > $FAILSLAB_PATH/ignore-gfp-wait echo $DEVICE > /sys/bus/pci/drivers/igc/bind Fixes: ea578703b03d ("igc: Add support for LEDs on i225/i226") Signed-off-by: Kohei Enju <enjuk@amazon.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-09-16	ixgbe: destroy aci.lock later within ixgbe_remove path	Jedrzej Jagielski
	There's another issue with aci.lock and previous patch uncovers it. aci.lock is being destroyed during removing ixgbe while some of the ixgbe closing routines are still ongoing. These routines use Admin Command Interface which require taking aci.lock which has been already destroyed what leads to call trace. [ +0.000004] DEBUG_LOCKS_WARN_ON(lock->magic != lock) [ +0.000007] WARNING: CPU: 12 PID: 10277 at kernel/locking/mutex.c:155 mutex_lock+0x5f/0x70 [ +0.000002] Call Trace: [ +0.000003] <TASK> [ +0.000006] ixgbe_aci_send_cmd+0xc8/0x220 [ixgbe] [ +0.000049] ? try_to_wake_up+0x29d/0x5d0 [ +0.000009] ixgbe_disable_rx_e610+0xc4/0x110 [ixgbe] [ +0.000032] ixgbe_disable_rx+0x3d/0x200 [ixgbe] [ +0.000027] ixgbe_down+0x102/0x3b0 [ixgbe] [ +0.000031] ixgbe_close_suspend+0x28/0x90 [ixgbe] [ +0.000028] ixgbe_close+0xfb/0x100 [ixgbe] [ +0.000025] __dev_close_many+0xae/0x220 [ +0.000005] dev_close_many+0xc2/0x1a0 [ +0.000004] ? kernfs_should_drain_open_files+0x2a/0x40 [ +0.000005] unregister_netdevice_many_notify+0x204/0xb00 [ +0.000006] ? __kernfs_remove.part.0+0x109/0x210 [ +0.000006] ? kobj_kset_leave+0x4b/0x70 [ +0.000008] unregister_netdevice_queue+0xf6/0x130 [ +0.000006] unregister_netdev+0x1c/0x40 [ +0.000005] ixgbe_remove+0x216/0x290 [ixgbe] [ +0.000021] pci_device_remove+0x42/0xb0 [ +0.000007] device_release_driver_internal+0x19c/0x200 [ +0.000008] driver_detach+0x48/0x90 [ +0.000003] bus_remove_driver+0x6d/0xf0 [ +0.000006] pci_unregister_driver+0x2e/0xb0 [ +0.000005] ixgbe_exit_module+0x1c/0xc80 [ixgbe] Same as for the previous commit, the issue has been highlighted by the commit 337369f8ce9e ("locking/mutex: Add MUTEX_WARN_ON() into fast path"). Move destroying aci.lock to the end of ixgbe_remove(), as this simply fixes the issue. Fixes: 4600cdf9f5ac ("ixgbe: Enable link management in E610 device") Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-09-16	ixgbe: initialize aci.lock before it's used	Jedrzej Jagielski
	Currently aci.lock is initialized too late. A bunch of ACI callbacks using the lock are called prior it's initialized. Commit 337369f8ce9e ("locking/mutex: Add MUTEX_WARN_ON() into fast path") highlights that issue what results in call trace. [ 4.092899] DEBUG_LOCKS_WARN_ON(lock->magic != lock) [ 4.092910] WARNING: CPU: 0 PID: 578 at kernel/locking/mutex.c:154 mutex_lock+0x6d/0x80 [ 4.098757] Call Trace: [ 4.098847] <TASK> [ 4.098922] ixgbe_aci_send_cmd+0x8c/0x1e0 [ixgbe] [ 4.099108] ? hrtimer_try_to_cancel+0x18/0x110 [ 4.099277] ixgbe_aci_get_fw_ver+0x52/0xa0 [ixgbe] [ 4.099460] ixgbe_check_fw_error+0x1fc/0x2f0 [ixgbe] [ 4.099650] ? usleep_range_state+0x69/0xd0 [ 4.099811] ? usleep_range_state+0x8c/0xd0 [ 4.099964] ixgbe_probe+0x3b0/0x12d0 [ixgbe] [ 4.100132] local_pci_probe+0x43/0xa0 [ 4.100267] work_for_cpu_fn+0x13/0x20 [ 4.101647] </TASK> Move aci.lock mutex initialization to ixgbe_sw_init() before any ACI command is sent. Along with that move also related SWFW semaphore in order to reduce size of ixgbe_probe() and that way all locks are initialized in ixgbe_sw_init(). Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Fixes: 4600cdf9f5ac ("ixgbe: Enable link management in E610 device") Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-09-16	i40e: remove redundant memory barrier when cleaning Tx descs	Maciej Fijalkowski
	i40e has a feature which writes to memory location last descriptor successfully sent. Memory barrier in i40e_clean_tx_irq() was used to avoid forward-reading descriptor fields in case DD bit was not set. Having mentioned feature in place implies that such situation will not happen as we know in advance how many descriptors HW has dealt with. Besides, this barrier placement was wrong. Idea is to have this protection after reading DD bit from HW descriptor, not before. Digging through git history showed me that indeed barrier was before DD bit check, anyways the commit introducing i40e_get_head() should have wiped it out altogether. Also, there was one commit doing s/read_barrier_depends/smp_rmb when get head feature was already in place, but it was only theoretical based on ixgbe experiences, which is different in these terms as that driver has to read DD bit from HW descriptor. Fixes: 1943d8ba9507 ("i40e/i40evf: enable hardware feature head write back") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-09-16	ice: fix Rx page leak on multi-buffer frames	Jacob Keller
	The ice_put_rx_mbuf() function handles calling ice_put_rx_buf() for each buffer in the current frame. This function was introduced as part of handling multi-buffer XDP support in the ice driver. It works by iterating over the buffers from first_desc up to 1 plus the total number of fragments in the frame, cached from before the XDP program was executed. If the hardware posts a descriptor with a size of 0, the logic used in ice_put_rx_mbuf() breaks. Such descriptors get skipped and don't get added as fragments in ice_add_xdp_frag. Since the buffer isn't counted as a fragment, we do not iterate over it in ice_put_rx_mbuf(), and thus we don't call ice_put_rx_buf(). Because we don't call ice_put_rx_buf(), we don't attempt to re-use the page or free it. This leaves a stale page in the ring, as we don't increment next_to_alloc. The ice_reuse_rx_page() assumes that the next_to_alloc has been incremented properly, and that it always points to a buffer with a NULL page. Since this function doesn't check, it will happily recycle a page over the top of the next_to_alloc buffer, losing track of the old page. Note that this leak only occurs for multi-buffer frames. The ice_put_rx_mbuf() function always handles at least one buffer, so a single-buffer frame will always get handled correctly. It is not clear precisely why the hardware hands us descriptors with a size of 0 sometimes, but it happens somewhat regularly with "jumbo frames" used by 9K MTU. To fix ice_put_rx_mbuf(), we need to make sure to call ice_put_rx_buf() on all buffers between first_desc and next_to_clean. Borrow the logic of a similar function in i40e used for this same purpose. Use the same logic also in ice_get_pgcnts(). Instead of iterating over just the number of fragments, use a loop which iterates until the current index reaches to the next_to_clean element just past the current frame. Unlike i40e, the ice_put_rx_mbuf() function does call ice_put_rx_buf() on the last buffer of the frame indicating the end of packet. For non-linear (multi-buffer) frames, we need to take care when adjusting the pagecnt_bias. An XDP program might release fragments from the tail of the frame, in which case that fragment page is already released. Only update the pagecnt_bias for the first descriptor and fragments still remaining post-XDP program. Take care to only access the shared info for fragmented buffers, as this avoids a significant cache miss. The xdp_xmit value only needs to be updated if an XDP program is run, and only once per packet. Drop the xdp_xmit pointer argument from ice_put_rx_mbuf(). Instead, set xdp_xmit in the ice_clean_rx_irq() function directly. This avoids needing to pass the argument and avoids an extra bit-wise OR for each buffer in the frame. Move the increment of the ntc local variable to ensure its updated before all calls to ice_get_pgcnts() or ice_put_rx_mbuf(), as the loop logic requires the index of the element just after the current frame. Now that we use an index pointer in the ring to identify the packet, we no longer need to track or cache the number of fragments in the rx_ring. Cc: Christoph Petrausch <christoph.petrausch@deepl.com> Cc: Jesper Dangaard Brouer <hawk@kernel.org> Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> Closes: https://lore.kernel.org/netdev/CAK8fFZ4hY6GUJNENz3wY9jaYLZXGfpr7dnZxzGMYoE44caRbgw@mail.gmail.com/ Fixes: 743bbd93cf29 ("ice: put Rx buffers after being done with current frame") Tested-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Tested-by: Priya Singh <priyax.singh@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-09-16	drm/xe: Remove duplicate header files	Yang Li
	Fix some duplicate includes in xe: ./drivers/gpu/drm/xe/xe_tlb_inval.c: xe_tlb_inval.h is included more than once. ./drivers/gpu/drm/xe/xe_pt.c: xe_tlb_inval_job.h is included more than once. While at it, also sort the include lines alphabetically. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=24705 Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=24706 Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> [Reword commit message] Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250916021039.1632766-1-yang.lee@linux.alibaba.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-16	drm/xe/guc: Return an error code if the GuC load fails	John Harrison
	Due to multiple explosion issues in the early days of the Xe driver, the GuC load was hacked to never return a failure. That prevented kernel panics and such initially, but now all it achieves is creating more confusing errors when the driver tries to submit commands to a GuC it already knows is not there. So fix that up. As a stop-gap and to help with debug of load failures due to invalid GuC init params, a wedge call had been added to the inner GuC load function. The reason being that it leaves the GuC log accessible via debugfs. However, for an end user, simply aborting the module load is much cleaner than wedging and trying to continue. The wedge blocks user submissions but it seems that various bits of the driver itself still try to submit to a dead GuC and lots of subsequent errors occur. And with regards to developers debugging why their particular code change is being rejected by the GuC, it is trivial to either add the wedge back in and hack the return code to zero again or to just do a GuC log dump to dmesg. v2: Add support for error injection testing and drop the now redundant wedge call. CC: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://lore.kernel.org/r/20250909224132.536320-1-John.C.Harrison@Intel.com
2025-09-16	nvme-pci: Add TUXEDO IBS Gen8 to Samsung sleep quirk	Georg Gottleuber
	On the TUXEDO InfinityBook S Gen8, a Samsung 990 Evo NVMe leads to a high power consumption in s2idle sleep (3.5 watts). This patch applies 'Force No Simple Suspend' quirk to achieve a sleep with a lower power consumption, typically around 1 watts. Signed-off-by: Georg Gottleuber <ggo@tuxedocomputers.com> Signed-off-by: Werner Sembach <wse@tuxedocomputers.com> Cc: stable@vger.kernel.org Signed-off-by: Keith Busch <kbusch@kernel.org>