linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2025-07-10	drm/xe/bmg: Don't use WA 16023588340 and 22019338487 on VF	Michal Wajdeczko
	These workarounds are not applicable for use by the VFs. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Tested-by: Jakub Kolakowski <jakub1.kolakowski@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Signed-off-by: Jakub Kolakowski <jakub1.kolakowski@intel.com> Link: https://lore.kernel.org/r/20250710103040.375610-2-jakub1.kolakowski@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 1d2e2503e506ddc499cbb7afdc8b70bcf6fe241f) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10	drm/xe/guc: Recommend GuC v70.46.2 for BMG, LNL, DG2	Julia Filipchuk
	UAPI compatibility version 1.22.2 Resolves various bugs. Recommend newer version. Signed-off-by: Julia Filipchuk <julia.filipchuk@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250626182805.1701096-13-daniele.ceraolospurio@intel.com (cherry picked from commit 0b64addcae7f04745bc5f62d41e27268052f812e) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10	drm/xe/pm: Correct comment of xe_pm_set_vram_threshold()	Shuicheng Lin
	The parameter threshold is with size in MiB, not in bits. Correct it to avoid any confusion. v2: s/mb/MiB, s/vram/VRAM, fix return section. (Michal) Fixes: 30c399529f4c ("drm/xe: Document Xe PM component") Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://lore.kernel.org/r/20250708021450.3602087-2-shuicheng.lin@intel.com Reviewed-by: Stuart Summers <stuart.summers@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit 0efec0500117947f924e5ac83be40f96378af85a) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10	drm/xe: Release runtime pm for error path of xe_devcoredump_read()	Shuicheng Lin
	xe_pm_runtime_put() is missed to be called for the error path in xe_devcoredump_read(). Add function description comments for xe_devcoredump_read() to help understand it. v2: more detail function comments and refine goto logic (Matt) Fixes: c4a2e5f865b7 ("drm/xe: Add devcoredump chunking") Cc: stable@vger.kernel.org Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250707004911.3502904-6-shuicheng.lin@intel.com (cherry picked from commit 017ef1228d735965419ff118fe1b89089e772c42) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10	drm/xe/pm: Restore display pm if there is error after display suspend	Shuicheng Lin
	xe_bo_evict_all() is called after xe_display_pm_suspend(). So if there is error with xe_bo_evict_all(), display pm should be restored. Fixes: 51462211f4a9 ("drm/xe/pxp: add PXP PM support") Fixes: cb8f81c17531 ("drm/xe/display: Make display suspend/resume work on discrete") Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250708035424.3608190-2-shuicheng.lin@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit 83dcee17855c4e5af037ae3262809036de127903) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10	drm/xe/sriov: Mark BMG as SR-IOV capable	Michal Wajdeczko
	Enable SR-IOV support for BMG platforms. Note that as other flags from the platform descriptor, it only means it may have that capability: it still depends on runtime checks for the proper support in HW and firmware. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Tested-by: Jakub Kolakowski <jakub1.kolakowski@intel.com> Signed-off-by: Jakub Kolakowski <jakub1.kolakowski@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Link: https://lore.kernel.org/r/20250710103040.375610-3-jakub1.kolakowski@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10	drm/xe: extend Wa_15015404425 to apply to PTL	Matt Atwood
	Wa_15015404425 only needs to be applied on PTL platforms with an A step compute die. There is no way to map PCI revid to the compute die stepping. The easiest way to figure out compute die stepping our end is to map the media IP's stepping to the compute die. For PTL, compute die has an A stepping if and only if the media IP's stepping is also A-step (This relationship is determined on a per platform basis and just happens to be this way on PTL). In addition this workaround is a chicken-and-egg problem. Wa_15015404425 requires that all register reads be preceded by four dummy MMIO writes (including during early driver init and even pre-OS firmware). The driver needs to perform some MMIO reads during init which include the GMD_ID register that contains the Media IPs stepping. To handle this in the safest manner assume the workaround applies to all of PTL during driver probe and deactivate the workaround after. The overall solution becomes a set of two workarounds: * 15015404425 - a Device OOB workaround that's always active for PTL * 15015404425_disable - a GT OOB workaround that applies to PTL platfroms with a B0 or later stepping The first of these workarounds issues dummy MMIO writes we do when reading registers. The second guards logic that disables the first once we have the necessary information later in the probe process. v2: rename SoC to device, avoid null pointer dereference, update commit message. v3: rebase v5: move disable check into xe_device_probe to avoid linking in xe_wa into xe_pci, reword commit message v6: squash extension and b0 support into 1 patch Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://lore.kernel.org/r/20250709221605.172516-7-matthew.s.atwood@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10	drm/xe: Move Wa_15015404425 to use the new XE_DEVICE_WA macro	Matt Atwood
	Move Wa_15015404425 to use the new implemented OOB macro XE_DEVICE_WA() v2: rename from SoC to Device v5: move workaround call back into the flush call v6: remove redundant commenting Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://lore.kernel.org/r/20250709221605.172516-6-matthew.s.atwood@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10	drm/xe: Add infrastructure for Device OOB workarounds	Matt Atwood
	Some workarounds need to be able to be applied ahead of any GT initialization for example 15015404425. This patch creates XE_DEVICE_WA macro, in the same vein as XE_WA. This macro can be used ahead of GT initialization, and can be tracked in sysfs. This should alleviate some of the complexities that exist in i915. v2: name change SoC to Device, address style issues v5: split into separate patch from RTP changes, put oob within a struct, move the initiation of oob workarounds into xe_device_probe_early(), clean up the comments around XE_WA. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://lore.kernel.org/r/20250709221605.172516-5-matthew.s.atwood@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10	drm/xe: add new type to RTP context	Matt Atwood
	Prepare the RTP context to be used before GT init. Add the xe device as a type, put WARN_ONs to protect existing RTP_MATCHes. v5: split out into separate patch, change definition order v6: catch missing cases for checking gt init Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://lore.kernel.org/r/20250709221605.172516-4-matthew.s.atwood@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10	drm/xe: add xe_device_wa infrastructure	Matt Atwood
	There are some workarounds that must be appplied before gt init, wa_15015404425 for example. Instead of sprinking them conditionally throughout the driver as we did for i915 generate an oob.rules file reusing the RTP infrastructure to make these easier to track. v2: rename xe_soc_wa to xe_device_wa v5: derive prefix from argument rather than hard coding the values. v6: split out xe_gen-wa_oob changes Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250709221605.172516-3-matthew.s.atwood@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10	drm/xe: prepare xe_gen_wa_oob to be multi-use	Matt Atwood
	There is a need for additional oob rules files. Make the current gen file more robust to support more files. Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250709221605.172516-2-matthew.s.atwood@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10	drm/amdgpu: Fix lifetime of struct amdgpu_task_info after ring reset	André Almeida
	When a ring reset happens, amdgpu calls drm_dev_wedged_event() using struct amdgpu_task_info ti as one of the arguments. After using ti, a call to amdgpu_vm_put_task_info(ti) is required to correctly track its lifetime. However, it's called from a place that the ring reset path never reaches due to a goto after drm_dev_wedged_event() is called. Move amdgpu_vm_put_task_info() bellow the exit label to make sure that it's called regardless of the code path. amdgpu_vm_put_task_info() can only accept a valid address or NULL as argument, so initialise ti to make sure we can call this function if ti isn't used. Fixes: a72002cb181f ("drm/amdgpu: Make use of drm_wedge_task_info") Reported-by: Dave Airlie <airlied@gmail.com> Closes: https://lore.kernel.org/dri-devel/CAPM=9tz0rQP8VZWKWyuF8kUMqRScxqoa6aVdwWw9=5yYxyYQ2Q@mail.gmail.com/ Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250704030629.1064397-1-andrealmeid@igalia.com Signed-off-by: André Almeida <andrealmeid@igalia.com>
2025-07-10	drm/xe/guc: Cancel ongoing H2G requests when stopping CT	Michal Wajdeczko
	Once we have started a GT reset sequence, which includes stopping GuC CTB communication, we should also cancel all ongoing H2G send- recv requests, as either GuC is already dead, or due to imminent reset GuC will not be able to reply, or due to internal cleanup we will lose pending fences. With this we will report dedicated -ECANCELED error instead of misleading -ETIME. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250709174038.1876-4-michal.wajdeczko@intel.com
2025-07-10	drm/xe/guc: Move state change logger to helper	Michal Wajdeczko
	In the state change helper we are already doing extra stuff, move debug state logger there to cover all state changes. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250709174038.1876-3-michal.wajdeczko@intel.com
2025-07-10	drm/xe/guc: Rename CT state change helper	Michal Wajdeczko
	In this helper we are already doing much more than just setting a new CT state and its name was little misleading. Rename it. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250709174038.1876-2-michal.wajdeczko@intel.com
2025-07-10	drm/xe/pm: Correct comment of xe_pm_set_vram_threshold()	Shuicheng Lin
	The parameter threshold is with size in MiB, not in bits. Correct it to avoid any confusion. v2: s/mb/MiB, s/vram/VRAM, fix return section. (Michal) Fixes: 30c399529f4c ("drm/xe: Document Xe PM component") Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://lore.kernel.org/r/20250708021450.3602087-2-shuicheng.lin@intel.com Reviewed-by: Stuart Summers <stuart.summers@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-10	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	Cross-merge networking fixes after downstream PR (net-6.16-rc6). No conflicts. Adjacent changes: Documentation/devicetree/bindings/net/allwinner,sun8i-a83t-emac.yaml 0a12c435a1d6 ("dt-bindings: net: sun8i-emac: Add A100 EMAC compatible") b3603c0466a8 ("dt-bindings: net: sun8i-emac: Rename A523 EMAC0 to GMAC0") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-10	drm/xe/bmg: Don't use WA 16023588340 and 22019338487 on VF	Michal Wajdeczko
	These workarounds are not applicable for use by the VFs. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Tested-by: Jakub Kolakowski <jakub1.kolakowski@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Signed-off-by: Jakub Kolakowski <jakub1.kolakowski@intel.com> Link: https://lore.kernel.org/r/20250710103040.375610-2-jakub1.kolakowski@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10	drm/amdgpu: do not resume device in thaw for normal hibernation	Samuel Zhang
	For normal hibernation, GPU do not need to be resumed in thaw since it is not involved in writing the hibernation image. Skip resume in this case can reduce the hibernation time. On VM with 8 * 192GB VRAM dGPUs, 98% VRAM usage and 1.7TB system memory, this can save 50 minutes. Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com> Tested-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Link: https://lore.kernel.org/r/20250710062313.3226149-6-guoqing.zhang@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2025-07-10	drm/amdgpu: move GTT to shmem after eviction for hibernation	Samuel Zhang
	When hibernate with data center dGPUs, huge number of VRAM BOs evicted to GTT and takes too much system memory. This will cause hibernation fail due to insufficient memory for creating the hibernation image. Move GTT BOs to shmem in KMD, then shmem to swap disk in kernel hibernation code to make room for hibernation image. Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250710062313.3226149-3-guoqing.zhang@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2025-07-10	drm/ttm: add new api ttm_device_prepare_hibernation()	Samuel Zhang
	This new api is used for hibernation to move GTT BOs to shmem after VRAM eviction. shmem will be flushed to swap disk later to reduce the system memory usage for hibernation. Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250710062313.3226149-2-guoqing.zhang@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2025-07-10	drm/i915/bios: Apply vlv_fixup_mipi_sequences() to v2 mipi-sequences too	Hans de Goede
	It turns out that the fixup from vlv_fixup_mipi_sequences() is necessary for some DSI panel's with version 2 mipi-sequences too. Specifically the Acer Iconia One 8 A1-840 (not to be confused with the A1-840FHD which is different) has the following sequences: BDB block 53 (1284 bytes) - MIPI sequence block: Sequence block version v2 Panel 0 * Sequence 2 - MIPI_SEQ_INIT_OTP GPIO index 9, source 0, set 0 (0x00) Delay: 50000 us GPIO index 9, source 0, set 1 (0x01) Delay: 6000 us GPIO index 9, source 0, set 0 (0x00) Delay: 6000 us GPIO index 9, source 0, set 1 (0x01) Delay: 25000 us Send DCS: Port A, VC 0, LP, Type 39, Length 5, Data ff aa 55 a5 80 Send DCS: Port A, VC 0, LP, Type 39, Length 3, Data 6f 11 00 ... Send DCS: Port A, VC 0, LP, Type 05, Length 1, Data 29 Delay: 120000 us Sequence 4 - MIPI_SEQ_DISPLAY_OFF Send DCS: Port A, VC 0, LP, Type 05, Length 1, Data 28 Delay: 105000 us Send DCS: Port A, VC 0, LP, Type 05, Length 2, Data 10 00 Delay: 10000 us Sequence 5 - MIPI_SEQ_ASSERT_RESET Delay: 10000 us GPIO index 9, source 0, set 0 (0x00) Notice how there is no MIPI_SEQ_DEASSERT_RESET, instead the deassert is done at the beginning of MIPI_SEQ_INIT_OTP, which is exactly what the fixup from vlv_fixup_mipi_sequences() fixes up. Extend it to also apply to v2 sequences, this fixes the panel not working on the Acer Iconia One 8 A1-840. Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14605 Signed-off-by: Hans de Goede <hansg@kernel.org> Acked-by: Jani Nikula <jani.nikula@intel.com> Link: https://lore.kernel.org/r/20250703143824.7121-1-hansg@kernel.org Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit 11895f375939d60efe7ed5dddc1cffe2e79f976c) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-10	drm/i915/bios: Apply vlv_fixup_mipi_sequences() to v2 mipi-sequences too	Hans de Goede
	It turns out that the fixup from vlv_fixup_mipi_sequences() is necessary for some DSI panel's with version 2 mipi-sequences too. Specifically the Acer Iconia One 8 A1-840 (not to be confused with the A1-840FHD which is different) has the following sequences: BDB block 53 (1284 bytes) - MIPI sequence block: Sequence block version v2 Panel 0 * Sequence 2 - MIPI_SEQ_INIT_OTP GPIO index 9, source 0, set 0 (0x00) Delay: 50000 us GPIO index 9, source 0, set 1 (0x01) Delay: 6000 us GPIO index 9, source 0, set 0 (0x00) Delay: 6000 us GPIO index 9, source 0, set 1 (0x01) Delay: 25000 us Send DCS: Port A, VC 0, LP, Type 39, Length 5, Data ff aa 55 a5 80 Send DCS: Port A, VC 0, LP, Type 39, Length 3, Data 6f 11 00 ... Send DCS: Port A, VC 0, LP, Type 05, Length 1, Data 29 Delay: 120000 us Sequence 4 - MIPI_SEQ_DISPLAY_OFF Send DCS: Port A, VC 0, LP, Type 05, Length 1, Data 28 Delay: 105000 us Send DCS: Port A, VC 0, LP, Type 05, Length 2, Data 10 00 Delay: 10000 us Sequence 5 - MIPI_SEQ_ASSERT_RESET Delay: 10000 us GPIO index 9, source 0, set 0 (0x00) Notice how there is no MIPI_SEQ_DEASSERT_RESET, instead the deassert is done at the beginning of MIPI_SEQ_INIT_OTP, which is exactly what the fixup from vlv_fixup_mipi_sequences() fixes up. Extend it to also apply to v2 sequences, this fixes the panel not working on the Acer Iconia One 8 A1-840. Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14605 Signed-off-by: Hans de Goede <hansg@kernel.org> Acked-by: Jani Nikula <jani.nikula@intel.com> Link: https://lore.kernel.org/r/20250703143824.7121-1-hansg@kernel.org Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-10	drm/nouveau: Remove waitque for sched teardown	Philipp Stanner
	struct nouveau_sched contains a waitque needed to prevent drm_sched_fini() from being called while there are still jobs pending. Doing so so far would have caused memory leaks. With the new memleak-free mode of operation switched on in drm_sched_fini() by providing the callback nouveau_sched_cancel_job() the waitque is not necessary anymore. Remove the waitque. Acked-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://lore.kernel.org/r/20250710125412.128476-10-phasta@kernel.org
2025-07-10	drm/nouveau: Add new callback for scheduler teardown	Philipp Stanner
	There is a new callback for always tearing the scheduler down in a leak-free, deadlock-free manner. Port Nouveau as its first user by providing the scheduler with a callback that ensures the fence context gets killed in drm_sched_fini(). Acked-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://lore.kernel.org/r/20250710125412.128476-9-phasta@kernel.org
2025-07-10	drm/nouveau: Make fence container helper usable driver-wide	Philipp Stanner
	In order to implement a new DRM GPU scheduler callback in Nouveau, a helper for obtaining a nouveau_fence from a dma_fence is necessary. Such a helper exists already inside nouveau_fence.c, called from_fence(). Make that helper available to other C files with a more precise name. Acked-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://lore.kernel.org/r/20250710125412.128476-8-phasta@kernel.org
2025-07-10	drm/sched: Warn if pending_list is not empty	Philipp Stanner
	drm_sched_fini() can leak jobs under certain circumstances. Warn if that happens. Acked-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://lore.kernel.org/r/20250710125412.128476-7-phasta@kernel.org
2025-07-10	drm/sched/tests: Add unit test for cancel_job()	Philipp Stanner
	The scheduler unit tests now provide a new callback, cancel_job(). This callback gets used by drm_sched_fini() for all still pending jobs to cancel them. Implement a new unit test to test this. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://lore.kernel.org/r/20250710125412.128476-6-phasta@kernel.org
2025-07-10	drm/sched/tests: Implement cancel_job() callback	Philipp Stanner
	The GPU Scheduler now supports a new callback, cancel_job(), which lets the scheduler cancel all jobs which might not yet be freed when drm_sched_fini() runs. Using this callback allows for significantly simplifying the mock scheduler teardown code. Implement the cancel_job() callback and adjust the code where necessary. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://lore.kernel.org/r/20250710125412.128476-5-phasta@kernel.org
2025-07-10	drm/sched: Avoid memory leaks with cancel_job() callback	Philipp Stanner
	Since its inception, the GPU scheduler can leak memory if the driver calls drm_sched_fini() while there are still jobs in flight. The simplest way to solve this in a backwards compatible manner is by adding a new callback, drm_sched_backend_ops.cancel_job(), which instructs the driver to signal the hardware fence associated with the job. Afterwards, the scheduler can safely use the established free_job() callback for freeing the job. Implement the new backend_ops callback cancel_job(). Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Link: https://lore.kernel.org/dri-devel/20250418113211.69956-1-tvrtko.ursulin@igalia.com/ Reviewed-by: Maíra Canal <mcanal@igalia.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://lore.kernel.org/r/20250710125412.128476-4-phasta@kernel.org
2025-07-10	drm/xe/bo: add GPU memory trace points	Juston Li
	Add TRACE_GPU_MEM tracepoints for tracking global GPU memory usage. These are required by VSR on Android 12+ for reporting GPU driver memory allocations. v5: - Drop process_mem tracking - Set the gpu_id field to dev->primary->index (Lucas, Tvrtko) - Formatting cleanup under 80 columns v3: - Use now configurable CONFIG_TRACE_GPU_MEM instead of adding a per-driver Kconfig (Lucas) v2: - Use u64 as preferred by checkpatch (Tvrtko) - Fix errors in comments/Kconfig description (Tvrtko) - drop redundant "CONFIG" in Kconfig Signed-off-by: Juston Li <justonli@chromium.org> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250709192313.479336-2-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10	drm/xe/xe_i2c: Add support for i2c in survivability mode	Riana Tauro
	Initialize i2c in survivability mode to allow firmware update of Add-In Management Controller (AMC) in survivability mode. Signed-off-by: Riana Tauro <riana.tauro@intel.com> Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://lore.kernel.org/r/20250701122252.2590230-6-heikki.krogerus@linux.intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-10	drm/xe/pm: Wire up suspend/resume for I2C controller	Raag Jadav
	Wire up suspend/resume handles for I2C controller to match its power state with SGUnit. Signed-off-by: Raag Jadav <raag.jadav@intel.com> Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Karthik Poosa <karthik.poosa@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://lore.kernel.org/r/20250701122252.2590230-5-heikki.krogerus@linux.intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-10	drm/xe: Support for I2C attached MCUs	Heikki Krogerus
	Adding adaption/glue layer where the I2C host adapter (Synopsys DesignWare I2C adapter) and the I2C clients (the microcontroller units) are enumerated. The microcontroller units (MCU) that are attached to the GPU depend on the OEM. The initially supported MCU will be the Add-In Management Controller (AMC). Co-developed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://lore.kernel.org/r/20250701122252.2590230-4-heikki.krogerus@linux.intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Rodrigo fixed the co-developed tags and SPDX format in the .c file]
2025-07-10	drm/xe/guc: Don't allocate temporary policies object	Michal Wajdeczko
	Since we are already using reusable buffer objects from the GuC buffer cache, we can directly write into their CPU pointers and spare unnecessary temporary allocation. While around, also make sure to clear obtained buffer, to avoid sending some stale data. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250702142504.1656-1-michal.wajdeczko@intel.com
2025-07-10	drm/xe/pf: Print configuration KLVs using debug printer	Michal Wajdeczko
	While we print VF's configuration KLVs only under DEBUG_SRIOV config, we should be doing it at debug level, not info level. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Lukasz Laguna <lukasz.laguna@intel.com> Link: https://lore.kernel.org/r/20250703145709.1832-1-michal.wajdeczko@intel.com
2025-07-10	drm/xe/pf: Print runtime registers using debug printer	Michal Wajdeczko
	While we already print VF's runtime registers only under DEBUG_SRIOV config, we should be still doing it at debug level, not info. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Lukasz Laguna <lukasz.laguna@intel.com> Link: https://lore.kernel.org/r/20250604190021.725-2-michal.wajdeczko@intel.com
2025-07-10	drm/gpu: Remove dead checks on wbinvd_on_all_cpus()'s return value	Sean Christopherson
	Remove the checks and associated pr_err() on wbinvd_on_all_cpus() failure, as the helper has unconditionally returned 0/success since commit caa759323c73 ("smp: Remove smp_call_function() and on_each_cpu() return values"). Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250522233733.3176144-2-seanjc@google.com
2025-07-10	drm/panthor: Fix UAF in panthor_gem_create_with_handle() debugfs code	Simona Vetter
	The object is potentially already gone after the drm_gem_object_put(). In general the object should be fully constructed before calling drm_gem_handle_create(), except the debugfs tracking uses a separate lock and list and separate flag to denotate whether the object is actually initialized. Since I'm touching this all anyway simplify this by only adding the object to the debugfs when it's ready for that, which allows us to delete that separate flag. panthor_gem_debugfs_bo_rm() already checks whether we've actually been added to the list or this is some error path cleanup. v2: Fix build issues for !CONFIG_DEBUGFS (Adrián) v3: Add linebreak and remove outdated comment (Liviu) Fixes: a3707f53eb3f ("drm/panthor: show device-wide list of DRM GEM objects over DebugFS") Cc: Adrián Larumbe <adrian.larumbe@collabora.com> Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: Steven Price <steven.price@arm.com> Cc: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Signed-off-by: Simona Vetter <simona.vetter@intel.com> Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20250709135220.1428931-1-simona.vetter@ffwll.ch
2025-07-09	relayfs: abolish prev_padding	Jason Xing
	Patch series "relayfs: misc changes", v5. The series mostly focuses on the error counters which helps every user debug their own kernel module. This patch (of 5): prev_padding represents the unused space of certain subbuffer. If the content of a call of relay_write() exceeds the limit of the remainder of this subbuffer, it will skip storing in the rest space and record the start point as buf->prev_padding in relay_switch_subbuf(). Since the buf is a per-cpu big buffer, the point of prev_padding as a global value for the whole buffer instead of a single subbuffer (whose padding info is stored in buf->padding[]) seems meaningless from the real use cases, so we don't bother to record it any more. Link: https://lkml.kernel.org/r/20250612061201.34272-1-kerneljasonxing@gmail.com Link: https://lkml.kernel.org/r/20250612061201.34272-2-kerneljasonxing@gmail.com Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Yushan Zhou <katrinzhou@tencent.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09	mm: remove callers of pfn_t functionality	Alistair Popple
	All PFN_* pfn_t flags have been removed. Therefore there is no longer a need for the pfn_t type and all uses can be replaced with normal pfns. Link: https://lkml.kernel.org/r/bbedfa576c9822f8032494efbe43544628698b1f.1750323463.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Björn Töpel <bjorn@kernel.org> Cc: Björn Töpel <bjorn@rivosinc.com> Cc: Chunyan Zhang <zhang.lyra@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Deepak Gupta <debug@rivosinc.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Inki Dae <m.szyprowski@samsung.com> Cc: John Groves <john@groves.net> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09	mm: remove remaining uses of PFN_DEV	Alistair Popple
	PFN_DEV was used by callers of dax_direct_access() to figure out if the returned PFN is associated with a page using pfn_t_has_page() or not. However all DAX PFNs now require an assoicated ZONE_DEVICE page so can assume a page exists. Other users of PFN_DEV were setting it before calling vmf_insert_mixed(). This is unnecessary as it is no longer checked, instead relying on pfn_valid() to determine if there is an associated page or not. Link: https://lkml.kernel.org/r/74b293aebc21b941090bc3e7aeafa91b57c821a5.1750323463.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Björn Töpel <bjorn@kernel.org> Cc: Björn Töpel <bjorn@rivosinc.com> Cc: Chunyan Zhang <zhang.lyra@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Deepak Gupta <debug@rivosinc.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Inki Dae <m.szyprowski@samsung.com> Cc: John Groves <john@groves.net> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09	mm: remove the for_reclaim field from struct writeback_control	Christoph Hellwig
	This field is now only set to one in the i915 gem code that only calls writeback_iter on it, which ignores the flag. All other checks are thuse dead code and the field can be removed. Link: https://lkml.kernel.org/r/20250610054959.2057526-7-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Nhat Pham <nphamcs@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09	mm: stop passing a writeback_control structure to shmem_writeout	Christoph Hellwig
	shmem_writeout only needs the swap_iocb cookie and the split folio list. Pass those explicitly and remove the now unused list member from struct writeback_control. Link: https://lkml.kernel.org/r/20250610054959.2057526-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Nhat Pham <nphamcs@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09	drm/xe: Expose fan control and voltage regulator version	Raag Jadav
	Add sysfs attributes for late binding features which expose bound version to the user. v2: Rework attribute and macro naming (Badal) v3: Drop fancy formatting (Rodrigo) v4: Form version string using local variables (Rodrigo) Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250709164224.2676086-1-raag.jadav@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-09	drm/gem: Fix race in drm_gem_handle_create_tail()	Simona Vetter
	Object creation is a careful dance where we must guarantee that the object is fully constructed before it is visible to other threads, and GEM buffer objects are no difference. Final publishing happens by calling drm_gem_handle_create(). After that the only allowed thing to do is call drm_gem_object_put() because a concurrent call to the GEM_CLOSE ioctl with a correctly guessed id (which is trivial since we have a linear allocator) can already tear down the object again. Luckily most drivers get this right, the very few exceptions I've pinged the relevant maintainers for. Unfortunately we also need drm_gem_handle_create() when creating additional handles for an already existing object (e.g. GETFB ioctl or the various bo import ioctl), and hence we cannot have a drm_gem_handle_create_and_put() as the only exported function to stop these issues from happening. Now unfortunately the implementation of drm_gem_handle_create() isn't living up to standards: It does correctly finishe object initialization at the global level, and hence is safe against a concurrent tear down. But it also sets up the file-private aspects of the handle, and that part goes wrong: We fully register the object in the drm_file.object_idr before calling drm_vma_node_allow() or obj->funcs->open, which opens up races against concurrent removal of that handle in drm_gem_handle_delete(). Fix this with the usual two-stage approach of first reserving the handle id, and then only registering the object after we've completed the file-private setup. Jacek reported this with a testcase of concurrently calling GEM_CLOSE on a freshly-created object (which also destroys the object), but it should be possible to hit this with just additional handles created through import or GETFB without completed destroying the underlying object with the concurrent GEM_CLOSE ioctl calls. Note that the close-side of this race was fixed in f6cd7daecff5 ("drm: Release driver references to handle before making it available again"), which means a cool 9 years have passed until someone noticed that we need to make this symmetry or there's still gaps left :-/ Without the 2-stage close approach we'd still have a race, therefore that's an integral part of this bugfix. More importantly, this means we can have NULL pointers behind allocated id in our drm_file.object_idr. We need to check for that now: - drm_gem_handle_delete() checks for ERR_OR_NULL already - drm_gem.c:object_lookup() also chekcs for NULL - drm_gem_release() should never be called if there's another thread still existing that could call into an IOCTL that creates a new handle, so cannot race. For paranoia I added a NULL check to drm_gem_object_release_handle() though. - most drivers (etnaviv, i915, msm) are find because they use idr_find(), which maps both ENOENT and NULL to NULL. - drivers using idr_for_each_entry() should also be fine, because idr_get_next does filter out NULL entries and continues the iteration. - The same holds for drm_show_memory_stats(). v2: Use drm_WARN_ON (Thomas) Reported-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Tested-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Cc: stable@vger.kernel.org Cc: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Signed-off-by: Simona Vetter <simona.vetter@intel.com> Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20250707151814.603897-1-simona.vetter@ffwll.ch
2025-07-09	drm/ast: Gen7: Switch default registers to gen4+ state	Thomas Zimmermann
	Change the default register settings for Gen7 to mach Gen4 and later. Gen7 currently uses the settings for Gen1, which is most likely incorrect. Using Gen4+ settings enables E2M linear-access modes in VGACRA2. It appears to be related to the chip's PCIE2MBOX feature, which is unused. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://lore.kernel.org/r/20250706162816.211552-11-tzimmermann@suse.de
2025-07-09	drm/ast: Gen7: Disable VGASR0[1] as on Gen4+	Thomas Zimmermann
	Set VGACRB6[5], which disables asynchronous sequencer resets via VGASR0[1]. This was most likely an oversight when adding support for Gen7. Aligns Gen7 with the earlier Gen4+. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://lore.kernel.org/r/20250706162816.211552-10-tzimmermann@suse.de
2025-07-09	drm/ast: Split ast_set_def_ext_reg() by chip generation	Thomas Zimmermann
	Duplicate ast_set_def_ext_reg() for individual chip generations and move call it into per-chip source files. Remove the original code. AST2100 and AST2500 reuse the function from earlier chips. AST2600 appears to be incorrect as it uses an older function. Keep this behavior for now. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://lore.kernel.org/r/20250706162816.211552-9-tzimmermann@suse.de