linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2025-03-07	drm/amd/display: Fix HPD after gpu reset	Roman Li
	commit 4de141b8b1b7991b607f77e5f4580e1c67c24717 upstream. [Why] DC is not using amdgpu_irq_get/put to manage the HPD interrupt refcounts. So when amdgpu_irq_gpu_reset_resume_helper() reprograms all of the IRQs, HPD gets disabled. [How] Use amdgpu_irq_get/put() for HPD init/fini in DM in order to sync refcounts Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Roman Li <Roman.Li@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f3dde2ff7fcaacd77884502e8f572f2328e9c745) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-07	drm/amd/display: add a quirk to enable eDP0 on DP1	Yilin Chen
	commit b5f7242e49b927cfe488b369fa552f2eff579ef1 upstream. [why] some board designs have eDP0 connected to DP1, need a way to enable support_edp0_on_dp1 flag, otherwise edp related features cannot work [how] do a dmi check during dm initialization to identify systems that require support_edp0_on_dp1. Optimize quirk table with callback functions to set quirk entries, retrieve_dmi_info can set quirks according to quirk entries Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Yilin Chen <Yilin.Chen@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f6d17270d18a6a6753fff046330483d43f8405e4) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-07	drm/amd/display: Disable PSR-SU on eDP panels	Tom Chung
	commit e8863f8b0316d8ee1e7e5291e8f2f72c91ac967d upstream. [Why] PSR-SU may cause some glitching randomly on several panels. [How] Temporarily disable the PSR-SU and fallback to PSR1 for all eDP panels. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3388 Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Sun peng Li <sunpeng.li@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6deeefb820d0efb0b36753622fb982d03b37b3ad) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-07	drm/amdgpu: init return value in amdgpu_ttm_clear_buffer	Pierre-Eric Pelloux-Prayer
	commit d3c7059b6a8600fc62cd863f1ea203b8675e63e1 upstream. Otherwise an uninitialized value can be returned if amdgpu_res_cleared returns true for all regions. Possibly closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3812 Fixes: a68c7eaa7a8f ("drm/amdgpu: Enable clear page functionality") Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 7c62aacc3b452f73a1284198c81551035fac6d71) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-07	drm/amdgpu: disable BAR resize on Dell G5 SE	Alex Deucher
	commit 099bffc7cadff40bfab1517c3461c53a7a38a0d7 upstream. There was a quirk added to add a workaround for a Sapphire RX 5600 XT Pulse that didn't allow BAR resizing. However, the quirk caused a regression with runtime pm on Dell laptops using those chips, rather than narrowing the scope of the resizing quirk, add a quirk to prevent amdgpu from resizing the BAR on those Dell platforms unless runtime pm is disabled. v2: update commit message, add runpm check Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1707 Fixes: 907830b0fc9e ("PCI: Add a REBAR size quirk for Sapphire RX 5600 XT Pulse") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5235053f443cef4210606e5fb71f99b915a9723d) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-07	drm/amdkfd: Preserve cp_hqd_pq_control on update_mqd	David Yat Sin
	commit 3502ab5022bb5ef1edd063bdb6465a8bf3b46e66 upstream. When userspace applications call AMDKFD_IOC_UPDATE_QUEUE. Preserve bitfields that do not need to be modified as they contain flags to track queue states that are used by CP FW. Signed-off-by: David Yat Sin <David.YatSin@amd.com> Reviewed-by: Jay Cornwall <jay.cornwall@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8150827990b709ab5a40c46c30d21b7f7b9e9440) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27	drm/amdgpu: bump version for RV/PCO compute fix	Alex Deucher
	commit 55ed2b1b50d029dd7e49a35f6628ca64db6d75d8 upstream. Bump the driver version for RV/PCO compute stability fix so mesa can use this check to enable compute queues on RV/PCO. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27	drm/amdgpu/gfx9: manually control gfxoff for CS on RV	Alex Deucher
	commit b35eb9128ebeec534eed1cefd6b9b1b7282cf5ba upstream. When mesa started using compute queues more often we started seeing additional hangs with compute queues. Disabling gfxoff seems to mitigate that. Manually control gfxoff and gfx pg with command submissions to avoid any issues related to gfxoff. KFD already does the same thing for these chips. v2: limit to compute v3: limit to APUs v4: limit to Raven/PCO v5: only update the compute ring_funcs v6: Disable GFX PG v7: adjust order Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Suggested-by: Błażej Szczygieł <mumei6102@gmail.com> Suggested-by: Sergey Kovalenko <seryoga.engineering@gmail.com> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3861 Link: https://lists.freedesktop.org/archives/amd-gfx/2025-January/119116.html Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27	drm/amdkfd: Ensure consistent barrier state saved in gfx12 trap handler	Lancelot SIX
	[ Upstream commit d584198a6fe4c51f4aa88ad72f258f8961a0f11c ] It is possible for some waves in a workgroup to finish their save sequence before the group leader has had time to capture the workgroup barrier state. When this happens, having those waves exit do impact the barrier state. As a consequence, the state captured by the group leader is invalid, and is eventually incorrectly restored. This patch proposes to have all waves in a workgroup wait for each other at the end of their save sequence (just before calling s_endpgm_saved). Signed-off-by: Lancelot SIX <lancelot.six@amd.com> Reviewed-by: Jay Cornwall <jay.cornwall@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-27	drm/amdkfd: Move gfx12 trap handler to separate file	Jay Cornwall
	[ Upstream commit 62498e797aeb2bfa92a823ee1a8253f96d1cbe3f ] gfx12 derivatives will have substantially different trap handler implementations from gfx10/gfx11. Add a separate source file for gfx12+ and remove unneeded conditional code. No functional change. v2: Revert copyright date to 2018, minor comment fixes Signed-off-by: Jay Cornwall <jay.cornwall@amd.com> Reviewed-by: Lancelot Six <lancelot.six@amd.com> Cc: Jonathan Kim <jonathan.kim@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: d584198a6fe4 ("drm/amdkfd: Ensure consistent barrier state saved in gfx12 trap handler") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-27	drm/amd/display: Correct register address in dcn35	loanchen
	[ Upstream commit f88192d2335b5a911fcfa09338cc00624571ec5e ] [Why] the offset address of mmCLK5_spll_field_8 was incorrect for dcn35 which causes SSC not to be enabled. Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Lo-An Chen <lo-an.chen@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-27	drm/amd/display: update dcn351 used clock offset	Charlene Liu
	[ Upstream commit a1fc2837f4960e84e9375e12292584ad2ae472da ] [why] hw register offset delta Reviewed-by: Martin Leung <martin.leung@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: f88192d2335b ("drm/amd/display: Correct register address in dcn35") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-27	drm/amd/display: Refactoring if and endif statements to enable DC_LOGGER	Lohita Mudimela
	[ Upstream commit b04200432c4730c9bb730a66be46551c83d60263 ] [Why] For Header related changes for core [How] Refactoring if and endif statements to enable DC_LOGGER Reviewed-by: Mounika Adhuri <mounika.adhuri@amd.com> Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Lohita Mudimela <lohita.mudimela@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: f88192d2335b ("drm/amd/display: Correct register address in dcn35") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-21	drm/amdgpu: avoid buffer overflow attach in smu_sys_set_pp_table()	Jiang Liu
	commit 1abb2648698bf10783d2236a6b4a7ca5e8021699 upstream. It malicious user provides a small pptable through sysfs and then a bigger pptable, it may cause buffer overflow attack in function smu_sys_set_pp_table(). Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jiang Liu <gerry@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-21	drm/amdgpu: bail out when failed to load fw in psp_init_cap_microcode()	Jiang Liu
	[ Upstream commit a0a455b4bc7483ad60e8b8a50330c1e05bb7bfcf ] In function psp_init_cap_microcode(), it should bail out when failed to load firmware, otherwise it may cause invalid memory access. Fixes: 07dbfc6b102e ("drm/amd: Use `amdgpu_ucode_*` helpers for PSP") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jiang Liu <gerry@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-21	amdkfd: properly free gang_ctx_bo when failed to init user queue	Zhu Lingshan
	[ Upstream commit a33f7f9660705fb2ecf3467b2c48965564f392ce ] The destructor of a gtt bo is declared as void amdgpu_amdkfd_free_gtt_mem(struct amdgpu_device adev, void mem_obj); Which takes void* as the second parameter. GCC allows passing void* to the function because void* can be implicitly casted to any other types, so it can pass compiling. However, passing this void* parameter into the function's execution process(which expects void and dereferencing void) will result in errors. Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Fixes: fb91065851cd ("drm/amdkfd: Refactor queue wptr_bo GART mapping") Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-17	Revert "drm/amd/display: Fix green screen issue after suspend"	Rodrigo Siqueira
	commit 04d6273faed083e619fc39a738ab0372b6a4db20 upstream. This reverts commit 87b7ebc2e16c14d32a912f18206a4d6cc9abc3e8. A long time ago, we had an issue with the Raven system when it was connected to two displays: one with DP and another with HDMI. After the system woke up from suspension, we saw a solid green screen caused by an underflow generated by bad DCC metadata. To workaround this issue, the 'commit 87b7ebc2e16c ("drm/amd/display: Fix green screen issue after suspend")' was introduced to disable the DCC for a few frames after in the resume phase. However, in hindsight, this solution was probably a workaround at the kernel level for some issues from another part (probably other driver components or user space). After applying this patch and trying to reproduce the green issue in a similar hardware system but using the latest kernel and userspace, we cannot see the issue, which makes this workaround obsolete and creates extra unnecessary complexity to the code; for all of this reason, this commit reverts the original change. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17	drm/amd/display: Fix seamless boot sequence	Lo-an Chen
	commit e01f07cb92513ca4b9b219ab9caa34d607bc1e2d upstream. [WHY] When the system powers up eDP with external monitors in seamless boot sequence, stutter get enabled before TTU and HUBP registers being programmed, which resulting in underflow. [HOW] Enable TTU in hubp_init. Change the sequence that do not perpare_bandwidth and optimize_bandwidth while having seamless boot streams. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Lo-an Chen <lo-an.chen@amd.com> Signed-off-by: Paul Hsieh <paul.hsieh@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17	drm/amdgpu: add a BO metadata flag to disable write compression for Vulkan	Marek Olšák
	commit 2255b40cacc2e5ef1b127770fc1808c60de4a2fc upstream. Vulkan can't support DCC and Z/S compression on GFX12 without WRITE_COMPRESS_DISABLE in this commit or a completely different DCC interface. AMDGPU_TILING_GFX12_SCANOUT is added because it's already used by userspace. Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17	Revert "drm/amd/display: Use HW lock mgr for PSR1"	Tom Chung
	commit f245b400a223a71d6d5f4c72a2cb9b573a7fc2b6 upstream. This reverts commit a2b5a9956269 ("drm/amd/display: Use HW lock mgr for PSR1") Because it may cause system hang while connect with two edp panel. Acked-by: Wayne Lin <wayne.lin@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17	drm/amdkfd: Block per-queue reset when halt_if_hws_hang=1	Jay Cornwall
	commit f214b7beb00621b983e67ce97477afc3ab4b38f4 upstream. The purpose of halt_if_hws_hang is to preserve GPU state for driver debugging when queue preemption fails. Issuing per-queue reset may kill wavefronts which caused the preemption failure. Signed-off-by: Jay Cornwall <jay.cornwall@amd.com> Reviewed-by: Jonathan Kim <Jonathan.Kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17	drm/amdkfd: only flush the validate MES contex	Prike Liang
	commit 9078a5bfa21e78ae68b6d7c365d1b92f26720c55 upstream. The following page fault was observed duringthe KFD process release. In this particular error case, the HIP test (./MemcpyPerformance -h) does not require the queue. As a result, the process_context_addr was not assigned when the KFD process was released, ultimately leading to this page fault during the execution of the function kfd_process_dequeue_from_all_devices(). [345962.294891] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:153 vmid:0 pasid:0) [345962.295333] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10 [345962.295775] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000B33 [345962.296097] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5) [345962.296394] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1 [345962.296633] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x1 [345962.296876] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [345962.297135] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x1 [345962.297377] amdgpu 0000:03:00.0: amdgpu: RW: 0x0 [345962.297682] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:169 vmid:0 pasid:0) Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17	drm/amd/amdgpu: change the config of cgcg on gfx12	Kenneth Feng
	commit 5cda56bd86c455341087dca29c65dc7c87f84340 upstream. change the config of cgcg on gfx12 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17	drm/amd/pm: Mark MM activity as unsupported	Lijo Lazar
	commit 819bf6662b93a5a8b0c396d2c7e7fab6264c9808 upstream. Aldebaran doesn't support querying MM activity percentage. Keep the field as 0xFFs to mark it as unsupported. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17	drm/amd/display: Optimize cursor position updates	Aric Cyr
	commit 024771f3fb75dc817e9429d5763f1a6eb84b6f21 upstream. [why] Updating the cursor enablement register can be a slow operation and accumulates when high polling rate cursors cause frequent updates asynchronously to the cursor position. [how] Since the cursor enable bit is cached there is no need to update the enablement register if there is no change to it. This removes the read-modify-write from the cursor position programming path in HUBP and DPP, leaving only the register writes. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Sung Lee <sung.lee@amd.com> Signed-off-by: Aric Cyr <Aric.Cyr@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17	drm/amdgpu: Fix Circular Locking Dependency in AMDGPU GFX Isolation	Srinivasan Shanmugam
	[ Upstream commit 1e8c193f8ca7ab7dff4f4747b45a55dca23c00f4 ] This commit addresses a circular locking dependency issue within the GFX isolation mechanism. The problem was identified by a warning indicating a potential deadlock due to inconsistent lock acquisition order. - The `amdgpu_gfx_enforce_isolation_ring_begin_use` and `amdgpu_gfx_enforce_isolation_ring_end_use` functions previously acquired `enforce_isolation_mutex` and called `amdgpu_gfx_kfd_sch_ctrl`, leading to potential deadlocks. ie., If `amdgpu_gfx_kfd_sch_ctrl` is called while `enforce_isolation_mutex` is held, and `amdgpu_gfx_enforce_isolation_handler` is called while `kfd_sch_mutex` is held, it can create a circular dependency. By ensuring consistent lock usage, this fix resolves the issue: [ 606.297333] ====================================================== [ 606.297343] WARNING: possible circular locking dependency detected [ 606.297353] 6.10.0-amd-mlkd-610-311224-lof #19 Tainted: G OE [ 606.297365] ------------------------------------------------------ [ 606.297375] kworker/u96:3/3825 is trying to acquire lock: [ 606.297385] ffff9aa64e431cb8 ((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)){+.+.}-{0:0}, at: __flush_work+0x232/0x610 [ 606.297413] but task is already holding lock: [ 606.297423] ffff9aa64e432338 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}, at: amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu] [ 606.297725] which lock already depends on the new lock. [ 606.297738] the existing dependency chain (in reverse order) is: [ 606.297749] -> #2 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}: [ 606.297765] __mutex_lock+0x85/0x930 [ 606.297776] mutex_lock_nested+0x1b/0x30 [ 606.297786] amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu] [ 606.298007] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu] [ 606.298225] amdgpu_ring_alloc+0x48/0x70 [amdgpu] [ 606.298412] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu] [ 606.298603] amdgpu_job_run+0xac/0x1e0 [amdgpu] [ 606.298866] drm_sched_run_job_work+0x24f/0x430 [gpu_sched] [ 606.298880] process_one_work+0x21e/0x680 [ 606.298890] worker_thread+0x190/0x350 [ 606.298899] kthread+0xe7/0x120 [ 606.298908] ret_from_fork+0x3c/0x60 [ 606.298919] ret_from_fork_asm+0x1a/0x30 [ 606.298929] -> #1 (&adev->enforce_isolation_mutex){+.+.}-{3:3}: [ 606.298947] __mutex_lock+0x85/0x930 [ 606.298956] mutex_lock_nested+0x1b/0x30 [ 606.298966] amdgpu_gfx_enforce_isolation_handler+0x87/0x370 [amdgpu] [ 606.299190] process_one_work+0x21e/0x680 [ 606.299199] worker_thread+0x190/0x350 [ 606.299208] kthread+0xe7/0x120 [ 606.299217] ret_from_fork+0x3c/0x60 [ 606.299227] ret_from_fork_asm+0x1a/0x30 [ 606.299236] -> #0 ((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)){+.+.}-{0:0}: [ 606.299257] __lock_acquire+0x16f9/0x2810 [ 606.299267] lock_acquire+0xd1/0x300 [ 606.299276] __flush_work+0x250/0x610 [ 606.299286] cancel_delayed_work_sync+0x71/0x80 [ 606.299296] amdgpu_gfx_kfd_sch_ctrl+0x287/0x4d0 [amdgpu] [ 606.299509] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu] [ 606.299723] amdgpu_ring_alloc+0x48/0x70 [amdgpu] [ 606.299909] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu] [ 606.300101] amdgpu_job_run+0xac/0x1e0 [amdgpu] [ 606.300355] drm_sched_run_job_work+0x24f/0x430 [gpu_sched] [ 606.300369] process_one_work+0x21e/0x680 [ 606.300378] worker_thread+0x190/0x350 [ 606.300387] kthread+0xe7/0x120 [ 606.300396] ret_from_fork+0x3c/0x60 [ 606.300406] ret_from_fork_asm+0x1a/0x30 [ 606.300416] other info that might help us debug this: [ 606.300428] Chain exists of: (work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work) --> &adev->enforce_isolation_mutex --> &adev->gfx.kfd_sch_mutex [ 606.300458] Possible unsafe locking scenario: [ 606.300468] CPU0 CPU1 [ 606.300476] ---- ---- [ 606.300484] lock(&adev->gfx.kfd_sch_mutex); [ 606.300494] lock(&adev->enforce_isolation_mutex); [ 606.300508] lock(&adev->gfx.kfd_sch_mutex); [ 606.300521] lock((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)); [ 606.300536] * DEADLOCK * [ 606.300546] 5 locks held by kworker/u96:3/3825: [ 606.300555] #0: ffff9aa5aa1f5d58 ((wq_completion)comp_1.1.0){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680 [ 606.300577] #1: ffffaa53c3c97e40 ((work_completion)(&sched->work_run_job)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680 [ 606.300600] #2: ffff9aa64e463c98 (&adev->enforce_isolation_mutex){+.+.}-{3:3}, at: amdgpu_gfx_enforce_isolation_ring_begin_use+0x1c3/0x5d0 [amdgpu] [ 606.300837] #3: ffff9aa64e432338 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}, at: amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu] [ 606.301062] #4: ffffffff8c1a5660 (rcu_read_lock){....}-{1:2}, at: __flush_work+0x70/0x610 [ 606.301083] stack backtrace: [ 606.301092] CPU: 14 PID: 3825 Comm: kworker/u96:3 Tainted: G OE 6.10.0-amd-mlkd-610-311224-lof #19 [ 606.301109] Hardware name: Gigabyte Technology Co., Ltd. X570S GAMING X/X570S GAMING X, BIOS F7 03/22/2024 [ 606.301124] Workqueue: comp_1.1.0 drm_sched_run_job_work [gpu_sched] [ 606.301140] Call Trace: [ 606.301146] <TASK> [ 606.301154] dump_stack_lvl+0x9b/0xf0 [ 606.301166] dump_stack+0x10/0x20 [ 606.301175] print_circular_bug+0x26c/0x340 [ 606.301187] check_noncircular+0x157/0x170 [ 606.301197] ? register_lock_class+0x48/0x490 [ 606.301213] __lock_acquire+0x16f9/0x2810 [ 606.301230] lock_acquire+0xd1/0x300 [ 606.301239] ? __flush_work+0x232/0x610 [ 606.301250] ? srso_alias_return_thunk+0x5/0xfbef5 [ 606.301261] ? mark_held_locks+0x54/0x90 [ 606.301274] ? __flush_work+0x232/0x610 [ 606.301284] __flush_work+0x250/0x610 [ 606.301293] ? __flush_work+0x232/0x610 [ 606.301305] ? __pfx_wq_barrier_func+0x10/0x10 [ 606.301318] ? mark_held_locks+0x54/0x90 [ 606.301331] ? srso_alias_return_thunk+0x5/0xfbef5 [ 606.301345] cancel_delayed_work_sync+0x71/0x80 [ 606.301356] amdgpu_gfx_kfd_sch_ctrl+0x287/0x4d0 [amdgpu] [ 606.301661] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu] [ 606.302050] ? srso_alias_return_thunk+0x5/0xfbef5 [ 606.302069] amdgpu_ring_alloc+0x48/0x70 [amdgpu] [ 606.302452] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu] [ 606.302862] ? drm_sched_entity_error+0x82/0x190 [gpu_sched] [ 606.302890] amdgpu_job_run+0xac/0x1e0 [amdgpu] [ 606.303366] drm_sched_run_job_work+0x24f/0x430 [gpu_sched] [ 606.303388] process_one_work+0x21e/0x680 [ 606.303409] worker_thread+0x190/0x350 [ 606.303424] ? __pfx_worker_thread+0x10/0x10 [ 606.303437] kthread+0xe7/0x120 [ 606.303449] ? __pfx_kthread+0x10/0x10 [ 606.303463] ret_from_fork+0x3c/0x60 [ 606.303476] ? __pfx_kthread+0x10/0x10 [ 606.303489] ret_from_fork_asm+0x1a/0x30 [ 606.303512] </TASK> v2: Refactor lock handling to resolve circular dependency (Alex) - Introduced a `sched_work` flag to defer the call to `amdgpu_gfx_kfd_sch_ctrl` until after releasing `enforce_isolation_mutex`. - This change ensures that `amdgpu_gfx_kfd_sch_ctrl` is called outside the critical section, preventing the circular dependency and deadlock. - The `sched_work` flag is set within the mutex-protected section if conditions are met, and the actual function call is made afterward. - This approach ensures consistent lock acquisition order. Fixes: afefd6f24502 ("drm/amdgpu: Implement Enforce Isolation Handler for KGD/KFD serialization") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-17	drm/amd/display: Limit Scaling Ratio on DCN3.01	Gabe Teeger
	[ Upstream commit abc0ad6d08440761b199988c329ad7ac83f41c9b ] [why] Underflow and flickering was occuring due to high scaling ratios when resizing videos. [how] Limit the scaling ratios by increasing the max scaling factor Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Gabe Teeger <Gabe.Teeger@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-17	drm/amd/display: Increase sanitizer frame larger than limit when compile ↵	Nathan Chancellor
	testing with clang [ Upstream commit e4479aecf6581af81bc0908575447878d2a07e01 ] Commit 24909d9ec7c3 ("drm/amd/display: Overwriting dualDPP UBF values before usage") added a new warning in dml2/display_mode_core.c when building allmodconfig with clang: drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/display_mode_core.c:6268:13: error: stack frame size (3128) exceeds limit (3072) in 'dml_prefetch_check' [-Werror,-Wframe-larger-than] 6268 \| static void dml_prefetch_check(struct display_mode_lib_st mode_lib) \| ^ Commit be4e3509314a ("drm/amd/display: DML21 Reintegration For Various Fixes") introduced one in dml2_core/dml2_core_dcn4_calcs.c with the same configuration: drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4_calcs.c:7236:13: error: stack frame size (3256) exceeds limit (3072) in 'dml_core_mode_support' [-Werror,-Wframe-larger-than] 7236 \| static bool dml_core_mode_support(struct dml2_core_calcs_mode_support_ex in_out_params) \| ^ In the case of the first warning, the stack usage was already at the limit at the parent change, so the offending change was rather innocuous. In the case of the second warning, there was a rather dramatic increase in stack usage compared to the parent: drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4_calcs.c:7032:13: error: stack frame size (2696) exceeds limit (2048) in 'dml_core_mode_support' [-Werror,-Wframe-larger-than] 7032 \| static bool dml_core_mode_support(struct dml2_core_calcs_mode_support_ex *in_out_params) \| ^ This is an unfortunate interaction between an issue with stack slot reuse in LLVM that gets exacerbated by sanitization (which gets enabled with all{mod,yes}config) and function calls using a much higher number of parameters than is typical in the kernel, necessitating passing most of these values on the stack. While it is possible that there should be source code changes to address these warnings, this code is difficult to modify for various reasons, as has been noted in other changes that have occurred for similar reasons, such as commit 6740ec97bcdb ("drm/amd/display: Increase frame warning limit with KASAN or KCSAN in dml2"). Increase the frame larger than limit when compile testing with clang and the sanitizers enabled to avoid this breakage in all{mod,yes}config, as they are commonly used and valuable testing targets. While it is not the best to hide this issue, it is not really relevant when compile testing, as the sanitizers are commonly stressful on optimizations and they are only truly useful at runtime, which COMPILE_TEST states will not occur with the current build. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202412121748.chuX4sap-lkp@intel.com/ Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-17	drm/amdkfd: Queue interrupt work to different CPU	Philip Yang
	[ Upstream commit 34db5a32617d102e8042151bb87590e43c97132e ] For CPX mode, each KFD node has interrupt worker to process ih_fifo to send events to user space. Currently all interrupt workers of same adev queue to same CPU, all workers execution are actually serialized and this cause KFD ih_fifo overflow when CPU usage is high. Use per-GPU unbounded highpri queue with number of workers equals to number of partitions, let queue_work select the next CPU round robin among the local CPUs of same NUMA. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-17	drm/amdgpu: Don't enable sdma 4.4.5 CTXEMPTY interrupt	Philip Yang
	[ Upstream commit b4b7271e5ca95b581f2fcc4ae852c4079215e92d ] The sdma context empty interrupt is dropped in amdgpu_irq_dispatch as unregistered interrupt src_id 243, this interrupt accounts to 1/3 of total interrupts and causes IH primary ring overflow when running stressful benchmark application. Disable this interrupt has no side effect found. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-17	drm/amd/display: Fix Mode Cutoff in DSC Passthrough to DP2.1 Monitor	Fangzhi Zuo
	[ Upstream commit e56ad45e991128bf4db160b75a1d9f647a341d8f ] Source --> DP2.1 MST hub --> DP1.4/2.1 monitor When change from DP1.4 to DP2.1 from monitor manual, modes higher than 4k120 are all cutoff by mode validation. Switch back to DP1.4 gets all the modes up to 4k240 available to be enabled by dsc passthrough. [why] Compared to DP1.4 link from hub to monitor, DP2.1 link has larger full_pbn value that causes overflow in the process of doing conversion from pbn to kbps. [how] Change the data type accordingly to fit into the data limit during conversion calculation. Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Reviewed-by: Wayne Lin <wayne.lin@amd.com> Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com> Signed-off-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-17	drm/amd/display: use eld_mutex to protect access to connector->eld	Dmitry Baryshkov
	[ Upstream commit 819bee01eea06282d7bda17d46caf29cae4f6d84 ] Reading access to connector->eld can happen at the same time the drm_edid_to_eld() updates the data. Take the newly added eld_mutex in order to protect connector->eld from concurrent access. Reviewed-by: Maxime Ripard <mripard@kernel.org> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241206-drm-connector-eld-mutex-v2-4-c9bce1ee8bea@linaro.org Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-17	drm/amd/display: Overwriting dualDPP UBF values before usage	Ausef Yousof
	[ Upstream commit 24909d9ec7c3afa8da2f3c9afa312e7a4a61f250 ] [WHY] Right now in dml2 mode validation we are calculating UBF parameters for prefetch calculation for single and dual DPP scenarios. Data structure to store such values are just 1D arrays, the single DPP values are overwritten by the dualDPP values, and we end up using dualDPP for prefetch calculations twice (once in place of singleDPP support check and again for dual). This naturally leads to many problems, one of which validating a mode in "singleDPP" (when we used dual DPP parameters) and sending the singleDPP parameters to mode programming, if we cannot support then we observe the corruption as described in the ticket. [HOW] UBF values need to have 2d arrays to store values specific to single and dual DPP states to avoid single DPP values being overwritten. Other parameters are recorded on a per state basis such as prefetch UBF values but they are in the same loop used for calculation and at that point its fine to overwrite them, its not the case for plain UBF values. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Ausef Yousof <Ausef.Yousof@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-17	drm/amd/display: Populate chroma prefetch parameters, DET buffer fix	Ausef Yousof
	[ Upstream commit 70fec46519fca859aa209f5f02e7e0a0123aca4a ] [WHY] Soft hang/lag observed during 10bit playback + moving cursor, corruption observed in other tickets for same reason, also failing MPO. 1. Currently, we are always running calculate_lowest_supported_state_for_temp_read which is only necessary on dGPU 2. Fast validate path does not apply DET buffer allocation policy 3. Prefetch UrgBFactor chroma parameter not populated in prefetch calculation [HOW] 1. Add a check to see if we are on APU, if so, skip the code 2. Add det buffer alloc policy checks to fast validate path 3. Populate UrgentBurstChroma param in call to calculate UrgBChroma prefetch values -revision commits: small formatting/brackets/null check addition + remove test change + dGPU code Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Ausef Yousof <Ausef.Yousof@amd.com> Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-08	drm/amd/display: Add hubp cache reset when powergating	Aric Cyr
	commit 01130f5260e5868fb6b15ab8c00dbc894139f48e upstream. [Why] When HUBP is power gated, the SW state can get out of sync with the hardware state causing cursor to not be programmed correctly. [How] Similar to DPP, add a HUBP reset function which is called wherever HUBP is initialized or powergated. This function will clear the cursor position and attribute cache allowing for proper programming when the HUBP is brought back up. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Sung Lee <sung.lee@amd.com> Signed-off-by: Aric Cyr <Aric.Cyr@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-08	drm/amd/display: Reduce accessing remote DPCD overhead	Wayne Lin
	commit adb4998f4928a17d91be054218a902ba9f8c1f93 upstream. [Why] Observed frame rate get dropped by tool like glxgear. Even though the output to monitor is 60Hz, the rendered frame rate drops to 30Hz lower. It's due to code path in some cases will trigger dm_dp_mst_is_port_support_mode() to read out remote Link status to assess the available bandwidth for dsc maniplation. Overhead of keep reading remote DPCD is considerable. [How] Store the remote link BW in mst_local_bw and use end-to-end full_pbn as an indicator to decide whether update the remote link bw or not. Whenever we need the info to assess the BW, visit the stored one first. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3720 Fixes: fa57924c76d9 ("drm/amd/display: Refactor function dm_dp_mst_is_port_support_mode()") Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Jerry Zuo <jerry.zuo@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 4a9a918545455a5979c6232fcf61ed3d8f0db3ae) Cc: stable@vger.kernel.org Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-08	drm/amdgpu: fix gpu recovery disable with per queue reset	Jonathan Kim
	[ Upstream commit 86bde64cb7957be393f84e5d35fb8dfc91e4ae7e ] Per queue reset should be bypassed when gpu recovery is disabled with module parameter. Fixes: ee0a469cf917 ("drm/amdkfd: support per-queue reset on gfx9") Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-08	Revert "drm/amdgpu/gfx9: put queue resets behind a debug option"	Alex Deucher
	[ Upstream commit 32f00289698189b813942f37626218fd473e7302 ] This reverts commit 7c1a2d8aba6cadde0cc542b2d805edc0be667e79. Extended validation has completed successfully, so enable these features by default. Acked-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Jonathan Kim <jonathan.kim@amd.com> Cc: Jiadong Zhu <Jiadong.Zhu@amd.com> Stable-dep-of: 86bde64cb795 ("drm/amdgpu: fix gpu recovery disable with per queue reset") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-08	drm/amdgpu: tear down ttm range manager for doorbell in amdgpu_ttm_fini()	Jiang Liu
	[ Upstream commit 60a2c0c12b644450e420ffc42291d1eb248bacb7 ] Tear down ttm range manager for doorbell in function amdgpu_ttm_fini(), to avoid memory leakage. Fixes: 792b84fb9038 ("drm/amdgpu: initialize ttm for doorbells") Signed-off-by: Jiang Liu <gerry@linux.alibaba.com> Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-08	drm/amdgpu/vcn: reset fw_shared under SRIOV	Bokun Zhang
	[ Upstream commit 3676f37a88432132bcff55a17dc48911239b6d98 ] - The previous patch only considered the case for baremetal and is not applicable for SRIOV code path. We also need to init fw_share for SRIOV VF Fixes: 928cd772e18f ("drm/amdgpu/vcn: reset fw_shared when VCPU buffers corrupted on vcn v4.0.3") Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Bokun Zhang <bokun.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-08	drm/amdgpu: Fix potential NULL pointer dereference in ↵	Ivan Stepchenko
	atomctrl_get_smc_sclk_range_table [ Upstream commit 357445e28ff004d7f10967aa93ddb4bffa5c3688 ] The function atomctrl_get_smc_sclk_range_table() does not check the return value of smu_atom_get_data_table(). If smu_atom_get_data_table() fails to retrieve SMU_Info table, it returns NULL which is later dereferenced. Found by Linux Verification Center (linuxtesting.org) with SVACE. In practice this should never happen as this code only gets called on polaris chips and the vbios data table will always be present on those chips. Fixes: a23eefa2f461 ("drm/amd/powerplay: enable dpm for baffin.") Signed-off-by: Ivan Stepchenko <sid@itb.spb.ru> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-08	drm/amd/pm: Fix an error handling path in ↵	Christophe JAILLET
	vega10_enable_se_edc_force_stall_config() [ Upstream commit a3300782d5375e280ba7040f323d01960bfe3396 ] In case of error after a amdgpu_gfx_rlc_enter_safe_mode() call, it is not balanced by a corresponding amdgpu_gfx_rlc_exit_safe_mode() call. Add the missing call. Fixes: 9b7b8154cdb8 ("drm/amd/powerplay: added didt support for vega10") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-01	drm/amd/display: Initialize denominator defaults to 1	Alex Hung
	[ Upstream commit 36b23e3baf9129d5b6c3a3a85b6b7ffb75ae287c ] [WHAT & HOW] Variables, used as denominators and maybe not assigned to other values, should be initialized to non-zero to avoid DIVIDE_BY_ZERO, as reported by Coverity. Reviewed-by: Austin Zheng <austin.zheng@amd.com> Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e2c4c6c10542ccfe4a0830bb6c9fd5b177b7bbb7) Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-01	drm/amd/display: Use HW lock mgr for PSR1	Tom Chung
	[ Upstream commit b5c764d6ed556c4e81fbe3fd976da77ec450c08e ] [Why] Without the dmub hw lock, it may cause the lock timeout issue while do modeset on PSR1 eDP panel. [How] Allow dmub hw lock for PSR1. Reviewed-by: Sun peng Li <sunpeng.li@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a2b5a9956269f4c1a09537177f18ab0229fe79f7) Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-23	drm/amd/display: Validate mdoe under MST LCT=1 case as well	Wayne Lin
	commit b5cd418f016fb801be413fd52fe4711d2d13018c upstream. [Why & How] Currently in dm_dp_mst_is_port_support_mode(), when valdidating mode under dsc decoding at the last DP link config, we only validate the case when there is an UFP. However, if the MSTB LCT=1, there is no UFP. Under this case, use root_link_bw_in_kbps as the available bw to compare. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3720 Fixes: fa57924c76d9 ("drm/amd/display: Refactor function dm_dp_mst_is_port_support_mode()") Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Jerry Zuo <jerry.zuo@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a04d9534a8a75b2806c5321c387be450c364b55e) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-23	Revert "drm/amd/display: Enable urgent latency adjustments for DCN35"	Nicholas Susanto
	commit 3412860cc4c0c484f53f91b371483e6e4440c3e5 upstream. Revert commit 284f141f5ce5 ("drm/amd/display: Enable urgent latency adjustments for DCN35") [Why & How] Urgent latency increase caused 2.8K OLED monitor caused it to block this panel support P0. Reverting this change does not reintroduce the netflix corruption issue which it fixed. Fixes: 284f141f5ce5 ("drm/amd/display: Enable urgent latency adjustments for DCN35") Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Nicholas Susanto <Nicholas.Susanto@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c7ccfc0d4241a834c25a9a9e1e78b388b4445d23) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-23	drm/amd/display: Do not wait for PSR disable on vbl enable	Leo Li
	commit ff2e4d874726c549130308b6b46aa0f8a34e04cb upstream. [Why] Outside of a modeset/link configuration change, we should not have to wait for the panel to exit PSR. Depending on the panel and it's state, it may take multiple frames for it to exit PSR. Therefore, waiting in all scenarios may cause perceived stuttering, especially in combination with faster vblank shutdown. [How] PSR1 disable is hooked up to the vblank enable event, and vice versa. In case of vblank enable, do not wait for panel to exit PSR, but still wait in all other cases. We also avoid a call to unnecessarily change power_opts on disable - this ends up sending another command to dmcub fw. When testing against IGT, some crc tests like kms_plane_alpha_blend and amd_hotplug were failing due to CRC timeouts. This was found to be caused by the early return before HW has fully exited PSR1. Fix this by first making sure we grab a vblank reference, then waiting for panel to exit PSR1, before programming hw for CRC generation. Fixes: 58a261bfc967 ("drm/amd/display: use a more lax vblank enable policy for older ASICs") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3743 Reviewed-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit aa6713fa2046f4c09bf3013dd1420ae15603ca6f) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-23	drm/amd/display: Disable replay and psr while VRR is enabled	Tom Chung
	commit 67edb81d6e9af43a0d58edf74630f82cfda4155d upstream. [Why] Replay and PSR will cause some video corruption while VRR is enabled. [How] 1. Disable the Replay and PSR while VRR is enabled. 2. Change the amdgpu_dm_crtc_vrr_active() parameter to const. Because the function will only read data from dm_crtc_state. Reviewed-by: Sun peng Li <sunpeng.li@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d7879340e987b3056b8ae39db255b6c19c170a0d) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-23	drm/amd/display: Fix PSR-SU not support but still call the amdgpu_dm_psr_enable	Tom Chung
	commit b0a3e840ad287c33a86b5515d606451b7df86ad4 upstream. [Why] The enum DC_PSR_VERSION_SU_1 of psr_version is 1 and DC_PSR_VERSION_UNSUPPORTED is 0xFFFFFFFF. The original code may has chance trigger the amdgpu_dm_psr_enable() while psr version is set to DC_PSR_VERSION_UNSUPPORTED. [How] Modify the condition to psr->psr_version == DC_PSR_VERSION_SU_1 Reviewed-by: Sun peng Li <sunpeng.li@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f765e7ce0417f8dc38479b4b495047c397c16902) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-23	drm/amdgpu: always sync the GFX pipe on ctx switch	Christian König
	commit af04b320c71c4b59971f021615876808a36e5038 upstream. That is needed to enforce isolation between contexts. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit def59436fb0d3ca0f211d14873d0273d69ebb405) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>