summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)Author
7 daysdrm/amdgpu: fix a memory leak in fence cleanup when unloadingAlex Deucher
commit 7838fb5f119191403560eca2e23613380c0e425e upstream. Commit b61badd20b44 ("drm/amdgpu: fix usage slab after free") reordered when amdgpu_fence_driver_sw_fini() was called after that patch, amdgpu_fence_driver_sw_fini() effectively became a no-op as the sched entities we never freed because the ring pointers were already set to NULL. Remove the NULL setting. Reported-by: Lin.Cao <lincao12@amd.com> Cc: Vitaly Prosyak <vitaly.prosyak@amd.com> Cc: Christian König <christian.koenig@amd.com> Fixes: b61badd20b44 ("drm/amdgpu: fix usage slab after free") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a525fa37aac36c4591cc8b07ae8957862415fbd5) Cc: stable@vger.kernel.org [ Adapt to conditional check ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 daysdrm/amdgpu/vcn4: Fix IB parsing with multiple engine info packagesDavid Rosca
commit 2b10cb58d7a3fd621ec9b2ba765a092e562ef998 upstream. There can be multiple engine info packages in one IB and the first one may be common engine, not decode/encode. We need to parse the entire IB instead of stopping after finding first engine info. Signed-off-by: David Rosca <david.rosca@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit dc8f9f0f45166a6b37864e7a031c726981d6e5fc) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 daysdrm/amdgpu/vcn: Allow limiting ctx to instance 0 for AV1 at any timeDavid Rosca
commit 3318f2d20ce48849855df5e190813826d0bc3653 upstream. There is no reason to require this to happen on first submitted IB only. We need to wait for the queue to be idle, but it can be done at any time (including when there are multiple video sessions active). Signed-off-by: David Rosca <david.rosca@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8908fdce0634a623404e9923ed2f536101a39db5) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 daysdrm/amdgpu: Add back JPEG to video caps for carrizo and newerDavid Rosca
[ Upstream commit 2036be31741b00f030530381643a8b35a5a42b5c ] JPEG is not supported on Vega only. Fixes: 0a6e7b06bdbe ("drm/amdgpu: Remove JPEG from vega and carrizo video caps") Signed-off-by: David Rosca <david.rosca@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0f4dfe86fe922c37bcec99dce80a15b4d5d4726d) Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-09drm/amd/amdgpu: Fix missing error return on kzalloc failureColin Ian King
[ Upstream commit 467e00b30dfe75c4cfc2197ceef1fddca06adc25 ] Currently the kzalloc failure check just sets reports the failure and sets the variable ret to -ENOMEM, which is not checked later for this specific error. Fix this by just returning -ENOMEM rather than setting ret. Fixes: 4fb930715468 ("drm/amd/amdgpu: remove redundant host to psp cmd buf allocations") Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1ee9d1a0962c13ba5ab7e47d33a80e3b8dc4b52e) Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-09Revert "drm/amdgpu: Avoid extra evict-restore process."Alex Deucher
This reverts commit 71598a5a7797f0052aaa7bcff0b8d4b8f20f1441 which is commit 1f02f2044bda1db1fd995bc35961ab075fa7b5a2 upstream. This commit introduced a regression, however the fix for the regression: aa5fc4362fac ("drm/amdgpu: fix task hang from failed job submission during process kill") depends on things not yet present in 6.12.y and older kernels. Since this commit is more of an optimization, just revert it for 6.12.y and older stable kernels. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x - 6.12.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-09drm/amdgpu: drop hw access in non-DC audio finiAlex Deucher
commit 71403f58b4bb6c13b71c05505593a355f697fd94 upstream. We already disable the audio pins in hw_fini so there is no need to do it again in sw_fini. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4481 Cc: oushixiong <oushixiong1025@163.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5eeb16ca727f11278b2917fd4311a7d7efb0bbd6) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-04Revert "drm/amdgpu: fix incorrect vm flags to map bo"Alex Deucher
commit ac4ed2da4c1305a1a002415058aa7deaf49ffe3e upstream. This reverts commit b08425fa77ad2f305fe57a33dceb456be03b653f. Revert this to align with 6.17 because the fixes tag was wrong on this commit. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit be33e8a239aac204d7e9e673c4220ef244eb1ba3) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-28drm/amdgpu: update mmhub 4.1.0 client id mappingsAlex Deucher
commit a0b34e4c8663b13e45c78267b4de3004b1a72490 upstream. Update the client id mapping so the correct clients get printed when there is a mmhub page fault. Tested-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-28drm/amdgpu: update mmhub 3.0.1 client id mappingsAlex Deucher
commit 0bae62cc989fa99ac9cb564eb573aad916d1eb61 upstream. Update the client id mapping so the correct clients get printed when there is a mmhub page fault. Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2a2681eda73b99a2c1ee8cdb006099ea5d0c2505) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-28drm/amdgpu: Update external revid for GC v9.5.0Lijo Lazar
commit 05c8b690511854ba31d8d1bff7139a13ec66b9e7 upstream. Use different external revid for GC v9.5.0 SOCs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 21c6764ed4bfaecad034bc4fd15dd64c5a436325) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-28drm/amdgpu: Initialize data to NULL in imu_v12_0_program_rlc_ram()Nathan Chancellor
commit c90f2e1172c51fa25492471dc9910e2d7c1444b9 upstream. After a recent change in clang to expose uninitialized warnings from const variables and pointers [1], there is a warning in imu_v12_0_program_rlc_ram() because data is passed uninitialized to program_imu_rlc_ram(): drivers/gpu/drm/amd/amdgpu/imu_v12_0.c:374:30: error: variable 'data' is uninitialized when used here [-Werror,-Wuninitialized] 374 | program_imu_rlc_ram(adev, data, (const u32)size); | ^~~~ As this warning happens early in clang's frontend, it does not realize that due to the assignment of r to -EINVAL, program_imu_rlc_ram() is never actually called, and even if it were, data would not be dereferenced because size is 0. Just initialize data to NULL to silence the warning, as the commit that added program_imu_rlc_ram() mentioned it would eventually be used over the old method, at which point data can be properly initialized and used. Cc: stable@vger.kernel.org Closes: https://github.com/ClangBuiltLinux/linux/issues/2107 Fixes: 56159fffaab5 ("drm/amdgpu: use new method to program rlc ram") Link: https://github.com/llvm/llvm-project/commit/2464313eef01c5b1edf0eccf57a32cdee01472c7 [1] Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-28drm/amdgpu: Avoid extra evict-restore process.Gang Ba
commit 1f02f2044bda1db1fd995bc35961ab075fa7b5a2 upstream. If vm belongs to another process, this is fclose after fork, wait may enable signaling KFD eviction fence and cause parent process queue evicted. [677852.634569] amdkfd_fence_enable_signaling+0x56/0x70 [amdgpu] [677852.634814] __dma_fence_enable_signaling+0x3e/0xe0 [677852.634820] dma_fence_wait_timeout+0x3a/0x140 [677852.634825] amddma_resv_wait_timeout+0x7f/0xf0 [amdkcl] [677852.634831] amdgpu_vm_wait_idle+0x2d/0x60 [amdgpu] [677852.635026] amdgpu_flush+0x34/0x50 [amdgpu] [677852.635208] filp_flush+0x38/0x90 [677852.635213] filp_close+0x14/0x30 [677852.635216] do_close_on_exec+0xdd/0x130 [677852.635221] begin_new_exec+0x1da/0x490 [677852.635225] load_elf_binary+0x307/0xea0 [677852.635231] ? srso_alias_return_thunk+0x5/0xfbef5 [677852.635235] ? ima_bprm_check+0xa2/0xd0 [677852.635240] search_binary_handler+0xda/0x260 [677852.635245] exec_binprm+0x58/0x1a0 [677852.635249] bprm_execve.part.0+0x16f/0x210 [677852.635254] bprm_execve+0x45/0x80 [677852.635257] do_execveat_common.isra.0+0x190/0x200 Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Gang Ba <Gang.Ba@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-28drm/amdgpu/discovery: fix fw based ip discoveryAlex Deucher
commit 514678da56da089b756b4d433efd964fa22b2079 upstream. We only need the fw based discovery table for sysfs. No need to parse it. Additionally parsing some of the board specific tables may result in incorrect data on some boards. just load the binary and don't parse it on those boards. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4441 Fixes: 80a0e8282933 ("drm/amdgpu/discovery: optionally use fw based ip discovery") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 62eedd150fa11aefc2d377fc746633fdb1baeb55) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-28amdgpu/amdgpu_discovery: increase timeout limit for IFWI initXaver Hugl
commit 928587381b54b1b6c62736486b1dc6cb16c568c2 upstream. With a timeout of only 1 second, my rx 5700XT fails to initialize, so this increases the timeout to 2s. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3697 Signed-off-by: Xaver Hugl <xaver.hugl@kde.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 9ed3d7bdf2dcdf1a1196630fab89a124526e9cc2) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-20drm/amdgpu: fix incorrect vm flags to map boJack Xiao
[ Upstream commit 040bc6d0e0e9c814c9c663f6f1544ebaff6824a8 ] It should use vm flags instead of pte flags to specify bo vm attributes. Fixes: 7946340fa389 ("drm/amdgpu: Move csa related code to separate file") Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit b08425fa77ad2f305fe57a33dceb456be03b653f) Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amdgpu: fix vram reservation issueYiPeng Chai
[ Upstream commit 10ef476aad1c848449934e7bec2ab2374333c7b6 ] The vram block allocation flag must be cleared before making vram reservation, otherwise reserving addresses within the currently freed memory range will always fail. Fixes: c9cad937c0c5 ("drm/amdgpu: add drm buddy support to amdgpu") Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d38eaf27de1b8584f42d6fb3f717b7ec44b3a7a1) Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/amdgpu/gfx10: fix kiq locking in KCQ resetAlex Deucher
[ Upstream commit a4b2ba8f631d3e44b30b9b46ee290fbfe608b7d0 ] The ring test needs to be inside the lock. Fixes: 097af47d3cfb ("drm/amdgpu/gfx10: wait for reset done before remap") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/amdgpu/gfx9.4.3: fix kiq locking in KCQ resetAlex Deucher
[ Upstream commit 08f116c59310728ea8b7e9dc3086569006c861cf ] The ring test needs to be inside the lock. Fixes: 4c953e53cc34 ("drm/amdgpu/gfx_9.4.3: wait for reset done before remap") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/amdgpu/gfx9: fix kiq locking in KCQ resetAlex Deucher
[ Upstream commit 730ea5074dac1b105717316be5d9c18b09829385 ] The ring test needs to be inside the lock. Fixes: fdbd69486b46 ("drm/amdgpu/gfx9: wait for reset done before remap") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/amdgpu: Remove nbiov7.9 replay count reportingLijo Lazar
[ Upstream commit 0f566f0e9c614aa3d95082246f5b8c9e8a09c8b3 ] Direct pcie replay count reporting is not available on nbio v7.9. Reporting is done through firmware. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Fixes: 50709d18f4a6 ("drm/amdgpu: Add pci replay count to nbio v7.9") Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-01drm/amdgpu: Reset the clear flag in buddy during resumeArunpravin Paneer Selvam
commit 95a16160ca1d75c66bf7a1c5e0bcaffb18e7c7fc upstream. - Added a handler in DRM buddy manager to reset the cleared flag for the blocks in the freelist. - This is necessary because, upon resuming, the VRAM becomes cluttered with BIOS data, yet the VRAM backend manager believes that everything has been cleared. v2: - Add lock before accessing drm_buddy_clear_reset_blocks()(Matthew Auld) - Force merge the two dirty blocks.(Matthew Auld) - Add a new unit test case for this issue.(Matthew Auld) - Having this function being able to flip the state either way would be good. (Matthew Brost) v3(Matthew Auld): - Do merge step first to avoid the use of extra reset flag. Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Cc: stable@vger.kernel.org Fixes: a68c7eaa7a8f ("drm/amdgpu: Enable clear page functionality") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3812 Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250716075125.240637-2-Arunpravin.PaneerSelvam@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-24drm/amdgpu: Increase reset counter only on successLijo Lazar
commit 86790e300d8b7bbadaad5024e308c52f1222128f upstream. Increment the reset counter only if soft recovery succeeded. This is consistent with a ring hard reset behaviour where counter gets incremented only if hard reset succeeded. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 25c314aa3ec3d30e4ee282540e2096b5c66a2437) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-24drm/amdgpu/gfx8: reset compute ring wptr on the GPU on resumeEeli Haapalainen
commit 83261934015c434fabb980a3e613b01d9976e877 upstream. Commit 42cdf6f687da ("drm/amdgpu/gfx8: always restore kcq MQDs") made the ring pointer always to be reset on resume from suspend. This caused compute rings to fail since the reset was done without also resetting it for the firmware. Reset wptr on the GPU to avoid a disconnect between the driver and firmware wptr. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3911 Fixes: 42cdf6f687da ("drm/amdgpu/gfx8: always restore kcq MQDs") Signed-off-by: Eeli Haapalainen <eeli.haapalainen@protonmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2becafc319db3d96205320f31cc0de4ee5a93747) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-17drm/amdgpu: Replace Mutex with Spinlock for RLCG register access to avoid ↵Srinivasan Shanmugam
Priority Inversion in SRIOV commit dc0297f3198bd60108ccbd167ee5d9fa4af31ed0 upstream. RLCG Register Access is a way for virtual functions to safely access GPU registers in a virtualized environment., including TLB flushes and register reads. When multiple threads or VFs try to access the same registers simultaneously, it can lead to race conditions. By using the RLCG interface, the driver can serialize access to the registers. This means that only one thread can access the registers at a time, preventing conflicts and ensuring that operations are performed correctly. Additionally, when a low-priority task holds a mutex that a high-priority task needs, ie., If a thread holding a spinlock tries to acquire a mutex, it can lead to priority inversion. register access in amdgpu_virt_rlcg_reg_rw especially in a fast code path is critical. The call stack shows that the function amdgpu_virt_rlcg_reg_rw is being called, which attempts to acquire the mutex. This function is invoked from amdgpu_sriov_wreg, which in turn is called from gmc_v11_0_flush_gpu_tlb. The [ BUG: Invalid wait context ] indicates that a thread is trying to acquire a mutex while it is in a context that does not allow it to sleep (like holding a spinlock). Fixes the below: [ 253.013423] ============================= [ 253.013434] [ BUG: Invalid wait context ] [ 253.013446] 6.12.0-amdstaging-drm-next-lol-050225 #14 Tainted: G U OE [ 253.013464] ----------------------------- [ 253.013475] kworker/0:1/10 is trying to lock: [ 253.013487] ffff9f30542e3cf8 (&adev->virt.rlcg_reg_lock){+.+.}-{3:3}, at: amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.013815] other info that might help us debug this: [ 253.013827] context-{4:4} [ 253.013835] 3 locks held by kworker/0:1/10: [ 253.013847] #0: ffff9f3040050f58 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680 [ 253.013877] #1: ffffb789c008be40 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680 [ 253.013905] #2: ffff9f3054281838 (&adev->gmc.invalidate_lock){+.+.}-{2:2}, at: gmc_v11_0_flush_gpu_tlb+0x198/0x4f0 [amdgpu] [ 253.014154] stack backtrace: [ 253.014164] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Tainted: G U OE 6.12.0-amdstaging-drm-next-lol-050225 #14 [ 253.014189] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [ 253.014203] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/18/2024 [ 253.014224] Workqueue: events work_for_cpu_fn [ 253.014241] Call Trace: [ 253.014250] <TASK> [ 253.014260] dump_stack_lvl+0x9b/0xf0 [ 253.014275] dump_stack+0x10/0x20 [ 253.014287] __lock_acquire+0xa47/0x2810 [ 253.014303] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.014321] lock_acquire+0xd1/0x300 [ 253.014333] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.014562] ? __lock_acquire+0xa6b/0x2810 [ 253.014578] __mutex_lock+0x85/0xe20 [ 253.014591] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.014782] ? sched_clock_noinstr+0x9/0x10 [ 253.014795] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.014808] ? local_clock_noinstr+0xe/0xc0 [ 253.014822] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.015012] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.015029] mutex_lock_nested+0x1b/0x30 [ 253.015044] ? mutex_lock_nested+0x1b/0x30 [ 253.015057] amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.015249] amdgpu_sriov_wreg+0xc5/0xd0 [amdgpu] [ 253.015435] gmc_v11_0_flush_gpu_tlb+0x44b/0x4f0 [amdgpu] [ 253.015667] gfx_v11_0_hw_init+0x499/0x29c0 [amdgpu] [ 253.015901] ? __pfx_smu_v13_0_update_pcie_parameters+0x10/0x10 [amdgpu] [ 253.016159] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.016173] ? smu_hw_init+0x18d/0x300 [amdgpu] [ 253.016403] amdgpu_device_init+0x29ad/0x36a0 [amdgpu] [ 253.016614] amdgpu_driver_load_kms+0x1a/0xc0 [amdgpu] [ 253.017057] amdgpu_pci_probe+0x1c2/0x660 [amdgpu] [ 253.017493] local_pci_probe+0x4b/0xb0 [ 253.017746] work_for_cpu_fn+0x1a/0x30 [ 253.017995] process_one_work+0x21e/0x680 [ 253.018248] worker_thread+0x190/0x330 [ 253.018500] ? __pfx_worker_thread+0x10/0x10 [ 253.018746] kthread+0xe7/0x120 [ 253.018988] ? __pfx_kthread+0x10/0x10 [ 253.019231] ret_from_fork+0x3c/0x60 [ 253.019468] ? __pfx_kthread+0x10/0x10 [ 253.019701] ret_from_fork_asm+0x1a/0x30 [ 253.019939] </TASK> v2: s/spin_trylock/spin_lock_irqsave to be safe (Christian). Fixes: e864180ee49b ("drm/amdgpu: Add lock around VF RLCG interface") Cc: lin cao <lin.cao@amd.com> Cc: Jingwen Chen <Jingwen.Chen2@amd.com> Cc: Victor Skvortsov <victor.skvortsov@amd.com> Cc: Zhigang Luo <zhigang.luo@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> [ Minor context change fixed. ] Signed-off-by: Wenshan Lan <jetlan9@163.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-17drm/amdgpu/ip_discovery: add missing ip_discovery fwFlora Cui
commit 2f6dd741cdcdadb9e125cc66d4fcfbe5ab92d36a upstream. Signed-off-by: Flora Cui <flora.cui@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Jonathan Gray <jsg@jsg.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-17drm/amdgpu/discovery: use specific ip_discovery.bin for legacy asicsFlora Cui
commit 25f602fbbcc8271f6e72211b54808ba21e677762 upstream. vega10/vega12/vega20/raven/raven2/picasso/arcturus/aldebaran Signed-off-by: Flora Cui <flora.cui@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Jonathan Gray <jsg@jsg.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-10drm/amdgpu/mes: add missing locking in helper functionsAlex Deucher
[ Upstream commit 40f970ba7a4ab77be2ffe6d50a70416c8876496a ] We need to take the MES lock. Reviewed-by: Michael Chen <michael.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-10drm/amdgpu: add kicker fws loading for gfx11/smu13/psp13Frank Min
[ Upstream commit 854171405e7f093532b33d8ed0875b9e34fc55b4 ] 1. Add kicker firmwares loading for gfx11/smu13/psp13 2. Register additional MODULE_FIRMWARE entries for kicker fws - gc_11_0_0_rlc_kicker.bin - gc_11_0_0_imu_kicker.bin - psp_13_0_0_sos_kicker.bin - psp_13_0_0_ta_kicker.bin - smu_13_0_0_kicker.bin Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit fb5ec2174d70a8989bc207d257db90ffeca3b163) Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-10drm/amdgpu: VCN v5_0_1 to prevent FW checking RB during DPG pauseSonny Jiang
[ Upstream commit 46e15197b513e60786a44107759d6ca293d6288c ] Add a protection to ensure programming are all complete prior VCPU starting. This is a WA for an unintended VCPU running. Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c29521b529fa5e225feaf709d863a636ca0cbbfa) Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-06drm/amdgpu: switch job hw_fence to amdgpu_fenceAlex Deucher
commit ebe43542702c3d15d1a1d95e8e13b1b54076f05a upstream. Use the amdgpu fence container so we can store additional data in the fence. This also fixes the start_time handling for MCBP since we were casting the fence to an amdgpu_fence and it wasn't. Fixes: 3f4c175d62d8 ("drm/amdgpu: MCBP based on DRM scheduler (v9)") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit bf1cd14f9e2e1fdf981eed273ddd595863f5288c) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-06drm/amdgpu: Fix SDMA UTC_L1 handling during start/stop sequencesJesse Zhang
commit 7f3b16f3f229e37cc3e02e9e4e7106c523b119e9 upstream. This commit makes two key fixes to SDMA v4.4.2 handling: 1. disable UTC_L1 in sdma_cntl register when stopping SDMA engines by reading the current value before modifying UTC_L1_ENABLE bit. 2. Ensure UTC_L1_ENABLE is consistently managed by: - Adding the missing register write when enabling UTC_L1 during start - Keeping UTC_L1 enabled by default as per hardware requirements v2: Correct SDMA_CNTL setting (Philip) Suggested-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 375bf564654e85a7b1b0657b191645b3edca1bda) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-06drm/amdgpu: Add kicker device detectionFrank Min
commit 0bbf5fd86c585d437b75003f11365b324360a5d6 upstream. 1. add kicker device list 2. add kicker device checking helper function Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 09aa2b408f4ab689c3541d22b0968de0392ee406) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-06drm/amdgpu: amdgpu_vram_mgr_new(): Clamp lpfn to total vramJohn Olender
commit 4d2f6b4e4c7ed32e7fa39fcea37344a9eab99094 upstream. The drm_mm allocator tolerated being passed end > mm->size, but the drm_buddy allocator does not. Restore the pre-buddy-allocator behavior of allowing such placements. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3448 Signed-off-by: John Olender <john.olender@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-06drm/amd: Adjust output for discovery error handlingMario Limonciello
[ Upstream commit 73eab78721f7b85216f1ca8c7b732f13213b5b32 ] commit 017fbb6690c2 ("drm/amdgpu/discovery: check ip_discovery fw file available") added support for reading an amdgpu IP discovery bin file for some specific products. If it's not found then it will fallback to hardcoded values. However if it's not found there is also a lot of noise about missing files and errors. Adjust the error handling to decrease most messages to DEBUG and to show users less about missing files. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reported-by: Marcus Seyfarth <m.seyfarth@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4312 Tested-by: Marcus Seyfarth <m.seyfarth@gmail.com> Fixes: 017fbb6690c2 ("drm/amdgpu/discovery: check ip_discovery fw file available") Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250617183052.1692059-1-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 49f1f9f6c3c9febf8ba93f94a8d9c8d03e1ea0a1) Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-06drm/amdgpu/discovery: optionally use fw based ip discoveryAlex Deucher
[ Upstream commit 80a0e828293389358f7db56adcdcb22b28df5e11 ] On chips without native IP discovery support, use the fw binary if available, otherwise we can continue without it. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Flora Cui <flora.cui@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 73eab78721f7 ("drm/amd: Adjust output for discovery error handling") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-06drm/amdgpu: seq64 memory unmap uses uninterruptible lockPhilip Yang
[ Upstream commit a359288ccb4dd8edb086e7de8fdf6e36f544c922 ] To unmap and free seq64 memory when drm node close to free vm, if there is signal accepted, then taking vm lock failed and leaking seq64 va mapping, and then dmesg has error log "still active bo inside vm". Change to use uninterruptible lock fix the mapping leaking and no dmesg error log. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27drm/amdgpu: read back register after written for VCN v4.0.5David (Ming Qiang) Wu
commit ee7360fc27d6045510f8fe459b5649b2af27811a upstream. On VCN v4.0.5 there is a race condition where the WPTR is not updated after starting from idle when doorbell is used. Adding register read-back after written at function end is to ensure all register writes are done before they can be used. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12528 Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 07c9db090b86e5211188e1b351303fbc673378cf) Cc: stable@vger.kernel.org (cherry picked from commit ee7360fc27d6045510f8fe459b5649b2af27811a) Hand modified for contextual changes where there is a for loop in 6.12 that was dropped later on. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-29drm/amdgpu: enlarge the VBIOS binary size limitShiwu Zhang
[ Upstream commit 667b96134c9e206aebe40985650bf478935cbe04 ] Some chips have a larger VBIOS file so raise the size limit to support the flashing tool. Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-29drm/amdgpu: Use active umc info from discoveryLijo Lazar
[ Upstream commit f7a594e40517fa2ab25d5ca10e7b6a158f529fb5 ] There could be configs where some UMC instances are harvested. This information is obtained through discovery data and populated in umc.active_mask. Avoid reassigning this as AID mask, instead use the mask directly while iterating through umc instances. This is to avoid accesses to harvested UMC instances. v2: fix warning (Alex) Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-29drm/amdgpu: reset psp->cmd to NULL after releasing the bufferJiang Liu
[ Upstream commit e92f3f94cad24154fd3baae30c6dfb918492278d ] Reset psp->cmd to NULL after releasing the buffer in function psp_sw_fini(). Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jiang Liu <gerry@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-29drm/amdgpu: Set snoop bit for SDMA for MI seriesHarish Kasiviswanathan
[ Upstream commit 3394b1f76d3f8adf695ceed350a5dae49003eb37 ] SDMA writes has to probe invalidate RW lines. Set snoop bit in mmhub for this to happen. v2: Missed a few mmhub_v9_4. Added now. v3: Calculate hub offset once since it doesn't change inside the loop Modified function names based on review comments. Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-29drm/amdgpu/mes11: fix set_hw_resources_1 calculationAlex Deucher
[ Upstream commit 1350dd3691b5f757a948e5b9895d62c422baeb90 ] It's GPU page size not CPU page size. In most cases they are the same, but not always. This can lead to overallocation on systems with larger pages. Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Cc: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-29drm/amdgpu: remove all KFD fences from the BO on releaseChristian König
[ Upstream commit cb0de06d1b0afb2d0c600ad748069f5ce27730ec ] Remove all KFD BOs from the private dma_resv object. This prevents the KFD from being evict unecessarily when an exported BO is released. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-and-tested-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-29drm/amdgpu: Do not program AGP BAR regs under SRIOV in gfxhub_v1_0.cVictor Lu
[ Upstream commit 057fef20b8401110a7bc1c2fe9d804a8a0bf0d24 ] SRIOV VF does not have write access to AGP BAR regs. Skip the writes to avoid a dmesg warning. Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-29drm/amdgpu: Fix missing drain retry fault the last entryEmily Deng
[ Upstream commit fe2fa3be3d59ba67d6de54a0064441ec233cb50c ] While the entry get in svm_range_unmap_from_cpu is the last entry, and the entry is page fault, it also need to be dropped. So for equal case, it also need to be dropped. v2: Only modify the svm_range_restore_pages. Signed-off-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Xiaogang Chen<xiaogang.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-29drm/amdgpu: Update SRIOV video codec capsDavid Rosca
[ Upstream commit 19478f2011f8b53dee401c91423c4e0b73753e4f ] There have been multiple fixes to the video caps that are missing for SRIOV. Update the SRIOV caps with correct values. Signed-off-by: David Rosca <david.rosca@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-29drm/amdgpu/gfx11: don't read registers in mqd initAlex Deucher
[ Upstream commit e27b36ea6ba5f29e91fcfb375ea29503708fcf43 ] Just use the default values. There's not need to get the value from hardware and it could cause problems if we do that at runtime and gfxoff is active. Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-29drm/amdgpu/gfx12: don't read registers in mqd initAlex Deucher
[ Upstream commit fc3c139cf0432b79fd08e23100a559ee51cd0be4 ] Just use the default values. There's not need to get the value from hardware and it could cause problems if we do that at runtime and gfxoff is active. Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-29drm/amdgpu: adjust drm_firmware_drivers_only() handlingAlex Deucher
[ Upstream commit e00e5c223878a60e391e5422d173c3382d378f87 ] Move to probe so we can check the PCI device type and only apply the drm_firmware_drivers_only() check for PCI DISPLAY classes. Also add a module parameter to override the nomodeset kernel parameter as a workaround for platforms that have this hardcoded on their kernel command lines. Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>