summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)Author
2025-04-10drm/amdgpu/umsch: fix ucode checkAlex Deucher
[ Upstream commit c917e39cbdcd9fff421184db6cc461cc58d52c17 ] Return an error if the IP version doesn't match otherwise we end up passing a NULL string to amdgpu_ucode_request. We should never hit this in practice today since we only enable the umsch code on the supported IP versions, but add a check to be safe. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202502130406.iWQ0eBug-lkp@intel.com/ Fixes: 020620424b27 ("drm/amd: Use a constant format string for amdgpu_ucode_request") Reviewed-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-28drm/amdgpu: Fix JPEG video caps max size for navi1x and ravenDavid Rosca
commit ec33964d9d88488fa954a03d476a8b811efc6e85 upstream. 8192x8192 is the maximum supported resolution. Signed-off-by: David Rosca <david.rosca@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6e0d2fde3ae8fdb5b47e10389f23ed2cb4daec5d) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-28drm/amdgpu: Fix MPEG2, MPEG4 and VC1 video caps max sizeDavid Rosca
commit f0105e173103c9d30a2bb959f7399437d536c848 upstream. 1920x1088 is the maximum supported resolution. Signed-off-by: David Rosca <david.rosca@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1a0807feb97082bff2b1342dbbe55a2a9a8bdb88) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-28drm/amdgpu: Remove JPEG from vega and carrizo video capsDavid Rosca
commit 7fc0765208502e53297ce72c49ca43729f9d6ff3 upstream. JPEG is only supported for VCN1+. Signed-off-by: David Rosca <david.rosca@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0a6e7b06bdbead2e43d56a2274b7e0c9c86d536e) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-28drm/amdgpu: Restore uncached behaviour on GFX12David Belanger
commit 35b6162bb790555ad56b7f0d120e307b8334d778 upstream. Always use MTYPE_UC if UNCACHED flag is specified. This makes kernarg region uncached and it restores usermode cache disable debug flag functionality. Do not set MTYPE_UC for COHERENT flag, on GFX12 coherence is handled by shader code. Signed-off-by: David Belanger <david.belanger@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit eb6cdfb807d038d9b9986b5c87188f28a4071eae) Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-28drm/amdgpu/gfx12: correct cleanup of 'me' field with gfx_v12_0_me_fini()Wentao Liang
commit 86730b5261d4d8dae3f5b97709d40d694ecf1ddf upstream. In gfx_v12_0_cp_gfx_load_me_microcode_rs64(), gfx_v12_0_pfp_fini() is incorrectly used to free 'me' field of 'gfx', since gfx_v12_0_pfp_fini() can only release 'pfp' field of 'gfx'. The release function of 'me' field should be gfx_v12_0_me_fini(). Fixes: 52cb80c12e8a ("drm/amdgpu: Add gfx v12_0 ip block support (v6)") Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ebdc52607a46cda08972888178c6aa9cd6965141) Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-22drm/amdgpu: NULL-check BO's backing store when determining GFX12 PTE flagsNatalie Vock
commit 6cc30748e17ea2a64051ceaf83a8372484e597f1 upstream. PRT BOs may not have any backing store, so bo->tbo.resource will be NULL. Check for that before dereferencing. Fixes: 0cce5f285d9a ("drm/amdkfd: Check correct memory types for is_system variable") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Natalie Vock <natalie.vock@gmx.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 3e3fcd29b505cebed659311337ea03b7698767fc) Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-07drm/amdgpu: init return value in amdgpu_ttm_clear_bufferPierre-Eric Pelloux-Prayer
commit d3c7059b6a8600fc62cd863f1ea203b8675e63e1 upstream. Otherwise an uninitialized value can be returned if amdgpu_res_cleared returns true for all regions. Possibly closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3812 Fixes: a68c7eaa7a8f ("drm/amdgpu: Enable clear page functionality") Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 7c62aacc3b452f73a1284198c81551035fac6d71) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-07drm/amdgpu: disable BAR resize on Dell G5 SEAlex Deucher
commit 099bffc7cadff40bfab1517c3461c53a7a38a0d7 upstream. There was a quirk added to add a workaround for a Sapphire RX 5600 XT Pulse that didn't allow BAR resizing. However, the quirk caused a regression with runtime pm on Dell laptops using those chips, rather than narrowing the scope of the resizing quirk, add a quirk to prevent amdgpu from resizing the BAR on those Dell platforms unless runtime pm is disabled. v2: update commit message, add runpm check Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1707 Fixes: 907830b0fc9e ("PCI: Add a REBAR size quirk for Sapphire RX 5600 XT Pulse") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5235053f443cef4210606e5fb71f99b915a9723d) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27drm/amdgpu: bump version for RV/PCO compute fixAlex Deucher
commit 55ed2b1b50d029dd7e49a35f6628ca64db6d75d8 upstream. Bump the driver version for RV/PCO compute stability fix so mesa can use this check to enable compute queues on RV/PCO. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27drm/amdgpu/gfx9: manually control gfxoff for CS on RVAlex Deucher
commit b35eb9128ebeec534eed1cefd6b9b1b7282cf5ba upstream. When mesa started using compute queues more often we started seeing additional hangs with compute queues. Disabling gfxoff seems to mitigate that. Manually control gfxoff and gfx pg with command submissions to avoid any issues related to gfxoff. KFD already does the same thing for these chips. v2: limit to compute v3: limit to APUs v4: limit to Raven/PCO v5: only update the compute ring_funcs v6: Disable GFX PG v7: adjust order Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Suggested-by: Błażej Szczygieł <mumei6102@gmail.com> Suggested-by: Sergey Kovalenko <seryoga.engineering@gmail.com> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3861 Link: https://lists.freedesktop.org/archives/amd-gfx/2025-January/119116.html Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-21drm/amdgpu: bail out when failed to load fw in psp_init_cap_microcode()Jiang Liu
[ Upstream commit a0a455b4bc7483ad60e8b8a50330c1e05bb7bfcf ] In function psp_init_cap_microcode(), it should bail out when failed to load firmware, otherwise it may cause invalid memory access. Fixes: 07dbfc6b102e ("drm/amd: Use `amdgpu_ucode_*` helpers for PSP") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jiang Liu <gerry@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-17drm/amdgpu: add a BO metadata flag to disable write compression for VulkanMarek Olšák
commit 2255b40cacc2e5ef1b127770fc1808c60de4a2fc upstream. Vulkan can't support DCC and Z/S compression on GFX12 without WRITE_COMPRESS_DISABLE in this commit or a completely different DCC interface. AMDGPU_TILING_GFX12_SCANOUT is added because it's already used by userspace. Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17drm/amd/amdgpu: change the config of cgcg on gfx12Kenneth Feng
commit 5cda56bd86c455341087dca29c65dc7c87f84340 upstream. change the config of cgcg on gfx12 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17drm/amdgpu: Fix Circular Locking Dependency in AMDGPU GFX IsolationSrinivasan Shanmugam
[ Upstream commit 1e8c193f8ca7ab7dff4f4747b45a55dca23c00f4 ] This commit addresses a circular locking dependency issue within the GFX isolation mechanism. The problem was identified by a warning indicating a potential deadlock due to inconsistent lock acquisition order. - The `amdgpu_gfx_enforce_isolation_ring_begin_use` and `amdgpu_gfx_enforce_isolation_ring_end_use` functions previously acquired `enforce_isolation_mutex` and called `amdgpu_gfx_kfd_sch_ctrl`, leading to potential deadlocks. ie., If `amdgpu_gfx_kfd_sch_ctrl` is called while `enforce_isolation_mutex` is held, and `amdgpu_gfx_enforce_isolation_handler` is called while `kfd_sch_mutex` is held, it can create a circular dependency. By ensuring consistent lock usage, this fix resolves the issue: [ 606.297333] ====================================================== [ 606.297343] WARNING: possible circular locking dependency detected [ 606.297353] 6.10.0-amd-mlkd-610-311224-lof #19 Tainted: G OE [ 606.297365] ------------------------------------------------------ [ 606.297375] kworker/u96:3/3825 is trying to acquire lock: [ 606.297385] ffff9aa64e431cb8 ((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)){+.+.}-{0:0}, at: __flush_work+0x232/0x610 [ 606.297413] but task is already holding lock: [ 606.297423] ffff9aa64e432338 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}, at: amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu] [ 606.297725] which lock already depends on the new lock. [ 606.297738] the existing dependency chain (in reverse order) is: [ 606.297749] -> #2 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}: [ 606.297765] __mutex_lock+0x85/0x930 [ 606.297776] mutex_lock_nested+0x1b/0x30 [ 606.297786] amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu] [ 606.298007] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu] [ 606.298225] amdgpu_ring_alloc+0x48/0x70 [amdgpu] [ 606.298412] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu] [ 606.298603] amdgpu_job_run+0xac/0x1e0 [amdgpu] [ 606.298866] drm_sched_run_job_work+0x24f/0x430 [gpu_sched] [ 606.298880] process_one_work+0x21e/0x680 [ 606.298890] worker_thread+0x190/0x350 [ 606.298899] kthread+0xe7/0x120 [ 606.298908] ret_from_fork+0x3c/0x60 [ 606.298919] ret_from_fork_asm+0x1a/0x30 [ 606.298929] -> #1 (&adev->enforce_isolation_mutex){+.+.}-{3:3}: [ 606.298947] __mutex_lock+0x85/0x930 [ 606.298956] mutex_lock_nested+0x1b/0x30 [ 606.298966] amdgpu_gfx_enforce_isolation_handler+0x87/0x370 [amdgpu] [ 606.299190] process_one_work+0x21e/0x680 [ 606.299199] worker_thread+0x190/0x350 [ 606.299208] kthread+0xe7/0x120 [ 606.299217] ret_from_fork+0x3c/0x60 [ 606.299227] ret_from_fork_asm+0x1a/0x30 [ 606.299236] -> #0 ((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)){+.+.}-{0:0}: [ 606.299257] __lock_acquire+0x16f9/0x2810 [ 606.299267] lock_acquire+0xd1/0x300 [ 606.299276] __flush_work+0x250/0x610 [ 606.299286] cancel_delayed_work_sync+0x71/0x80 [ 606.299296] amdgpu_gfx_kfd_sch_ctrl+0x287/0x4d0 [amdgpu] [ 606.299509] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu] [ 606.299723] amdgpu_ring_alloc+0x48/0x70 [amdgpu] [ 606.299909] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu] [ 606.300101] amdgpu_job_run+0xac/0x1e0 [amdgpu] [ 606.300355] drm_sched_run_job_work+0x24f/0x430 [gpu_sched] [ 606.300369] process_one_work+0x21e/0x680 [ 606.300378] worker_thread+0x190/0x350 [ 606.300387] kthread+0xe7/0x120 [ 606.300396] ret_from_fork+0x3c/0x60 [ 606.300406] ret_from_fork_asm+0x1a/0x30 [ 606.300416] other info that might help us debug this: [ 606.300428] Chain exists of: (work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work) --> &adev->enforce_isolation_mutex --> &adev->gfx.kfd_sch_mutex [ 606.300458] Possible unsafe locking scenario: [ 606.300468] CPU0 CPU1 [ 606.300476] ---- ---- [ 606.300484] lock(&adev->gfx.kfd_sch_mutex); [ 606.300494] lock(&adev->enforce_isolation_mutex); [ 606.300508] lock(&adev->gfx.kfd_sch_mutex); [ 606.300521] lock((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)); [ 606.300536] *** DEADLOCK *** [ 606.300546] 5 locks held by kworker/u96:3/3825: [ 606.300555] #0: ffff9aa5aa1f5d58 ((wq_completion)comp_1.1.0){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680 [ 606.300577] #1: ffffaa53c3c97e40 ((work_completion)(&sched->work_run_job)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680 [ 606.300600] #2: ffff9aa64e463c98 (&adev->enforce_isolation_mutex){+.+.}-{3:3}, at: amdgpu_gfx_enforce_isolation_ring_begin_use+0x1c3/0x5d0 [amdgpu] [ 606.300837] #3: ffff9aa64e432338 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}, at: amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu] [ 606.301062] #4: ffffffff8c1a5660 (rcu_read_lock){....}-{1:2}, at: __flush_work+0x70/0x610 [ 606.301083] stack backtrace: [ 606.301092] CPU: 14 PID: 3825 Comm: kworker/u96:3 Tainted: G OE 6.10.0-amd-mlkd-610-311224-lof #19 [ 606.301109] Hardware name: Gigabyte Technology Co., Ltd. X570S GAMING X/X570S GAMING X, BIOS F7 03/22/2024 [ 606.301124] Workqueue: comp_1.1.0 drm_sched_run_job_work [gpu_sched] [ 606.301140] Call Trace: [ 606.301146] <TASK> [ 606.301154] dump_stack_lvl+0x9b/0xf0 [ 606.301166] dump_stack+0x10/0x20 [ 606.301175] print_circular_bug+0x26c/0x340 [ 606.301187] check_noncircular+0x157/0x170 [ 606.301197] ? register_lock_class+0x48/0x490 [ 606.301213] __lock_acquire+0x16f9/0x2810 [ 606.301230] lock_acquire+0xd1/0x300 [ 606.301239] ? __flush_work+0x232/0x610 [ 606.301250] ? srso_alias_return_thunk+0x5/0xfbef5 [ 606.301261] ? mark_held_locks+0x54/0x90 [ 606.301274] ? __flush_work+0x232/0x610 [ 606.301284] __flush_work+0x250/0x610 [ 606.301293] ? __flush_work+0x232/0x610 [ 606.301305] ? __pfx_wq_barrier_func+0x10/0x10 [ 606.301318] ? mark_held_locks+0x54/0x90 [ 606.301331] ? srso_alias_return_thunk+0x5/0xfbef5 [ 606.301345] cancel_delayed_work_sync+0x71/0x80 [ 606.301356] amdgpu_gfx_kfd_sch_ctrl+0x287/0x4d0 [amdgpu] [ 606.301661] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu] [ 606.302050] ? srso_alias_return_thunk+0x5/0xfbef5 [ 606.302069] amdgpu_ring_alloc+0x48/0x70 [amdgpu] [ 606.302452] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu] [ 606.302862] ? drm_sched_entity_error+0x82/0x190 [gpu_sched] [ 606.302890] amdgpu_job_run+0xac/0x1e0 [amdgpu] [ 606.303366] drm_sched_run_job_work+0x24f/0x430 [gpu_sched] [ 606.303388] process_one_work+0x21e/0x680 [ 606.303409] worker_thread+0x190/0x350 [ 606.303424] ? __pfx_worker_thread+0x10/0x10 [ 606.303437] kthread+0xe7/0x120 [ 606.303449] ? __pfx_kthread+0x10/0x10 [ 606.303463] ret_from_fork+0x3c/0x60 [ 606.303476] ? __pfx_kthread+0x10/0x10 [ 606.303489] ret_from_fork_asm+0x1a/0x30 [ 606.303512] </TASK> v2: Refactor lock handling to resolve circular dependency (Alex) - Introduced a `sched_work` flag to defer the call to `amdgpu_gfx_kfd_sch_ctrl` until after releasing `enforce_isolation_mutex`. - This change ensures that `amdgpu_gfx_kfd_sch_ctrl` is called outside the critical section, preventing the circular dependency and deadlock. - The `sched_work` flag is set within the mutex-protected section if conditions are met, and the actual function call is made afterward. - This approach ensures consistent lock acquisition order. Fixes: afefd6f24502 ("drm/amdgpu: Implement Enforce Isolation Handler for KGD/KFD serialization") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-17drm/amdgpu: Don't enable sdma 4.4.5 CTXEMPTY interruptPhilip Yang
[ Upstream commit b4b7271e5ca95b581f2fcc4ae852c4079215e92d ] The sdma context empty interrupt is dropped in amdgpu_irq_dispatch as unregistered interrupt src_id 243, this interrupt accounts to 1/3 of total interrupts and causes IH primary ring overflow when running stressful benchmark application. Disable this interrupt has no side effect found. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-08drm/amdgpu: fix gpu recovery disable with per queue resetJonathan Kim
[ Upstream commit 86bde64cb7957be393f84e5d35fb8dfc91e4ae7e ] Per queue reset should be bypassed when gpu recovery is disabled with module parameter. Fixes: ee0a469cf917 ("drm/amdkfd: support per-queue reset on gfx9") Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-08Revert "drm/amdgpu/gfx9: put queue resets behind a debug option"Alex Deucher
[ Upstream commit 32f00289698189b813942f37626218fd473e7302 ] This reverts commit 7c1a2d8aba6cadde0cc542b2d805edc0be667e79. Extended validation has completed successfully, so enable these features by default. Acked-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Jonathan Kim <jonathan.kim@amd.com> Cc: Jiadong Zhu <Jiadong.Zhu@amd.com> Stable-dep-of: 86bde64cb795 ("drm/amdgpu: fix gpu recovery disable with per queue reset") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-08drm/amdgpu: tear down ttm range manager for doorbell in amdgpu_ttm_fini()Jiang Liu
[ Upstream commit 60a2c0c12b644450e420ffc42291d1eb248bacb7 ] Tear down ttm range manager for doorbell in function amdgpu_ttm_fini(), to avoid memory leakage. Fixes: 792b84fb9038 ("drm/amdgpu: initialize ttm for doorbells") Signed-off-by: Jiang Liu <gerry@linux.alibaba.com> Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-08drm/amdgpu/vcn: reset fw_shared under SRIOVBokun Zhang
[ Upstream commit 3676f37a88432132bcff55a17dc48911239b6d98 ] - The previous patch only considered the case for baremetal and is not applicable for SRIOV code path. We also need to init fw_share for SRIOV VF Fixes: 928cd772e18f ("drm/amdgpu/vcn: reset fw_shared when VCPU buffers corrupted on vcn v4.0.3") Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Bokun Zhang <bokun.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-23drm/amdgpu: always sync the GFX pipe on ctx switchChristian König
commit af04b320c71c4b59971f021615876808a36e5038 upstream. That is needed to enforce isolation between contexts. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit def59436fb0d3ca0f211d14873d0273d69ebb405) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-23drm/amdgpu: disable gfxoff with the compute workload on gfx12Kenneth Feng
commit 90505894c4ed581318836b792c57723df491cb91 upstream. Disable gfxoff with the compute workload on gfx12. This is a workaround for the opencl test failure. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2affe2bbc997b3920045c2c434e480c81a5f9707) Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-23drm/amdgpu: fix fw attestation for MP0_14_0_{2/3}Gui Chengming
commit bd275e6cfc972329d39c6406a3c6d2ba2aba7db6 upstream. FW attestation was disabled on MP0_14_0_{2/3}. V2: Move check into is_fw_attestation_support func. (Frank) Remove DRM_WARN log info. (Alex) Fix format. (Christian) Signed-off-by: Gui Chengming <Jack.Gui@amd.com> Reviewed-by: Frank.Min <Frank.Min@amd.com> Reviewed-by: Christian König <Christian.Koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 62952a38d9bcf357d5ffc97615c48b12c9cd627c) Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-17drm/amdgpu: Add a lock when accessing the buddy trim functionArunpravin Paneer Selvam
commit 75c8b703e5bded1e33b08fb09b829e7c2c1ed50a upstream. When running YouTube videos and Steam games simultaneously, the tester found a system hang / race condition issue with the multi-display configuration setting. Adding a lock to the buddy allocator's trim function would be the solution. <log snip> [ 7197.250436] general protection fault, probably for non-canonical address 0xdead000000000108 [ 7197.250447] RIP: 0010:__alloc_range+0x8b/0x340 [amddrm_buddy] [ 7197.250470] Call Trace: [ 7197.250472] <TASK> [ 7197.250475] ? show_regs+0x6d/0x80 [ 7197.250481] ? die_addr+0x37/0xa0 [ 7197.250483] ? exc_general_protection+0x1db/0x480 [ 7197.250488] ? drm_suballoc_new+0x13c/0x93d [drm_suballoc_helper] [ 7197.250493] ? asm_exc_general_protection+0x27/0x30 [ 7197.250498] ? __alloc_range+0x8b/0x340 [amddrm_buddy] [ 7197.250501] ? __alloc_range+0x109/0x340 [amddrm_buddy] [ 7197.250506] amddrm_buddy_block_trim+0x1b5/0x260 [amddrm_buddy] [ 7197.250511] amdgpu_vram_mgr_new+0x4f5/0x590 [amdgpu] [ 7197.250682] amdttm_resource_alloc+0x46/0xb0 [amdttm] [ 7197.250689] ttm_bo_alloc_resource+0xe4/0x370 [amdttm] [ 7197.250696] amdttm_bo_validate+0x9d/0x180 [amdttm] [ 7197.250701] amdgpu_bo_pin+0x15a/0x2f0 [amdgpu] [ 7197.250831] amdgpu_dm_plane_helper_prepare_fb+0xb2/0x360 [amdgpu] [ 7197.251025] ? try_wait_for_completion+0x59/0x70 [ 7197.251030] drm_atomic_helper_prepare_planes.part.0+0x2f/0x1e0 [ 7197.251035] drm_atomic_helper_prepare_planes+0x5d/0x70 [ 7197.251037] drm_atomic_helper_commit+0x84/0x160 [ 7197.251040] drm_atomic_nonblocking_commit+0x59/0x70 [ 7197.251043] drm_mode_atomic_ioctl+0x720/0x850 [ 7197.251047] ? __pfx_drm_mode_atomic_ioctl+0x10/0x10 [ 7197.251049] drm_ioctl_kernel+0xb9/0x120 [ 7197.251053] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7197.251056] drm_ioctl+0x2d4/0x550 [ 7197.251058] ? __pfx_drm_mode_atomic_ioctl+0x10/0x10 [ 7197.251063] amdgpu_drm_ioctl+0x4e/0x90 [amdgpu] [ 7197.251186] __x64_sys_ioctl+0xa0/0xf0 [ 7197.251190] x64_sys_call+0x143b/0x25c0 [ 7197.251193] do_syscall_64+0x7f/0x180 [ 7197.251197] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7197.251199] ? amdgpu_display_user_framebuffer_create+0x215/0x320 [amdgpu] [ 7197.251329] ? drm_internal_framebuffer_create+0xb7/0x1a0 [ 7197.251332] ? srso_alias_return_thunk+0x5/0xfbef5 Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Fixes: 4a5ad08f5377 ("drm/amdgpu: Add address alignment support to DCC buffers") Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 3318ba94e56b9183d0304577c74b33b6b01ce516) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-09drm/amdgpu: use sjt mec fw on gfx943 for sriovVictor Zhao
[ Upstream commit 9a4ab400f1fad0e6e8686b8f5fc5376383860ce8 ] Use second jump table in sriov for live migration or mulitple VF support so different VF can load different version of MEC as long as they support sjt Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-09drm/amdgpu: fix backport of commit 73dae652dcacAlex Deucher
Commit 73dae652dcac ("drm/amdgpu: rework resume handling for display (v2)") missed a small code change when it was backported resulting in an automatic backlight control breakage. Fix the backport. Note that this patch is not in Linus' tree as it is not required there; the bug was introduced in the backport. Fixes: 99a02eab8251 ("drm/amdgpu: rework resume handling for display (v2)") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3853 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.11.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-27drm/amdgpu/mmhub4.1: fix IP version checkAlex Deucher
commit 6ebc5b92190e01dd48313b68cbf752c9adcfefa8 upstream. Use the helper function rather than reading it directly. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 63bfd24088b42c6f55c2096bfc41b50213d419b2) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-27drm/amdgpu/gfx12: fix IP version checkAlex Deucher
commit 41be00f839e9ee7753892a73a36ce4c14c6f5cbf upstream. Use the helper function rather than reading it directly. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f1fd1d0f40272948aa6ab82a3a82ecbbc76dff53) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-27drm/amdgpu/nbio7.0: fix IP version checkAlex Deucher
commit 3abb660f9e18925468685591a3702bda05faba4f upstream. Use the helper function rather than reading it directly. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0ec43fbece784215d3c4469973e4556d70bce915) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-27drm/amdgpu/nbio7.7: fix IP version checkAlex Deucher
commit 458600da793da12e0f3724ecbea34a80703f4d5b upstream. Use the helper function rather than reading it directly. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 22b9555bc90df22b585bdd1f161b61584b13af51) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-27drm/amdgpu/nbio7.11: fix IP version checkAlex Deucher
commit 8c1ecc7197a88c6ae62de56e1c0887f220712a32 upstream. Use the helper function rather than reading it directly. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2c8eeaaa0fe5841ccf07a0eb51b1426f34ef39f7) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-27drm/amdgpu: don't access invalid schedPierre-Eric Pelloux-Prayer
[ Upstream commit a93b1020eb9386d7da11608477121b10079c076a ] Since 2320c9e6a768 ("drm/sched: memset() 'job' in drm_sched_job_init()") accessing job->base.sched can produce unexpected results as the initialisation of (*job)->base.sched done in amdgpu_job_alloc is overwritten by the memset. This commit fixes an issue when a CS would fail validation and would be rejected after job->num_ibs is incremented. In this case, amdgpu_ib_free(ring->adev, ...) will be called, which would crash the machine because the ring value is bogus. To fix this, pass a NULL pointer to amdgpu_ib_free(): we can do this because the device is actually not used in this function. The next commit will remove the ring argument completely. Fixes: 2320c9e6a768 ("drm/sched: memset() 'job' in drm_sched_job_init()") Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2ae520cb12831d264ceb97c61f72c59d33c0dbd7) Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-27drm/amdgpu: Handle NULL bo->tbo.resource (again) in amdgpu_vm_bo_updateMichel Dänzer
commit 85230ee36d88e7a09fb062d43203035659dd10a5 upstream. Third time's the charm, I hope? Fixes: d3116756a710 ("drm/ttm: rename bo->mem and make it a pointer") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3837 Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 695c2c745e5dff201b75da8a1d237ce403600d04) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-27drm/amdgpu: fix amdgpu_coredumpChristian König
commit 8d1a13816e59254bd3b18f5ae0895230922bd120 upstream. The VM pointer might already be outdated when that function is called. Use the PASID instead to gather the information instead. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 57f812d171af4ba233d3ed7c94dfa5b8e92dcc04) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-27drm/amd: Update strapping for NBIO 2.5.0Mario Limonciello
commit a7f9d98eb1202132014ba760c26ad8608ffc9caf upstream. This helps to avoid a spurious PME event on hotplug to Azalia. Cc: Vijendar Mukunda <Vijendar.Mukunda@amd.com> Reported-and-tested-by: ionut_n2001@yahoo.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215884 Tested-by: Gabriel Marcano <gabemarcano@yahoo.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20241211024414.7840-1-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 3f6f237b9dd189e1fb85b8a3f7c97a8f27c1e49a) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-19amdgpu/uvd: get ring reference from rq schedulerDavid (Ming Qiang) Wu
[ Upstream commit 47f402a3e08113e0f5d8e1e6fcc197667a16022f ] base.sched may not be set for each instance and should not be used for cases such as non-IB tests. Fixes: 2320c9e6a768 ("drm/sched: memset() 'job' in drm_sched_job_init()") Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-19drm/amdgpu: fix when the cleaner shader is emittedChristian König
commit f4df208177d02f1c90f3644da3a2453080b8c24f upstream. Emitting the cleaner shader must come after the check if a VM switch is necessary or not. Otherwise we will emit the cleaner shader every time and not just when it is necessary because we switched between applications. This can otherwise crash on gang submit and probably decreases performance quite a bit. v2: squash in fix from Srini (Alex) Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: ee7a846ea27b ("drm/amdgpu: Emit cleaner shader at end of IB submission") Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-19drm/amdgpu: fix UVD contiguous CS mapping problemChristian König
commit 12f325bcd2411e571dbb500bf6862c812c479735 upstream. When starting the mpv player, Radeon R9 users are observing the below error in dmesg. [drm:amdgpu_uvd_cs_pass2 [amdgpu]] *ERROR* msg/fb buffer ff00f7c000-ff00f7e000 out of 256MB segment! The patch tries to set the TTM_PL_FLAG_CONTIGUOUS for both user flag(AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS) set and not set cases. v2: Make the TTM_PL_FLAG_CONTIGUOUS mandatory for user BO's. v3: revert back to v1, but fix the check instead (chk). Closes:https://gitlab.freedesktop.org/drm/amd/-/issues/3599 Closes:https://gitlab.freedesktop.org/drm/amd/-/issues/3501 Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.10+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-14drm/amdgpu: rework resume handling for display (v2)Alex Deucher
commit 73dae652dcac776296890da215ee7dec357a1032 upstream. Split resume into a 3rd step to handle displays when DCC is enabled on DCN 4.0.1. Move display after the buffer funcs have been re-enabled so that the GPU will do the move and properly set the DCC metadata for DCN. v2: fix fence irq resume ordering Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.11.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-14drm/amdgpu/vcn: reset fw_shared when VCPU buffers corrupted on vcn v4.0.3Xiang Liu
[ Upstream commit 928cd772e18ffbd7723cb2361db4a8ccf2222235 ] It is not necessarily corrupted. When there is RAS fatal error, device memory access is blocked. Hence vcpu bo cannot be saved to system memory as in a regular suspend sequence before going for reset. In other full device reset cases, that gets saved and restored during resume. v2: Remove redundant code like vcn_v4_0 did v2: Refine commit message v3: Drop the volatile v3: Refine commit message Signed-off-by: Xiang Liu <xiang.liu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14drm/amdgpu: set the right AMDGPU sg segment limitationPrike Liang
[ Upstream commit e2e97435783979124ba92d6870415c57ecfef6a5 ] The driver needs to set the correct max_segment_size; otherwise debug_dma_map_sg() will complain about the over-mapping of the AMDGPU sg length as following: WARNING: CPU: 6 PID: 1964 at kernel/dma/debug.c:1178 debug_dma_map_sg+0x2dc/0x370 [ 364.049444] Modules linked in: veth amdgpu(OE) amdxcp drm_exec gpu_sched drm_buddy drm_ttm_helper ttm(OE) drm_suballoc_helper drm_display_helper drm_kms_helper i2c_algo_bit rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace netfs xt_conntrack xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo iptable_nat xt_addrtype iptable_filter br_netfilter nvme_fabrics overlay nfnetlink_cttimeout nfnetlink openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c bridge stp llc amd_atl intel_rapl_msr intel_rapl_common sunrpc sch_fq_codel snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg edac_mce_amd binfmt_misc snd_hda_codec snd_pci_acp6x snd_hda_core snd_acp_config snd_hwdep snd_soc_acpi kvm_amd snd_pcm kvm snd_seq_midi snd_seq_midi_event crct10dif_pclmul ghash_clmulni_intel sha512_ssse3 snd_rawmidi sha256_ssse3 sha1_ssse3 aesni_intel snd_seq nls_iso8859_1 crypto_simd snd_seq_device cryptd snd_timer rapl input_leds snd [ 364.049532] ipmi_devintf wmi_bmof ccp serio_raw k10temp sp5100_tco soundcore ipmi_msghandler cm32181 industrialio mac_hid msr parport_pc ppdev lp parport drm efi_pstore ip_tables x_tables pci_stub crc32_pclmul nvme ahci libahci i2c_piix4 r8169 nvme_core i2c_designware_pci realtek i2c_ccgx_ucsi video wmi hid_generic cdc_ether usbnet usbhid hid r8152 mii [ 364.049576] CPU: 6 PID: 1964 Comm: rocminfo Tainted: G OE 6.10.0-custom #492 [ 364.049579] Hardware name: AMD Majolica-RN/Majolica-RN, BIOS RMJ1009A 06/13/2021 [ 364.049582] RIP: 0010:debug_dma_map_sg+0x2dc/0x370 [ 364.049585] Code: 89 4d b8 e8 36 b1 86 00 8b 4d b8 48 8b 55 b0 44 8b 45 a8 4c 8b 4d a0 48 89 c6 48 c7 c7 00 4b 74 bc 4c 89 4d b8 e8 b4 73 f3 ff <0f> 0b 4c 8b 4d b8 8b 15 c8 2c b8 01 85 d2 0f 85 ee fd ff ff 8b 05 [ 364.049588] RSP: 0018:ffff9ca600b57ac0 EFLAGS: 00010286 [ 364.049590] RAX: 0000000000000000 RBX: ffff88b7c132b0c8 RCX: 0000000000000027 [ 364.049592] RDX: ffff88bb0f521688 RSI: 0000000000000001 RDI: ffff88bb0f521680 [ 364.049594] RBP: ffff9ca600b57b20 R08: 000000000000006f R09: ffff9ca600b57930 [ 364.049596] R10: ffff9ca600b57928 R11: ffffffffbcb46328 R12: 0000000000000000 [ 364.049597] R13: 0000000000000001 R14: ffff88b7c19c0700 R15: ffff88b7c9059800 [ 364.049599] FS: 00007fb2d3516e80(0000) GS:ffff88bb0f500000(0000) knlGS:0000000000000000 [ 364.049601] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 364.049603] CR2: 000055610bd03598 CR3: 00000001049f6000 CR4: 0000000000350ef0 [ 364.049605] Call Trace: [ 364.049607] <TASK> [ 364.049609] ? show_regs+0x6d/0x80 [ 364.049614] ? __warn+0x8c/0x140 [ 364.049618] ? debug_dma_map_sg+0x2dc/0x370 [ 364.049621] ? report_bug+0x193/0x1a0 [ 364.049627] ? handle_bug+0x46/0x80 [ 364.049631] ? exc_invalid_op+0x1d/0x80 [ 364.049635] ? asm_exc_invalid_op+0x1f/0x30 [ 364.049642] ? debug_dma_map_sg+0x2dc/0x370 [ 364.049647] __dma_map_sg_attrs+0x90/0xe0 [ 364.049651] dma_map_sgtable+0x25/0x40 [ 364.049654] amdgpu_bo_move+0x59a/0x850 [amdgpu] [ 364.049935] ? srso_return_thunk+0x5/0x5f [ 364.049939] ? amdgpu_ttm_tt_populate+0x5d/0xc0 [amdgpu] [ 364.050095] ttm_bo_handle_move_mem+0xc3/0x180 [ttm] [ 364.050103] ttm_bo_validate+0xc1/0x160 [ttm] [ 364.050108] ? amdgpu_ttm_tt_get_user_pages+0xe5/0x1b0 [amdgpu] [ 364.050263] amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0xa12/0xc90 [amdgpu] [ 364.050473] kfd_ioctl_alloc_memory_of_gpu+0x16b/0x3b0 [amdgpu] [ 364.050680] kfd_ioctl+0x3c2/0x530 [amdgpu] [ 364.050866] ? __pfx_kfd_ioctl_alloc_memory_of_gpu+0x10/0x10 [amdgpu] [ 364.051054] ? srso_return_thunk+0x5/0x5f [ 364.051057] ? tomoyo_file_ioctl+0x20/0x30 [ 364.051063] __x64_sys_ioctl+0x9c/0xd0 [ 364.051068] x64_sys_call+0x1219/0x20d0 [ 364.051073] do_syscall_64+0x51/0x120 [ 364.051077] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 364.051081] RIP: 0033:0x7fb2d2f1a94f Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14drm/amdgpu: skip amdgpu_device_cache_pci_state under sriovVictor Zhao
[ Upstream commit afe260df55ac280cd56306248cb6d8a6b0db095c ] Under sriov, host driver will save and restore vf pci cfg space during reset. And during device init, under sriov, pci_restore_state happens after fullaccess released, and it can have race condition with mmio protection enable from host side leading to missing interrupts. So skip amdgpu_device_cache_pci_state for sriov. Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14drm/amdgpu: refine error handling in amdgpu_ttm_tt_pin_userptrLang Yu
[ Upstream commit 46186667f98fb7158c98f4ff5da62c427761ffcd ] Free sg table when dma_map_sgtable() failed to avoid memory leak. Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14drm/amdgpu: Dereference the ATCS ACPI bufferPrike Liang
[ Upstream commit 32e7ee293ff476c67b51be006e986021967bc525 ] Need to dereference the atcs acpi buffer after the method is executed, otherwise it will result in a memory leak. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14drm/amdgpu: clear RB_OVERFLOW bit when enabling interrupts for vega20_ihVictor Lu
[ Upstream commit 8b22f048331dfd45fdfbf0efdfb1d43deff7518d ] Port this change to vega20_ih.c: commit afbf7955ff01 ("drm/amdgpu: clear RB_OVERFLOW bit when enabling interrupts") Original commit message: "Why: Setting IH_RB_WPTR register to 0 will not clear the RB_OVERFLOW bit if RB_ENABLE is not set. How to fix: Set WPTR_OVERFLOW_CLEAR bit after RB_ENABLE bit is set. The RB_ENABLE bit is required to be set, together with WPTR_OVERFLOW_ENABLE bit so that setting WPTR_OVERFLOW_CLEAR bit would clear the RB_OVERFLOW." Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14drm/amdgpu/gfx9: Add cleaner shader for GFX9.4.2Srinivasan Shanmugam
[ Upstream commit 9343b904e7198e4804685133327dece7fe709bc1 ] This commit adds the cleaner shader microcode for GFX9.4.2 GPUs. The cleaner shader is a piece of GPU code that is used to clear or initialize certain GPU resources, such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). Clearing these resources is important for ensuring data isolation between different workloads running on the GPU. Without the cleaner shader, residual data from a previous workload could potentially be accessed by a subsequent workload, leading to data leaks and incorrect computation results. The cleaner shader microcode is represented as an array of 32-bit words (`gfx_9_4_2_cleaner_shader_hex`). This array is the binary representation of the cleaner shader code, which is written in a low-level GPU instruction set. Also, this patch updates the `gfx_v9_0_sw_init` function to initialize the cleaner shader if the MEC firmware version is 88 or higher. It sets the `cleaner_shader_ptr` and `cleaner_shader_size` to the appropriate values and attempts to initialize the cleaner shader. When the cleaner shader feature is enabled, the AMDGPU driver loads this array into a specific location in the GPU memory. The GPU then reads this memory location to fetch and execute the cleaner shader instructions. The cleaner shader is executed automatically by the GPU at the end of each workload, before the next workload starts. This ensures that all GPU resources are in a clean state before the start of each workload. This change ensures that the GPU memory is properly cleared between different processes, preventing data leakage and enhancing security. It also aligns with the serialization mechanism between KGD and KFD, ensuring that the GPU state is consistent across different workloads. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14drm/amdgpu/hdp5.2: do a posting read when flushing HDPAlex Deucher
commit f756dbac1ce1d5f9a2b35e3b55fa429cf6336437 upstream. Need to read back to make sure the write goes through. Cc: David Belanger <david.belanger@amd.com> Reviewed-by: Frank Min <frank.min@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-14drm/amdgpu/hdp7.0: do a posting read when flushing HDPAlex Deucher
commit 689275140cb8e9f8ae59e545086fce51fb0b994a upstream. Need to read back to make sure the write goes through. Cc: David Belanger <david.belanger@amd.com> Reviewed-by: Frank Min <frank.min@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-14drm/amdgpu/hdp5.0: do a posting read when flushing HDPAlex Deucher
commit cf424020e040be35df05b682b546b255e74a420f upstream. Need to read back to make sure the write goes through. Cc: David Belanger <david.belanger@amd.com> Reviewed-by: Frank Min <frank.min@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-14drm/amdgpu/hdp4.0: do a posting read when flushing HDPAlex Deucher
commit c9b8dcabb52afe88413ff135a0953e3cc4128483 upstream. Need to read back to make sure the write goes through. Cc: David Belanger <david.belanger@amd.com> Reviewed-by: Frank Min <frank.min@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>