summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)Author
2024-09-18drm/amd/amdgpu: apply command submission parser for JPEG v1David (Ming Qiang) Wu
commit 8409fb50ce48d66cf9dc5391f03f05c56c430605 upstream. Similar to jpeg_v2_dec_ring_parse_cs() but it has different register ranges and a few other registers access. Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 3d5adbdf1d01708777f2eda375227cbf7a98b9fe) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-12drm/amdgpu: handle gfx12 in amdgpu_display_verify_sizesMarek Olšák
[ Upstream commit 8dd1426e2c80e32ac1995007330c8f95ffa28ebb ] It verified GFX9-11 swizzle modes on GFX12, which has undefined behavior. Signed-off-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-12drm/amdgpu: reject gang submit on reserved VMIDsChristian König
[ Upstream commit 320debca1ba3a81c87247eac84eff976ead09ee0 ] A gang submit won't work if the VMID is reserved and we can't flush out VM changes from multiple engines at the same time. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-12drm/amdgpu: Set no_hw_access when VF request full GPU failsYifan Zha
[ Upstream commit 33f23fc3155b13c4a96d94a0a22dc26db767440b ] [Why] If VF request full GPU access and the request failed, the VF driver can get stuck accessing registers for an extended period during the unload of KMS. [How] Set no_hw_access flag when VF request for full GPU access fails This prevents further hardware access attempts, avoiding the prolonged stuck state. Signed-off-by: Yifan Zha <Yifan.Zha@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-12drm/amdgpu: check for LINEAR_ALIGNED correctly in check_tiling_flags_gfx6Marek Olšák
[ Upstream commit 11317d2963fa79767cd7c6231a00a9d77f2e0f54 ] Fix incorrect check. Signed-off-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-12drm/amdgpu: clear RB_OVERFLOW bit when enabling interruptsDanijel Slivka
[ Upstream commit afbf7955ff01e952dbdd465fa25a2ba92d00291c ] Why: Setting IH_RB_WPTR register to 0 will not clear the RB_OVERFLOW bit if RB_ENABLE is not set. How to fix: Set WPTR_OVERFLOW_CLEAR bit after RB_ENABLE bit is set. The RB_ENABLE bit is required to be set, together with WPTR_OVERFLOW_ENABLE bit so that setting WPTR_OVERFLOW_CLEAR bit would clear the RB_OVERFLOW. Signed-off-by: Danijel Slivka <danijel.slivka@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-12drm/amdgpu: Fix smatch static checker warningHawking Zhang
[ Upstream commit bdbdc7cecd00305dc844a361f9883d3a21022027 ] adev->gfx.imu.funcs could be NULL Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-08drm/amdgpu: add lock in amdgpu_gart_invalidate_tlbYunxiang Li
[ Upstream commit 18f2525d31401e5142db95ff3a6ec0f4147be818 ] We need to take the reset domain lock before flush hdp. We can't put the lock inside amdgpu_device_flush_hdp itself because it is used during reset where we already take the write side lock. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-08drm/amdgpu: add skip_hw_access checks for sriovYunxiang Li
[ Upstream commit b3948ad1ac582f560e1f3aeaecf384619921c48d ] Accessing registers via host is missing the check for skip_hw_access and the lockdep check that comes with it. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-08drm/amdgu: fix Unintentional integer overflow for mall sizeJesse Zhang
[ Upstream commit c09d2eff81a997c169e0cacacd6b60c5e3aa33f2 ] Potentially overflowing expression mall_size_per_umc * adev->gmc.num_umc with type unsigned int (32 bits, unsigned) is evaluated using 32-bit arithmetic,and then used in a context that expects an expression of type u64 (64 bits, unsigned). Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-08drm/amdgpu: update type of buf size to u32 for eeprom functionsTao Zhou
[ Upstream commit 2aadb520bfacec12527effce3566f8df55e5d08e ] Avoid overflow issue. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-08drm/kfd: Correct pinned buffer handling at kfd restore and validate processXiaogang Chen
[ Upstream commit f326d7cc745683f53052b84382bd10567b45cd5d ] This reverts commit 8a774fe912ff ("drm/amdgpu: avoid restore process run into dead loop") since buffer got pinned is not related whether it needs mapping And skip buffer validation at kfd driver if the buffer has been pinned. Signed-off-by: Xiaogang Chen <Xiaogang.Chen@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-08drm/amdgpu: the warning dereferencing obj for nbio_v7_4Jesse Zhang
[ Upstream commit d190b459b2a4304307c3468ed97477b808381011 ] if ras_manager obj null, don't print NBIO err data Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-08drm/amdgpu: fix the waring dereferencing hiveJesse Zhang
[ Upstream commit 1940708ccf5aff76de4e0b399f99267c93a89193 ] Check the amdgpu_hive_info *hive that maybe is NULL. Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-08drm/amdgpu: fix dereference after null checkJesse Zhang
[ Upstream commit b1f7810b05d1950350ac2e06992982974343e441 ] check the pointer hive before use. Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-08drm/amdgpu: Fix the warning division or modulo by zeroJesse Zhang
[ Upstream commit 1a00f2ac82d6bc6689388c7edcd2a4bd82664f3c ] Checks the partition mode and returns an error for an invalid mode. Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-08drm/amdgpu: fix mc_data out-of-bounds read warningTim Huang
[ Upstream commit 51dfc0a4d609fe700750a62f41447f01b8c9ea50 ] Clear warning that read mc_data[i-1] may out-of-bounds. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-08drm/amdgpu: fix ucode out-of-bounds read warningTim Huang
[ Upstream commit 8944acd0f9db33e17f387fdc75d33bb473d7936f ] Clear warning that read ucode[] may out-of-bounds. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-08drm/amdgpu: Fix out-of-bounds read of df_v1_7_channel_numberMa Jun
[ Upstream commit d768394fa99467bcf2703bde74ddc96eeb0b71fa ] Check the fb_channel_number range to avoid the array out-of-bounds read error Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-08drm/amdgpu: Fix out-of-bounds write warningMa Jun
[ Upstream commit be1684930f5262a622d40ce7a6f1423530d87f89 ] Check the ring type value to fix the out-of-bounds write warning Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-08drm/amdgpu: Fix the uninitialized variable warningMa Jun
[ Upstream commit 7e39d7ec35883a168343ea02f40e260e176c6c63 ] Check the user input and phy_id value range to fix "Using uninitialized value phy_id" Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-08drm/amd/amdgpu: Check tbo resource pointerAsad Kamal
[ Upstream commit 6cd2b872643bb29bba01a8ac739138db7bd79007 ] Validate tbo resource pointer, skip if NULL Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-08drm/amdgpu: avoid reading vf2pf info size from FBZhigang Luo
[ Upstream commit 3bcc0ee14768d886cedff65da72d83d375a31a56 ] VF can't access FB when host is doing mode1 reset. Using sizeof to get vf2pf info size, instead of reading it from vf2pf header stored in FB. Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-08drm/amdgpu: fix overflowed array index read warningTim Huang
[ Upstream commit ebbc2ada5c636a6a63d8316a3408753768f5aa9f ] Clear overflowed array index read warning by cast operation. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-08drm/amdgpu: Fix uninitialized variable warning in amdgpu_afmt_acrMa Jun
[ Upstream commit c0d6bd3cd209419cc46ac49562bef1db65d90e70 ] Assign value to clock to fix the warning below: "Using uninitialized value res. Field res.clock is uninitialized" Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29drm/amdgpu/vcn: not pause dpg for unified queueBoyuan Zhang
commit 7d75ef3736a025db441be652c8cc8e84044a215f upstream. For unified queue, DPG pause for encoding is done inside VCN firmware, so there is no need to pause dpg based on ring type in kernel. For VCN3 and below, pausing DPG for encoding in kernel is still needed. v2: add more comments v3: update commit message Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-29drm/amdgpu/vcn: identify unified queue in sw initBoyuan Zhang
commit ecfa23c8df7ef3ea2a429dfe039341bf792e95b4 upstream. Determine whether VCN using unified queue in sw_init, instead of calling functions later on. v2: fix coding style Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-29drm/amdgpu: Validate TA binary sizeCandice Li
commit c99769bceab4ecb6a067b9af11f9db281eea3e2a upstream. Add TA binary size validation to avoid OOB write. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c0a04e3570d72aaf090962156ad085e37c62e442) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-29drm/amdkfd: reserve the BO before validating itLang Yu
[ Upstream commit 0c93bd49576677ae1a18817d5ec000ef031d5187 ] Fix a warning. v2: Avoid unmapping attachment repeatedly when ERESTARTSYS. v3: Lock the BO before accessing ttm->sg to avoid race conditions.(Felix) [ 41.708711] WARNING: CPU: 0 PID: 1463 at drivers/gpu/drm/ttm/ttm_bo.c:846 ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.708989] Call Trace: [ 41.708992] <TASK> [ 41.708996] ? show_regs+0x6c/0x80 [ 41.709000] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709008] ? __warn+0x93/0x190 [ 41.709014] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709024] ? report_bug+0x1f9/0x210 [ 41.709035] ? handle_bug+0x46/0x80 [ 41.709041] ? exc_invalid_op+0x1d/0x80 [ 41.709048] ? asm_exc_invalid_op+0x1f/0x30 [ 41.709057] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu] [ 41.709185] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709197] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu] [ 41.709337] ? srso_alias_return_thunk+0x5/0x7f [ 41.709346] kfd_mem_dmaunmap_attachment+0x9e/0x1e0 [amdgpu] [ 41.709467] amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x56/0x80 [amdgpu] [ 41.709586] kfd_ioctl_unmap_memory_from_gpu+0x1b7/0x300 [amdgpu] [ 41.709710] kfd_ioctl+0x1ec/0x650 [amdgpu] [ 41.709822] ? __pfx_kfd_ioctl_unmap_memory_from_gpu+0x10/0x10 [amdgpu] [ 41.709945] ? srso_alias_return_thunk+0x5/0x7f [ 41.709949] ? tomoyo_file_ioctl+0x20/0x30 [ 41.709959] __x64_sys_ioctl+0x9c/0xd0 [ 41.709967] do_syscall_64+0x3f/0x90 [ 41.709973] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Fixes: 101b8104307e ("drm/amdkfd: Move dma unmapping after TLB flush") Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29drm/amd/amdgpu: command submission parser for JPEGDavid (Ming Qiang) Wu
[ Upstream commit 470516c2925493594a690bc4d05b1f4471d9f996 ] Add JPEG IB command parser to ensure registers in the command are within the JPEG IP block. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a7f670d5d8e77b092404ca8a35bb0f8f89ed3117) Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29drm/amdgpu: fix dereference null return value for the function ↵Jesse Zhang
amdgpu_vm_pt_parent [ Upstream commit 511a623fb46a6cf578c61d4f2755783c48807c77 ] The pointer parent may be NULLed by the function amdgpu_vm_pt_parent. To make the code more robust, check the pointer parent. Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29drm/amdkfd: Move dma unmapping after TLB flushPhilip Yang
[ Upstream commit 101b8104307eac734f2dfa4d3511430b0b631c73 ] Otherwise GPU may access the stale mapping and generate IOMMU IO_PAGE_FAULT. Move this to inside p->mutex to prevent multiple threads mapping and unmapping concurrently race condition. After kfd_mem_dmaunmap_attachment is removed from unmap_bo_from_gpuvm, kfd_mem_dmaunmap_attachment is called if failed to map to GPUs, and before free the mem attachment in case failed to unmap from GPUs. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29drm/amdgpu: access RLC_SPM_MC_CNTL through MMIO in SRIOV runtimeZhenGuo Yin
[ Upstream commit 9f05cfc78c6880e06940ea78fbc43f6392710f17 ] Register RLC_SPM_MC_CNTL is not blocked by L1 policy, VF can directly access it through MMIO during SRIOV runtime. v2: use SOC15 interface to access registers Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29drm/amd/amdgpu/imu_v11_0: Increase buffer size to ensure all possible values ↵Lee Jones
can be stored [ Upstream commit a728342ae4ec2a7fdab0038b11427579424f133e ] Fixes the following W=1 kernel build warning(s): drivers/gpu/drm/amd/amdgpu/imu_v11_0.c: In function ‘imu_v11_0_init_microcode’: drivers/gpu/drm/amd/amdgpu/imu_v11_0.c:52:54: warning: ‘_imu.bin’ directive output may be truncated writing 8 bytes into a region of size between 4 and 33 [-Wformat-truncation=] drivers/gpu/drm/amd/amdgpu/imu_v11_0.c:52:9: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 40 Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29drm/amdgpu/jpeg4: properly set atomics vmid fieldAlex Deucher
commit e6c6bd6253e792cee6c5c065e106e87b9f0d9ae9 upstream. This needs to be set as well if the IB uses atomics. Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c6c2e8b6a427d4fecc7c36cffccb908185afcab2) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-29drm/amdgpu/jpeg2: properly set atomics vmid fieldAlex Deucher
commit e414a304f2c5368a84f03ad34d29b89f965a33c9 upstream. This needs to be set as well if the IB uses atomics. Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 35c628774e50b3784c59e8ca7973f03bcb067132) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-29drm/amdgpu: Actually check flags for all context ops.Bas Nieuwenhuizen
commit 0573a1e2ea7e35bff08944a40f1adf2bb35cea61 upstream. Missing validation ... Checked libdrm and it clears all the structs, so we should be safe to just check everything. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c6b86421f1f9ddf9d706f2453159813ee39d0cf9) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-14drm/amdgpu: Forward soft recovery errors to userspaceJoshua Ashton
commit 829798c789f567ef6ba4b084c15b7b5f3bd98d51 upstream. As we discussed before[1], soft recovery should be forwarded to userspace, or we can get into a really bad state where apps will keep submitting hanging command buffers cascading us to a hard reset. 1: https://lore.kernel.org/all/bf23d5ed-9a6b-43e7-84ee-8cbfd0d60f18@froggi.es/ Signed-off-by: Joshua Ashton <joshua@froggi.es> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 434967aadbbbe3ad9103cc29e9a327de20fdba01) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-14drm/amdgpu: Add lock around VF RLCG interfaceVictor Skvortsov
[ Upstream commit e864180ee49b4d30e640fd1e1d852b86411420c9 ] flush_gpu_tlb may be called from another thread while device_gpu_recover is running. Both of these threads access registers through the VF RLCG interface during VF Full Access. Add a lock around this interface to prevent race conditions between these threads. Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-14drm/admgpu: fix dereferencing null pointer contextJesse Zhang
[ Upstream commit 030ffd4d43b433bc6671d9ec34fc12c59220b95d ] When user space sets an invalid ta type, the pointer context will be empty. So it need to check the pointer context before using it Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-14drm/amdgpu: Fix the null pointer dereference to ras_managerMa Jun
[ Upstream commit 4c11d30c95576937c6c35e6f29884761f2dddb43 ] Check ras_manager before using it Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-14drm/amdgpu: fix potential resource leak warningTim Huang
[ Upstream commit 22a5daaec0660dd19740c4c6608b78f38760d1e6 ] Clear resource leak warning that when the prepare fails, the allocated amdgpu job object will never be released. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03drm/amd/amdgpu: Fix uninitialized variable warningsMa Ke
commit df65aabef3c0327c23b840ab5520150df4db6b5f upstream. Return 0 to avoid returning an uninitialized variable r. Cc: stable@vger.kernel.org Fixes: 230dd6bb6117 ("drm/amd/amdgpu: implement mode2 reset on smu_v13_0_10") Signed-off-by: Ma Ke <make24@iscas.ac.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6472de66c0aa18d50a4b5ca85f8272e88a737676) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-03drm/amdgpu: reset vm state machine after gpu reset(vram lost)ZhenGuo Yin
commit 5659b0c93a1ea02c662a030b322093203f299185 upstream. [Why] Page table of compute VM in the VRAM will lost after gpu reset. VRAM won't be restored since compute VM has no shadows. [How] Use higher 32-bit of vm->generation to record a vram_lost_counter. Reset the VM state machine when vm->genertaion is not equal to the new generation token. v2: Check vm->generation instead of calling drm_sched_entity_error in amdgpu_vm_validate. v3: Use new generation token instead of vram_lost_counter for check. Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org (cherry picked from commit 47c0388b0589cb481c294dcb857d25a214c46eb3) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-03drm/amdgpu/sdma5.2: Update wptr registers as well as doorbellAlex Deucher
commit a03ebf116303e5d13ba9a2b65726b106cb1e96f6 upstream. We seem to have a case where SDMA will sometimes miss a doorbell if GFX is entering the powergating state when the doorbell comes in. To workaround this, we can update the wptr via MMIO, however, this is only safe because we disallow gfxoff in begin_ring() for SDMA 5.2 and then allow it again in end_ring(). Enable this workaround while we are root causing the issue with the HW team. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/3440 Tested-by: Friedrich Vock <friedrich.vock@gmx.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org (cherry picked from commit f2ac52634963fc38e4935e11077b6f7854e5d700) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-03drm/amdgpu: Remove GC HW IP 9.3.0 from noretry=1Tim Van Patten
[ Upstream commit 1446226d32a45bb7c4f63195a59be8c08defe658 ] The following commit updated gmc->noretry from 0 to 1 for GC HW IP 9.3.0: commit 5f3854f1f4e2 ("drm/amdgpu: add more cases to noretry=1") This causes the device to hang when a page fault occurs, until the device is rebooted. Instead, revert back to gmc->noretry=0 so the device is still responsive. Fixes: 5f3854f1f4e2 ("drm/amdgpu: add more cases to noretry=1") Signed-off-by: Tim Van Patten <timvp@google.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03drm/amdgpu: Check if NBIO funcs are NULL in amdgpu_device_baco_exitFriedrich Vock
[ Upstream commit 0cdb3f9740844b9d95ca413e3fcff11f81223ecf ] The special case for VM passthrough doesn't check adev->nbio.funcs before dereferencing it. If GPUs that don't have an NBIO block are passed through, this leads to a NULL pointer dereference on startup. Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de> Fixes: 1bece222eabe ("drm/amdgpu: Clear doorbell interrupt status for Sienna Cichlid") Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03drm/amdgpu: Fix memory range calculationLijo Lazar
[ Upstream commit ce798376ef6764de51d8f4684ae525b55df295fa ] Consider the 16M reserved region also before range calculation for GMC 9.4.3 SOCs. Fixes: a433f1f59484 ("drm/amdgpu: Initialize memory ranges for GC 9.4.3") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-27drm/amdgpu: Fix signedness bug in sdma_v4_0_process_trap_irq()Dan Carpenter
commit 6769a23697f17f9bf9365ca8ed62fe37e361a05a upstream. The "instance" variable needs to be signed for the error handling to work. Fixes: 8b2faf1a4f3b ("drm/amdgpu: add error handle to avoid out-of-bounds") Reviewed-by: Bob Zhou <bob.zhou@amd.com> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Siddh Raman Pant <siddh.raman.pant@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-25drm/amdgpu: Indicate CU havest info to CPHarish Kasiviswanathan
[ Upstream commit 49c9ffabde555c841392858d8b9e6cf58998a50c ] To achieve full occupancy CP hardware needs to know if CUs in SE are symmetrically or asymmetrically harvested v2: Reset is_symmetric_cus for each loop Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>