summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-08-20drm/amdgpu: Enforce isolation as part of the jobSrinivasan Shanmugam
This patch adds a new parameter 'enforce_isolation' to the amdgpu_job structure. This parameter is used to determine whether shader isolation should be enforced for a job. The enforce_isolation parameter is then stored in the amdgpu_job structure and used when flushing the VM. The enforce_isolation field of the amdgpu_job structure is set directly after the job is allocated This change allows more fine-grained control over shader isolation, making it possible to enforce isolation on a per-job basis rather than globally. This can be useful in scenarios where only certain jobs require isolation. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com>
2024-08-16drm/amdgpu: abort KIQ waits when there is a pending resetVictor Skvortsov
Stop waiting for the KIQ to return back when there is a reset pending. It's quite likely that the KIQ will never response. Signed-off-by: Koenig Christian <Christian.Koenig@amd.com> Suggested-by: Lazar Lijo <Lijo.Lazar@amd.com> Tested-by: Victor Skvortsov <victor.skvortsov@amd.com> Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu: Make enforce_isolation setting per GPUSrinivasan Shanmugam
This commit makes enforce_isolation setting to be per GPU and per partition by adding the enforce_isolation array to the adev structure. The adev variable is set based on the global enforce_isolation module parameter during device initialization. In amdgpu_ids.c, the adev->enforce_isolation value for the current GPU is used to determine whether to enforce isolation between graphics and compute processes on that GPU. In amdgpu_ids.c, the adev->enforce_isolation value for the current GPU and partition is used to determine whether to enforce isolation between graphics and compute processes on that GPU and partition. This allows the enforce_isolation setting to be controlled individually for each GPU and each partition, which is useful in a system with multiple GPUs and partitions where different isolation settings might be desired for different GPUs and partitions. v2: fix loop in amdgpu_vmid_mgr_init() (Alex) Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Suggested-by: Christian König <christian.koenig@amd.com>
2024-08-16drm/amdgpu: Emit cleaner shader at end of IB submissionAlex Deucher
This commit introduces the emission of a cleaner shader at the end of the IB submission process. This is achieved by adding a new function pointer, `emit_cleaner_shader`, to the `amdgpu_ring_funcs` structure. If the `emit_cleaner_shader` function is set in the ring functions, it is called during the VM flush process. The cleaner shader is only emitted if the `enable_cleaner_shader` flag is set in the `amdgpu_device` structure. This allows the cleaner shader emission to be controlled on a per-device basis. By emitting a cleaner shader at the end of the IB submission, we can ensure that the VM state is properly cleaned up after each submission. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com>
2024-08-16drm/amdgpu: Add infrastructure for Cleaner Shader featureSrinivasan Shanmugam
The cleaner shader is used by the CP firmware to clean LDS and GPRs between processes on the CUs. This adds an internal API for GFX IP code to allocate and initialize the cleaner shader. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com>
2024-08-16drm/amdgpu: handle enforce isolation on non-0 gfxhubAlex Deucher
Some chips have more than one gfxhub so check if we are a gfxhub rather than just gfxhub 0. Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu/sdma5.2: limit wptr workaround to sdma 5.2.1Alex Deucher
The workaround seems to cause stability issues on other SDMA 5.2.x IPs. Fixes: a03ebf116303 ("drm/amdgpu/sdma5.2: Update wptr registers as well as doorbell") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3556 Acked-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu: add vcn ip dump support for vcn_v2_6Sunil Khatri
Add support for logging the registers in devcoredump buffer for vcn_v2_6. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu: add print support for vcn_v2_5 ip dumpSunil Khatri
Add support for logging the registers in devcoredump buffer for vcn_v2_5. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu: add vcn_v2_5 ip dump supportSunil Khatri
Add support of vcn ip dump in the devcoredump for vcn_v2_5. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu: add print support for vcn_v2_0 ip dumpSunil Khatri
Add support for logging the registers in devcoredump buffer for vcn_v2_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu: add vcn_v2_0 ip dump supportSunil Khatri
Add support of vcn ip dump in the devcoredump for vcn_v2_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu: add print support for vcn_v1_0 ip dumpSunil Khatri
Add support for logging the registers in devcoredump buffer for vcn_v1_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu: add vcn_v1_0 ip dump supportSunil Khatri
Add support of vcn ip dump in the devcoredump for vcn_v1_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu: add print support for vcn_v4_0_5 ip dumpSunil Khatri
Add support for logging the registers in devcoredump buffer for vcn_v4_0_5. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu: add print support for vcn_v4_0 ip dumpSunil Khatri
Add support for logging the registers in devcoredump buffer for vcn_v4_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu: add print support for vcn_v4_0_3 ip dumpSunil Khatri
Add support for logging the registers in devcoredump buffer for vcn_v4_0_3. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu: add vcn_v4_0_5 ip dump supportSunil Khatri
Add support of vcn ip dump in the devcoredump for vcn_v4_0_5. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu: add vcn_v4_0 ip dump supportSunil Khatri
Add support of vcn ip dump in the devcoredump for vcn_v4_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu: add vcn_v4_0_3 ip dump supportSunil Khatri
Add support of vcn ip dump in the devcoredump for vcn_v4_0_3. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu: add print support for vcn_v5_0 ip dumpSunil Khatri
Add support for logging the registers in devcoredump buffer for vcn_v5_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu/mes12: add API for user queue resetAlex Deucher
Add API for resetting user queues. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu/mes11: add API for user queue resetAlex Deucher
Add API for resetting user queues. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu/mes: add API for user queue resetAlex Deucher
Add API for resetting user queues. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu/gfx11: export gfx_v11_0_request_gfx_index_mutex()Alex Deucher
It will be used by the queue reset code. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu/gfx11: add a mutex for the gfx semaphoreAlex Deucher
This will be used in more places in the future so add a mutex. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu/gfx11: enter safe mode before touching CP_INT_CNTLAlex Deucher
Need to enter safe mode before touching GC MMIO. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu/gfx7: add ring reset callback for gfxAlex Deucher
Add ring reset callback for gfx. v2: fix operator precedence (kernel test robot) Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu/gfx8: add ring reset callback for gfxAlex Deucher
Add ring reset callback for gfx. v2: fix operator precedence (kernel test robot) Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu: add vcn_v5_0 ip dump supportSunil Khatri
Add support of vcn ip dump in the devcoredump for vcn_v5_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu: add print support for vcn_v3_0 ip dumpSunil Khatri
Add support for logging the registers in devcoredump buffer for vcn_v3_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu: add vcn_v3_0 ip dump supportSunil Khatri
Add support of vcn ip dump in the devcoredump for vcn_v3_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu: add vcn ip dump ptr in vcn global structSunil Khatri
Add pointer to the vcn ip dump in the vcn global structure to be accessible for all vcn version via global adev. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amd: Remove unused declarationsZhang Zekun
amdgpu_gart_table_vram_pin() and amdgpu_gart_table_vram_unpin() has been removed since commit 575e55ee4fbc ("drm/amdgpu: recover gart table at resume") remain the declarations untouched in the header files. Besides, amdgpu_dm_display_resume() has also beed removed since commit a80aa93de1a0 ("drm/amd/display: Unify dm resume sequence into a single call"). So, let's remove this unused declarations. Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/radeon/evergreen_cs: fix int overflow errors in cs track offsetsNikita Zhandarovich
Several cs track offsets (such as 'track->db_s_read_offset') either are initialized with or plainly take big enough values that, once shifted 8 bits left, may be hit with integer overflow if the resulting values end up going over u32 limit. Same goes for a few instances of 'surf.layer_size * mslice' multiplications that are added to 'offset' variable - they may potentially overflow as well and need to be validated properly. While some debug prints in this code section take possible overflow issues into account, simply casting to (unsigned long) may be erroneous in its own way, as depending on CPU architecture one is liable to get different results. Fix said problems by: - casting 'offset' to fixed u64 data type instead of ambiguous unsigned long. - casting one of the operands in vulnerable to integer overflow cases to u64. - adjust format specifiers in debug prints to properly represent 'offset' values. Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Fixes: 285484e2d55e ("drm/radeon: add support for evergreen/ni tiling informations v11") Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu: fixing rlc firmware loading failure issueYang Wang
Skip rlc firmware validation to ignore firmware header size mismatch issues. This restores the workaround added in commit 849e133c973c ("drm/amdgpu: Fix the null pointer when load rlc firmware") Fixes: 3af2c80ae2f5 ("drm/amdgpu: refine gfx10 firmware loading") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3551 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu: remove ME0 registers from mi300 dumpSunil Khatri
Remove ME0 registers from MI300 gfx_9_4_3 ipdump MI300 does not have gfx ME and hence those register are just empty one and could be dropped. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu/gfx9: use rlc safe mode for soft recoveryAlex Deucher
Protect the MMIO access with safe mode. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu/gfx9.4.3: use rlc safe mode for soft recoveryAlex Deucher
Protect the MMIO access with safe mode. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu/gfx9.4.3: use proper rlc safe mode helpersAlex Deucher
Rather than open coding it for the queue reset. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu/gfx9: use proper rlc safe mode helpersAlex Deucher
Rather than open coding it for the queue reset. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu/gfx9: add ring reset callback for gfxAlex Deucher
Add ring reset callback for gfx. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu/gfx9: per queue reset only on bare metalAlex Deucher
It's not supported under SR-IOV at the moment. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu/gfx9.4.3: implement reset_hw_queue for gfx9.4.3Jiadong Zhu
Using mmio to do queue reset. Enter safe mode before writing mmio registers. v2: set register instance offset according to xcc id. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu/gfx9: implement reset_hw_queue for gfx9Jiadong Zhu
Using mmio to do queue reset. Enter safe mode when writing registers. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu/gfx: add a new kiq_pm4_funcs callback for reset_hw_queueJiadong Zhu
Add reset_hw_queue in kiq_pm4_funcs callbacks. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu/gfx_9.4.3: wait for reset done before remapJiadong Zhu
There is a racing condition that cp firmware modifies MQD in reset sequence after driver updates it for remapping. We have to wait till CP_HQD_ACTIVE becoming false then remap the queue. v2: fix KIQ locking (Alex) v3: fix KIQ locking harder Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu/gfx9.4.3: remap queue after reset successfullyJiadong Zhu
Kiq command unmap_queues only does the dequeueing action. We have to map the queue back with clean mqd. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu/gfx9.4.3: add ring reset callbackAlex Deucher
Add ring reset callback for compute. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu/gfx9: wait for reset done before remapJiadong Zhu
There is a racing condition that cp firmware modifies MQD in reset sequence after driver updates it for remapping. We have to wait till CP_HQD_ACTIVE becoming false then remap the queue. v2: fix KIQ locking (Alex) v3: fix KIQ locking harder Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>