summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)Author
2024-12-11drm/amdgpu/sdma7: Add queue reset sysfs for sdmav7Jesse.zhang@amd.com
sdmv7 queue reset already supports by mmio, add its sys file. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-11drm/amdgpu/mes12: Implement reset gfx/compute queue function by mmioJesse.zhang@amd.com
Reset gfx/compute queue through mmio based on me_id and queue_id. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-11drm/amdgpu/mes12: Implement reset sdmav7 queue function by mmioJesse.zhang@amd.com
Reset sdma queue through mmio based on me_id and queue_id. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-11drm/amdgpu: Avoid VF for RAS recovery source checkLijo Lazar
VF device sets the RAS flag when mailbox data can't be read properly. There is no conclusive way to tell if the real source is RAS error. Therefore VF schedules a KFD based reset which doesn't set RAS source. SKip checking RAS source for any VF scheduled recovery. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reported-by: Vojislav Tomasevic <vojislav.tomasevic@amd.com> Reviewed-by: Yiqing Yao <yiqing.yao@amd.com> Tested-by: Yiqing Yao <yiqing.yao@amd.com> Fixes: e1ee2111ca48 ("drm/amdgpu: Prefer RAS recovery for scheduler hang") Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-11drm/amdgpu/sdma7: implement queue reset callback for sdma7Jesse.zhang@amd.com
Implement sdma queue reset callback by mes_reset_queue_mmio. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-11drm/amdgpu/sdma7: Implement resume function for each instanceJesse.zhang@amd.com
Extracts the resume sequence for per sdma instance from sdma_v7_0_gfx_resume. This function can be used in start or restart scenarios of specific instances. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: Support nbif v6_3_1 fatal error handlingCandice Li
Add nbif v6_3_1 fatal error handling support. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: Enable VCN_5_0_1 IP blockSonny Jiang
Add VCN_5_0_1 IP block to kernel boot Signed-off-by: Sonny Jiang <sonjiang@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: Add VCN_5_0_1 supportSonny Jiang
Add vcn support for VCN_5_0_1 v2: rebase, squash in fixes (Alex) Signed-off-by: Sonny Jiang <sonjiang@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: enable JPEG5_0_1 ip blockSathishkumar S
enable JPEG5_0_1 ip block Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Sonny Jiang <sonny.jiang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: Add JPEG5_0_1 supportSathishkumar S
add support for JPEG5_0_1 v2: squash in updates, rebase on IP instance changes Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Sonny Jiang <sonny.jiang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: Add VCN_5_0_1 codec querySonny Jiang
Support VCN_5_0_1 codec query v2: squash in updates Signed-off-by: Sonny Jiang <sonjiang@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: Add VCN_5_0_1 firmwareSonny Jiang
Add vcn_5_0_1 firmware support Signed-off-by: Sonny Jiang <sonjiang@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: update macro for maximum jpeg ringsSathishkumar S
Update the macro to accomdate more rings. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Sonny Jiang <sonjiang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: update irq sec header for vcn 5.0.0Alex Deucher
No functional change. Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: update irq sec header for jpeg 5.0.0Alex Deucher
No functional change. Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: Add umc v8_14 ras functionsCandice Li
Add umc v8_14 ras functions. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: Add psp v14_0_3 ras supportCandice Li
Add psp v14_0_3 ras support. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amd/amdgpu: Add Annotations to Process Isolation functionsSrinivasan Shanmugam
This update adds explanations to key functions that manage how the Kernel Fusion Driver (KFD) and Kernel Graphics Driver (KGD) share the GPU. amdgpu_gfx_enforce_isolation_wait_for_kfd: Controls the waiting period for KFD to ensure it takes turns with KGD in using the GPU. It uses a mutex to safely manage shared data, like timing and state, and tracks when KFD starts and stops waiting. amdgpu_gfx_enforce_isolation_ring_begin_use: Ensures KFD has enough time to run before new tasks are submitted to the GPU ring. It uses a mutex to synchronize access and may adjust the KFD scheduler. amdgpu_gfx_enforce_isolation_ring_end_use: Handles cleanup and state updates when finishing the use of a GPU ring. It may also adjust the KFD scheduler, using a mutex to manage shared data access. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: Init mmhub v1_8_1 ras funcHawking Zhang
reuse mmhub v1_8 ras functuion Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: Enable xgmi for gfx v9_5_0Shiwu Zhang
Enable xgmi for gfx v9_5_0 Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: Fetch refclock for SMU v13.0.12Asad Kamal
Add support to fetch refclock value for SMU v13.0.12 Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amd/pm: Add mode2 support for SMU v13.0.12Asad Kamal
Add mode2 reset support for smu version 13.0.12 Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amd/amdgpu: Add Descriptions to Process Isolation and Cleaner Shader ↵Srinivasan Shanmugam
Sysfs Functions This update adds explanations to key functions related to process isolation and cleaner shader execution sysfs interfaces. - `amdgpu_gfx_set_run_cleaner_shader`: Describes how to manually run a cleaner shader, which clears the Local Data Store (LDS) and General Purpose Registers (GPRs) to ensure data isolation between GPU workloads. - `amdgpu_gfx_get_enforce_isolation`: Describes how to query the current settings of the 'enforce_isolation' feature for each GPU partition. - `amdgpu_gfx_set_enforce_isolation`: Describes how to enable or disable process isolation for GPU partitions through the sysfs interface. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: Enable RAS for psp v13_0_12Hawking Zhang
Enable RAS Cap check and initialize RAS funcs for psp v13_0_12 Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: Load spdm_drv for psp v13_0_12Hawking Zhang
spdm_drv is a firmware that needs to be loaded in driver initialization phase. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: Add psp v13_0_12 firmware specifiersHawking Zhang
Add psp v13_0_12 firmware specifiers for sos and ta Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Shiwu Zhang <shiwu.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: add psp 13_0_12 version supportLe Ma
Add support for new psp 13_0_12 version Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amd: Show an info message about optional firmware missingMario Limonciello
With the warning from the core about missing firmware gone, users still may be notified of missing optional firmware by a more friendly message to clarify it's optional. Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: add ACA support for jpeg v4.0.3Yang Wang
Add ACA support for jpeg v4.0.3. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: add ACA support for vcn v4.0.3Yang Wang
v1: Add ACA support for vcn v4.0.3. v2: - split VCN ACA(v1) to 2 parts: vcn and jpeg. - move mmSMNAID_AID0_MCA_SMU to amdgpu_aca.h file. v3: - split JPEG ACA to another patch. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: move common ACA ipid defines into amdgpu_aca.hYang Wang
move common ACA ipid defines into amdgpu_aca.h file. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: add ih cam support for IH 4.4.4Alex Sierra
Same as IH 4.4.2. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: add initial support for sdma444Le Ma
add sdma444 basic support Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: Increase FRU File Id buffer sizeLijo Lazar
Some boards use longer File Ids. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: correct the calculation of RAS bad pageTao Zhou
After the introduction of NPS RAS, one bad page record on eeprom may be related to 1 or 16 bad pages, so the bad page record and bad page are two different concepts, define a new variable to store bad page number. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: split ras_eeprom_init into init and check functionsTao Zhou
Init function is for ras table header read and check function is responsible for the validation of the header. Call them in different stages. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amd: Add the capability to mark certain firmware as "required"Mario Limonciello
Some of the firmware that is loaded by amdgpu is not actually required. For example the ISP firmware on some SoCs is optional, and if it's not present the ISP IP block just won't be initialized. The firmware loader core however will show a warning when this happens like this: ``` Direct firmware load for amdgpu/isp_4_1_0.bin failed with error -2 ``` To avoid confusion for non-required firmware, adjust the amd-ucode helper to take an extra argument indicating if the firmware is required or optional. On optional firmware use firmware_request_nowarn() instead of request_firmware() to avoid the warnings. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/amd-gfx/df71d375-7abd-4b32-97ce-15e57846eed8@amd.com/T/#t Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: Apply gc v9_5_0 golden settingsHawking Zhang
Apply gc v9_5_0 golden settings. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amd: update mtype flags for gfx 9.5.0Alex Sierra
Update mtype flags to meet gfx 9.5.0 requirements for remote GPU memory and system memory. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: add initial support for gfx950Le Ma
add gfx950 basic support Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu/gfx: add gfx950 microcodeLe Ma
Add firmware declarations. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amd: define gc ip version local variableAlex Sierra
For better readability. Also leftover orphaned code. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: Remove gfxoff usageLijo Lazar
GFXOFF is not valid for these IP versions. Also, SDMA v4.4.2 is not in GFX domain. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: Avoid to release the FW twice in the validated errorPrike Liang
There will to release the FW twice when the FW validated error. Even if the release_firmware() will further validate the FW whether is empty, but that will be redundant and inefficient. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: device: fix spellos and punctuationRandy Dunlap
Make spelling and punctuation changes to ease reading of the comments. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Xinhui Pan <Xinhui.Pan@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: Add amdgpu_vcn_sched_mask debugfsSathishkumar S
Add debugfs entry to enable or disable job submission to specific vcn instances. The entry is created only when there is more than an instance and is unified queue type. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: return error when eeprom checksum failedJinzhou Su
Return eeprom table checksum error result, otherwise it might be overwritten by next call. V2: replace DRM_ERROR with dev_err Signed-off-by: Jinzhou Su <jinzhou.su@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amd: Add Suspend/Hibernate notification callback supportMario Limonciello
As part of the suspend sequence VRAM needs to be evicted on dGPUs. In order to make suspend/resume more reliable we moved this into the pmops prepare() callback so that the suspend sequence would fail but the system could remain operational under high memory usage suspend. Another class of issues exist though where due to memory fragementation there isn't a large enough contiguous space and swap isn't accessible. Add support for a suspend/hibernate notification callback that could evict VRAM before tasks are frozen. This should allow paging out to swap if necessary. Link: https://github.com/ROCm/ROCK-Kernel-Driver/issues/174 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3476 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2362 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3781 Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Link: https://lore.kernel.org/r/20241128032656.2090059-2-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10drm/amdgpu: Simplify cleanup check for FRU sysfsLijo Lazar
FRU info is expected to be non-NULL if FRU sys files are created. Simplify the check. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>