summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd
AgeCommit message (Collapse)Author
2025-06-18drm/amdgpu/sdma5.2: init engine reset mutexAlex Deucher
Missing the mutex init. Fixes: 47454f2dc0bf ("drm/amdgpu: Register the new sdma function pointers for sdma_v5_2") Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amdkfd: Fix race in GWS queue schedulingJay Cornwall
q->gws is not updated atomically with qpd->mapped_gws_queue. If a runlist is created between pqm_set_gws and update_queue it will contain a queue which uses GWS in a process with no GWS allocated. This will result in a scheduler hang. Use q->properties.is_gws which is changed while holding the DQM lock. Signed-off-by: Jay Cornwall <jay.cornwall@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amdgpu/sdma5: init engine reset mutexAlex Deucher
Missing the mutex init. Fixes: e56d4bf57fab ("drm/amdgpu/: drm/amdgpu: Register the new sdma function pointers for sdma_v5_0") Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amdgpu: switch job hw_fence to amdgpu_fenceAlex Deucher
Use the amdgpu fence container so we can store additional data in the fence. This also fixes the start_time handling for MCBP since we were casting the fence to an amdgpu_fence and it wasn't. Fixes: 3f4c175d62d8 ("drm/amdgpu: MCBP based on DRM scheduler (v9)") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amdgpu: Add xgmi API to set max speed/widthLijo Lazar
Add an API to set the max possible xgmi speed/width. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amdgpu: Deprecate xgmi_link_speed enumLijo Lazar
xgmi doesn't have discrete max speeds defined. Speed numbers can be arbitrary based on SOC. Deprecate the enum. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amdgpu: Extend bus status check to more casesLijo Lazar
In case of unexpected errors, check if device is alive on the bus. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amd/pm: Report pldm version and board voltageLijo Lazar
Add support to report PLDM firmware version and board voltage on SMU v13.0.12 SOCs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amd/pm: Update SMU v13.0.12 pmfw headerLijo Lazar
Update PMFW metrics table definition to version 0x13 Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amdgpu: reclaim psp fw reservation memory regionFrank Min
PSP v14 fw update introduced changes on memory reservation region, according to the change driver reclaim some non-reserved region. 1. introduce 2 new psp commands to query fw reservation regions 2. add a new reservation region for psp 3. reclaim psp non-used region Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amdgpu: refine usage of amdgpu_bad_page_thresholdganglxie
when amdgpu_bad_page_threshold == -1 or -2, driver will issue a warning message when threshold is reached and continue runtime services. Signed-off-by: ganglxie <ganglxie@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amdgpu: Fix SDMA UTC_L1 handling during start/stop sequencesJesse Zhang
This commit makes two key fixes to SDMA v4.4.2 handling: 1. disable UTC_L1 in sdma_cntl register when stopping SDMA engines by reading the current value before modifying UTC_L1_ENABLE bit. 2. Ensure UTC_L1_ENABLE is consistently managed by: - Adding the missing register write when enabling UTC_L1 during start - Keeping UTC_L1 enabled by default as per hardware requirements v2: Correct SDMA_CNTL setting (Philip) Suggested-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amdgpu: Release reset locks during failuresLijo Lazar
Make sure to release reset domain lock in case of failures. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Fixes: 11bb33766f66 ("drm/amdgpu: refactor amdgpu_device_gpu_recover") Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amd/pm: set pcie default dpm table when updating pcie dpm parametersKenneth Feng
set pcie default dpm table when updating pcie dpm parameters Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amd/pm: move the dpm table setting back after featureenablementKenneth Feng
move the dpm table setting back after featureenablemend due to dependancy. For SMUv13.0.6, there is no pptable. Those frequency tables are available through FW metrics and it needs DPM to be enabled. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amdgpu/sdma: handle paging queues in amdgpu_sdma_reset_engine()Alex Deucher
Need to properly start and stop paging queues if they are present. This is not an issue today since we don't support a paging queue on any chips with queue reset. Fixes: b22659d5d352 ("drm/amdgpu: switch amdgpu_sdma_reset_engine to use the new sdma function pointers") Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amd/display: Promote DC to 3.2.338Taimur Hassan
DC v3.2.338 summary: * DML bug fixes * Add pwait to DMCUB hang reporting * New definitions / changes to prep for new platforms. * Misc cleanups Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amd/display: Removing Unused DPP FunctionsRyan Seto
[Why & How] The functions in this commit are defined for dpp401 but never used. Removing them as they are not necessary. Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Ryan Seto <ryanseto@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amd/display: add APG struct to stream_enc for future useCharlene Liu
some new asics will have an APG instance taking over certain functions. Reviewed-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com> Signed-off-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amd/display: prepare for new platformKarthi Kandasamy
Expose some functions for new platform use Reviewed-by: Nevenko Stupar <nevenko.stupar@amd.com> Signed-off-by: Karthi Kandasamy <karthi.kandasamy@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amd/display: Add pwait status to DMCUB diagnosticsNicholas Kazlauskas
[Why] To know if the firmware is idle when logging. [How] Add the pwait status to the DMCUB diagnostics. Reviewed-by: Ovidiu Bunea <ovidiu.bunea@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amd/display: Check dce_hwseq before dereferencing itAlex Hung
[WHAT] hws was checked for null earlier in dce110_blank_stream, indicating hws can be null, and should be checked whenever it is used. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amd/display: Disable common modes for eDPMario Limonciello
[Why] Common modes are added to eDP for compatibility in clone mode, but not all panels support them. Non-native modes were disabled in the past but this caused problems because compositors didn't use scaling for non native modes. Now non-native modes on eDP will enable the scaler by default. [How] Check the connector type. If the connector is eDP avoid adding common modes. Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amd/display: Use scaling for non-native resolutions on eDPMario Limonciello
[Why] Common resolutions are added to supported modes to enable compatibility scenarios that compositors may use to do things like clone displays. There is no guarantee however that the panel will natively support these modes. [How] If the compositor hasn't enabled scaling but a non-native resolution has been picked for an eDP panel turn the scaler on anyway. This will ensure compatibility. Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amd/display: remove use_native_pstate_optimizationYan Li
[Why] In DML2 (not DML2.1), DCN35 and DCN351 have the default value for use_native_pstate_optimization set to true. The code path where this bit is false is not used. [How] Remove the bit and the corresponding code path when it is set to false. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Reviewed-by: Aric Cyr <aric.cyr@amd.com> Signed-off-by: Yan Li <yan.li@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amd/display: apply two different methods to validate modesYan Li
[Why] In DML2, the current method to determine a mode is supported involves checking the voltage levels sequentially from the lowest, until one is found that can support the mode. It causes cursor lag due to low performance. [How] We apply two methods for mode validation. 1) DC_VALIDATE_MODE_ONLY: only the maximum voltage level is checked to determine whether the mode is supported, which improves performance and eliminate cursor lag. 2) DC_VALIDATE_MODE_AND_STATE_INDEX: when the optimal voltage level is required, check the voltage level from the lowest until a suitable one is found found and returns its index. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Reviewed-by: Aric Cyr <aric.cyr@amd.com> Signed-off-by: Yan Li <yan.li@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amdkfd: Move the process suspend and resume out of full accessEmily Deng
For the suspend and resume process, exclusive access is not required. Therefore, it can be moved out of the full access section to reduce the duration of exclusive access. v3: Move suspend processes before hardware fini. Remove twice call for bare metal. v4: Refine code Signed-off-by: Emily Deng <Emily.Deng@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amd/pm: Use pointer type for typecheck()Lijo Lazar
typecheck creates local variables based on the type passed. That could result in stack frame size warnings like below in certain configs: drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu13/smu_v13_0_6_ppt.c:2885:1: error: the frame size of 8304 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] Checking against the pointer type is sufficient for the purpose of getting a diagnostic message during build time. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Palmer Dabbelt <palmer@dabbelt.com> Tested-by: Palmer Dabbelt <palmer@dabbelt.com> Link: https://lore.kernel.org/r/20250610212141.19445-1-palmer@dabbelt.com Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amd: Allow printing Renoir OD SCLK levels without setting dpm to manualMario Limonciello
Several other ASICs allow printing OD SCLK levels without setting DPM control to manual. When OD is disabled it will show the range the hardware supports. When OD is enabled it will show what values have been programmed. Adjust Renoir to work the same. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reported-by: Vicki Pfau <vi@endrift.com> Link: https://lore.kernel.org/r/20250609031227.479079-2-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amdgpu: VCN v5_0_1 to prevent FW checking RB during DPG pauseSonny Jiang
Add a protection to ensure programming are all complete prior VCPU starting. This is a WA for an unintended VCPU running. Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amd: Allow printing VanGogh OD SCLK levels without setting dpm to manualMario Limonciello
Several other ASICs allow printing OD SCLK levels without setting DPM control to manual. When OD is disabled it will show the range the hardware supports. When OD is enabled it will show what values have been programmed. Adjust VanGogh to work the same. Cc: Pierre-Loup A. Griffais <pgriffais@valvesoftware.com> Reported-by: Vicki Pfau <vi@endrift.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250609031227.479079-1-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amd/display: Fix annotations for dc state functionsSrinivasan Shanmugam
This patch addresses inconsistencies in the annotations for the following functions: - **dc_get_power_profile_for_dc_state**: Standardized parameter and return value annotations. - **dc_get_det_buffer_size_from_state**: Clarified parameter documentation for better understanding. - **dc_get_host_router_index**: Corrected parameter descriptions to follow documentation conventions. Fixes the below: drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:6386: warning: Cannot understand *************** Cc: Tom Chung <chiahsuan.chung@amd.com> Cc: Roman Li <roman.li@amd.com> Cc: Alex Hung <alex.hung@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Cc: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amd/pm: update pcie dpm parameters before smu feature enablementKenneth Feng
update pcie dpm parameters before smu feature enablement Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amd/pm: override pcie dpm parameters only if it is necessaryKenneth Feng
override pcie dpm parameters only if it is necessary Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amdgpu: Add soft reset callback to SDMA v4.4.xJesse Zhang
Implement soft reset engine callback for SDMA 4.4.x IPs. This avoids IP version check in generic implementation. V2: Correct physical instance ID calculation in soft_reset_engine (Jesse) Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amdgpu: Use logical instance ID for SDMA v4_4_2 queue operationsJesse Zhang
Simplify SDMA v4_4_2 queue reset and stop operations by: 1. Removing GET_INST(SDMA0) conversion for ring->me 2. Using the logical instance ID (ring->me) directly 3. Maintaining consistent behavior with other SDMA queue operations This change aligns with the existing queue handling logic where ring->me already represents the correct instance identifier. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amdgpu: Fix SDMA engine reset with logical instance IDJesse Zhang
This commit makes the following improvements to SDMA engine reset handling: 1. Clarifies in the function documentation that instance_id refers to a logical ID 2. Adds conversion from logical to physical instance ID before performing reset using GET_INST(SDMA0, instance_id) macro 3. Improves error messaging to indicate when a logical instance reset fails 4. Adds better code organization with blank lines for readability The change ensures proper SDMA engine reset by using the correct physical instance ID while maintaining the logical ID interface for callers. V2: Remove harvest_config check and convert directly to physical instance (Lijo) Suggested-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amdgpu: Suspend IH during mode-2 resetLijo Lazar
On multi-aid SOCs, there could be a continuous stream of interrupts from GC after poison consumption. Suspend IH to disable them before doing mode-2 reset. This avoids conflicts in hardware accesses during interrupt handlers while a reset is ongoing. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amd/display: Destroy cached state in complete() callbackMario Limonciello
[Why] When the suspend sequence has been aborted after prepare() but before suspend() the resume() callback never gets called. The PM core will call complete() when this happens. As the state has been cached in prepare() it needs to be destroyed in complete() if it's still around. [How] Create a helper for destroying cached state and call it both in resume() and complete() callbacks. If resume has been called the state will be destroyed and it's a no-op for complete(). If resume hasn't been called (such as an aborted suspend) then destroy the state in complete(). Fixes: 50e0bae34fa6 ("drm/amd/display: Add and use new dm_prepare_suspend() callback") Reviewed-by: Alex Hung <alex.hung@amd.com> Link: https://lore.kernel.org/r/20250602014432.3538345-4-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amd/display: Stop storing failures into adev->dm.cached_stateMario Limonciello
If drm_atomic_helper_suspend() has failed for any reason, it's stored in adev->dm.cached_state. This isn't expected because the resume (or complete()) sequence will attempt to use the stored state to resume. Reviewed-by: Alex Hung <alex.hung@amd.com> Link: https://lore.kernel.org/r/20250602014432.3538345-3-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amd: Add support for a complete pmops actionMario Limonciello
complete() callbacks are supposed to handle reversing anything that occurred during prepare() callbacks. They'll be called on every power state transition, and will also be called if the sequence is failed (such as an aborted suspend). Add support for IP blocks to support this action. Reviewed-by: Alex Hung <alex.hung@amd.com> Link: https://lore.kernel.org/r/20250602014432.3538345-2-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amd/pm: Show default gfx clock levelsLijo Lazar
For SMU v13.0.6 SOCs, always show default clock levels for gfx in pp_dpm_sclk. Any custom min/max levels set by user will be available in pp_od_clk_voltage Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amdgpu: Add debug mask to disable CE logsXiang Liu
Add debug mask to disable kernel logs of RAS correctable errors, including both ACA and CE error counter kernel messages. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amdgpu: add kicker fws loading for gfx11/smu13/psp13Frank Min
1. Add kicker firmwares loading for gfx11/smu13/psp13 2. Register additional MODULE_FIRMWARE entries for kicker fws - gc_11_0_0_rlc_kicker.bin - gc_11_0_0_imu_kicker.bin - psp_13_0_0_sos_kicker.bin - psp_13_0_0_ta_kicker.bin - smu_13_0_0_kicker.bin Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amdgpu: Add kicker device detectionFrank Min
1. add kicker device list 2. add kicker device checking helper function Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amdgpu: Clear reset flags from ras contextLijo Lazar
Once RAS errors are cleared with appropriate recovery mechanism, clear reset flags also from RAS context. Otherwise, stale flag values could affect the subsequent RAS reset handling on the device. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amdgpu/gfx9: drop reset_kgqAlex Deucher
It doesn't work reliably and we have soft recover and full adapter reset so drop this. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amdgpu/gfx8: drop reset_kgqAlex Deucher
It doesn't work reliably and we have soft recover and full adapter reset so drop this. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amdgpu/gfx7: drop reset_kgqAlex Deucher
It doesn't work reliably and we have soft recover and full adapter reset so drop this. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amdkfd: allow compute partition mode switch with cgroup exclusionsJonathan Kim
The KFD currently bars a compute partition mode switch while a KFD process exists. Since cgroup excluded devices remain excluded for the lifetime of a KFD process and user space is able to mode switch single devices, allow users to mode switch a device with any running process that has been cgroup excluded from this device. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>