summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-03-05drm/amd/display: Add a new dcdebugmask to allow turning off brightness curveMario Limonciello
Upgrading the kernel may cause some systems that were previously not using a firmware specified brightness curve to use one. In the event of problems with this curve (for example an interpolation error) add a new dcdebugmask value that can be used to turn it off. Also add an info message to show that custom brightness curves are currently in use. Reviewed-by: Alex Hung <alex.hung@amd.com> Link: https://lore.kernel.org/r/20250228185145.186319-6-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: Add support for custom brightness curveMario Limonciello
Some systems specify in the firmware a brightness curve that better reflects the characteristics of the panel used. This is done in the form of data points and matching luminance percentage. When converting a userspace requested brightness value use that curve to convert to a firmware intended brightness value. Reviewed-by: Alex Hung <alex.hung@amd.com> Link: https://lore.kernel.org/r/20250228185145.186319-5-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: Avoid operating on copies of backlight capsMario Limonciello
Making a copy of the backlight caps structure between uses is unnecessary. Refer to pointers to the same structure when using it. Reviewed-by: Alex Hung <alex.hung@amd.com> Link: https://lore.kernel.org/r/20250228185145.186319-4-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd: Pass luminance data to amdgpu_dm_backlight_capsMario Limonciello
The ATIF method on some systems will provide a backlight curve. Pass this curve into amdgpu_dm add it to the structures. Reviewed-by: Alex Hung <alex.hung@amd.com> Link: https://lore.kernel.org/r/20250228185145.186319-3-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd: Copy entire structure in amdgpu_acpi_get_backlight_caps()Mario Limonciello
As new members are introduced to the structure copying the entire structure will help avoid missing them. Reviewed-by: Alex Hung <alex.hung@amd.com> Link: https://lore.kernel.org/r/20250228185145.186319-2-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: Promote DAL to 3.2.323Taimur Hassan
This version brings along following fixes: - Various cleanups to amdgpu dm - Add DP tunneling IRQ handler - Fix display corruption for dcn35 - Fix dmcub reset problem - Adjust BW determination for PCON - DIO encoder refactor - Fix performance with SubVP under gaming Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: Use drm_err() for handle_hpd_irq_helper()Mario Limonciello
drm_err() will show which device has the error. Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: Use scoped guards for handle_hpd_irq_helper()Mario Limonciello
Scoped guards will release the mutex when they go out of scope. Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: Use _free() macro for amdgpu_dm_update_connector_after_detect()Mario Limonciello
By using a _free() macro multiple duplicated snippets of code to free the sink can be dropped. The sink will be released when leaving scope. Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: Use scoped guard for amdgpu_dm_update_connector_after_detect()Mario Limonciello
A scoped guard will release the mutex when it goes out of scope. Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: Use _free(kfree) for dm_gpureset_commit_state()Mario Limonciello
Using a _free(kfree) macro drops the need for a goto statement as it will be freed when it goes out of scope. Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: Change amdgpu_dm_irq_resume_*() to voidMario Limonciello
amdgpu_dm_irq_resume_early() and amdgpu_dm_irq_resume_late() don't have any error flows. Change the return type from integer to void. Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: Change amdgpu_dm_irq_resume_*() to use drm_dbg()Mario Limonciello
drm_dbg() is helpful to show which device had the debug statement. Adjust to using this instead for debug messages. Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: Use scoped guard for dm_resume()Mario Limonciello
Scoped guards will release the mutex when they go out of scope. Adjust the code to use these instead. Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: Use drm_err() instead of DRM_ERROR in dm_resume()Mario Limonciello
drm_err() is helpful to show which device had the error. Adjust to using this instead for error messages. Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: Use _free() macro for amdgpu_dm_commit_zero_streams()Mario Limonciello
All cases except a failure to create a copy of the current context will call dc_state_release() on the copied context. Use a _free() macro to free the context and then adjust the error handling flow to drop the unnecessary use of goto statements. Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: Catch failures for amdgpu_dm_commit_zero_streams()Mario Limonciello
amdgpu_dm_commit_zero_streams() returns a DC error code that isn't checked. Add an explicit check to this and fail dm_suspend() if it is not DC_OK. Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: Drop `ret` variable from dm_suspend()Mario Limonciello
The `ret` variable in dm_suspend() doesn't get set and is just used to return 0. Drop the needless declaration. Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: Change amdgpu_dm_irq_suspend() to voidMario Limonciello
amdgpu_dm_irq_suspend() doesn't have any error flows and always returns zero. Change the function to void. Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: Add tunneling IRQ handlerCruise Hung
USB4 DP BW Allocation uses DP_TUNNELING_IRQ to indicate the status update. The DP_TUNNELING_IRQ is defined in LINK_SERVICE_IRQ_VECTOR_ESI0. When receiving DP HPD IRQ in USB4, read the LINK_SERVICE_IRQ_VECTOR_ESI0. Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Cruise Hung <Cruise.Hung@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: Added visual confirm for DCCLeo Zeng
[WHY] We want to add a visual confirm mode for DCC and MCache for debugging purpose. [HOW] color pipes based on whether DCC is enabled and what MCache id is used. black - DCC disabled red - DCC enabled grey - 2 different MCaches used other colors - 1 MCache used Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Leo Zeng <Leo.Zeng@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: Ensure DMCUB idle before reset on DCN31/DCN35Nicholas Kazlauskas
[Why] If we soft reset before halt finishes and there are outstanding memory transactions then the memory interface may produce unexpected results, such as out of order transactions when the firmware next runs. These can manifest as random or unexpected load/store violations. [How] Increase the timeout before soft reset to ensure the DMCUB has quiesced. This is effectively 1s maximum based on experimentation. Use the enable bit check on DCN31 like we're doing on DCN35 and reorder the reset writes to follow the HW programming guide. Ensure we're reading SCRATCH7 instead of SCRATCH8 for the HALT code. No current versions of DMCUB firmware use the SCRATCH8 boot bit to dynamically switch where the HALT code goes to maintain backwards compatibility with PSP. Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: Revert "Increase halt timeout for DMCUB to 1s"Nicholas Kazlauskas
This reverts commit 50f040c53ea9 ("drm/amd/display: Increase halt timeout for DMCUB to 1s") There's two issues here: 1. Each poll is closer to 10us than 1us so it stalls for 15s on PNP. 2. We're reading the wrong scratch register to check for the HALT code. Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: Check NULL connector before it is usedAlex Hung
[Why & How] amdgpu_dm_find_first_crtc_matching_connector can return NULL. It is necessary to the returned connector before passing it drm_atomic_get_new_connector_state which always assumes connector is not NULL. Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: Remove unused struct definitionGeorge Shen
[Why/How] The struct is not and will not be used, as it is no longer relevant nor supported. Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: George Shen <george.shen@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: Skip checking FRL_MODE bit for PCON BW determinationGeorge Shen
[Why/How] Certain PCON will clear the FRL_MODE bit despite supporting the link BW indicated in the other bits. Thus, skip checking the FRL_MODE bit when interpreting the hdmi_encoded_link_bw struct. Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: George Shen <george.shen@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: misc for dio encoder refactorPeichen Huang
[WHY] These are left required changes for dio encoder refactor. [HOW] 1. original logic is separated by config option 2. new link encoder dp enable/disable code for dcn35 3. process fec only for DP 8b10b encoding Reviewed-by: Cruise Hung <cruise.hung@amd.com> Signed-off-by: Peichen Huang <PeiChen.Huang@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: read mso dpcd capsHansen Dsouza
[Why & How] Read if panel support multi-sst links Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Hansen Dsouza <Hansen.Dsouza@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: Fix DMUB reset sequence for DCN401Dillon Varone
[WHY] It should no longer use DMCUB_SOFT_RESET as it can result in the memory request path becoming desynchronized. [HOW] To ensure robustness in the reset sequence: 1) Extend timeout on the "halt" command sent via gpint, and check for controller to enter "wait" as a stronger guarantee that there are no requests to memory still in flight. 2) Remove usage of DMCUB_SOFT_RESET 3) Rely on PSP to reset the controller safely Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Dillon Varone <Dillon.Varone@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: Fix p-state type when p-state is unsupportedDillon Varone
[WHY&HOW] P-state type would remain on previously used when unsupported which causes confusion in logging and visual confirm, so set back to zero when unsupported. Reviewed-by: Aric Cyr <aric.cyr@amd.com> Signed-off-by: Dillon Varone <Dillon.Varone@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: Request HW cursor on DCN3.2 with SubVPAric Cyr
[why] When SubVP is active the HW cursor size is limited to 64x64, and anything larger will force composition which is bad for gaming on DCN3.2 if the game uses a larger cursor. [how] If HW cursor is requested, typically by a fullscreen game, do not enable SubVP so that up to 256x256 cursor sizes are available for DCN3.2. Reviewed-by: Aric Cyr <aric.cyr@amd.com> Signed-off-by: Aric Cyr <Aric.Cyr@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: fix type mismatch in CalculateDynamicMetadataParameters()Vitaliy Shevtsov
There is a type mismatch between what CalculateDynamicMetadataParameters() takes and what is passed to it. Currently this function accepts several args as signed long but it's called with unsigned integers and integer. On some systems where long is 32 bits and one of these unsigned int params is greater than INT_MAX it may cause passing input params as negative values. Fix this by changing these argument types from long to unsigned int and to int respectively. Also this will align the function's definition with similar functions in other dcn* drivers. Found by Linux Verification Center (linuxtesting.org) with Svace. Fixes: 6725a88f88a7 ("drm/amd/display: Add DCN3 DML") Signed-off-by: Vitaliy Shevtsov <v.shevtsov@mt-integration.ru> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdgpu: Avoid HDP flush on JPEG v5.0.1Lijo Lazar
Similar to JPEG v4.0.3, HDP flush shouldn't be performed by JPEG engine. Keep it empty. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdgpu: Initialize RRMT status on JPEG v5.0.1Lijo Lazar
Initialize RRMT enablement status from register. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdgpu: Update SDMA scheduler mask handling to include page queueJesse.zhang@amd.com
This patch updates the SDMA scheduler mask handling to include the page queue if it exists. The scheduler mask is calculated based on the number of SDMA instances and the presence of the page queue. The mask is updated to reflect the state of both the SDMA gfx ring and the page queue. Changes: - Add handling for the SDMA page queue in `amdgpu_debugfs_sdma_sched_mask_set`. - Update scheduler mask calculations to include the page queue. - Modify `amdgpu_debugfs_sdma_sched_mask_get` to return the correct mask value. This change is necessary to verify multiple queues (SDMA gfx queue + page queue) and ensure proper scheduling and state management for SDMA instances. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdgpu: Add offset normalization in VCN v5.0.1Lijo Lazar
VCN v5.0.1 also will need register offset normalization. Reuse the logic from VCN v4.0.3. Also, avoid HDP flush similar to VCN v4.0.3 Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdgpu: Initialize RRMT status on VCN v5.0.1Lijo Lazar
Initialize RRMT status from register. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdgpu: Free CPER entry after committing to ringXiang Liu
Free CPER entry when it's committed to CPER ring to avoid memory leak. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdgpu: fix spelling typos in SIAlexandre Demers
Fix typos Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/radeon: fix spelling typosAlexandre Demers
Found some typos while exploring radeon code. Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdgpu: fix spelling typosAlexandre Demers
Found some typos while exploring amdgpu code. Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amdgpu: Fix parameter annotation in vcn_v5_0_0_is_idleSrinivasan Shanmugam
Update parameter description in the vcn_v5_0_0_is_idle function Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1231: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v5_0_0_is_idle' drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1231: warning: Excess function parameter 'handle' description in 'vcn_v5_0_0_is_idle' Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amdkfd: debugfs hang_hws skip GPU with MESPhilip Yang
debugfs hang_hws is used by GPU reset test with HWS, for MES this crash the kernel with NULL pointer access because dqm->packet_mgr is not setup for MES path. Skip GPU with MES for now, MES hang_hws debugfs interface will be supported later. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amdkfd: Fix pqm_destroy_queue race with GPU resetPhilip Yang
If GPU in reset, destroy_queue return -EIO, pqm_destroy_queue should delete the queue from process_queue_list and free the resource. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amdgpu: Fix parameter annotations for VCN clock gating functionsSrinivasan Shanmugam
The previous references to a non-existent `adev` parameter have been removed & corrected to reflect the use of the `vinst` pointer, which points to the VCN instance structure, in the below files: - vcn_v1_0.c - vcn_v2_0.c - vcn_v3_0.c Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c:624: warning: Function parameter or struct member 'vinst' not described in 'vcn_v1_0_enable_clock_gating' drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c:624: warning: Excess function parameter 'adev' description in 'vcn_v1_0_enable_clock_gating' drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c:376: warning: Function parameter or struct member 'vinst' not described in 'vcn_v2_0_mc_resume' drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c:376: warning: Excess function parameter 'adev' description in 'vcn_v2_0_mc_resume' drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:776: warning: Function parameter or struct member 'vinst' not described in 'vcn_v3_0_disable_clock_gating' drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:776: warning: Excess function parameter 'adev' description in 'vcn_v3_0_disable_clock_gating' drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:776: warning: Excess function parameter 'inst' description in 'vcn_v3_0_disable_clock_gating' drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:965: warning: Function parameter or struct member 'vinst' not described in 'vcn_v3_0_enable_clock_gating' drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:965: warning: Excess function parameter 'adev' description in 'vcn_v3_0_enable_clock_gating' drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:965: warning: Excess function parameter 'inst' description in 'vcn_v3_0_enable_clock_gating' Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amdkfd: Fix mode1 reset crash issuePhilip Yang
If HW scheduler hangs and mode1 reset is used to recover GPU, KFD signal user space to abort the processes. After process abort exit, user queues still use the GPU to access system memory before h/w is reset while KFD cleanup worker free system memory and free VRAM. There is use-after-free race bug that KFD allocate and reuse the freed system memory, and user queue write to the same system memory to corrupt the data structure and cause driver crash. To fix this race, KFD cleanup worker terminate user queues, then flush reset_domain wq to wait for any GPU ongoing reset complete, and then free outstanding BOs. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amdkfd: KFD release_work possible circular lockingPhilip Yang
If waiting for gpu reset done in KFD release_work, thers is WARNING: possible circular locking dependency detected #2 kfd_create_process kfd_process_mutex flush kfd release work #1 kfd release work wait for amdgpu reset work #0 amdgpu_device_gpu_reset kgd2kfd_pre_reset kfd_process_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((work_completion)(&p->release_work)); lock((wq_completion)kfd_process_wq); lock((work_completion)(&p->release_work)); lock((wq_completion)amdgpu-reset-dev); To fix this, KFD create process move flush release work outside kfd_process_mutex. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amdkfd: Remove kfd_process_hw_exception workerPhilip Yang
With GPU reset-domain worker implemented, KFD hw_exception worker is not needed any more, just call amdgpu_amdkfd_gpu_reset directly from kfd_hws_hang. Suggested-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amd/amdgpu: Add support for xgmi_v6_4_1Asad Kamal
Add support for xgmi_v6_4_1 and use it appropriate places Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amdgpu: Add xgmi speed/width related infoLijo Lazar
Add APIs to initialize XGMI speed, width details and get to max bandwidth supported. It is assumed that a device only supports same generation of XGMI links with uniform width. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>