summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-02-25drm/amd/display: restore edid reading from a given i2c adapterMelissa Wen
When switching to drm_edid, we slightly changed how to get edid by removing the possibility of getting them from dc_link when in aux transaction mode. As MST doesn't initialize the connector with `drm_connector_init_with_ddc()`, restore the original behavior to avoid functional changes. v2: - Fix build warning of unchecked dereference (kernel test bot) CC: Alex Hung <alex.hung@amd.com> CC: Mario Limonciello <mario.limonciello@amd.com> CC: Roman Li <Roman.Li@amd.com> CC: Aurabindo Pillai <Aurabindo.Pillai@amd.com> Fixes: 48edb2a4256e ("drm/amd/display: switch amdgpu_dm_connector to use struct drm_edid") Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25drm/amdgpu: Remove unused nbif_v6_3_1_sriov_funcsDr. David Alan Gilbert
The nbif_v6_3_1_sriov_funcs instance of amdgpu_nbio_funcs was added in commit 894c6d3522d1 ("drm/amdgpu: Add nbif v6_3_1 ip block support") but has remained unused. Alex has confirmed it wasn't needed. Remove it, together with the four unused stub functions: nbif_v6_3_1_sriov_ih_doorbell_range nbif_v6_3_1_sriov_gc_doorbell_init nbif_v6_3_1_sriov_vcn_doorbell_range nbif_v6_3_1_sriov_sdma_doorbell_range Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25mailmap: Add entry for Rodrigo SiqueiraRodrigo Siqueira
Map all of my previously used email addresses to my @igalia.com address. Acked-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25drm/amdgpu: Add ring reset callback for JPEG5_0_1Sathishkumar S
Add ring reset function callback for JPEG5_0_1 to recover from job timeouts without a full gpu reset. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25MAINTAINERS: Change my role from Maintainer to ReviewerRodrigo Siqueira
Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25drm/amdgpu: Log after a successful ring resetAndré Almeida
When a ring reset happens, the kernel log shows only "amdgpu: Starting <ring name> ring reset", but when it finishes nothing appears in the log. Explicitly write in the log that the reset has finished correctly. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25drm/amdgpu: Log the creation of a coredump fileAndré Almeida
After a GPU reset happens, the driver creates a coredump file. However, the user might not be aware of it. Log the file creation the user can find more information about the device and add the file to bug reports. This is similar to what the xe driver does. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25drm/amdgpu/mes: keep enforce isolation up to dateAlex Deucher
Re-send the mes message on resume to make sure the mes state is up to date. Fixes: 8521e3c5f058 ("drm/amd/amdgpu: limit single process inside MES") Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Shaoyun Liu <shaoyun.liu@amd.com> Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25drm/amd/pm: Use separate metrics table for smu_v13_0_12Asad Kamal
Use separate metrics table for smu_v13_0_12 and fetch metrics data using that. v2: Fix jpeg busy indexing (Lijo) Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25drm/amdgpu: Add core reset registers for JPEG5_0_1Sathishkumar S
Add core reset control register definitions and align all prior register definitions to end at 100 column length for uniformity. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25drm/amdgpu: Per-instance init func for JPEG5_0_1Sathishkumar S
Add helper functions to handle per-instance and per-core initialization and deinitialization in JPEG5_0_1. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25drm/amd/display: fix an indent issue in DML21Aurabindo Pillai
Remove extraneous tab and newline in dml2_core_dcn4.c that was reported by the bot Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202502211920.txUfwtSj-lkp@intel.com/ Fixes: 70839da6360 ("drm/amd/display: Add new DCN401 sources") Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25MAINTAINERS: update amdgpu maintainers listAlex Deucher
Xinhui's email is no longer valid. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25drm/amdgpu: disable BAR resize on Dell G5 SEAlex Deucher
There was a quirk added to add a workaround for a Sapphire RX 5600 XT Pulse that didn't allow BAR resizing. However, the quirk caused a regression with runtime pm on Dell laptops using those chips, rather than narrowing the scope of the resizing quirk, add a quirk to prevent amdgpu from resizing the BAR on those Dell platforms unless runtime pm is disabled. v2: update commit message, add runpm check Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1707 Fixes: 907830b0fc9e ("PCI: Add a REBAR size quirk for Sapphire RX 5600 XT Pulse") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25drm/amd/pm: Fetch fru product info for smu_v13_0_12Asad Kamal
Fetch fru product info for smu_v13_0_12 from static metrics table v2: Field by field copy for fru info(Lijo) Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25drm/amd/pm: Fetch static metrics tableAsad Kamal
Fetch clock frequency table from static metrics table for smu_v13_0_12 v2: Move PPTable definition, remove unnecessary checks for getting static metrics table(Lijo) Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25drm/amd/pm: Add GetStaticMetricTable messageAsad Kamal
Add GetStaticMetricTable message for smu_v13_0_12 Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25drm/amd/pm: Update pmfw headers for smu_v13_0_12Asad Kamal
Update pmfw headers for smu_v13_0_12 new messages & metrics table. Static metrics table for frequency added, Separate metrics table for smu_v13_0_12 added. Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25drm/amdgpu: Update amdgpu_job_timedout to check if the ring is guiltyJesse.zhang@amd.com
This patch updates the `amdgpu_job_timedout` function to check if the ring is actually guilty of causing the timeout. If not, it skips error handling and fence completion. v2: move the is_guilty check down into the queue reset area (Alex) v3: need to call is_guilty before reset (Alex) v4: squash in is_guilty logic fixes (Alex) Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25drm/amd/pm: add support for checking SDMA reset capabilityJesse.zhang@amd.com
This patch introduces a new function to check if the SMU supports resetting the SDMA engine. This capability check ensures that the driver does not attempt to reset the SDMA engine on hardware that does not support it. The following changes are included: - New function `amdgpu_dpm_reset_sdma_is_supported` to check SDMA reset support at the AMDGPU driver level. - New function `smu_reset_sdma_is_supported` to check SDMA reset support at the SMU level. - Implementation of `smu_v13_0_6_reset_sdma_is_supported` for the specific SMU version v13.0.6. - Updated `smu_v13_0_6_reset_sdma` to use the new capability check before attempting to reset the SDMA engine. v2: change smu_reset_sdma_is_supported type to bool (Tim) Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25drm/amdgpu: Add reset function pointer for SDMA v4.4.2 page ringJesse.zhang@amd.com
This patch adds a reset function pointer to the SDMA v4.4.2 page ring functionality. The new function pointer `reset` is set to `sdma_v4_4_2_reset_queue`, which is responsible for resetting the SDMA queue. Changes: - Add `reset` function pointer to `sdma_v4_4_2_page_ring_funcs`. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25drm/amdgpu: Improve SDMA reset logic with guilty queue trackingJesse.zhang@amd.com
This patch includes the remaining improvements to the SDMA reset logic: - Added `gfx_guilty` and `page_guilty` flags to track guilty queues. - Updated the reset and resume functions to handle the guilty state. - Cached the `rptr` before reset. v2: 1.replace the caller with a guilty bool. If the queue is the guilty one, set the rptr and wptr to the saved wptr value, else, set the rptr and wptr to the saved rptr value. (Alex) 2. cache the rptr before the reset. (Alex) v3: Keeping intermediate variables like u64 rwptr simplifies resotre rptr/wptr.(Lijo) Suggested-by: Alex Deucher <alexander.deucher@amd.com> Suggested-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25drm/amdgpu/sdma: Introduce is_guilty callbacks for sdma GFX and PAGE ringsJesse.zhang@amd.com
This patch introduces the `is_guilty` callbacks for the GFX and PAGE rings. These callbacks check if a ring is guilty of causing a timeout or error. Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25drm/amdgpu: Introduce cached_rptr and is_guilty callback in amdgpu_ringJesse.zhang@amd.com
This patch introduces the following changes: - Add `cached_rptr` to the `amdgpu_ring` structure to store the read pointer before a reset. - Add `is_guilty` callback to the `amdgpu_ring_funcs` structure to check if a ring is guilty of causing a timeout. Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25drm/amdgpu: Introduce conditional user queue suspension for SDMA resetsJesse.zhang@amd.com
- Modify the `amdgpu_sdma_reset_engine` function to accept a `suspend_user_queues` parameter. - This parameter allows the function to conditionally suspend and resume user queues during SDMA resets. - Ensure that user queues are suspended only when necessary to avoid unnecessary overhead and potential deadlocks. - Restart the scheduler's work queue for the GFX and page rings after the reset to allow new tasks to be submitted. This change improves synchronization between the KGD and the KFD during SDMA resets, ensuring proper handling of user queues and avoiding race conditions. V2: replace the ring_lock with the existed the scheduler locks for the queues (ring->sched) on the sdma engine.(Alex) v3: call drm_sched_wqueue_stop() rather than job_list_lock. If a GPU ring reset was already initiated for one ring at amdgpu_job_timedout, skip resetting that ring and call drm_sched_wqueue_stop() for the other rings (Alex) replace the common lock (sdma_reset_lock) with DQM lock to to resolve reset races between the two driver sections during KFD eviction.(Jon) Rename the caller to Reset_src and Change AMDGPU_RESET_SRC_SDMA_KGD/KFD to AMDGPU_RESET_SRC_SDMA_HWS/RING (Jon) v4: restart the wqueue if the reset was successful, or fall back to a full adapter reset. (Alex) move definition of reset source to enumeration AMDGPU_RESET_SRCS, and check reset src in amdgpu_sdma_reset_instance (Jon) v5: Call amdgpu_amdkfd_suspend/resume at the start/end of reset function respectively under !SRC_HWS conditions only (Jon) v6: replace the paramter src with a bool suspend_user_queues, remove the paramter src in pre/post func. (Jon) Suggested-by: Alex Deucher <alexander.deucher@amd.com> Suggested-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Suggested-by: Jonathan Kim <Jonathan.Kim@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Acked-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25drm/amdgpu: Remove redundant logic in GC v9.4.3Lijo Lazar
GFXOFF check is not needed for GC v9.4.3. Also, save/restore list is available by default. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25drm/amdgpu: Do not poweroff UVDJ in JPEG4_0_3Sathishkumar S
Update power gate setting to not poweroff UVDJ in JPEG4_0_3. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25Documentation/gpu: Add acronyms for some firmware componentsRodrigo Siqueira
Users can check the file "/sys/kernel/debug/dri/0/amdgpu_firmware_info" to get information on the firmware loaded in the system. This file has multiple acronyms that are not documented in the glossary. This commit introduces some missing acronyms to the AMD glossary documentation. The meaning of each acronym in this commit was extracted from code documentation available in the following files: - drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c - drivers/gpu/drm/amd/include/amd_shared.h Changes since v1: - Expand acronym meanings based on Alex Deucher suggestions. Cc: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25drm/amdgpu/sdma: Refactor SDMA reset functionality and add callback supportJesse.zhang@amd.com
This patch refactors the SDMA reset functionality in the `sdma_v4_4_2` driver to improve modularity and support shared usage between AMDGPU and KFD. The changes include: 1. **Refactored SDMA Reset Logic**: - Split the `sdma_v4_4_2_reset_queue` function into two separate functions: - `sdma_v4_4_2_stop_queue`: Stops the SDMA queue before reset. - `sdma_v4_4_2_restore_queue`: Restores the SDMA queue after reset. - These functions are now used as callbacks for the shared reset mechanism. 2. **Added Callback Support**: - Introduced a new structure `sdma_v4_4_2_reset_funcs` to hold the stop and restore callbacks. - Added `sdma_v4_4_2_set_reset_funcs` to register these callbacks with the shared reset mechanism using `amdgpu_set_on_reset_callbacks`. 3. **Fixed Reset Queue Function**: - Modified `sdma_v4_4_2_reset_queue` to use the shared `amdgpu_sdma_reset_queue` function, ensuring consistency across the driver. This patch ensures that SDMA reset functionality is more modular, reusable, and aligned with the shared reset mechanism between AMDGPU and KFD. v2: Renamed sdma_v4_4_2_set_reset_funcs to sdma_v4_4_2_set_engine_reset_funcs. Renamed sdma_v4_4_2_reset_funcs to sdma_v4_4_2_engine_reset_funcs.(Alex) Suggested-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25drm/amdgpu/kfd: Add shared SDMA reset functionality with callback supportJesse.zhang@amd.com
This patch introduces shared SDMA reset functionality between AMDGPU and KFD. The implementation includes the following key changes: 1. Added `amdgpu_sdma_reset_queue`: - Resets a specific SDMA queue by instance ID. - Invokes registered pre-reset and post-reset callbacks to allow KFD and AMDGPU to save/restore their state during the reset process. 2. Added `amdgpu_set_on_reset_callbacks`: - Allows KFD and AMDGPU to register callback functions for pre-reset and post-reset operations. - Callbacks are stored in a global linked list and invoked in the correct order during SDMA reset. This patch ensures that both AMDGPU and KFD can handle SDMA reset events gracefully, with proper state saving and restoration. It also provides a flexible callback mechanism for future extensions. v2: fix CamelCase and put the SDMA helper into amdgpu_sdma.c (Alex) v3: rename the `amdgpu_register_on_reset_callbacks` function to `amdgpu_sdma_register_on_reset_callbacks` move global reset_callback_list to struct amdgpu_sdma (Alex) v4: Update the reset callback function description and rename the reset function to amdgpu_sdma_reset_engine (Alex) Suggested-by: Alex Deucher <alexander.deucher@amd.com> Suggested-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25drm/amdgpu: correct the name of mes_pipe structureLikun Gao
Correct the structure name admgpu_mes_pipe to amdgpu_mes_pipe. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25drm/amdkfd: Preserve cp_hqd_pq_control on update_mqdDavid Yat Sin
When userspace applications call AMDKFD_IOC_UPDATE_QUEUE. Preserve bitfields that do not need to be modified as they contain flags to track queue states that are used by CP FW. Signed-off-by: David Yat Sin <David.YatSin@amd.com> Reviewed-by: Jay Cornwall <jay.cornwall@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25amdgpu/pm/legacy: fix suspend/resume issueschr[]
resume and irq handler happily races in set_power_state() * amdgpu_legacy_dpm_compute_clocks() needs lock * protect irq work handler * fix dpm_enabled usage v2: fix clang build, integrate Lijo's comments (Alex) Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2524 Fixes: 3712e7a49459 ("drm/amd/pm: unified lock protections in amdgpu_dpm.c") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Tested-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> # on Oland PRO Signed-off-by: chr[] <chris@rudorff.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25drm/amdgpu: update the handle ptr in is_idleSunil Khatri
Update the *handle to amdgpu_ip_block ptr for all functions pointers of is_idle. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-21drm/amdgpu: remove all KFD fences from the BO on releaseChristian König
Remove all KFD BOs from the private dma_resv object. This prevents the KFD from being evict unecessarily when an exported BO is released. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-and-tested-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amdgpu: update the handle ptr in get_clockgating_stateSunil Khatri
Update the *handle to amdgpu_ip_block ptr for all functions pointers of get_clockgating_state. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amd/display: Add clear DCC and Tiling callback for DCERodrigo Siqueira
Introduce the DCC and Tiling reset callback to all DCE versions that can call it. Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amdkfd: Fix error handling for missing PASID in 'kfd_process_device_init_vm'Srinivasan Shanmugam
In the kfd_process_device_init_vm function, a valid error code is now returned when the associated Process Address Space ID (PASID) is not present. If the address space virtual memory (avm) does not have an associated PASID, the function sets the ret variable to -EINVAL before proceeding to the error handling section. This ensures that the calling function, such as kfd_ioctl_acquire_vm, can appropriately handle the error, thereby preventing any issues during virtual memory initialization. Fixes the below: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c:1694 kfd_process_device_init_vm() warn: missing error code 'ret' drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c 1647 int kfd_process_device_init_vm(struct kfd_process_device *pdd, 1648 struct file *drm_file) 1649 { ... 1690 1691 if (unlikely(!avm->pasid)) { 1692 dev_warn(pdd->dev->adev->dev, "WARN: vm %p has no pasid associated", 1693 avm); --> 1694 goto err_get_pasid; ret = -EINVAL? 1695 } Fixes: 8544374c0f82 ("drm/amdkfd: Have kfd driver use same PASID values from graphic driver") Reported by: Dan Carpenter <dan.carpenter@linaro.org> Cc: Xiaogang Chen <xiaogang.chen@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amdgpu: Remove redundant check of adevXiang Liu
There is no need to check adev for sure. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amdgpu: Check aca enabled inside cper init/fini funcXiang Liu
Move code about checking aca enabled to the cper init/fini function to make code clean. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amdgpu: Use firmware supported NPS modesLijo Lazar
If firmware supported NPS modes are available through CAP register, use those values for supported NPS modes. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amd/pm: Fetch current power limit from PMFWLijo Lazar
On SMU v13.0.12, always query the firmware to get the current power limit as it could be updated through other means also. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amdgpu: Add ring reset callback for JPEG4_0_3Sathishkumar S
Add ring reset function callback for JPEG4_0_3 to recover from job timeouts without a full gpu reset. V2: - sched->ready flag shouldn't be modified by HW backend (Christian) V3: - Dont modifying sched/job-submission state from HW backend (Christian) - Implement per-core reset sequence V4: - Dont create reset_mask sysfs and return -EOPNOTSUPP on VFs (Lijo) Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amdgpu: Add JPEG4_0_3 core reset control regSathishkumar S
Add core reset control registers for JPEG4_0_3 Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amdgpu: Replace Mutex with Spinlock for RLCG register access to avoid ↵Srinivasan Shanmugam
Priority Inversion in SRIOV RLCG Register Access is a way for virtual functions to safely access GPU registers in a virtualized environment., including TLB flushes and register reads. When multiple threads or VFs try to access the same registers simultaneously, it can lead to race conditions. By using the RLCG interface, the driver can serialize access to the registers. This means that only one thread can access the registers at a time, preventing conflicts and ensuring that operations are performed correctly. Additionally, when a low-priority task holds a mutex that a high-priority task needs, ie., If a thread holding a spinlock tries to acquire a mutex, it can lead to priority inversion. register access in amdgpu_virt_rlcg_reg_rw especially in a fast code path is critical. The call stack shows that the function amdgpu_virt_rlcg_reg_rw is being called, which attempts to acquire the mutex. This function is invoked from amdgpu_sriov_wreg, which in turn is called from gmc_v11_0_flush_gpu_tlb. The [ BUG: Invalid wait context ] indicates that a thread is trying to acquire a mutex while it is in a context that does not allow it to sleep (like holding a spinlock). Fixes the below: [ 253.013423] ============================= [ 253.013434] [ BUG: Invalid wait context ] [ 253.013446] 6.12.0-amdstaging-drm-next-lol-050225 #14 Tainted: G U OE [ 253.013464] ----------------------------- [ 253.013475] kworker/0:1/10 is trying to lock: [ 253.013487] ffff9f30542e3cf8 (&adev->virt.rlcg_reg_lock){+.+.}-{3:3}, at: amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.013815] other info that might help us debug this: [ 253.013827] context-{4:4} [ 253.013835] 3 locks held by kworker/0:1/10: [ 253.013847] #0: ffff9f3040050f58 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680 [ 253.013877] #1: ffffb789c008be40 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680 [ 253.013905] #2: ffff9f3054281838 (&adev->gmc.invalidate_lock){+.+.}-{2:2}, at: gmc_v11_0_flush_gpu_tlb+0x198/0x4f0 [amdgpu] [ 253.014154] stack backtrace: [ 253.014164] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Tainted: G U OE 6.12.0-amdstaging-drm-next-lol-050225 #14 [ 253.014189] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [ 253.014203] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/18/2024 [ 253.014224] Workqueue: events work_for_cpu_fn [ 253.014241] Call Trace: [ 253.014250] <TASK> [ 253.014260] dump_stack_lvl+0x9b/0xf0 [ 253.014275] dump_stack+0x10/0x20 [ 253.014287] __lock_acquire+0xa47/0x2810 [ 253.014303] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.014321] lock_acquire+0xd1/0x300 [ 253.014333] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.014562] ? __lock_acquire+0xa6b/0x2810 [ 253.014578] __mutex_lock+0x85/0xe20 [ 253.014591] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.014782] ? sched_clock_noinstr+0x9/0x10 [ 253.014795] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.014808] ? local_clock_noinstr+0xe/0xc0 [ 253.014822] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.015012] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.015029] mutex_lock_nested+0x1b/0x30 [ 253.015044] ? mutex_lock_nested+0x1b/0x30 [ 253.015057] amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.015249] amdgpu_sriov_wreg+0xc5/0xd0 [amdgpu] [ 253.015435] gmc_v11_0_flush_gpu_tlb+0x44b/0x4f0 [amdgpu] [ 253.015667] gfx_v11_0_hw_init+0x499/0x29c0 [amdgpu] [ 253.015901] ? __pfx_smu_v13_0_update_pcie_parameters+0x10/0x10 [amdgpu] [ 253.016159] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.016173] ? smu_hw_init+0x18d/0x300 [amdgpu] [ 253.016403] amdgpu_device_init+0x29ad/0x36a0 [amdgpu] [ 253.016614] amdgpu_driver_load_kms+0x1a/0xc0 [amdgpu] [ 253.017057] amdgpu_pci_probe+0x1c2/0x660 [amdgpu] [ 253.017493] local_pci_probe+0x4b/0xb0 [ 253.017746] work_for_cpu_fn+0x1a/0x30 [ 253.017995] process_one_work+0x21e/0x680 [ 253.018248] worker_thread+0x190/0x330 [ 253.018500] ? __pfx_worker_thread+0x10/0x10 [ 253.018746] kthread+0xe7/0x120 [ 253.018988] ? __pfx_kthread+0x10/0x10 [ 253.019231] ret_from_fork+0x3c/0x60 [ 253.019468] ? __pfx_kthread+0x10/0x10 [ 253.019701] ret_from_fork_asm+0x1a/0x30 [ 253.019939] </TASK> v2: s/spin_trylock/spin_lock_irqsave to be safe (Christian). Fixes: e864180ee49b ("drm/amdgpu: Add lock around VF RLCG interface") Cc: lin cao <lin.cao@amd.com> Cc: Jingwen Chen <Jingwen.Chen2@amd.com> Cc: Victor Skvortsov <victor.skvortsov@amd.com> Cc: Zhigang Luo <zhigang.luo@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amd/display: 3.2.321Taimur Hassan
Summary: * Add support for disconnected eDP streams * Add log for MALL entry on DCN32x * Add DCC/Tiling reset helper for DCN and DCE * Guard against setting dispclk low when active * Other minor fixes Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amd/display: Add support for disconnected eDP streamsHarry VanZyllDeJong
[Why] eDP may not be connected to the GPU on driver start causing fail enumeration. [How] Move the virtual signal type check before the eDP connector signal check. Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Harry VanZyllDeJong <hvanzyll@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amd/display: dpia should avoid encoder used by dp2Peichen Huang
[WHY] In current HPO DP2 implementation, driver would enable/disable DIG encoder when configuring HPO DP2. Therefore, usb4 dp tunnelling should not use the DIG encoder if the corresponded phy is used by a HPO DP2 stream. [HOW] A DP2 stream is treated as a dig stream. Reviewed-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com> Signed-off-by: Peichen Huang <PeiChen.Huang@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amd/display: Guard against setting dispclk low when activeNicholas Kazlauskas
[Why] We should never apply a minimum dispclk value while in prepare_bandwidth or while displays are active. This is always an optimization for when all displays are disabled. [How] Defer dispclk optimization until safe_to_lower = true and display_count reaches 0. Since 0 has a special value in this logic (ie. no dispclk required) we also need adjust the logic that clamps it for the actual request to PMFW. Reviewed-by: Gabe Teeger <gabe.teeger@amd.com> Reviewed-by: Leo Chen <leo.chen@amd.com> Reviewed-by: Syed Hassan <syed.hassan@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amd/display: Fix BT2020 YCbCr limited/full range inputIlya Bakoulin
[Why] BT2020 YCbCr input is not handled properly when full range quantization is used and limited range is not supported at all. [How] - Add enums for BT2020 YCbCr limited/full range - Add limited range CSC matrix Reviewed-by: Krunoslav Kovac <krunoslav.kovac@amd.com> Signed-off-by: Ilya Bakoulin <Ilya.Bakoulin@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Robert Mader <robert.mader@collabora.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>