linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2025-06-18	drm/amdkfd: Fix race in GWS queue scheduling	Jay Cornwall
	q->gws is not updated atomically with qpd->mapped_gws_queue. If a runlist is created between pqm_set_gws and update_queue it will contain a queue which uses GWS in a process with no GWS allocated. This will result in a scheduler hang. Use q->properties.is_gws which is changed while holding the DQM lock. Signed-off-by: Jay Cornwall <jay.cornwall@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit b98370220eb3110e82248e3354e16a489a492cfb) Cc: stable@vger.kernel.org
2025-05-16	drm/amdkfd: Support chain runlists of XNACK+/XNACK-	Amber Lin
	If the MEC firmware supports chaining runlists of XNACK+/XNACK- processes, set SQ_CONFIG1 chicken bit and SET_RESOURCES bit 28. When the MEC/HWS supports it, KFD checks the XNACK+/XNACK- processes mix happens or not. If it does, enter over-subscription. Signed-off-by: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11	drm/amdgpu: adjust enforce_isolation handling	Alex Deucher
	Switch from a bool to an enum and allow more options for enforce isolation. There are now 3 modes of operation: - Disabled (0) - Enabled (serialization and cleaner shader) (1) - Enabled in legacy mode (no serialization or cleaner shader) (2) This provides better flexibility for more use cases. Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-19	drm/amdkfd: Fix bug in config_dequeue_wait_counts	Harish Kasiviswanathan
	For certain ASICs where dequeue_wait_count don't need to be initialized, pm_config_dequeue_wait_counts_v9 return without filling in the packet information. However, the calling function interprets this as a success and sends the uninitialized packet to firmware causing hang. Fix the above bug by not calling pm_config_dequeue_wait_counts_v9 for ASICs that don't need the value to be initialized. v2: Removed redudant code. Tidy up code based on review comments v3: Don't call pm_config_dequeue_wait_counts_v9 for certain ASICs Fixes: ed962f8d0603 ("drm/amdkfd: Add pm_config_dequeue_wait_counts API") Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-13	drm/amdgpu: Reduce dequeue retry timeout for gfx9 family	Harish Kasiviswanathan
	Dequeue retry timeout controls the interval between checks for unmet conditions. On MI series, reduce this from 0x40 to 0x1 (~ 1 uS). The cost of additional bandwidth consumed by CP when polling memory shouldn't be substantial. Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-10	drm/amdkfd: Add pm_config_dequeue_wait_counts API	Harish Kasiviswanathan
	Update pm_update_grace_period() to more cleaner pm_config_dequeue_wait_counts(). Previously, grace_period variable was overloaded as a variable and a macro, making it inflexible to configure additional dequeue wait times. pm_config_dequeue_wait_counts() now takes in a cmd / variable. This allows flexibility to update different dequeue wait times. Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-12	drm/amdkfd: Have kfd driver use same PASID values from graphic driver	Xiaogang Chen
	Current kfd driver has its own PASID value for a kfd process and uses it to locate vm at interrupt handler or mapping between kfd process and vm. That design is not working when a physical gpu device has multiple spatial partitions, ex: adev in CPX mode. This patch has kfd driver use same pasid values that graphic driver generated which is per vm per pasid. These pasid values are passed to fw/hardware. We do not need change interrupt handler though more pasid values are used. Also, pasid values at log are replaced by user process pid; pasid values are not exposed to user. Users see their process pids that have meaning in user space. Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-20	drm/amdkfd: Enable processes isolation on gfx9	Amber Lin
	When amdgpu enable enforce_isolation, KFD enables single-process mode in HWS and sets exec_cleaner_shader bit in MAP_PROCESS. Signed-off-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05	drm/amdkfd: Comment out the unused variable use_static in pm_map_queues_v9	Jesse Zhang
	To fix the warning about unused value, remove the use_static and use the parameter is_static directly. Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20	drm/amdgpu: Use function for IP version check	Lijo Lazar
	Use an inline function for version check. Gives more flexibility to handle any format changes. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11	drm/amdkfd: Fix reg offset for setting CWSR grace period	Mukul Joshi
	This patch fixes the case where the code currently passes absolute register address and not the reg offset, which HWS expects, when sending the PM4 packet to set/update CWSR grace period. Additionally, cleanup the signature of build_grace_period_packet_info function as it no longer needs the inst parameter. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-15	drm/amdkfd: Add missing tba_hi programming on aldebaran	Jay Cornwall
	Previously asymptomatic because high 32 bits were zero. Fixes: 96c211f1f9ef ("drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole") Signed-off-by: Jay Cornwall <jay.cornwall@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-12	drm/amdkfd: add kfd2kgd debugger callbacks for GC v9.4.3	Eric Huang
	Implement the similarities as GC v9.4.2, and the difference for GC v9.4.3 HW spec, i.e. xcc instance. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: prepare map process for multi-process debug devices	Jonathan Kim
	Unlike single process debug devices, multi-process debug devices allow debug mode setting per-VMID (non-device-global). Because the HWS manages PASID-VMID mapping, the new MAP_PROCESS API allows the KFD to forward the required SPI debug register write requests. To request a new debug mode setting change, the KFD must be able to preempt all queues then remap all queues with these new setting requests for MAP_PROCESS to take effect. Note that by default, trap enablement in non-debug mode must be disabled for performance reasons for multi-process debug devices due to setup overhead in FW. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdkfd: prepare map process for single process debug devices	Jonathan Kim
	Older HW only supports debugging on a single process because the SPI debug mode setting registers are device global. The HWS has supplied a single pinned VMID (0xf) for MAP_PROCESS for debug purposes. To pin the VMID, the KFD will remove the VMID from the HWS dynamic VMID allocation via SET_RESOUCES so that a debugged process will never migrate away from its pinned VMID. The KFD is responsible for reserving and releasing this pinned VMID accordingly whenever the debugger attaches and detaches respectively. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: add configurable grace period for unmap queues	Jonathan Kim
	The HWS schedule allows a grace period for wave completion prior to preemption for better performance by avoiding CWSR on waves that can potentially complete quickly. The debugger, on the other hand, will want to inspect wave status immediately after it actively triggers preemption (a suspend function to be provided). To minimize latency between preemption and debugger wave inspection, allow immediate preemption by setting the grace period to 0. Note that setting the preepmtion grace period to 0 will result in an infinite grace period being set due to a CP FW bug so set it to 1 for now. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdkfd: Update packet manager for GFX9.4.3	Mukul Joshi
	In GFX 9.4.3, there can be more than 8 SDMA engines. As a result, extended_engine_sel and engine_sel fields in MAP_QUEUES packet need to be updated to allow correct mapping of SDMA queues to these SDMA engines. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdkfd: Introduce kfd_node struct (v5)	Mukul Joshi
	Introduce a new structure, kfd_node, which will now represent a compute node. kfd_node is carved out of kfd_dev structure. kfd_dev struct now will become the parent of kfd_node, and will store common resources such as doorbells, GTT sub-alloctor etc. kfd_node struct will store all resources specific to a compute node, such as device queue manager, interrupt handling etc. This is the first step in adding compute partition support in KFD. v2: introduce kfd_node struct to gc v11 (Hawking) v3: make reference to kfd_dev struct through kfd_node (Morris) v4: use kfd_node instead for kfd isr/mqd functions (Morris) v5: rebase (Alex) Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Tested-by: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Morris Zhang <Shiwu.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-17	drm/amdkfd: Use proper enum in pm_unmap_queues_v9()	Nathan Chancellor
	Clang warns: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_packet_manager_v9.c:267:3: error: implicit conversion from enumeration type 'enum mes_map_queues_extended_engine_sel_enum' to different enumeration type 'enum mes_unmap_queues_extended_engine_sel_enum' [-Werror,-Wenum-conversion] extended_engine_sel__mes_map_queues__sdma0_to_7_sel : ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated. Use 'extended_engine_sel__mes_unmap_queues__sdma0_to_7_sel' to eliminate the warning, which is the same numeric value of the proper type. Fixes: 009e9a158505 ("drm/amdkfd: navi2x requires extended engines to map and unmap sdma queues") Link: https://github.com/ClangBuiltLinux/linux/issues/1596 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-14	drm/amdkfd: navi2x requires extended engines to map and unmap sdma queues	Jonathan Kim
	SDMA 5.2.x queues are required to be mapped and unmapped from the extended engines. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-14	drm/amdkfd: remove unneeded unmap single queue option	Jonathan Kim
	The KFD only unmaps all queues, all dynamics queues or all process queues since RUN_LIST is mapped with all KFD queues. There's no need to provide a single type unmap so remove this option. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-14	drm/amdkfd: update SPDX license header	Rajneesh Bhardwaj
	Update the SPDX License header for all the KFD files. Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-23	drm/amdkfd: add per-vmid-debug map_process_support	Jonathan Kim
	In order to support multi-process debugging, HWS PM4 packet MAP_PROCESS requires an extension of 5 DWORDS to support targeting of per-vmid SPI debug control registers as well as watch points per process. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09	drm/amdkfd: dqm fence memory corruption	Qu Huang
	Amdgpu driver uses 4-byte data type as DQM fence memory, and transmits GPU address of fence memory to microcode through query status PM4 message. However, query status PM4 message definition and microcode processing are all processed according to 8 bytes. Fence memory only allocates 4 bytes of memory, but microcode does write 8 bytes of memory, so there is a memory corruption. Changes since v1: * Change dqm->fence_addr as a u64 pointer to fix this issue, also fix up query_status and amdkfd_fence_wait_timeout function uses 64 bit fence value to make them consistent. Signed-off-by: Qu Huang <jinsdb@126.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-02	drm/amdkfd: Update hardware scheduling time quanta	Joseph Greathouse
	Update PROCESS_QUANTUM, the time the hardware scheduler allows processes to run before switching to other processes when it becomes over-subscribed. Increase this to 10ms, to allow processes to better amortize their task switch times. Update HQD Quantum, the amount of time that an active queue stays attached to the CP before we forcibly switch it for another active queue for fairness. Setting these so that HQD < PROCESS makes it easier to ensure that we get fairness when we have multiple active queues on the device. Otherwise we may start process-swapping before we get to all the queues in a CP. Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28	drm/amdkfd: Enable over-subscription with >1 GWS queue	Joseph Greathouse
	The current GWS usage model will only allows a single GWS-enabled process to be active on the GPU at once. This ensures that a barrier-using kernel gets a known amount of GPU hardware, to prevent deadlock due to inability to go beyond the GWS barrier. The HWS watches how many GWS entries are assigned to each process, and goes into over-subscription mode when two processes need more than the 64 that are available. The current KFD method for working with this is to allocate all 64 GWS entries to each GWS-capable process. When more than one GWS-enabled process is in the runlist, we must make sure the runlist is in over-subscription mode, so that the HWS gets a chained RUN_LIST packet and continues scheduling kernels. Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-19	drm/amdkfd: Rename kfd_kernel_queue_.c to kfd_packet_manager_.c	Yong Zhao
	After the recent cleanup, the functionalities provided by the previous kfd_kernel_queue_*.c are actually all packet manager related. So rename them to reflect that. Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>