Age | Commit message (Collapse) | Author |
|
q->gws is not updated atomically with qpd->mapped_gws_queue. If a
runlist is created between pqm_set_gws and update_queue it will
contain a queue which uses GWS in a process with no GWS allocated.
This will result in a scheduler hang.
Use q->properties.is_gws which is changed while holding the DQM lock.
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit b98370220eb3110e82248e3354e16a489a492cfb)
Cc: stable@vger.kernel.org
|
|
If the MEC firmware supports chaining runlists of XNACK+/XNACK-
processes, set SQ_CONFIG1 chicken bit and SET_RESOURCES bit 28.
When the MEC/HWS supports it, KFD checks the XNACK+/XNACK- processes mix
happens or not. If it does, enter over-subscription.
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Switch from a bool to an enum and allow more options
for enforce isolation. There are now 3 modes of operation:
- Disabled (0)
- Enabled (serialization and cleaner shader) (1)
- Enabled in legacy mode (no serialization or cleaner shader) (2)
This provides better flexibility for more use cases.
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
For certain ASICs where dequeue_wait_count don't need to be initialized,
pm_config_dequeue_wait_counts_v9 return without filling in the packet
information. However, the calling function interprets this as a success
and sends the uninitialized packet to firmware causing hang.
Fix the above bug by not calling pm_config_dequeue_wait_counts_v9 for
ASICs that don't need the value to be initialized.
v2: Removed redudant code.
Tidy up code based on review comments
v3: Don't call pm_config_dequeue_wait_counts_v9 for certain ASICs
Fixes: ed962f8d0603 ("drm/amdkfd: Add pm_config_dequeue_wait_counts API")
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Dequeue retry timeout controls the interval between checks for unmet
conditions. On MI series, reduce this from 0x40 to 0x1 (~ 1 uS). The
cost of additional bandwidth consumed by CP when polling memory
shouldn't be substantial.
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Update pm_update_grace_period() to more cleaner
pm_config_dequeue_wait_counts(). Previously, grace_period variable was
overloaded as a variable and a macro, making it inflexible to configure
additional dequeue wait times.
pm_config_dequeue_wait_counts() now takes in a cmd / variable. This
allows flexibility to update different dequeue wait times.
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Current kfd driver has its own PASID value for a kfd process and uses it to
locate vm at interrupt handler or mapping between kfd process and vm. That
design is not working when a physical gpu device has multiple spatial
partitions, ex: adev in CPX mode. This patch has kfd driver use same pasid
values that graphic driver generated which is per vm per pasid.
These pasid values are passed to fw/hardware. We do not need change interrupt
handler though more pasid values are used. Also, pasid values at log are
replaced by user process pid; pasid values are not exposed to user. Users see
their process pids that have meaning in user space.
Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
When amdgpu enable enforce_isolation, KFD enables single-process mode in
HWS and sets exec_cleaner_shader bit in MAP_PROCESS.
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
To fix the warning about unused value,
remove the use_static and use the parameter is_static directly.
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Use an inline function for version check. Gives more flexibility to
handle any format changes.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This patch fixes the case where the code currently passes
absolute register address and not the reg offset, which HWS
expects, when sending the PM4 packet to set/update CWSR grace
period. Additionally, cleanup the signature of
build_grace_period_packet_info function as it no longer needs
the inst parameter.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Previously asymptomatic because high 32 bits were zero.
Fixes: 96c211f1f9ef ("drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole")
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Implement the similarities as GC v9.4.2, and the difference
for GC v9.4.3 HW spec, i.e. xcc instance.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Unlike single process debug devices, multi-process debug devices allow
debug mode setting per-VMID (non-device-global).
Because the HWS manages PASID-VMID mapping, the new MAP_PROCESS API allows
the KFD to forward the required SPI debug register write requests.
To request a new debug mode setting change, the KFD must be able to
preempt all queues then remap all queues with these new setting
requests for MAP_PROCESS to take effect.
Note that by default, trap enablement in non-debug mode must be disabled
for performance reasons for multi-process debug devices due to setup
overhead in FW.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Older HW only supports debugging on a single process because the
SPI debug mode setting registers are device global.
The HWS has supplied a single pinned VMID (0xf) for MAP_PROCESS
for debug purposes. To pin the VMID, the KFD will remove the VMID from
the HWS dynamic VMID allocation via SET_RESOUCES so that a debugged
process will never migrate away from its pinned VMID.
The KFD is responsible for reserving and releasing this pinned VMID
accordingly whenever the debugger attaches and detaches respectively.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The HWS schedule allows a grace period for wave completion prior to
preemption for better performance by avoiding CWSR on waves that can
potentially complete quickly. The debugger, on the other hand, will
want to inspect wave status immediately after it actively triggers
preemption (a suspend function to be provided).
To minimize latency between preemption and debugger wave inspection, allow
immediate preemption by setting the grace period to 0.
Note that setting the preepmtion grace period to 0 will result in an
infinite grace period being set due to a CP FW bug so set it to 1 for now.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
In GFX 9.4.3, there can be more than 8 SDMA engines.
As a result, extended_engine_sel and engine_sel fields
in MAP_QUEUES packet need to be updated to allow correct
mapping of SDMA queues to these SDMA engines.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Introduce a new structure, kfd_node, which will now represent
a compute node. kfd_node is carved out of kfd_dev structure.
kfd_dev struct now will become the parent of kfd_node, and will
store common resources such as doorbells, GTT sub-alloctor etc.
kfd_node struct will store all resources specific to a compute
node, such as device queue manager, interrupt handling etc.
This is the first step in adding compute partition support in KFD.
v2: introduce kfd_node struct to gc v11 (Hawking)
v3: make reference to kfd_dev struct through kfd_node (Morris)
v4: use kfd_node instead for kfd isr/mqd functions (Morris)
v5: rebase (Alex)
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Tested-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Morris Zhang <Shiwu.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Clang warns:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_packet_manager_v9.c:267:3:
error: implicit conversion from enumeration type 'enum
mes_map_queues_extended_engine_sel_enum' to different enumeration type
'enum mes_unmap_queues_extended_engine_sel_enum'
[-Werror,-Wenum-conversion]
extended_engine_sel__mes_map_queues__sdma0_to_7_sel :
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
Use 'extended_engine_sel__mes_unmap_queues__sdma0_to_7_sel' to eliminate
the warning, which is the same numeric value of the proper type.
Fixes: 009e9a158505 ("drm/amdkfd: navi2x requires extended engines to map and unmap sdma queues")
Link: https://github.com/ClangBuiltLinux/linux/issues/1596
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
SDMA 5.2.x queues are required to be mapped and unmapped from the extended
engines.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The KFD only unmaps all queues, all dynamics queues or all process queues
since RUN_LIST is mapped with all KFD queues.
There's no need to provide a single type unmap so remove this option.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Update the SPDX License header for all the KFD files.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
In order to support multi-process debugging, HWS PM4 packet
MAP_PROCESS requires an extension of 5 DWORDS to support targeting of
per-vmid SPI debug control registers as well as watch points per process.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Amdgpu driver uses 4-byte data type as DQM fence memory,
and transmits GPU address of fence memory to microcode
through query status PM4 message. However, query status
PM4 message definition and microcode processing are all
processed according to 8 bytes. Fence memory only allocates
4 bytes of memory, but microcode does write 8 bytes of memory,
so there is a memory corruption.
Changes since v1:
* Change dqm->fence_addr as a u64 pointer to fix this issue,
also fix up query_status and amdkfd_fence_wait_timeout function
uses 64 bit fence value to make them consistent.
Signed-off-by: Qu Huang <jinsdb@126.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Update PROCESS_QUANTUM, the time the hardware scheduler allows
processes to run before switching to other processes when it becomes
over-subscribed. Increase this to 10ms, to allow processes to better
amortize their task switch times.
Update HQD Quantum, the amount of time that an active queue stays
attached to the CP before we forcibly switch it for another active
queue for fairness.
Setting these so that HQD < PROCESS makes it easier to ensure that
we get fairness when we have multiple active queues on the device.
Otherwise we may start process-swapping before we get to all the
queues in a CP.
Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The current GWS usage model will only allows a single GWS-enabled
process to be active on the GPU at once. This ensures that a
barrier-using kernel gets a known amount of GPU hardware, to
prevent deadlock due to inability to go beyond the GWS barrier.
The HWS watches how many GWS entries are assigned to each process,
and goes into over-subscription mode when two processes need more
than the 64 that are available. The current KFD method for working
with this is to allocate all 64 GWS entries to each GWS-capable
process.
When more than one GWS-enabled process is in the runlist, we must
make sure the runlist is in over-subscription mode, so that the
HWS gets a chained RUN_LIST packet and continues scheduling
kernels.
Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
After the recent cleanup, the functionalities provided by the previous
kfd_kernel_queue_*.c are actually all packet manager related. So rename
them to reflect that.
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|