summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdkfd
AgeCommit message (Collapse)Author
2022-09-19drm/amdkfd: Fix spelling mistake "detroyed" -> "destroyed"Colin Ian King
There is a spelling mistake in a pr_debug message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-19drm/amdkfd: Use the consolidated MQD manager functions for GFX11Shiwu Zhang
To remove duplication for GFX11 as well, use the common MQD manager functions defined in kfd_mqd_manager.c for all version of managers Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-13drm/amdkfd: Migrate in CPU page fault use current mmPhilip Yang
migrate_vma_setup shows below warning because we don't hold another process mm mmap_lock. We should use current vmf->vma->vm_mm instead, the caller already hold current mmap lock inside CPU page fault handler. WARNING: CPU: 10 PID: 3054 at include/linux/mmap_lock.h:155 find_vma Call Trace: walk_page_range+0x76/0x150 migrate_vma_setup+0x18a/0x640 svm_migrate_vram_to_ram+0x245/0xa10 [amdgpu] svm_migrate_to_ram+0x36f/0x470 [amdgpu] do_swap_page+0xcfe/0xec0 __handle_mm_fault+0x96b/0x15e0 handle_mm_fault+0x13f/0x3e0 do_user_addr_fault+0x1e7/0x690 Fixes: e1f84eef313f ("drm/amdkfd: handle CPU fault on COW mapping") Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-13drm/amdkfd: Remove prefault before migrating to VRAMPhilip Yang
Prefaulting potentially allocates system memory pages before a migration. This adds unnecessary overhead. Instead we can skip unallocated pages in the migration and just point migrate->dst to a 0-initialized VRAM page directly. Then the VRAM page will be inserted to the PTE. A subsequent CPU page fault will migrate the page back to system memory. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-13drm/amdkfd: handle CPU fault on COW mappingPhilip Yang
If CPU page fault in a page with zone_device_data svm_bo from another process, that means it is COW mapping in the child process and the range is migrated to VRAM by parent process. Migrate the parent process range back to system memory to recover the CPU page fault. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-13amd/amdkfd: fix repeated words in commentswangjianli
Delete the redundant word 'to'. Signed-off-by: wangjianli <wangjianli@cdjrlc.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-13drm/amdkfd: Fix CRIU restore op due to doorbell offsetRajneesh Bhardwaj
Recently introduced change to allocate doorbells only when the first queue is created or mapped for CPU / GPU access, did not consider Checkpoint Restore scenario completely. This fix allows the CRIU restore operation by extending the doorbell optimization to CRIU restore scenario. Fixes: 16f0013157bf ("drm/amdkfd: Allocate doorbells only when needed") Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-12Merge tag 'amd-drm-next-6.1-2022-09-08' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.1-2022-09-08: amdgpu: - Mode2 reset for RDNA2 - Lots of new DC documentation - Add documentation about different asic families - DSC improvements - Aldebaran fixes - Misc spelling and grammar fixes - GFXOFF stats support for vangogh - DC frame size fixes - NBIO 7.7 updates - DCN 3.2 updates - DCN 3.1.4 Updates - SMU 13.x updates - Misc bug fixes - Rework DC register offset handling - GC 11.x updates - PSP 13.x updates - SDMA 6.x updates - GMC 11.x updates - SR-IOV updates - PSP fixes for TA unloading - DSC passthrough support - Misc code cleanups amdkfd: - ISA fixes for some GC 10.3 IPs - Misc code cleanups radeon: - Delayed work flush fix - Use time_after for some jiffies calculations drm: - DSC passthrough aux support Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220908155202.57862-1-alexander.deucher@amd.com
2022-08-30drm/amdkfd: Added GFX 11.0.3 SupportDavid Belanger
Added missing cases for GFX 11.0.3 code in a few switch statements. Signed-off-by: David Belanger <david.belanger@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-29drm/amdkfd: remove redundant variables err and retJinpeng Cui
Return value from kfd_wait_on_events() and io_remap_pfn_range() directly instead of taking this in another redundant variable. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Jinpeng Cui <cui.jinpeng2@zte.com.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-25drm/amdkfd: Fix isa version for the GC 10.3.7Prike Liang
Correct the isa version for handling KFD test. Fixes: 7c4f4f197e0c ("drm/amdkfd: Add GC 10.3.6 and 10.3.7 KFD definitions") Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-25drm/amdkfd: Allocate doorbells only when neededFelix Kuehling
Only allocate doorbells when the first queue is created on a GPU or the doorbells need to be mapped into CPU or GPU virtual address space. This avoids allocating doorbells unnecessarily and can allow more processes to use KFD on multi-GPU systems. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Kent Russell <kent.Russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-19Merge tag 'amd-drm-fixes-6.0-2022-08-17' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.0-2022-08-17: amdgpu: - Revert some DML stack changes - Rounding fixes in KFD allocations - atombios vram info table parsing fix - DCN 3.1.4 fixes - Clockgating fixes for various new IPs - SMU 13.0.4 fixes - DCN 3.1.4 FP fixes - TMDS fixes for YCbCr420 4k modes - DCN 3.2.x fixes - USB 4 fixes - SMU 13.0 fixes - SMU driver unload memory leak fixes - Display orientation fix - Regression fix for generic fbdev conversion - SDMA 6.x fixes - SR-IOV fixes - IH 6.x fixes - Use after free fix in bo list handling - Revert pipe1 support - XGMI hive reset fix amdkfd: - Fix potential crach in kfd_create_indirect_link_prop() Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220818025206.6463-1-alexander.deucher@amd.com
2022-08-16drm/amdkfd: potential crash in kfd_create_indirect_link_prop()Dan Carpenter
This code has two bugs. If kfd_topology_device_by_proximity_domain() failed on the first iteration through the loop then "cpu_link" is uninitialized and should not be dereferenced. The second bug is that we cannot dereference a list iterator when it points to the list head. In other words, if we exit the list_for_each_entry() loop exits without hitting a break then "cpu_link" is not a valid pointer and should not be dereferenced. Fix both of these problems by setting "cpu_link" to NULL when it is invalid and non-NULL when it is valid. That makes it easier to test for valid vs invalid. Fixes: 0f28cca87e9a ("drm/amdkfd: Extend KFD device topology to surface peer-to-peer links") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-16drm/amdkfd: reserve 2 queues for sdma 6.0.1 in bitmapYifan Zhang
There is only one engine in sdma 6.0.1, the total number of reserved queues should be 2, reflect this number in bitmap as well. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-16drm/amdkfd: Fix mm reference in SVM eviction workerFelix Kuehling
Use the mm reference from the fence. This allows removing the svm_bo->svms pointer, which was problematic because we cannot assume that the struct kfd_process containing the svms is still allocated without holding a refcount on the process. Use mmget_not_zero to ensure the mm is still valid, and drop the svm_bo reference if it isn't. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-10drm/amdkfd: Handle restart of kfd_ioctl_wait_eventsFelix Kuehling
When kfd_ioctl_wait_events needs to restart due to a signal, we need to update the timeout to account for the time already elapsed. We also need to undo auto_reset of events that have signaled already, so that the restarted ioctl will be able to count those signals again. This fixes infinite hangs when kfd_ioctl_wait_events is interrupted by a signal. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-and-tested-by: Xiaogang Chen <Xiaogang.Chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-05Merge tag 'mm-stable-2022-08-03' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Most of the MM queue. A few things are still pending. Liam's maple tree rework didn't make it. This has resulted in a few other minor patch series being held over for next time. Multi-gen LRU still isn't merged as we were waiting for mapletree to stabilize. The current plan is to merge MGLRU into -mm soon and to later reintroduce mapletree, with a view to hopefully getting both into 6.1-rc1. Summary: - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe Lin, Yang Shi, Anshuman Khandual and Mike Rapoport - Some kmemleak fixes from Patrick Wang and Waiman Long - DAMON updates from SeongJae Park - memcg debug/visibility work from Roman Gushchin - vmalloc speedup from Uladzislau Rezki - more folio conversion work from Matthew Wilcox - enhancements for coherent device memory mapping from Alex Sierra - addition of shared pages tracking and CoW support for fsdax, from Shiyang Ruan - hugetlb optimizations from Mike Kravetz - Mel Gorman has contributed some pagealloc changes to improve latency and realtime behaviour. - mprotect soft-dirty checking has been improved by Peter Xu - Many other singleton patches all over the place" [ XFS merge from hell as per Darrick Wong in https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ] * tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits) tools/testing/selftests/vm/hmm-tests.c: fix build mm: Kconfig: fix typo mm: memory-failure: convert to pr_fmt() mm: use is_zone_movable_page() helper hugetlbfs: fix inaccurate comment in hugetlbfs_statfs() hugetlbfs: cleanup some comments in inode.c hugetlbfs: remove unneeded header file hugetlbfs: remove unneeded hugetlbfs_ops forward declaration hugetlbfs: use helper macro SZ_1{K,M} mm: cleanup is_highmem() mm/hmm: add a test for cross device private faults selftests: add soft-dirty into run_vmtests.sh selftests: soft-dirty: add test for mprotect mm/mprotect: fix soft-dirty check in can_change_pte_writable() mm: memcontrol: fix potential oom_lock recursion deadlock mm/gup.c: fix formatting in check_and_migrate_movable_page() xfs: fail dax mount if reflink is enabled on a partition mm/memcontrol.c: remove the redundant updating of stats_flush_threshold userfaultfd: don't fail on unrecognized features hugetlb_cgroup: fix wrong hugetlb cgroup numa stat ...
2022-07-29drm/amdkfd: use time_is_before_jiffies(a + b) to replace "jiffies - a > b"Yu Zhe
time_is_before_jiffies deals with timer wrapping correctly. Signed-off-by: Yu Zhe <yuzhe@nfschina.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-28drm/amdgpu: add debugfs for kfd system and ttm mem usedAlex Sierra
This keeps track of kfd system mem used and kfd ttm mem used. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-28drm/amdkfd: track unified memory reservation with xnack offAlex Sierra
[WHY] Unified memory with xnack off should be tracked, as userptr mappings and legacy allocations do. To avoid oversuscribe system memory when xnack off. [How] Exposing functions reserve_mem_limit and unreserve_mem_limit to SVM API and call them on every prange creation and free. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-28drm/amdkfd: Split giant svm rangePhilip Yang
Giant svm range split to smaller ranges, align the range start address to max svm range pages to improve MMU TLB usage. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-28drm/amdkfd: Set svm range max pagesPhilip Yang
This will be used to split giant svm range into smaller ranges, to support VRAM overcommitment by giant range and improve GPU retry fault recover on giant range. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-25drm/amdkfd: Process notifier release callback don't take mutexPhilip Yang
Move process queues cleanup to deferred work kfd_process_wq_release, to avoid potential deadlock circular locking warning: WARNING: possible circular locking dependency detected the existing dependency chain (in reverse order) is: -> #2 ((work_completion)(&svms->deferred_list_work)){+.+.}-{0:0}: __flush_work+0x343/0x4a0 svm_range_list_lock_and_flush_work+0x39/0xc0 svm_range_set_attr+0xe8/0x1080 [amdgpu] kfd_ioctl+0x19b/0x600 [amdgpu] __x64_sys_ioctl+0x81/0xb0 do_syscall_64+0x34/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #1 (&info->lock#2){+.+.}-{3:3}: __mutex_lock+0xa4/0x940 amdgpu_amdkfd_gpuvm_acquire_process_vm+0x2e3/0x590 kfd_process_device_init_vm+0x61/0x200 [amdgpu] kfd_ioctl_acquire_vm+0x83/0xb0 [amdgpu] kfd_ioctl+0x19b/0x600 [amdgpu] __x64_sys_ioctl+0x81/0xb0 do_syscall_64+0x34/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #0 (&process->mutex){+.+.}-{3:3}: __lock_acquire+0x1365/0x23d0 lock_acquire+0xc9/0x2e0 __mutex_lock+0xa4/0x940 kfd_process_notifier_release+0x96/0xe0 [amdgpu] __mmu_notifier_release+0x94/0x210 exit_mmap+0x35/0x1f0 mmput+0x63/0x120 svm_range_deferred_list_work+0x177/0x2c0 [amdgpu] process_one_work+0x2a4/0x600 worker_thread+0x39/0x3e0 kthread+0x16d/0x1a0 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((work_completion)(&svms->deferred_list_work)); lock(&info->lock#2); lock((work_completion)(&svms->deferred_list_work)); lock(&process->mutex); Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-25drm/amdkfd: Correct mmu_notifier_get failure handlingPhilip Yang
If process has signal pending, mmu_notifier_get_locked fails and calls ops->free_notifier, kfd_process_free_notifier will schedule kfd_process_wq_release as process refcount is 1, but process structure is already freed. This use after free bug causes system crash with different backtrace. The fix is to increase process refcount and then decrease the refcount after mmu_notifier_get success. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-17drm/amdkfd: add SPM support for SVMAlex Sierra
When CPU is connected throug XGMI, it has coherent access to VRAM resource. In this case that resource is taken from a table in the device gmc aperture base. This resource is used along with the device type, which could be DEVICE_PRIVATE or DEVICE_COHERENT to create the device page map region. Also, MIGRATE_VMA_SELECT_DEVICE_COHERENT flag is selected for coherent type case during migration to device. Link: https://lkml.kernel.org/r/20220715150521.18165-8-alex.sierra@amd.com Signed-off-by: Alex Sierra <alex.sierra@amd.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-13drm/amdkfd: correct the MEC atomic support firmware checking for GC 10.3.7Prike Liang
On the GC 10.3.7 platform the initial MEC release version #3 can support atomic operation,so need correct and set its MEC atomic support version to #3. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-07drm/amdgpu: Fix one list corruption when create queue failsxinhui pan
Queue would be freed when create_queue_cpsch fails So lets do queue cleanup otherwise various list and memory issues happen. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-07drm/amdkfd: optimize svm range evictEric Huang
It is to avoid unnecessary queue eviction when range is not mapped to gpu. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-07drm/amdkfd: change svm range evictEric Huang
Adding always evict queues when flag is set to KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED as if XNACK off. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-30drm/amdkfd: Asynchronously free smi_clientPhilip Yang
The synchronize_rcu may take several ms, which noticeably slows down applications close SMI event handle. Use call_rcu to free client->fifo and client asynchronously and eliminate the synchronize_rcu call in the user thread. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-30drm/amdkfd: Add unmap from GPU SMI eventPhilip Yang
SVM range unmapped from GPUs when range is unmapped from CPU, or with xnack on from MMU notifier when range is evicted or migrated. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-30drm/amdkfd: Add user queue eviction restore SMI eventPhilip Yang
Output user queue eviction and restore event. User queue eviction may be triggered by svm or userptr MMU notifier, TTM eviction, device suspend and CRIU checkpoint and restore. User queue restore may be rescheduled if eviction happens again while restore. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-30drm/amdkfd: Add migration SMI eventPhilip Yang
For migration start and end event, output timestamp when migration starts, ends, svm range address and size, GPU id of migration source and destination and svm range attributes, Migration trigger could be prefetch, CPU or GPU page fault and TTM eviction. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-30drm/amdkfd: Add GPU recoverable fault SMI eventPhilip Yang
Use ktime_get_boottime_ns() as timestamp to correlate with other APIs. Output timestamp when GPU recoverable fault starts and ends to recover the fault, if migration happened or only GPU page table is updated to recover, fault address, if read or write fault. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-30drm/amdkfd: Enable per process SMI eventPhilip Yang
Process receive event from same process by default. Add a flag to be able to receive event from all processes, this requires super user permission. Event using pid 0 to send the event to all processes, to keep the default behavior of existing SMI events. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-30drm/amdkfd: fix cu mask for asics with wgpsJonathan Kim
GFX10 and up have work group processors (WGP) and WGP mode is the native compile mode. KFD and ROCr have no visibility into whether a dispatch is operating in CU or WGP mode. Enforce CU masking to be pairwise continguous in enablement and round robin distribute CUs across the SEs in a pairwise manner to assume WGP mode at all times. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-28Revert "drm/amdkfd: Free queue after unmap queue success"Philip Yang
This reverts commit ab8529b0cdb271d9b222cbbddb2641f3fca5df8f. This causes KFDTest KFDMemoryTest.MemoryRegister test failed on gfx9. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-28drm/amdgpu: add mc wptr addr support for mesJack Xiao
MES requires mc wptr address for usermode queues. Export bo gart address for mc wptr address. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-23drm/amdgpu: Update mes_v11_api_def.hGraham Sider
Update MES API to support oversubscription without aggregated doorbell for usermode queues. v2: Change oversubscription_no_aggregated_en to is_kfd_process (align with MES) Signed-off-by: Graham Sider <Graham.Sider@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-23drm/amdkfd: Enable GFX11 usermode queue oversubscriptionGraham Sider
Starting with GFX11, MES requires wptr BOs to be GTT allocated/mapped to GART for usermode queues in order to support oversubscription. In the case that work is submitted to an unmapped queue, MES must have a GART wptr address to determine whether the queue should be mapped. This change is accompanied with changes in MES and is applicable for MES_API_VERSION >= 2. v3: - Use amdgpu_vm_bo_lookup_mapping for wptr_bo mapping lookup - Move wptr_bo refcount increment to amdgpu_amdkfd_map_gtt_bo_to_gart - Remove list_del_init from amdgpu_amdkfd_map_gtt_bo_to_gart - Cleanup/fix create_queue wptr_bo error handling v4: - Add MES version shift/mask defines to amdgpu_mes.h - Change version check from MES_VERSION to MES_API_VERSION - Add check in kfd_ioctl_create_queue before wptr bo pin/GART map to ensure bo is a single page. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-22drm/amdkfd: Free queue after unmap queue successPhilip Yang
After queue unmap or remove from MES successfully, free queue sysfs entries, doorbell and remove from queue list. Otherwise, application may destroy queue again, cause below kernel warning or crash backtrace. For outstanding queues, either application forget to destroy or failed to destroy, kfd_process_notifier_release will remove queue sysfs entries, kfd_process_wq_release will free queue doorbell. v2: decrement_queue_count for MES queue refcount_t: underflow; use-after-free. WARNING: CPU: 7 PID: 3053 at lib/refcount.c:28 Call Trace: kobject_put+0xd6/0x1a0 kfd_procfs_del_queue+0x27/0x30 [amdgpu] pqm_destroy_queue+0xeb/0x240 [amdgpu] kfd_ioctl_destroy_queue+0x32/0x70 [amdgpu] kfd_ioctl+0x27d/0x500 [amdgpu] do_syscall_64+0x35/0x80 WARNING: CPU: 2 PID: 3053 at drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device_queue_manager.c:400 Call Trace: deallocate_doorbell.isra.0+0x39/0x40 [amdgpu] destroy_queue_cpsch+0xb3/0x270 [amdgpu] pqm_destroy_queue+0x108/0x240 [amdgpu] kfd_ioctl_destroy_queue+0x32/0x70 [amdgpu] kfd_ioctl+0x27d/0x500 [amdgpu] general protection fault, probably for non-canonical address 0xdead000000000108: Call Trace: pqm_destroy_queue+0xf0/0x200 [amdgpu] kfd_ioctl_destroy_queue+0x2f/0x60 [amdgpu] kfd_ioctl+0x19b/0x600 [amdgpu] Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Graham Sider <Graham.Sider@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-22drm/amdkfd: Add queue to MES if it becomes activePhilip Yang
We remove the user queue from MES scheduler to update queue properties. If the queue becomes active after updating, add the user queue to MES scheduler, to be able to handle command packet submission. v2: don't break pqm_set_gws Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Graham Sider <Graham.Sider@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-21drm/amdkfd: correct sdma queue number of sdma 6.0.1Yifan Zhang
sdma 6.0.1 has 8 queues instead of 2. Fixes: 26776a7031c423 ("drm/amdkfd: add GC 11.0.1 KFD support") Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-14drm/amdkfd: fix warning when CONFIG_HSA_AMD_P2P is not setAlex Deucher
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1542:11: warning: variable 'i' set but not used [-Wunused-but-set-variable] Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-14drm/amdkfd: Add available memory ioctlDaniel Phillips
Add a new KFD ioctl to return the largest possible memory size that can be allocated as a buffer object using kfd_ioctl_alloc_memory_of_gpu. It attempts to use exactly the same accept/reject criteria as that function so that allocating a new buffer object of the size returned by this new ioctl is guaranteed to succeed, barring races with other allocating tasks. This IOCTL will be used by libhsakmt: https://www.mail-archive.com/amd-gfx@lists.freedesktop.org/msg75743.html Signed-off-by: Daniel Phillips <Daniel.Phillips@amd.com> Signed-off-by: David Yat Sin <David.YatSin@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-10drm/amdkfd: Remove field io_link_count from struct kfd_topology_deviceRamesh Errabolu
The field is redundant and does not serve any functional role Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-08drm/amdkfd: Document and fix GTT BO kmap APIFelix Kuehling
Removed an unused parameter from two functions and added kernel-doc comments. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-08drm/amdkfd: Extend KFD device topology to surface peer-to-peer linksRamesh Errabolu
Extend KFD device topology to surface peer-to-peer links among GPU devices connected over PCIe or xGMI. Enabling HSA_AMD_P2P is REQUIRED to surface peer-to-peer links. Prior to this KFD did not expose to user mode any P2P links or indirect links that go over two or more direct hops. Old versions of the Thunk used to make up their own P2P and indirect links without the information about peer-accessibility and chipset support available to the kernel mode driver. In this patch we expose P2P links in a new sysfs directory to provide more reliable P2P link information to user mode. Old versions of the Thunk will continue to work as before and ignore the new directory. This avoids conflicts between P2P links exposed by KFD and P2P links created by the Thunk itself. New versions of the Thunk will use only the P2P links provided in the new p2p_links directory, if it exists, or fall back to the old code path on older KFDs that don't expose p2p_links. Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-08drm/amdkfd: Define config HSA_AMD_P2P to support peer-to-peerRamesh Errabolu
Extend current kernel config requirements of amdgpu by adding config HSA_AMD_P2P. Enabling HSA_AMD_P2P is REQUIRED to support peer-to-peer communication between AMD GPU devices over PCIe bus Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>