summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)Author
2025-06-03drm/amdgpu: Add userq fence support to SDMAv7.0Arunpravin Paneer Selvam
- Add userq fence support to SDMAv7.0. - GFX12's user fence irq src id differs from GFX11's, hence we need create a new irq srcid header file for GFX12. User fence irq src id information- GFX11 and SDMA6.0 - 0x43 GFX12 and SDMA7.0 - 0x46 Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-03drm/amdgpu: Fix integer overflow in amdgpu_gem_add_input_fence()Dan Carpenter
The "num_syncobj_handles" is a u32 value that comes from the user via the ioctl. On 32bit systems the "sizeof(uint32_t) * num_syncobj_handles" multiplication can have an integer overflow. Use size_mul() to fix that. Fixes: 38c67ec9aa4b ("drm/amdgpu: Add input fence to sync bo map/unmap") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-03drm/amdgpu: Fix integer overflow issues in amdgpu_userq_fence.cDan Carpenter
This patch only affects 32bit systems. There are several integer overflows bugs here but only the "sizeof(u32) * num_syncobj" multiplication is a problem at runtime. (The last lines of this patch). These variables are u32 variables that come from the user. The issue is the multiplications can overflow leading to us allocating a smaller buffer than intended. For the first couple integer overflows, the syncobj_handles = memdup_user() allocation is immediately followed by a kmalloc_array(): syncobj = kmalloc_array(num_syncobj_handles, sizeof(*syncobj), GFP_KERNEL); In that situation the kmalloc_array() works as a bounds check and we haven't accessed the syncobj_handlesp[] array yet so the integer overflow is harmless. But the "num_syncobj" multiplication doesn't have that and the integer overflow could lead to an out of bounds access. Fixes: a292fdecd728 ("drm/amdgpu: Implement userqueue signal/wait IOCTL") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-03drm/amdgpu: disable workload profile switching when OD is enabledAlex Deucher
Users have reported that they have to reduce the level of undervolting to acheive stability when dynamic workload profiles are enabled on GC 10.3.x. Disable dynamic workload profiles if the user has enabled OD. Fixes: b9467983b774 ("drm/amdgpu: add dynamic workload profile switching for gfx10") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4262 Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.15.x
2025-06-03drm/amdgpu/gfx10: Refine Cleaner Shader for GFX10.1.10Vitaly Prosyak
This patch updates the cleaner shader, which is responsible for initializing GPU resources such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). Changes include adjustments to register clearing and shader configuration. - Updated GPU resource initialization addresses in the cleaner shader from `be803080` to `be803000`. - Simplified the logic in the SGPR clearing section, ensuring all SGPRs are set to zero. Fixes: 25961bad9212 ("drm/amdgpu/gfx10: Add cleaner shader for GFX10.1.10") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Manu Rastogi <manu.rastogi@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-03drm/amdgpu: Add more checks to discovery fetchLijo Lazar
Add more checks for valid vram size and log error, if any. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-03dma-fence: Use a flag for 64-bit seqnosTvrtko Ursulin
With the goal of reducing the need for drivers to touch (and dereference) fence->ops, we move the 64-bit seqnos flag from struct dma_fence_ops to the fence->flags. Drivers which were setting this flag are changed to use new dma_fence_init64() instead of dma_fence_init(). v2: * Streamlined init and added kerneldoc. * Rebase for amdgpu userq which landed since. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Christian König <christian.koenig@amd.com> # v1 Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net> Link: https://lore.kernel.org/r/20250515095004.28318-3-tvrtko.ursulin@igalia.com
2025-06-03Merge drm-next-2025-05-28 into drm-misc-nextMaxime Ripard
Christian needs a recent drm-next branch to merge fence patches. Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-05-29drm/amdkfd: Map wptr BO to GART unconditionallyLang Yu
For simulation C models that don't run CP FW where adev->mes.sched_version is not populated correctly. This causes NULL dereference in amdgpu_amdkfd_free_gtt_mem(dev->adev, (void **)&pqn->q->wptr_bo_gart) and warning on unpinned BO in amdgpu_bo_gpu_offset(q->properties.wptr_bo). Compared with adding version check here and there, always map wptr BO to GART simplifies things. v2: Add NULL check in amdgpu_amdkfd_free_gtt_mem.(Philip) Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-29drm/amdgpu/mes: remove some unused functionsAlex Deucher
Nothing uses them so remove them. Leftover from MES bring up. Reviewed-by: Michael Chen <michael.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-29drm/amdgpu/mes: add missing locking in helper functionsAlex Deucher
We need to take the MES lock. Reviewed-by: Michael Chen <michael.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2025-05-29drm/amd: Export DMCUB version to sysfsMario Limonciello
For supported ASICs DMCU version is exported, but ASICs that support DMCUB there is no information exported to sysfs. Add an attribute for DMCUB. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Link: https://lore.kernel.org/r/20250527155942.476354-1-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-29drm/amdgpu: Get mca address for old eeprom recordsganglxie
after getting mca address for old eeprom records with 'address==0', it can be correctly parsed under none-nps1, or it will be dropped. Signed-off-by: ganglxie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-29drm/amdgpu: handle old RAS eeprom data in non-nps1 modeganglxie
Get MCA address from PA in nps1, then convert MCA address to PA in specific nps mode. Signed-off-by: ganglxie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-29drm/amdgpu: Add userq fence support to SDMAv6.0Arunpravin Paneer Selvam
Add userq fence support to SDMAv6.0 Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-29drm/amdgpu: amdgpu_vram_mgr_new(): Clamp lpfn to total vramJohn Olender
The drm_mm allocator tolerated being passed end > mm->size, but the drm_buddy allocator does not. Restore the pre-buddy-allocator behavior of allowing such placements. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3448 Signed-off-by: John Olender <john.olender@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2025-05-29drm/amdgpu/vcn5.0.1: read back register after writtenDavid (Ming Qiang) Wu
The addition of register read-back in VCN v5.0.1 is intended to prevent potential race conditions. Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-29drm/amdgpu/vcn5: read back register after writtenDavid (Ming Qiang) Wu
The addition of register read-back in VCN v5.0.0 is intended to prevent potential race conditions. Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-29drm/amdgpu/vcn4.0.5: read back register after writtenDavid (Ming Qiang) Wu
The addition of register read-back in VCN v4.0.5 is intended to prevent potential race conditions. Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-29drm/amdgpu/vcn4.0.3: read back register after writtenDavid (Ming Qiang) Wu
The addition of register read-back in VCN v4.0.3 is intended to prevent potential race conditions. Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-29drm/amdgpu/vcn4: read back register after writtenDavid (Ming Qiang) Wu
The addition of register read-back in VCN v4.0.0 is intended to prevent potential race conditions. Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-29drm/amdgpu/vcn3: read back register after writtenDavid (Ming Qiang) Wu
The addition of register read-back in VCN v3.0 is intended to prevent potential race conditions. Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-29drm/amdgpu/vcn2.5: read back register after writtenDavid (Ming Qiang) Wu
The addition of register read-back in VCN v2.5 is intended to prevent potential race conditions. Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-29drm/amdgpu/vcn2: read back register after writtenDavid (Ming Qiang) Wu
The addition of register read-back in VCN v2.0 is intended to prevent potential race conditions. Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-29drm/amdgpu/vcn1: read back register after writtenDavid (Ming Qiang) Wu
V3: drop changes where readbacks have implemented. This patch set is to add readbacks only. V2: use common register UVD_STATUS for readback (standard PCI MMIO behavior, i.e. readback post all writes to let the writes hit the hardware) add readback in ..._stop() for more coverage. Similar to the changes made for VCN v4.0.5 where readback to post the writes to avoid race with the doorbell, the addition of register readback support in other VCN versions is intended to prevent potential race conditions, even though such issues have not been observed yet. This change ensures consistency across different VCN variants and helps avoid similar issues. The overhead introduced is negligible. Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-28drm/amd/amdgpu: Add GPIO resources required for amdispPratap Nirujogi
ISP is a child device to GFX, and its device specific information is not available in ACPI. Adding the 2 GPIO resources required for ISP_v4_1_1 in amdgpu_isp driver. - GPIO 0 to allow sensor driver to enable and disable sensor module. - GPIO 85 to allow ISP driver to enable and disable ISP RGB streaming mode. Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-28Merge tag 'drm-next-2025-05-28' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm updates from Dave Airlie: "As part of building up nova-core/nova-drm pieces we've brought in some rust abstractions through this tree, aux bus being the main one, with devres changes also in the driver-core tree. Along with the drm core abstractions and enough nova-core/nova-drm to use them. This is still all stub work under construction, to build the nova driver upstream. The other big NVIDIA related one is nouveau adds support for Hopper/Blackwell GPUs, this required a new GSP firmware update to 570.144, and a bunch of rework in order to support multiple fw interfaces. There is also the introduction of an asahi uapi header file as a precursor to getting the real driver in later, but to unblock userspace mesa packages while the driver is trapped behind rust enablement. Otherwise it's the usual mixture of stuff all over, amdgpu, i915/xe, and msm being the main ones, and some changes to vsprintf. new drivers: - bring in the asahi uapi header standalone - nova-drm: stub driver rust dependencies (for nova-core): - auxiliary - bus abstractions - driver registration - sample driver - devres changes from driver-core - revocable changes core: - add Apple fourcc modifiers - add virtio capset definitions - extend EXPORT_SYNC_FILE for timeline syncobjs - convert to devm_platform_ioremap_resource - refactor shmem helper page pinning - DP powerup/down link helpers - extended %p4cc in vsprintf.c to support fourcc prints - change vsprintf %p4cn to %p4chR, remove %p4cn - Add drm_file_err function - IN_FORMATS_ASYNC property - move sitronix from tiny to their own subdir rust: - add drm core infrastructure rust abstractions (device/driver, ioctl, file, gem) dma-buf: - adjust sg handling to not cache map on attach - allow setting dma-device for import - Add a helper to sort and deduplicate dma_fence arrays docs: - updated drm scheduler docs - fbdev todo update - fb rendering - actual brightness ttm: - fix delayed destroy resv object bridge: - add kunit tests - convert tc358775 to atomic - convert drivers to devm_drm_bridge_alloc - convert rk3066_hdmi to bridge driver scheduler: - add kunit tests panel: - refcount panels to improve lifetime handling - Powertip PH128800T004-ZZA01 - NLT NL13676BC25-03F, Tianma TM070JDHG34-00 - Himax HX8279/HX8279-D DDIC - Visionox G2647FB105 - Sitronix ST7571 - ZOTAC rotation quirk vkms: - allow attaching more displays i915: - xe3lpd display updates - vrr refactor - intel_display struct conversions - xe2hpd memory type identification - add link rate/count to i915_display_info - cleanup VGA plane handling - refactor HDCP GSC - fix SLPC wait boosting reference counting - add 20ms delay to engine reset - fix fence release on early probe errors xe: - SRIOV updates - BMG PCI ID update - support separate firmware for each GT - SVM fix, prelim SVM multi-device work - export fan speed - temp disable d3cold on BMG - backup VRAM in PM notifier instead of suspend/freeze - update xe_ttm_access_memory to use GPU for non-visible access - fix guc_info debugfs for VFs - use copy_from_user instead of __copy_from_user - append PCIe gen5 limitations to xe_firmware document amdgpu: - DSC cleanup - DC Scaling updates - Fused I2C-over-AUX updates - DMUB updates - Use drm_file_err in amdgpu - Enforce isolation updates - Use new dma_fence helpers - USERQ fixes - Documentation updates - SR-IOV updates - RAS updates - PSP 12 cleanups - GC 9.5 updates - SMU 13.x updates - VCN / JPEG SR-IOV updates amdkfd: - Update error messages for SDMA - Userptr updates - XNACK fixes radeon: - CIK doorbell cleanup nouveau: - add support for NVIDIA r570 GSP firmware - enable Hopper/Blackwell support nova-core: - fix task list - register definition infrastructure - move firmware into own rust module - register auxiliary device for nova-drm nova-drm: - initial driver skeleton msm: - GPU: - ACD (adaptive clock distribution) for X1-85 - drop fictional address_space_size - improve GMU HFI response time out robustness - fix crash when throttling during boot - DPU: - use single CTL path for flushing on DPU 5.x+ - improve SSPP allocation code for better sharing - Enabled SmartDMA on SM8150, SC8180X, SC8280XP, SM8550 - Added SAR2130P support - Disabled DSC support on MSM8937, MSM8917, MSM8953, SDM660 - DP: - switch to new audio helpers - better LTTPR handling - DSI: - Added support for SA8775P - Added SAR2130P support - HDMI: - Switched to use new helpers for ACR data - Fixed old standing issue of HPD not working in some cases amdxdna: - add dma-buf support - allow empty command submits renesas: - add dma-buf support - add zpos, alpha, blend support panthor: - fail properly for NO_MMAP bos - add SET_LABEL ioctl - debugfs BO dumping support imagination: - update DT bindings - support TI AM68 GPU hibmc: - improve interrupt handling and HPD support virtio: - add panic handler support rockchip: - add RK3588 support - add DP AUX bus panel support ivpu: - add heartbeat based hangcheck mediatek: - prepares support for MT8195/99 HDMIv2/DDCv2 anx7625: - improve HPD tegra: - speed up firmware loading * tag 'drm-next-2025-05-28' of https://gitlab.freedesktop.org/drm/kernel: (1627 commits) drm/nouveau/tegra: Fix error pointer vs NULL return in nvkm_device_tegra_resource_addr() drm/xe: Default auto_link_downgrade status to false drm/xe/guc: Make creation of SLPC debugfs files conditional drm/i915/display: Add check for alloc_ordered_workqueue() and alloc_workqueue() drm/i915/dp_mst: Work around Thunderbolt sink disconnect after SINK_COUNT_ESI read drm/i915/ptl: Use everywhere the correct DDI port clock select mask drm/nouveau/kms: add support for GB20x drm/dp: add option to disable zero sized address only transactions. drm/nouveau: add support for GB20x drm/nouveau/gsp: add hal for fifo.chan.doorbell_handle drm/nouveau: add support for GB10x drm/nouveau/gf100-: track chan progress with non-WFI semaphore release drm/nouveau/nv50-: separate CHANNEL_GPFIFO handling out from CHANNEL_DMA drm/nouveau: add helper functions for allocating pinned/cpu-mapped bos drm/nouveau: add support for GH100 drm/nouveau: improve handling of 64-bit BARs drm/nouveau/gv100-: switch to volta semaphore methods drm/nouveau/gsp: support deeper page tables in COPY_SERVER_RESERVED_PDES drm/nouveau/gsp: init client VMMs with NV0080_CTRL_DMA_SET_PAGE_DIRECTORY drm/nouveau/gsp: fetch level shift and PDE from BAR2 VMM ...
2025-05-28drm/amdgpu: update trace format to match gpu_scheduler_tracePierre-Eric Pelloux-Prayer
Log fences using the same format for coherency. Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Arvind Yadav <arvind.yadav@amd.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://lore.kernel.org/r/20250526125505.2360-11-pierre-eric.pelloux-prayer@amd.com
2025-05-28drm: Get rid of drm_sched_job.idPierre-Eric Pelloux-Prayer
Its only purpose was for trace events, but jobs can already be uniquely identified using their fence. The downside of using the fence is that it's only available after 'drm_sched_job_arm' was called which is true for all trace events that used job.id so they can safely switch to using it. Suggested-by: Tvrtko Ursulin <tursulin@igalia.com> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Arvind Yadav <arvind.yadav@amd.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://lore.kernel.org/r/20250526125505.2360-9-pierre-eric.pelloux-prayer@amd.com
2025-05-28drm/sched: Store the drm client_id in drm_sched_fencePierre-Eric Pelloux-Prayer
This will be used in a later commit to trace the drm client_id in some of the gpu_scheduler trace events. This requires changing all the users of drm_sched_job_init to add an extra parameter. The newly added drm_client_id field in the drm_sched_fence is a bit of a duplicate of the owner one. One suggestion I received was to merge those 2 fields - this can't be done right now as amdgpu uses some special values (AMDGPU_FENCE_OWNER_*) that can't really be translated into a client id. Christian is working on getting rid of those; when it's done we should be able to squash owner/drm_client_id together. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://lore.kernel.org/r/20250526125505.2360-3-pierre-eric.pelloux-prayer@amd.com
2025-05-27Merge tag 'irq-cleanups-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq cleanups from Thomas Gleixner: "A set of cleanups for the generic interrupt subsystem: - Consolidate on one set of functions for the interrupt domain code to get rid of pointlessly duplicated code with only marginal different semantics. - Update the documentation accordingly and consolidate the coding style of the irqdomain header" * tag 'irq-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits) irqdomain: Consolidate coding style irqdomain: Fix kernel-doc and add it to Documentation Documentation: irqdomain: Update it Documentation: irq-domain.rst: Simple improvements Documentation: irq/concepts: Minor improvements Documentation: irq/concepts: Add commas and reflow irqdomain: Improve kernel-docs of functions irqdomain: Make struct irq_domain_info variables const irqdomain: Use irq_domain_instantiate()'s return value as initializers irqdomain: Drop irq_linear_revmap() pinctrl: keembay: Switch to irq_find_mapping() irqchip/armada-370-xp: Switch to irq_find_mapping() gpu: ipu-v3: Switch to irq_find_mapping() gpio: idt3243x: Switch to irq_find_mapping() sh: Switch to irq_find_mapping() powerpc: Switch to irq_find_mapping() irqdomain: Drop irq_domain_add_*() functions powerpc: Switch irq_domain_add_nomap() to use fwnode thermal: Switch to irq_domain_create_linear() soc: Switch to irq_domain_create_*() ...
2025-05-22drm/amdgpu: seq64 memory unmap uses uninterruptible lockPhilip Yang
To unmap and free seq64 memory when drm node close to free vm, if there is signal accepted, then taking vm lock failed and leaking seq64 va mapping, and then dmesg has error log "still active bo inside vm". Change to use uninterruptible lock fix the mapping leaking and no dmesg error log. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: update ras support checkMangesh Gadre
update ras support check for vcn 5.0.1 Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Enable RAS for jpeg 5.0.1Mangesh Gadre
Enable jpeg ras posion processing and aca error logging Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Enable RAS for vcn 5.0.1Mangesh Gadre
Enable vcn ras posion processing and aca error logging Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Remove duplicated "context still alive" checkTvrtko Ursulin
When amdgpu_ctx_mgr_fini() calls amdgpu_ctx_mgr_entity_fini() it contains the exact same "context still alive" check as it will do next. Remove the duplicated copy. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Make amdgpu_ctx_mgr_entity_fini staticTvrtko Ursulin
Function amdgpu_ctx_mgr_entity_fini() only has a single local caller so lets make it local. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Update runtime pm checksAlex Deucher
Don't enable BACO when in passthrough. PCI resets don't work correctly when in BACO. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amd/pm: Use external link order for xgmi dataLijo Lazar
xgmi_port_num interface reports external link number for port number. To be consistent, use the external link number for reporting other XGMI link data also. v2: For invalid link number return -EINVAL (Kevin) Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Add sysfs nodes for partitionLijo Lazar
Add sysfs nodes to provide compute paritition specific data. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Register aqua vanjaram jpeg poison irqStanley.Yang
Register aqua vanjaram jpeg poison irq, add jpeg poison handle. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Register aqua vanjaram vcn poison irqStanley.Yang
Register aqua vanjaram vcn poison irq, add vcn poison handle. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Fix eviction fence worker race during fd closeJesse.Zhang
The current cleanup order during file descriptor close can lead to a race condition where the eviction fence worker attempts to access a destroyed mutex from the user queue manager: [ 517.294055] DEBUG_LOCKS_WARN_ON(lock->magic != lock) [ 517.294060] WARNING: CPU: 8 PID: 2030 at kernel/locking/mutex.c:564 [ 517.294094] Workqueue: events amdgpu_eviction_fence_suspend_worker [amdgpu] The issue occurs because: 1. We destroy the user queue manager (including its mutex) first 2. Then try to destroy eviction fences which may have pending work 3. The eviction fence worker may try to access the already-destroyed mutex Fix this by reordering the cleanup to: 1. First mark the fd as closing and destroy eviction fences, which flushes any pending work 2. Then safely destroy the user queue manager after we're certain no more fence work will be executed The copy in amdgpu_driver_postclose_kms() needs to be removed (Christian) Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: lock the eviction fence for wq signals itPrike Liang
Lock and refer to the eviction fence before the eviction fence schedules work queue tries to signal it. Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Prike Liang <Prike.Liang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16gpu: Switch to irq_domain_create_linear()Jiri Slaby (SUSE)
irq_domain_add_linear() is going away as being obsolete now. Switch to the preferred irq_domain_create_linear(). That differs in the first parameter: It takes more generic struct fwnode_handle instead of struct device_node. Therefore, of_fwnode_handle() is added around the parameter. Note some of the users can likely use dev->fwnode directly instead of indirect of_fwnode_handle(dev->of_node). But dev->fwnode is not guaranteed to be set for all, so this has to be investigated on case to case basis (by people who can actually test with the HW). [ tglx: Fix up subject prefix ] Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250319092951.37667-19-jirislaby@kernel.org
2025-05-16drm/amdgpu/jpeg: sriov support for jpeg_v5_0_1fanhuang
initialization table handshake with mmsch Signed-off-by: fanhuang <FangSheng.Huang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16drm/amdgpu/vcn: sriov support for vcn_v5_0_1fanhuang
initialization table handshake with mmsch Signed-off-by: fanhuang <FangSheng.Huang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16drm/amdgpu: fix use-after-unlock in eviction fence destroyArvind Yadav
The eviction fence destroy path incorrectly calls dma_fence_put() on evf_mgr->ev_fence after releasing the ev_fence_lock. This introduces a potential use-after-unlock or race because another thread concurrently modifies evf_mgr->ev_fence. Fix this by grabbing a local reference to evf_mgr->ev_fence under the lock and using that for dma_fence_put() after waiting. Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16drm/amdgpu: Allow NPS2-CPX combination for VFsLijo Lazar
CPX partition mode is compatible with NPS2 on aquavanjaram VFs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16drm/amdgpu/mmsch: Add MMSCH v5_0 support for sriovfanhuang
These structures are basically ported from MMSCH v4_0 The structures are the same as v4_0 except for the init header Signed-off-by: fanhuang <FangSheng.Huang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>