summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-03-07drm/amdkfd: Set per-process flags only once cik/viHarish Kasiviswanathan
Set per-process static sh_mem config only once during process initialization. Move all static changes from update_qpd() which is called each time a queue is created to set_cache_memory_policy() which is called once during process initialization. set_cache_memory_policy() is currently defined only for cik and vi family. So this commit only focuses on these two. A separate commit will address other asics. Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amd: Keep display off while going into S4Mario Limonciello
When userspace invokes S4 the flow is: 1) amdgpu_pmops_prepare() 2) amdgpu_pmops_freeze() 3) Create hibernation image 4) amdgpu_pmops_thaw() 5) Write out image to disk 6) Turn off system Then on resume amdgpu_pmops_restore() is called. This flow has a problem that because amdgpu_pmops_thaw() is called it will call amdgpu_device_resume() which will resume all of the GPU. This includes turning the display hardware back on and discovering connectors again. This is an unexpected experience for the display to turn back on. Adjust the flow so that during the S4 sequence display hardware is not turned back on. Reported-by: Xaver Hugl <xaver.hugl@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2038 Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Harry Wentland <harry.wentland@amd.com> Link: https://lore.kernel.org/r/20250306185124.44780-1-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amd/amdgpu: Add missing GC 11.5.0 registerTom St Denis
Adds register needed for debugging purposes. Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amdkfd: clear F8_MODE for gfx950Alex Sierra
Default F8_MODE should be OCP format on gfx950. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amdgpu: add defines for pin_offsets in DCE8Alexandre Demers
Define pin_offsets values in the same way it is done in DCE8 Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amdgpu: Fix annotation for dce_v6_0_line_buffer_adjust functionSrinivasan Shanmugam
Updated description for the 'other_mode' parameter. This parameter is used to determine the display mode of another display controller that may be sharing the line buffer. Cc: Ken Wang <Qingqing.Wang@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amdgpu: handle amdgpu_cgs_create_device() errors in amd_powerplay_create()Wentao Liang
Add error handling to propagate amdgpu_cgs_create_device() failures to the caller. When amdgpu_cgs_create_device() fails, release hwmgr and return -ENOMEM to prevent null pointer dereference. [v1]->[v2]: Change error code from -EINVAL to -ENOMEM. Free hwmgr. Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amd/display: fix missing .is_two_pixels_per_containerAliaksei Urbanski
Starting from 6.11, AMDGPU driver, while being loaded with amdgpu.dc=1, due to lack of .is_two_pixels_per_container function in dce60_tg_funcs, causes a NULL pointer dereference on PCs with old GPUs, such as R9 280X. So this fix adds missing .is_two_pixels_per_container to dce60_tg_funcs. Reported-by: Rosen Penev <rosenp@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3942 Fixes: e6a901a00822 ("drm/amd/display: use even ODM slice width for two pixels per container") Signed-off-by: Aliaksei Urbanski <aliaksei.urbanski@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amdgpu/display: Allow DCC for video formats on GFX12David Rosca
We advertise DCC as supported for NV12/P010 formats on GFX12, but it would fail on this check on atomic commit. Signed-off-by: David Rosca <david.rosca@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amdgpu: Use unique CPER record id across devicesXiang Liu
Encode socket id to CPER record id to be unique across devices. v2: add pointer check for adev->smuio.funcs->get_socket_id v2: set 0 if adev->smuio.funcs->get_socket_id is NULL Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amdgpu: fix the gb_addr_config_fields init value mismatchShiwu Zhang
For gfx_v9_4_3 specifically, before regGB_ADDR_CONFIG is overwritten in gfx hw_init it is read out to popluate the gb_addr_config_fields in the sw_init stage, which causes mismatch. Fix it by using the golden value in sw_init as well. v2: This is a driver-set golden reg and keep as it is (Lijo) Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amdgpu: retire ip init code specific for A0 revShiwu Zhang
For aqua_vanjaram, A0 HW is retired so remove the code specific for it in gfx ip init. Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amdgpu: increase RAS bad page thresholdTao Zhou
For default policy, driver will issue an RMA event when the number of bad pages is greater than 8 physical rows, rather than reaches 8 physical rows, don't rely on threshold configurable parameters in default mode. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amdgpu: Fix missing drain retry fault the last entryEmily Deng
While the entry get in svm_range_unmap_from_cpu is the last entry, and the entry is page fault, it also need to be dropped. So for equal case, it also need to be dropped. v2: Only modify the svm_range_restore_pages. Signed-off-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Xiaogang Chen<xiaogang.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amdgpu: Do not set power brake sequence for Aldebaran SRIOVVictor Lu
Aldebaran SRIOV VF cannot access the power brake feature regs. The accesses can be skipped to avoid a dmesg warning. v2: Remove redundant asic type check Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amdkfd: remove unused debug gws support status variableJonathan Kim
Remove unused declaration of gws_debug_workaround. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Amber Lin <amber.lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amdgpu: fix inconsistent indenting warningCharles Han
Fix below inconsistent indenting smatch warning. smatch warnings: drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c:582 amdgpu_sdma_reset_engine() warn: inconsistent indenting Signed-off-by: Charles Han <hanchunchao@inspur.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amdgpu: Do not write to GRBM_CNTL if Aldebaran SRIOVVictor Lu
Aldebaran SRIOV VF does not have write permissions to GRBM_CTNL. This access can be skipped to avoid a dmesg warning. v2: Use GC IP version check instead of asic check Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07Merge tag 'drm-misc-next-2025-03-06' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.15: Cross-subsystem Changes: base: - component: Provide helper to query bound status fbdev: - fbtft: Remove access to page->index Core Changes: - Fix usage of logging macros in several places gem: - Add test function for imported dma-bufs and use it in core and helpers - Avoid struct drm_gem_object.import_attach tests: - Fix lockdep warnings ttm: - Add helpers for TTM shrinker Driver Changes: adp: - Add support for Apple Touch Bar displays on M1/M2 amdxdna: - Fix interrupt handling appletbdrm: - Add support for Apple Touch Bar displays on x86 bridge: - synopsys: Add HDMI audio support - ti-sn65dsi83: Support negative DE polarity ipu-v3: - Remove unused code nouveau: - Avoid multiple -Wflex-array-member-not-at-end warnings panthor: - Fix CS_STATUS_ defines - Improve locking rockchip: - analogix_dp: Add eDP support - lvds: Improve logging - vop2: Improve HDMI mode handling; Add support for RK3576 - Fix shutdown - Support rk3562-mali xe: - Use TTM shrinker Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20250306130700.GA485504@linux.fritz.box
2025-03-06drm/gma500: fix inconsistent indenting warningCharles Han
Fix below inconsistent indenting smatch warning. smatch warnings: drivers/gpu/drm/gma500/cdv_device.c:218 cdv_errata() warn: inconsistent indenting Signed-off-by: Charles Han <hanchunchao@inspur.com> Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250305084911.6394-1-hanchunchao@inspur.com
2025-03-06drm/gma500: Replace deprecated strncpy() with strscpy()Thorsten Blum
strncpy() is deprecated for NUL-terminated destination buffers. Use strscpy() instead and remove the manual NUL-termination. Compile-tested only. Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250225203932.334123-1-thorsten.blum@linux.dev
2025-03-06drm/prime: Use dma_buf from GEM object instanceThomas Zimmermann
Avoid dereferencing struct drm_gem_object.import_attach for the imported dma-buf. The dma_buf field in the GEM object instance refers to the same buffer. Prepares to make import_attach optional. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Anusha Srivatsa <asrivats@redhat.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250226172457.217725-11-tzimmermann@suse.de
2025-03-06drm/mipi-dbi: Test for imported buffers with drm_gem_is_imported()Thomas Zimmermann
Instead of testing import_attach for imported GEM buffers, invoke drm_gem_is_imported() to do the test. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Anusha Srivatsa <asrivats@redhat.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250226172457.217725-10-tzimmermann@suse.de
2025-03-06drm/fb-dma-helper: Test for imported buffers with drm_gem_is_imported()Thomas Zimmermann
Instead of testing import_attach for imported GEM buffers, invoke drm_gem_is_imported() to do the test. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Anusha Srivatsa <asrivats@redhat.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250226172457.217725-9-tzimmermann@suse.de
2025-03-06drm/gem-framebuffer: Use dma_buf from GEM object instanceThomas Zimmermann
Avoid dereferencing struct drm_gem_object.import_attach for the imported dma-buf. The dma_buf field in the GEM object instance refers to the same buffer. Prepares to make import_attach optional. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Anusha Srivatsa <asrivats@redhat.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250226172457.217725-8-tzimmermann@suse.de
2025-03-06drm/gem-framebuffer: Test for imported buffers with drm_gem_is_imported()Thomas Zimmermann
Instead of testing import_attach for imported GEM buffers, invoke drm_gem_is_imported() to do the test. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Anusha Srivatsa <asrivats@redhat.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250226172457.217725-7-tzimmermann@suse.de
2025-03-06drm/gem-shmem: Use dma_buf from GEM object instanceThomas Zimmermann
Avoid dereferencing struct drm_gem_object.import_attach for the imported dma-buf. The dma_buf field in the GEM object instance refers to the same buffer. Prepares to make import_attach optional. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Anusha Srivatsa <asrivats@redhat.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250226172457.217725-6-tzimmermann@suse.de
2025-03-06drm/gem-shmem: Test for imported buffers with drm_gem_is_imported()Thomas Zimmermann
Instead of testing import_attach for imported GEM buffers, invoke drm_gem_is_imported() to do the test. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Anusha Srivatsa <asrivats@redhat.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250226172457.217725-5-tzimmermann@suse.de
2025-03-06drm/gem-dma: Use dma_buf from GEM object instanceThomas Zimmermann
Avoid dereferencing struct drm_gem_object.import_attach for the imported dma-buf. The dma_buf field in the GEM object instance refers to the same buffer. Prepares to make import_attach optional. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Anusha Srivatsa <asrivats@redhat.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250226172457.217725-4-tzimmermann@suse.de
2025-03-06drm/gem-dma: Test for imported buffers with drm_gem_is_imported()Thomas Zimmermann
Instead of testing import_attach for imported GEM buffers, invoke drm_gem_is_imported() to do the test. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Anusha Srivatsa <asrivats@redhat.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250226172457.217725-3-tzimmermann@suse.de
2025-03-06drm/gem: Test for imported GEM buffers with helperThomas Zimmermann
Add drm_gem_is_imported() that tests if a GEM object's buffer has been imported. Update the GEM code accordingly. GEM code usually tests for imports if import_attach has been set in struct drm_gem_object. But attaching a dma-buf on import requires a DMA-capable importer device, which is not the case for many serial busses like USB or I2C. The new helper tests if a GEM object's dma-buf has been created from the GEM object. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Anusha Srivatsa <asrivats@redhat.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250226172457.217725-2-tzimmermann@suse.de
2025-03-05drm/panel: fix Visionox RM692E5 dependenciesArnd Bergmann
The newly added driver uses the DSC helpers, so the corresponding Kconfig option must be enabled: ERROR: modpost: "drm_dsc_pps_payload_pack" [drivers/gpu/drm/panel/panel-visionox-rm692e5.ko] undefined! Fixes: 7cb3274341bf ("drm/panel: Add Visionox RM692E5 panel driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250304142907.732196-1-arnd@kernel.org Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
2025-03-05drm/xe: Increase the XE_PL_TT watermarkThomas Hellström
The XE_PL_TT watermark was set to 50% of system memory. The idea behind that was unclear since the net effect is that TT memory will be evicted to TTM_PL_SYSTEM memory if that watermark is exceeded, requiring PPGTT rebinds and dma remapping. But there is no similar watermark for TTM_PL_1SYSTEM memory. The TTM functionality that tries to swap out system memory to shmem objects if a 50% limit of total system memory is reached is orthogonal to this, and with the shrinker added, it's no longer in effect. Replace the 50% TTM_PL_TT limit with a 100% limit, in effect allowing all graphics memory to be bound to the device unless it has been swapped out by the shrinker. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/intel-xe/20250305092220.123405-8-thomas.hellstrom@linux.intel.com
2025-03-05drm/xe: Add a shrinker for xe bosThomas Hellström
Rather than relying on the TTM watermark accounting add a shrinker for xe_bos in TT or system memory. Leverage the newly added TTM per-page shrinking and shmem backup support. Although xe doesn't fully support WONTNEED (purgeable) bos yet, introduce and add shrinker support for purgeable ttm_tts. v2: - Cleanups bugfixes and a KUNIT shrinker test. - Add writeback support, and activate if kswapd. v3: - Move the try_shrink() helper to core TTM. - Minor cleanups. v4: - Add runtime pm for the shrinker. Shrinking may require an active device for CCS metadata copying. v5: - Separately purge ghost- and zombie objects in the shrinker. - Fix a format specifier - type inconsistency. (Kernel test robot). v7: - s/long/s64/ (Christian König) - s/sofar/progress/ (Matt Brost) v8: - Rebase on Xe KUNIT update. - Add content verifying to the shrinker kunit test. - Split out TTM changes to a separate patch. - Get rid of multiple bool arguments for clarity (Matt Brost) - Avoid an error pointer dereference (Matt Brost) - Avoid an integer overflow (Matt Auld) - Address misc review comments by Matt Brost. v9: - Fix a compliation error. - Rebase. v10: - Update to new LRU walk interface. - Rework ghost-, zombie and purged object shrinking. - Rebase. v11: - Use additional TTM helpers. - Honor __GFP_FS and __GFP_IO - Rebase. v13: - Use ttm_tt_setup_backup(). v14: - Don't set up backup on imported bos. v15: - Rebase on backup interface changes. Cc: Christian König <christian.koenig@amd.com> Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <dri-devel@lists.freedesktop.org> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/intel-xe/20250305092220.123405-7-thomas.hellstrom@linux.intel.com
2025-03-05drm/ttm: Add helpers for shrinkingThomas Hellström
Add a number of helpers for shrinking that access core TTM and core MM functionality in a way that make them unsuitable for driver open-coding. v11: - New patch (split off from previous) and additional helpers. v13: - Adapt to ttm_backup interface change. - Take resource off LRU when backed up. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Dave Airlie <airlied@redhat.com> Acked-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/intel-xe/20250305092220.123405-6-thomas.hellstrom@linux.intel.com
2025-03-05drm/ttm: Add a macro to perform LRU iterationThomas Hellström
Following the design direction communicated here: https://lore.kernel.org/linux-mm/b7491378-defd-4f1c-31e2-29e4c77e2d67@amd.com/T/#ma918844aa8a6efe8768fdcda0c6590d5c93850c9 Export a LRU walker for driver shrinker use. The walker initially supports only trylocking, since that's the method used by shrinkes. The walker makes use of scoped_guard() to allow exiting from the LRU walk loop without performing any explicit unlocking or cleanup. v8: - Split out from another patch. - Use a struct for bool arguments to increase readability (Matt Brost). - Unmap user-space cpu-mappings before shrinking pages. - Explain non-fatal error codes (Matt Brost) v10: - Instead of using the existing helper, Wrap the interface inside out and provide a loop to de-midlayer things the LRU iteration (Christian König). - Removing the R-B by Matt Brost since the patch was significantly changed. v11: - Split the patch up to include just the LRU walk helper. v12: - Indent after scoped_guard() (Matt Brost) v15: - Adapt to new definition of scoped_guard() Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Dave Airlie <airlied@redhat.com> Acked-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/intel-xe/20250305092220.123405-5-thomas.hellstrom@linux.intel.com
2025-03-05drm/ttm: Use fault-injection to test error pathsThomas Hellström
Use fault-injection to test partial TTM swapout and interrupted swapin. Return -EINTR for swapin to test the callers ability to handle and restart the swapin, and on swapout perform a partial swapout to test that the swapin and release_shrunken functionality. v8: - Use the core fault-injection system. v9: - Fix compliation failure for !CONFIG_FAULT_INJECTION Cc: Christian König <christian.koenig@amd.com> Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <dri-devel@lists.freedesktop.org> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/intel-xe/20250305092220.123405-4-thomas.hellstrom@linux.intel.com
2025-03-05drm/ttm/pool, drm/ttm/tt: Provide a helper to shrink pagesThomas Hellström
Provide a helper to shrink ttm_tt page-vectors on a per-page basis. A ttm_backup backend could then in theory get away with allocating a single temporary page for each struct ttm_tt. This is accomplished by splitting larger pages before trying to back them up. In the future we could allow ttm_backup to handle backing up large pages as well, but currently there's no benefit in doing that, since the shmem backup backend would have to split those anyway to avoid allocating too much temporary memory, and if the backend instead inserts pages into the swap-cache, those are split on reclaim by the core. Due to potential backup- and recover errors, allow partially swapped out struct ttm_tt's, although mark them as swapped out stopping them from being swapped out a second time. More details in the ttm_pool.c DOC section. v2: - A couple of cleanups and error fixes in ttm_pool_back_up_tt. - s/back_up/backup/ - Add a writeback parameter to the exported interface. v8: - Use a struct for flags for readability (Matt Brost) - Address misc other review comments (Matt Brost) v9: - Update the kerneldoc for the ttm_tt::backup field. v10: - Rebase. v13: - Rebase on ttm_backup interface change. Update kerneldoc. - Rebase and adjust ttm_tt_is_swapped(). v15: - Rebase on ttm_backup return value change. - Rebase on previous restructuring of ttm_pool_alloc() - Rework the ttm_pool backup interface (Christian König) - Remove cond_resched() (Christian König) - Get rid of the need to allocate an intermediate page array when restoring a multi-order page (Christian König) - Update documentation. Cc: Christian König <christian.koenig@amd.com> Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <dri-devel@lists.freedesktop.org> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Christian Koenig <christian.koenig@amd.com> Link: https://lore.kernel.org/intel-xe/20250305092220.123405-3-thomas.hellstrom@linux.intel.com
2025-03-05drm/ttm: Provide a shmem backup implementationThomas Hellström
Provide a standalone shmem backup implementation. Given the ttm_backup interface, this could later on be extended to providing other backup implementation than shmem, with one use-case being GPU swapout to a user-provided fd. v5: - Fix a UAF. (kernel test robot, Dan Carptenter) v6: - Rename ttm_backup_shmem_copy_page() function argument (Matthew Brost) - Add some missing documentation v8: - Use folio_file_page to get to the page we want to writeback instead of using the first page of the folio. v13: - Remove the base class abstraction (Christian König) - Include ttm_backup_bytes_avail(). v14: - Fix kerneldoc for ttm_backup_bytes_avail() (0-day) - Work around casting of __randomize_layout struct pointer (0-day) v15: - Return negative error code from ttm_backup_backup_page() (Christian König) - Doc fixes. (Christian König). Cc: Christian König <christian.koenig@amd.com> Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <dri-devel@lists.freedesktop.org> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/intel-xe/20250305092220.123405-2-thomas.hellstrom@linux.intel.com
2025-03-05drm/amdgpu: Fix core reset sequence for JPEG5_0_1Sathishkumar S
For cores 1 through 9 repair the core reset sequence by adjusting offsets to access the expected registers. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdkfd: flag per-sdma queue reset supported to user spaceJonathan Kim
Similar to compute queue reset, flag SDMA queue reset capabilities to user space for safe testing. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdkfd: implement per queue sdma reset for gfx 9.4+Jonathan Kim
To reset hung SDMA queues on GFX 9.4+ for the GFX9 family, a soft reset must be issued through SMU. Since soft resets will reset an entire SDMA engine, use a common KGD call to do the reset as the KGD will handle avoiding a reset of in flight GFX and paging queues on that engine. In addition, create a common call for all reset types to simplify the handling of module parameter settings that block gpu resets. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdgpu: Do not program AGP BAR regs under SRIOV in gfxhub_v1_0.cVictor Lu
SRIOV VF does not have write access to AGP BAR regs. Skip the writes to avoid a dmesg warning. Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdgpu: Fix core reset sequence for JPEG4_0_3Sathishkumar S
For cores 1 through 7 repair the core reset sequence by adjusting offsets to access the expected registers. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdgpu: Add support for CPERs on virtualizationTony Yi
Add support for CPERs on VFs. VFs do not receive PMFW messages directly; as such, they need to query them from the host. To avoid hitting host event guard, CPER queries need to be rate limited. CPER queries share the same RAS telemetry buffer as error count query, so a mutex protecting the shared buffer was added as well. For readability, the amdgpu_detect_virtualization was refactored into multiple individual functions. Signed-off-by: Tony Yi <Tony.Yi@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdkfd: remove unnecessary cpu domain validationJames Zhu
before move to GTT domain. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: use drm_* instead of DRM_ in apply_edid_quirks()Aurabindo Pillai
drm_* macros are more helpful that DRM_* macros since the former indicates the associated DRM device that prints the error, which maybe helpful when debugging. Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: Add workaround for a panelAurabindo Pillai
Implement w/a for a panel which requires 10s delay after link detect. Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdgpu: Update headers for CPER support on SRIOVTony Yi
Update amdgv_sriovmsg.h and mxgpu_nv.h to add new definitions for CPER support on VFs. PMFW ACA messages are not available on VFs, and VFs must query CPERs from host. Signed-off-by: Tony Yi <Tony.Yi@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/pm: always allow ih interrupt from fwKenneth Feng
always allow ih interrupt from fw on smu v14 based on the interface requirement Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>