linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2023-06-09	drm/amdgpu: disable sdma ecc irq only when sdma RAS is enabled in suspend	Guchun Chen
	sdma_v4_0_ip is shared on a few asics, but in sdma_v4_0_hw_fini, driver unconditionally disables ecc_irq which is only enabled on those asics enabling sdma ecc. This will introduce a warning in suspend cycle on those chips with sdma ip v4.0, while without sdma ecc. So this patch correct this. [ 7283.166354] RIP: 0010:amdgpu_irq_put+0x45/0x70 [amdgpu] [ 7283.167001] RSP: 0018:ffff9a5fc3967d08 EFLAGS: 00010246 [ 7283.167019] RAX: ffff98d88afd3770 RBX: 0000000000000001 RCX: 0000000000000000 [ 7283.167023] RDX: 0000000000000000 RSI: ffff98d89da30390 RDI: ffff98d89da20000 [ 7283.167025] RBP: ffff98d89da20000 R08: 0000000000036838 R09: 0000000000000006 [ 7283.167028] R10: ffffd5764243c008 R11: 0000000000000000 R12: ffff98d89da30390 [ 7283.167030] R13: ffff98d89da38978 R14: ffffffff999ae15a R15: ffff98d880130105 [ 7283.167032] FS: 0000000000000000(0000) GS:ffff98d996f00000(0000) knlGS:0000000000000000 [ 7283.167036] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7283.167039] CR2: 00000000f7a9d178 CR3: 00000001c42ea000 CR4: 00000000003506e0 [ 7283.167041] Call Trace: [ 7283.167046] <TASK> [ 7283.167048] sdma_v4_0_hw_fini+0x38/0xa0 [amdgpu] [ 7283.167704] amdgpu_device_ip_suspend_phase2+0x101/0x1a0 [amdgpu] [ 7283.168296] amdgpu_device_suspend+0x103/0x180 [amdgpu] [ 7283.168875] amdgpu_pmops_freeze+0x21/0x60 [amdgpu] [ 7283.169464] pci_pm_freeze+0x54/0xc0 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2522 Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: Fix vram recover doesn't work after whole GPU reset (v2)	Lin.Cao
	v1: Vmbo->shadow is used to back vram bo up when vram lost. So that we should set shadow as vmbo->shadow to recover vmbo->bo v2: Modify if(vmbo->shadow) shadow = vmbo->shadow as if(!vmbo->shadow) continue; Fixes: e18aaea733da ("drm/amdgpu: move shadow_list to amdgpu_bo_vm") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Lin.Cao <lincao12@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: drop gfx_v11_0_cp_ecc_error_irq_funcs	Horatio Zhang
	The gfx.cp_ecc_error_irq is retired in gfx11. In gfx_v11_0_hw_fini still use amdgpu_irq_put to disable this interrupt, which caused the call trace in this function. [ 102.873958] Call Trace: [ 102.873959] <TASK> [ 102.873961] gfx_v11_0_hw_fini+0x23/0x1e0 [amdgpu] [ 102.874019] gfx_v11_0_suspend+0xe/0x20 [amdgpu] [ 102.874072] amdgpu_device_ip_suspend_phase2+0x240/0x460 [amdgpu] [ 102.874122] amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu] [ 102.874172] amdgpu_device_pre_asic_reset+0xd9/0x490 [amdgpu] [ 102.874223] amdgpu_device_gpu_recover.cold+0x548/0xce6 [amdgpu] [ 102.874321] amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu] [ 102.874375] process_one_work+0x21f/0x3f0 [ 102.874377] worker_thread+0x200/0x3e0 [ 102.874378] ? process_one_work+0x3f0/0x3f0 [ 102.874379] kthread+0xfd/0x130 [ 102.874380] ? kthread_complete_and_exit+0x20/0x20 [ 102.874381] ret_from_fork+0x22/0x30 v2: - Handle umc and gfx ras cases in separated patch - Retired the gfx_v11_0_cp_ecc_error_irq_funcs in gfx11 v3: - Improve the subject and code comments - Add judgment on gfx11 in the function of amdgpu_gfx_ras_late_init v4: - Drop the define of CP_ME1_PIPE_INST_ADDR_INTERVAL and SET_ECC_ME_PIPE_STATE which using in gfx_v11_0_set_cp_ecc_error_state - Check cp_ecc_error_irq.funcs rather than ip version for a more sustainable life v5: - Simplify judgment conditions Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: set default num_kcq to 2 under sriov	YuBiao Wang
	The number of kernel queues has impact on the latency under sriov usecase. So to reduce the latency we set the default num_kcq = 2 under sriov if not set manually. Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: install stub fence into potential unused fence pointers	Lang Yu
	When using cpu to update page tables, vm update fences are unused. Install stub fence into these fence pointers instead of NULL to avoid NULL dereference when calling dma_fence_wait() on them. Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: Remove the unused variable golden_settings_gc_9_4_3	Jiapeng Chong
	Variable golden_settings_gc_9_4_3 is not effectively used, so delete it. drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:48:38: warning: ‘golden_settings_gc_9_4_3’ defined but not used. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4877 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdkfd: Don't trigger evictions unmapping dmabuf attachments	Felix Kuehling
	Don't move DMABuf attachments for PCIe P2P mappings to the SYSTEM domain when unmapping. This avoids triggering eviction fences unnecessarily. Instead do the move to SYSTEM and back to GTT when mapping these attachments to ensure the SG table gets updated after evictions. This may still trigger unnecessary evictions if user mode unmaps and remaps the same BO. However, this is unlikely in real applications. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Eric Huang <jinhuieric.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: remove unneeded semicolon	Jiapeng Chong
	No functional modification involved. ./drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c:146:2-3: Unneeded semicolon. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4871 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: do gfxhub init for all XCDs	Le Ma
	Each XCD needs to do gfxhub init Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: fix an amdgpu_irq_put() issue in gmc_v9_0_hw_fini()	Hamza Mahfooz
	As made mention of in commit c56edea58c31 ("drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v10_0_hw_fini") and commit aa6ac247ed7d ("drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v11_0_hw_fini"). It is meaningless to call amdgpu_irq_put() for gmc.ecc_irq. So, remove it from gmc_v9_0_hw_fini(). Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2522 Fixes: c8b5a95b5709 ("drm/amdgpu: Fix desktop freezed after gpu-reset") Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: unlock on error in gfx_v9_4_3_kiq_resume()	Dan Carpenter
	Smatch complains that we need to drop this lock before returning. drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:1838 gfx_v9_4_3_kiq_resume() warn: inconsistent returns 'ring->mqd_obj->tbo.base.resv'. Fixes: 86301129698b ("drm/amdgpu: split gc v9_4_3 functionality from gc v9_0") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: unlock the correct lock in amdgpu_gfx_enable_kcq()	Dan Carpenter
	We changed which lock we are supposed to take but this error path was accidentally over looked so it still drops the old lock. Fixes: def799c6596d ("drm/amdgpu: add multi-xcc support to amdgpu_gfx interfaces (v4)") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: drop unused function	Alex Deucher
	amdgpu_discovery_get_ip_version() has not been used since commit c40bdfb2ffa4 ("drm/amdgpu: fix incorrect VCN revision in SRIOV") so drop it. Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: drop invalid IP revision	Alex Deucher
	This was already fixed and dropped in: commit baf3f8f37406 ("drm/amdgpu: handle SRIOV VCN revision parsing") commit c40bdfb2ffa4 ("drm/amdgpu: fix incorrect VCN revision in SRIOV") But seems to have been accidently been left around in a merge. Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: put MQDs in VRAM	Alex Deucher
	Reduces preemption latency. Only enable this for gfx10 and 11 for now to avoid changing behavior on gfx 8 and 9. v2: move MES MQDs into VRAM as well (YuBiao) v3: enable on gfx10, 11 only (Alex) v4: minor style changes, document why gfx10/11 only (Alex) Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: drop redundant sched job cleanup when cs is aborted	Guchun Chen
	Once command submission failed due to userptr invalidation in amdgpu_cs_submit, legacy code will perform cleanup of scheduler job. However, it's not needed at all, as former commit has integrated job cleanup stuff into amdgpu_job_free. Otherwise, because of double free, a NULL pointer dereference will occur in such scenario. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2457 Fixes: f7d66fb2ea43 ("drm/amdgpu: cleanup scheduler job initialization v2") Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amd/amdgpu: Fix errors & warnings in amdgpu _bios, _cs, _dma_buf, _fence.c	Srinivasan Shanmugam
	The following checkpatch errors & warning is removed. ERROR: else should follow close brace '}' ERROR: trailing statements should be on next line WARNING: Prefer 'unsigned int' to bare use of 'unsigned' WARNING: Possible repeated word: 'Fences' WARNING: Missing a blank line after declarations WARNING: braces {} are not necessary for single statement blocks WARNING: Comparisons should place the constant on the right side of the test WARNING: printk() should include KERN_<LEVEL> facility level Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: set gfx9 onwards APU atomics support to be true	Yifan Zhang
	APUs w/ gfx9 onwards doesn't reply on PCIe atomics, rather it is internal path w/ native atomic support. Set have_atomics_support to true. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Lang Yu <lang.yu@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu/gfx11: always restore kcq/kgq MQDs	Alex Deucher
	Always restore the MQD not just when we do a reset. This allows us to move the MQD to VRAM if we want. v2: always reset ring pointer as well (Christian) Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu/nv: update VCN 3 max HEVC encoding resolution	Thong Thai
	Update the maximum resolution reported for HEVC encoding on VCN 3 devices to reflect its 8K encoding capability. v2: Also update the max height for H.264 encoding to match spec. (Ruijing) Signed-off-by: Thong Thai <thong.thai@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu/gfx10: always restore kcq/kgq MQDs	Alex Deucher
	Always restore the MQD not just when we do a reset. This allows us to move the MQD to VRAM if we want. v2: always reset ring pointer as well (Christian) Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu/gfx9: always restore kcq MQDs	Alex Deucher
	Always restore the MQD not just when we do a reset. This allows us to move the MQD to VRAM if we want. v2: always reset ring pointer as well (Christian) Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu/gfx8: always restore kcq MQDs	Alex Deucher
	Always restore the MQD not just when we do a reset. This allows us to move the MQD to VRAM if we want. v2: always reset ring pointer as well (Christian) Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu/gfx11: drop unused variable	Alex Deucher
	Just check the return value directly. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu/gfx10: drop unused variable	Alex Deucher
	Just check the return value directly. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu/gfx11: use generic [en/dis]able_kgq() helpers	Alex Deucher
	And remove the duplicate local variants. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu/gfx10: use generic [en/dis]able_kgq() helpers	Alex Deucher
	And remove the duplicate local variants. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: add [en/dis]able_kgq() functions	Alex Deucher
	To replace the IP specific variants which are largely duplicate. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amd/amdgpu: Fix style problems in amdgpu_psp.c	Srinivasan Shanmugam
	Fix the following checkpatch warnings & error in amdgpu_psp.c WARNING: Comparisons should place the constant on the right side of the test WARNING: braces {} are not necessary for single statement blocks WARNING: please, no space before tabs WARNING: braces {} are not necessary for single statement blocks ERROR: that open brace { should be on the previous line Suggested-by: Christian König <christian.koenig@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu/gfx10: drop old bring up code	Alex Deucher
	No longer used. Remove it. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu/gfx11: drop old bring up code	Alex Deucher
	No longer used. Remove it. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amd/amdgpu: Fix style problems in amdgpu_debugfs.c	Srinivasan Shanmugam
	Fix the following issues reported by checkpatch: WARNING: please, no space before tabs WARNING: Prefer 'unsigned int' to bare use of 'unsigned' WARNING: sizeof rd should be sizeof(rd) WARNING: Missing a blank line after declarations WARNING: sizeof rd->id should be sizeof(rd->id) WARNING: static const char * array should probably be static const char * const WARNING: Symbolic permissions 'S_IRUGO' are not preferred. Consider using octal permissions '0444'. WARNING: Prefer seq_puts to seq_printf ERROR: space prohibited after that open parenthesis '(' Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: Enable mcbp under sriov by default	YuBiao Wang
	Enable mcbp under sriov by default. Asics with soc21 supports mcbp now so we should set it enabled. Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com> Reviewed-by: Horace Chen <horace.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: remove pasid_src field from IV entry	Xiaomeng Hou
	PASID_SRC is not actually present in the Interrupt Packet, the field is taken as reserved bits now. So remove it from IV entry to avoid misuse. Signed-off-by: Xiaomeng Hou <Xiaomeng.Hou@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amd/amdgpu: Simplify switch case statements in amdgpu_connectors.c	Srinivasan Shanmugam
	Fix the following checkpatch errors: ERROR: trailing statements should be on next line ERROR: space required after that ',' (ctx:VxV) ERROR: code indent should use tabs where possible Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: disable SDMA WPTR_POLL_ENABLE for SR-IOV	Horace Chen
	[Why] This WPTR_POLL_ENABLE is a hardware contigious polling which will cause FCLK and UCLK to keep on a high level. Mostly its case can be covered by F32_WPTR_POLL_ENABLE which polls by firmware. So to save power, SR-IOV also needs to disable this bit Signed-off-by: Horace Chen <horace.chen@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: Add SDMA_UTCL1_WR_FIFO_SED field for sdma_v4_4_ras_field	Stanley.Yang
	Query sdma_utcl1_wr_fifo_sed fiel to detect UTCL1_WR_FIFO SED error counts Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: add a missing lock for AMDGPU_SCHED	Chia-I Wu
	mgr->ctx_handles should be protected by mgr->lock. v2: improve commit message v3: add a Fixes tag Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Fixes: 52c6a62c64fa ("drm/amdgpu: add interface for editing a foreign process's priority v3") Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdkfd: Update KFD TTM mem limit	Mukul Joshi
	Use the helper function in TTM to get TTM memory limit and set KFD's internal mem limit. This ensures that KFD's TTM mem limit and actual TTM mem limit are exactly same. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: Set GTT size equal to TTM mem limit	Mukul Joshi
	Use the helper function in TTM to get TTM mem limit and set GTT size to be equal to TTL mem limit. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: mark gfx_v9_4_3_disable_gpa_mode() static	Guchun Chen
	This was left global by accident, the corresponding functions for other hardware types are already static: drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:1072:6: error: no previous prototype for function 'gfx_v9_4_3_disable_gpa_mode' [-Werror,-Wmissing-prototypes] Fixes: 86301129698b ("drm/amdgpu: split gc v9_4_3 functionality from gc v9_0") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: check correct allocated mqd_backup object after alloc	Guchun Chen
	Instead of the default one, check the right mqd_backup object. Signed-off-by: Guchun Chen <guchun.chen@amd.com> Cc: Le Ma <le.ma@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: fix a build warning by a typo in amdgpu_gfx.c	Guchun Chen
	This should be a typo when intruducing multi-xx support. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Cc: Le Ma <le.ma@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v10_0_hw_fini	Horatio Zhang
	The gmc.ecc_irq is enabled by firmware per IFWI setting, and the host driver is not privileged to enable/disable the interrupt. So, it is meaningless to use the amdgpu_irq_put function in gmc_v10_0_hw_fini, which also leads to the call trace. [ 82.340264] Call Trace: [ 82.340265] <TASK> [ 82.340269] gmc_v10_0_hw_fini+0x83/0xa0 [amdgpu] [ 82.340447] gmc_v10_0_suspend+0xe/0x20 [amdgpu] [ 82.340623] amdgpu_device_ip_suspend_phase2+0x127/0x1c0 [amdgpu] [ 82.340789] amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu] [ 82.340955] amdgpu_device_pre_asic_reset+0xdd/0x2b0 [amdgpu] [ 82.341122] amdgpu_device_gpu_recover.cold+0x4dd/0xbb2 [amdgpu] [ 82.341359] amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu] [ 82.341529] process_one_work+0x21d/0x3f0 [ 82.341535] worker_thread+0x1fa/0x3c0 [ 82.341538] ? process_one_work+0x3f0/0x3f0 [ 82.341540] kthread+0xff/0x130 [ 82.341544] ? kthread_complete_and_exit+0x20/0x20 [ 82.341547] ret_from_fork+0x22/0x30 Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v11_0_hw_fini	Horatio Zhang
	The gmc.ecc_irq is enabled by firmware per IFWI setting, and the host driver is not privileged to enable/disable the interrupt. So, it is meaningless to use the amdgpu_irq_put function in gmc_v11_0_hw_fini, which also leads to the call trace. [ 102.980303] Call Trace: [ 102.980303] <TASK> [ 102.980304] gmc_v11_0_hw_fini+0x54/0x90 [amdgpu] [ 102.980357] gmc_v11_0_suspend+0xe/0x20 [amdgpu] [ 102.980409] amdgpu_device_ip_suspend_phase2+0x240/0x460 [amdgpu] [ 102.980459] amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu] [ 102.980520] amdgpu_device_pre_asic_reset+0xd9/0x490 [amdgpu] [ 102.980573] amdgpu_device_gpu_recover.cold+0x548/0xce6 [amdgpu] [ 102.980687] amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu] [ 102.980740] process_one_work+0x21f/0x3f0 [ 102.980741] worker_thread+0x200/0x3e0 [ 102.980742] ? process_one_work+0x3f0/0x3f0 [ 102.980743] kthread+0xfd/0x130 [ 102.980743] ? kthread_complete_and_exit+0x20/0x20 [ 102.980744] ret_from_fork+0x22/0x30 Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: Enable doorbell selfring after resize FB BAR	Shane Xiao
	[Why] The selfring doorbell aperture will change when resize FB BAR successfully during gmc sw init, we should reorder the sequence of enabling doorbell selfring aperture. [How] Move enable_doorbell_selfring_aperture from _common_hw_init to _common_late_init. This fixes the potential issue that GPU ring its own doorbell when this device is in translated mode when iommu is on. v2: Remove *_enable_doorbell_aperture functions (Christian) v3: Add comments to note that why we need enable doorbell selfring late (Christian) Signed-off-by: Shane Xiao <shane.xiao@amd.com> Signed-off-by: Aaron Liu <aaron.liu@amd.com> Tested-by: Xiaomeng Hou <Xiaomeng.Hou@amd.com> Reviewed-by: Christian K�nig <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amd/amdgpu: Fix style errors in amdgpu_display.c	Srinivasan Shanmugam
	Fix following checkpatch errors in amdgpu_display.c ERROR: spaces required around that '=' (ctx:VxW) ERROR: that open brace { should be on the previous line ERROR: else should follow close brace '}' Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: Use the default reset when loading or reloading the driver	lyndonli
	Below call trace and errors are observed when reloading amdgpu driver with the module parameter reset_method=3. It should do a default reset when loading or reloading the driver, regardless of the module parameter reset_method. v2: add comments inside and modify commit messages. [ +2.180243] [drm] psp gfx command ID_LOAD_TOC(0x20) failed and response status is (0x0) [ +0.000011] [drm:psp_hw_start [amdgpu]] ERROR Failed to load toc [ +0.000890] [drm:psp_hw_start [amdgpu]] ERROR PSP tmr init failed! [ +0.020683] [drm:amdgpu_fill_buffer [amdgpu]] ERROR Trying to clear memory with ring turned off. [ +0.000003] RIP: 0010:amdgpu_bo_release_notify+0x1ef/0x210 [amdgpu] [ +0.000004] Call Trace: [ +0.000003] <TASK> [ +0.000008] ttm_bo_release+0x2c4/0x330 [amdttm] [ +0.000026] amdttm_bo_put+0x3c/0x70 [amdttm] [ +0.000020] amdgpu_bo_free_kernel+0xe6/0x140 [amdgpu] [ +0.000728] psp_v11_0_ring_destroy+0x34/0x60 [amdgpu] [ +0.000826] psp_hw_init+0xe7/0x2f0 [amdgpu] [ +0.000813] amdgpu_device_fw_loading+0x1ad/0x2d0 [amdgpu] [ +0.000731] amdgpu_device_init.cold+0x108e/0x2002 [amdgpu] [ +0.001071] ? do_pci_enable_device+0xe1/0x110 [ +0.000011] amdgpu_driver_load_kms+0x1a/0x160 [amdgpu] [ +0.000729] amdgpu_pci_probe+0x179/0x3a0 [amdgpu] Signed-off-by: lyndonli <Lyndon.Li@amd.com> Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: Fix mode2 reset for sienna cichlid	lyndonli
	Before this change, sienna_cichlid_get_reset_handler will always return NULL, although the module parameter reset_method is 3 when loading amdgpu driver. Signed-off-by: lyndonli <Lyndon.Li@amd.com> Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09	drm/amdgpu: add new flag to AMDGPU_CTX_QUERY2	Pierre-Eric Pelloux-Prayer
	OpenGL EXT_robustness extension expects the driver to stop reporting GUILTY_CONTEXT_RESET when the reset has completed and the GPU is ready to accept submission again. This commit adds a AMDGPU_CTX_QUERY2_FLAGS_RESET_IN_PROGRESS flag, that let the UMD know that the reset is still not finished. Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22290 Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>