summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)Author
2022-08-29drm/amdgpu: declare firmware for new SDMA 6.0.3Hawking Zhang
To support new sdma ip block Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Frank Min <Frank.Min@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-29drm/amdgpu: enable smu block for smu 13.0.10John Clements
Force to enable smu block for SMU v13.0.10 Signed-off-by: John Clements <john.clements@amd.com> Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-29drm/amdgpu: add new ip block for PSP 13.0Frank Min
Add ip block support for psp v13_0_10. Signed-off-by: Frank Min <Frank.Min@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-29drm/amdgpu: added firmware module for psp 13.0.10John Clements
added missing firmware module Signed-off-by: John Clements <john.clements@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-29drm/amdgpu: support psp v13_0_10 ip blockFrank Min
Add psp v13_0_10 ip block, initialize firmware and psp functions Signed-off-by: Frank Min <Frank.Min@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-29drm/amdgpu: add new ip block for SOC21Hawking Zhang
Add ip block support for soc21_common. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Frank Min <Frank.Min@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-29drm/amdgpu: Enable pg/cg flags on GC11_0_3 for VCNSonny Jiang
This enable VCN PG, CG, DPG and JPEG PG, CG Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Reviewed-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-29drm/amdgpu: initialize common sw config for v11_0_3Hawking Zhang
init cp/pg_flags and extenal_rev_id Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Frank Min <Frank.Min@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-29drm/amdgpu: drop gc 11_0_0 golden settingsHawking Zhang
driver doesn't need to program any gc 11_0_0 golden Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-29drm/amdgpu: ensure no PCIe peer access for CPU XGMI iolinksAlex Sierra
[Why] Devices with CPU XGMI iolink do not support PCIe peer access. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-29drm/amdgpu: Fix use-after-free in amdgpu_cs_ioctlYuBiao Wang
[Why] In amdgpu_cs_ioctl, amdgpu_job_free could be performed ealier if there is -ERESTARTSYS error. In this case, job->hw_fence could be not initialized yet. Putting hw_fence during amdgpu_job_free could lead to a use-after-free warning. [How] Check if drm_sched_job_init is performed before job_free by checking s_fence. v2: Check hw_fence.ops instead since it could be NULL if fence is not initialized. Reverse the condition since !=NULL check is discouraged in kernel. Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-29drm/amdgpu: add missing pci_disable_device() in amdgpu_pmops_runtime_resume()Yang Yingliang
Add missing pci_disable_device() if amdgpu_device_resume() fails. Fixes: 8e4d5d43cc6c ("drm/amdgpu: Handling of amdgpu_device_resume return value for graceful teardown") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-29drm/amdgpu: Remove the unneeded result variableye xingchen
Return the value sdma_v5_2_start() directly instead of storing it in another redundant variable. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-29drm/amdgpu: Update mes_v11_api_def.hGraham Sider
New GFX11 MES FW adds the trap_en bit. For now hardcode to 1 (traps enabled). Signed-off-by: Graham Sider <Graham.Sider@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-29drm/amdgpu: disable FRU access on special SIENNA CICHLID cardGuchun Chen
Below driver load error will be printed, not friendly to end user. amdgpu: ATOM BIOS: 113-D603GLXE-077 [drm] FRU: Failed to get size field [drm:amdgpu_fru_get_product_info [amdgpu]] *ERROR* Failed to read FRU Manufacturer, ret:-5 Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-25drm/amdgpu: mmVM_L2_CNTL3 register not initialized correctlyQu Huang
The mmVM_L2_CNTL3 register is not assigned an initial value Signed-off-by: Qu Huang <jinsdb@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-25drm: amd: amdgpu: ACPI: Add comment about ACPI_FADT_LOW_POWER_S0Rafael J. Wysocki
According to the ACPI specification [1], the ACPI_FADT_LOW_POWER_S0 flag merely means that it is better to use low-power S0 idle on the given platform than S3 (provided that the latter is supported) and it doesn't preclude using either of them (which of them will be used depends on the choices made by user space). However, on some systems that flag is used to indicate whether or not to enable special firmware mechanics allowing the system to save more energy when suspended to idle. If that flag is unset, doing so is generally risky. Accordingly, add a comment to explain the ACPI_FADT_LOW_POWER_S0 check in amdgpu_acpi_is_s0ix_active(), the purpose of which is otherwise somewhat unclear. Link: https://uefi.org/specs/ACPI/6.4/05_ACPI_Software_Programming_Model/ACPI_Software_Programming_Model.html#fixed-acpi-description-table-fadt # [1] Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-25drm/amdgpu: use dev_info to benefit mGPU caseGuchun Chen
'free PSP TMR buffer' happens in suspend, but sometimes in mGPU config, it mixes with PSP resume log printing from another GPU, which is confusing. So use dev_info instead of DRM_INFO for printing. [drm] PSP is resuming... [drm] reserve 0xa00000 from 0x877e000000 for PSP TMR amdgpu 0000:e3:00.0: amdgpu: GECC is enabled amdgpu 0000:e3:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available amdgpu 0000:e3:00.0: amdgpu: SMU is resuming... amdgpu 0000:e3:00.0: amdgpu: smu driver if version = 0x00000040, smu fw if version = 0x00000041, smu fw program = 0, version = 0x003a5400 (58.84.0) amdgpu 0000:e3:00.0: amdgpu: SMU driver if version not matched amdgpu 0000:e3:00.0: amdgpu: dpm has been enabled amdgpu 0000:e3:00.0: amdgpu: SMU is resumed successfully! [drm] DMUB hardware initialized: version=0x02020014 [drm] free PSP TMR buffer [drm] kiq ring mec 2 pipe 1 q 0 Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-25drm/amdgpu: use adev_to_drm to get drm deviceGuchun Chen
adev_to_drm is used everywhere in amdgpu code, so modify it to keep consistency. Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-25drm/amdgpu: add MGCG perfmon setting for gfx11Likun Gao
Enable GFX11 MGCG perfmon setting. V2: set rlc to saft mode before setting. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-25drm/amd/amdgpu: avoid soft reset check when gpu recovery disabledChengming Gui
Avoid soft reset, even ip hang check (ring/ib test) when gpu recovery disabled. v2: add missing "}" Signed-off-by: Chengming Gui <Jack.Gui@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-25drm/amdgpu: Fix page table setup on ArcturusMukul Joshi
When translate_further is enabled, page table depth needs to be updated. This was missing on Arcturus MMHUB init. This was causing address translations to fail for SDMA user-mode queues. Fixes: 352e683b72e7 ("drm/amdgpu: Enable translate_further to extend UTCL2 reach") Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-25drm/amdgpu: skip set_topology_info for VFVignesh Chander
Skip set_topology_info as xgmi TA will now block it and host needs to program it. Signed-off-by: Vignesh Chander <Vignesh.Chander@amd.com> Reviewed-By : Shaoyun Liu <Shaoyun.Liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-25drm/amdgpu: add sdma instance check for gfx11 CGCGTim Huang
For some ASICs, like GFX IP v11.0.1, only have one SDMA instance, so not need to configure SDMA1_RLC_CGCG_CTRL for this case. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-25drm/amdgpu: Don't register backlight when another backlight should be used (v3)Hans de Goede
Before this commit when we want userspace to use the acpi_video backlight device we register both the GPU's native backlight device and acpi_video's firmware acpi_video# backlight device. This relies on userspace preferring firmware type backlight devices over native ones. Registering 2 backlight devices for a single display really is undesirable, don't register the GPU's native backlight device when another backlight device should be used. Changes in v2: - To avoid linker errors when amdgpu is builtin and video_detect.c is in a module, select ACPI_VIDEO and its deps if ACPI is enabled. When ACPI is disabled, ACPI_VIDEO is also always disabled, ensuring the stubs from acpi/video.h will be used. Changes in v3: - Use drm_info(drm_dev, "...") to log messages - ACPI_VIDEO can now be enabled on non X86 too, adjust the Kconfig changes to match this. Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-08-22drm/amdgpu: enable NBIO IP v7.7.0 Clock GatingTim Huang
Enable AMD_CG_SUPPORT_BIF_MGCG and AMD_CG_SUPPORT_BIF_LS support. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-22drm/amdgpu: add NBIO IP v7.7.0 Clock Gating supportTim Huang
Add BIF Clock Gating MGCG and LS support for NBIO IP v7.7.0. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-22drm/amdgpu: Remove the additional kfd pre reset call for sriovshaoyunl
The additional call is caused by merge conflict Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-22drm/amdgpu: Check num_gfx_rings for gfx v9_0 rb setup.Candice Li
No need to set up rb when no gfx rings. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-22drm/amdgpu: fix hive reference leak when adding xgmi deviceYiPeng Chai
Only amdgpu_get_xgmi_hive but no amdgpu_put_xgmi_hive which will leak the hive reference. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-22drm/amdgpu: Move psp_xgmi_terminate call from amdgpu_xgmi_remove_device to ↵YiPeng Chai
psp_hw_fini V1: The amdgpu_xgmi_remove_device function will send unload command to psp through psp ring to terminate xgmi, but psp ring has been destroyed in psp_hw_fini. V2: 1. Change the commit title. 2. Restore amdgpu_xgmi_remove_device to its original calling location. Move psp_xgmi_terminate call from amdgpu_xgmi_remove_device to psp_hw_fini. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-22drm/amdgpu: enable GFXOFF allow control for GC IP v11.0.1Tim Huang
Enable GFXOFF allow control when set the GFX power gating. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-22drm/ttm: Switch to using the new res callbackArunpravin Paneer Selvam
Apply new intersect and compatible callback instead of having a generic placement range verfications. v2: Added a separate callback for compatiblilty checks (Christian) v3: Cleanups and removal of workarounds Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220820073304.178444-6-Arunpravin.PaneerSelvam@amd.com
2022-08-22drm/amdgpu: Implement intersect/compatible functionsArunpravin Paneer Selvam
Implemented a new intersect and compatible callback function fetching start offset from backend drm buddy allocator. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220820073304.178444-3-Arunpravin.PaneerSelvam@amd.com
2022-08-19Merge tag 'amd-drm-fixes-6.0-2022-08-17' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.0-2022-08-17: amdgpu: - Revert some DML stack changes - Rounding fixes in KFD allocations - atombios vram info table parsing fix - DCN 3.1.4 fixes - Clockgating fixes for various new IPs - SMU 13.0.4 fixes - DCN 3.1.4 FP fixes - TMDS fixes for YCbCr420 4k modes - DCN 3.2.x fixes - USB 4 fixes - SMU 13.0 fixes - SMU driver unload memory leak fixes - Display orientation fix - Regression fix for generic fbdev conversion - SDMA 6.x fixes - SR-IOV fixes - IH 6.x fixes - Use after free fix in bo list handling - Revert pipe1 support - XGMI hive reset fix amdkfd: - Fix potential crach in kfd_create_indirect_link_prop() Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220818025206.6463-1-alexander.deucher@amd.com
2022-08-16drm/amdgpu: Document gfx_off members of struct amdgpu_gfxAndré Almeida
Add comments to document gfx_off related members of struct amdgpu_gfx. Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-16drm/amd: Add detailed GFXOFF stats to debugfsAndré Almeida
Add debugfs interface to log GFXOFF statistics: - Read amdgpu_gfxoff_count to get the total GFXOFF entry count at the time of query since system power-up - Write 1 to amdgpu_gfxoff_residency to start logging, and 0 to stop. Read it to get average GFXOFF residency % multiplied by 100 during the last logging interval. Both features are designed to be keep the values persistent between suspends. Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-16drm/amdgpu: use sjt mec fw on aldebaran for sriovshaoyunl
The second jump table is required on live migration or mulitple VF configuration on Aldebaran. With this implemented, the first level jump table(hw used) will be same, mec fw internal will use the second level jump table jump to the real functionality implementation. so the different VF can load different version of MEC as long as they support sjt Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-16Revert "drm/amd/amdgpu: add pipe1 hardware support"Michel Dänzer
This reverts commit 4c7631800e6bf0eced08dd7b4f793fcd972f597d. Triggered GFX hangs with GNOME Wayland on Navi 21. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2117 Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-16drm/amdgpu: Fix use-after-free on amdgpu_bo_list mutexMaíra Canal
If amdgpu_cs_vm_handling returns r != 0, then it will unlock the bo_list_mutex inside the function amdgpu_cs_vm_handling and again on amdgpu_cs_parser_fini. This problem results in the following use-after-free problem: [ 220.280990] ------------[ cut here ]------------ [ 220.281000] refcount_t: underflow; use-after-free. [ 220.281019] WARNING: CPU: 1 PID: 3746 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110 [ 220.281029] ------------[ cut here ]------------ [ 220.281415] CPU: 1 PID: 3746 Comm: chrome:cs0 Tainted: G W L ------- --- 5.20.0-0.rc0.20220812git7ebfc85e2cd7.10.fc38.x86_64 #1 [ 220.281421] Hardware name: System manufacturer System Product Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022 [ 220.281426] RIP: 0010:refcount_warn_saturate+0xba/0x110 [ 220.281431] Code: 01 01 e8 79 4a 6f 00 0f 0b e9 42 47 a5 00 80 3d de 7e be 01 00 75 85 48 c7 c7 f8 98 8e 98 c6 05 ce 7e be 01 01 e8 56 4a 6f 00 <0f> 0b e9 1f 47 a5 00 80 3d b9 7e be 01 00 0f 85 5e ff ff ff 48 c7 [ 220.281437] RSP: 0018:ffffb4b0d18d7a80 EFLAGS: 00010282 [ 220.281443] RAX: 0000000000000026 RBX: 0000000000000003 RCX: 0000000000000000 [ 220.281448] RDX: 0000000000000001 RSI: ffffffff988d06dc RDI: 00000000ffffffff [ 220.281452] RBP: 00000000ffffffff R08: 0000000000000000 R09: ffffb4b0d18d7930 [ 220.281457] R10: 0000000000000003 R11: ffffa0672e2fffe8 R12: ffffa058ca360400 [ 220.281461] R13: ffffa05846c50a18 R14: 00000000fffffe00 R15: 0000000000000003 [ 220.281465] FS: 00007f82683e06c0(0000) GS:ffffa066e2e00000(0000) knlGS:0000000000000000 [ 220.281470] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 220.281475] CR2: 00003590005cc000 CR3: 00000001fca46000 CR4: 0000000000350ee0 [ 220.281480] Call Trace: [ 220.281485] <TASK> [ 220.281490] amdgpu_cs_ioctl+0x4e2/0x2070 [amdgpu] [ 220.281806] ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu] [ 220.282028] drm_ioctl_kernel+0xa4/0x150 [ 220.282043] drm_ioctl+0x21f/0x420 [ 220.282053] ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu] [ 220.282275] ? lock_release+0x14f/0x460 [ 220.282282] ? _raw_spin_unlock_irqrestore+0x30/0x60 [ 220.282290] ? _raw_spin_unlock_irqrestore+0x30/0x60 [ 220.282297] ? lockdep_hardirqs_on+0x7d/0x100 [ 220.282305] ? _raw_spin_unlock_irqrestore+0x40/0x60 [ 220.282317] amdgpu_drm_ioctl+0x4a/0x80 [amdgpu] [ 220.282534] __x64_sys_ioctl+0x90/0xd0 [ 220.282545] do_syscall_64+0x5b/0x80 [ 220.282551] ? futex_wake+0x6c/0x150 [ 220.282568] ? lock_is_held_type+0xe8/0x140 [ 220.282580] ? do_syscall_64+0x67/0x80 [ 220.282585] ? lockdep_hardirqs_on+0x7d/0x100 [ 220.282592] ? do_syscall_64+0x67/0x80 [ 220.282597] ? do_syscall_64+0x67/0x80 [ 220.282602] ? lockdep_hardirqs_on+0x7d/0x100 [ 220.282609] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 220.282616] RIP: 0033:0x7f8282a4f8bf [ 220.282639] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00 [ 220.282644] RSP: 002b:00007f82683df410 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 220.282651] RAX: ffffffffffffffda RBX: 00007f82683df588 RCX: 00007f8282a4f8bf [ 220.282655] RDX: 00007f82683df4d0 RSI: 00000000c0186444 RDI: 0000000000000018 [ 220.282659] RBP: 00007f82683df4d0 R08: 00007f82683df5e0 R09: 00007f82683df4b0 [ 220.282663] R10: 00001d04000a0600 R11: 0000000000000246 R12: 00000000c0186444 [ 220.282667] R13: 0000000000000018 R14: 00007f82683df588 R15: 0000000000000003 [ 220.282689] </TASK> [ 220.282693] irq event stamp: 6232311 [ 220.282697] hardirqs last enabled at (6232319): [<ffffffff9718cd7e>] __up_console_sem+0x5e/0x70 [ 220.282704] hardirqs last disabled at (6232326): [<ffffffff9718cd63>] __up_console_sem+0x43/0x70 [ 220.282709] softirqs last enabled at (6232072): [<ffffffff970ff669>] __irq_exit_rcu+0xf9/0x170 [ 220.282716] softirqs last disabled at (6232061): [<ffffffff970ff669>] __irq_exit_rcu+0xf9/0x170 [ 220.282722] ---[ end trace 0000000000000000 ]--- Therefore, remove the mutex_unlock from the amdgpu_cs_vm_handling function, so that amdgpu_cs_submit and amdgpu_cs_parser_fini can handle the unlock. Fixes: 90af0ca047f3 ("drm/amdgpu: Protect the amdgpu_bo_list list with a mutex v2") Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Maíra Canal <mairacanal@riseup.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-16drm/amdgpu: Fix interrupt handling on ih_soft ringMukul Joshi
There are no backing hardware registers for ih_soft ring. As a result, don't try to access hardware registers for read and write pointers when processing interrupts on the IH soft ring. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-16drm/amdgpu: reduce reset timeVictor Zhao
In multi container use case, reset time is important, so skip ring tests and cp halt wait during ip suspending for reset as they are going to fail and cost more time on reset v2: add a hang flag to indicate the reset comes from a job timeout, skip ring test and cp halt wait in this case v3: move hang flag to adev Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-16drm/amdgpu: revert context to stop engine before mode2 resetVictor Zhao
For some hang caused by slow tests, engine cannot be stopped which may cause resume failure after reset. In this case, force halt engine by reverting context addresses Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-16drm/amdgpu: save and restore gc hub regsVictor Zhao
Save and restore gfxhub regs as they will be reset during mode 2 Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-16drm/amdgpu: add debugfs amdgpu_reset_levelVictor Zhao
Introduce amdgpu_reset_level debugfs in order to help debug and test specific type of reset. Also helps blocking unwanted type of resets. By default, mode2 reset will not be enabled v2: make this debugfs in adev and use debugfs_create_u32 Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-16drm/amdgpu: let mode2 reset fallback to default when failureVictor Zhao
- introduce AMDGPU_SKIP_MODE2_RESET flag - let mode2 reset fallback to default reset method if failed v2: move this part out from the asic specific part Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-16drm/amdgpu: add mode2 reset for sienna_cichlidVictor Zhao
To meet the requirement for multi container usecase which needs a quicker reset and not causing VRAM lost, adding the Mode2 reset handler for sienna_cichlid. v2: move skip mode2 flag part separately v3: remove the use of asic_reset_res Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-16drm/amdgpu: Add secure display TA load for RenoirShane Xiao
Add secure display TA load for Renoir Signed-off-by: Shane Xiao <shane.xiao@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-16drm/amdgpu/vcn: Return void from the stop_dbg_modeKhalid Masum
There is no point in returning an int here. It only returns 0 which the caller never uses. Therefore return void and remove the unnecessary assignment. Addresses-Coverity: 1504988 ("Unused value") Fixes: 8da1170a16e4 ("drm/amdgpu: add VCN4 ip block support") Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Suggested-by: Ruijing Dong <ruijing.dong@amd.com> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Khalid Masum <khalid.masum.92@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-16drm/amdgpu: remove useless condition in amdgpu_job_stop_all_jobs_on_sched()Andrey Strachuk
Local variable 'rq' is initialized by an address of field of drm_sched_job, so it does not make sense to compare 'rq' with NULL. Found by Linux Verification Center (linuxtesting.org) with SVACE. Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Andrey Strachuk <strochuk@ispras.ru> Fixes: 7c6e68c777f1 ("drm/amdgpu: Avoid HW GPU reset for RAS.") Signed-off-by: Alex Deucher <alexander.deucher@amd.com>