From 033c56474acf567a450f8bafca50e0b610f2b716 Mon Sep 17 00:00:00 2001 From: YuBiao Wang Date: Thu, 16 Mar 2023 11:30:32 +0800 Subject: drm/amdgpu: Force signal hw_fences that are embedded in non-sched jobs [Why] For engines not supporting soft reset, i.e. VCN, there will be a failed ib test before mode 1 reset during asic reset. The fences in this case are never signaled and next time when we try to free the sa_bo, kernel will hang. [How] During pre_asic_reset, driver will clear job fences and afterwards the fences' refcount will be reduced to 1. For drm_sched_jobs it will be released in job_free_cb, and for non-sched jobs like ib_test, it's meant to be released in sa_bo_free but only when the fences are signaled. So we have to force signal the non_sched bad job's fence during pre_asic_reset or the clear is not complete. Signed-off-by: YuBiao Wang Acked-by: Luben Tuikov Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c') diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index faff4a3f96e6e..f52d0ba91a770 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -678,6 +678,15 @@ void amdgpu_fence_driver_clear_job_fences(struct amdgpu_ring *ring) ptr = &ring->fence_drv.fences[i]; old = rcu_dereference_protected(*ptr, 1); if (old && old->ops == &amdgpu_job_fence_ops) { + struct amdgpu_job *job; + + /* For non-scheduler bad job, i.e. failed ib test, we need to signal + * it right here or we won't be able to track them in fence_drv + * and they will remain unsignaled during sa_bo free. + */ + job = container_of(old, struct amdgpu_job, hw_fence); + if (!job->base.s_fence && !dma_fence_is_signaled(old)) + dma_fence_signal(old); RCU_INIT_POINTER(*ptr, NULL); dma_fence_put(old); } -- cgit v1.2.3 From c1a322a7a4a96cd0a3dde32ce37af437a78bf8cd Mon Sep 17 00:00:00 2001 From: Guchun Chen Date: Tue, 9 May 2023 16:15:27 +0800 Subject: drm/amdgpu: skip disabling fence driver src_irqs when device is unplugged When performing device unbind or halt, we have disabled all irqs at the very begining like amdgpu_pci_remove or amdgpu_device_halt. So amdgpu_irq_put for irqs stored in fence driver should not be called any more, otherwise, below calltrace will arrive. [ 139.114088] WARNING: CPU: 2 PID: 1550 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:616 amdgpu_irq_put+0xf6/0x110 [amdgpu] [ 139.114655] Call Trace: [ 139.114655] [ 139.114657] amdgpu_fence_driver_hw_fini+0x93/0x130 [amdgpu] [ 139.114836] amdgpu_device_fini_hw+0xb6/0x350 [amdgpu] [ 139.114955] amdgpu_driver_unload_kms+0x51/0x70 [amdgpu] [ 139.115075] amdgpu_pci_remove+0x63/0x160 [amdgpu] [ 139.115193] ? __pm_runtime_resume+0x64/0x90 [ 139.115195] pci_device_remove+0x3a/0xb0 [ 139.115197] device_remove+0x43/0x70 [ 139.115198] device_release_driver_internal+0xbd/0x140 Signed-off-by: Guchun Chen Acked-by: Alex Deucher Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c') diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index f52d0ba91a770..a7d250809da99 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -582,7 +582,8 @@ void amdgpu_fence_driver_hw_fini(struct amdgpu_device *adev) if (r) amdgpu_fence_driver_force_completion(ring); - if (ring->fence_drv.irq_src) + if (!drm_dev_is_unplugged(adev_to_drm(adev)) && + ring->fence_drv.irq_src) amdgpu_irq_put(adev, ring->fence_drv.irq_src, ring->fence_drv.irq_type); -- cgit v1.2.3