summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-04-09drm/amd/display: Set max TTU on DPG enableWesley Chalmers
[WHY] There is a bug in HW that causes P-State to hang when DPG is enabled in certain conditions. [HOW] The solution is to force MIN_TTU_VBLANK register to maximum value whenever DPG has been enabled. Make stream do a full update on test pattern change, so that the TTUs get updated. When DPG is enabled, update the ttu_regs.min_ttu_vblank field of each pipe in the stream's topology to the maximum value (0xffffff). v2: squash in build fix for when DCN is not defined (Alex) Signed-off-by: Wesley Chalmers <Wesley.Chalmers@amd.com> Reviewed-by: Tony Cheng <Tony.Cheng@amd.com> Acked-by: Anson Jacob <Anson.Jacob@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amd/display: New path for enabling DPGWesley Chalmers
[WHY] We want to make enabling test pattern a part of the stream update code path. This change is the first step towards that goal. Signed-off-by: Wesley Chalmers <Wesley.Chalmers@amd.com> Reviewed-by: Aric Cyr <Aric.Cyr@amd.com> Reviewed-by: Tony Cheng <Tony.Cheng@amd.com> Acked-by: Anson Jacob <Anson.Jacob@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amd/display: Update display endpoint control path.Jimmy Kizito
[Why] Some display endpoints may be dynamically mapped to the link encoders which drive them. [How] Update the code paths for display enabling/disabling to accommodate the dynamic association between links and link encoders. Signed-off-by: Jimmy Kizito <Jimmy.Kizito@amd.com> Reviewed-by: Jun Lei <Jun.Lei@amd.com> Acked-by: Anson Jacob <Anson.Jacob@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amd/display: Add dynamic link encoder selection.Jimmy Kizito
[Why] Some display endpoints may be programmably mapped to compatible link encoders. The assignment of link encoders to links has to be dynamic to accommodate the increased flexibility in comparison to conventional display endpoints. [How] - Add link encoder assignment tracking variables. - Execute link encoder assignment algorithm before enabling link and release link encoders from links once they are disabled. Signed-off-by: Jimmy Kizito <Jimmy.Kizito@amd.com> Reviewed-by: Jun Lei <Jun.Lei@amd.com> Acked-by: Anson Jacob <Anson.Jacob@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amd/display: Fix MST topology debugfsEryk Brol
[why] The drm dump_topology function was previously called on all DP connectors. This resulted in empty topology dumps for those connectors which weren't root MST nodes. [how] Make sure we only dump topology from the root MST node. Signed-off-by: Eryk Brol <eryk.brol@amd.com> Reviewed-by: Aurabindo Jayamohanan Pillai <Aurabindo.Pillai@amd.com> Acked-by: Anson Jacob <Anson.Jacob@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amd/display: LTTPR config logicWesley Chalmers
[WHY] Some systems can enable LTTPR through bits in BIOS, while other systems can be configured at boot to enable LTTPR. Some configs enable Non-Transparent mode, while others enable Transparent mode. Signed-off-by: Wesley Chalmers <Wesley.Chalmers@amd.com> Reviewed-by: Jun Lei <Jun.Lei@amd.com> Acked-by: Anson Jacob <Anson.Jacob@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amd/display: Enumerate LTTPR modesWesley Chalmers
[WHY] There are three possible modes for LTTPR: - Non-LTTPR mode, where AUX timeout is 400 us and no per-hop link training is done - LTTPR Transparent mode, where AUX timeout is 3200 us and no per-hop link training is done - LTTPR Non-Transparent mode, where AUX timeout is 3200 us and per-hop link training is done [HOW] Use an enum instead of a bool to track LTTPR state; modify comparisons accordingly. Signed-off-by: Wesley Chalmers <Wesley.Chalmers@amd.com> Reviewed-by: Jun Lei <Jun.Lei@amd.com> Acked-by: Anson Jacob <Anson.Jacob@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amd/display: Interface for LTTPR interopWesley Chalmers
[WHY] The logic to toggle LTTPR transparent/non-transparent requires 2 flags provided by BIOS [HOW] Repurpose the interface to get dce caps so both LTTPR querying functions can use them. Signed-off-by: Wesley Chalmers <Wesley.Chalmers@amd.com> Reviewed-by: Jun Lei <Jun.Lei@amd.com> Acked-by: Anson Jacob <Anson.Jacob@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amd/display: Rename fs_params to hdr_tm_paramsKrunoslav Kovac
[Why&How] Renaming structure to better indicate its meaning. Signed-off-by: Krunoslav Kovac <Krunoslav.Kovac@amd.com> Reviewed-by: Aric Cyr <Aric.Cyr@amd.com> Acked-by: Anson Jacob <Anson.Jacob@amd.com> Acked-by: Anthony Koo <Anthony.Koo@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amd/display: Fix typo for variable nameVladimir Stempen
[why] Word "remainder" was misspelled as "reminder" in reduceSizeAndFraction method variable. [how] Fix the spelling. Signed-off-by: Vladimir Stempen <vladimir.stempen@amd.com> Reviewed-by: Alexander Deucher <alexander.deucher@amd.com> Reviewed-by: Bindu R <Bindu.R@amd.com> Acked-by: Anson Jacob <Anson.Jacob@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amd/display: add mod hdcp interface for supporting encryption state queryWenjing Liu
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com> Reviewed-by: George Shen <George.Shen@amd.com> Acked-by: Anson Jacob <Anson.Jacob@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amd/display: define mod_hdcp_display_disable_option structWenjing Liu
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com> Reviewed-by: George Shen <George.Shen@amd.com> Acked-by: Anson Jacob <Anson.Jacob@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amd/display: enable DP DSC Compliance automationQingqing Zhuo
[Why] Color depth data is not parsed during test requests. [How] Update display color depth according to color depth request from the test equipment. Signed-off-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Acked-by: Anson Jacob <Anson.Jacob@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amd/display: Guard ASSR with internal display flagStylon Wang
[Why] ASSR enabling only considers capability declared in DPCD. We also need to check whether the connector is internal. [How] ASSR enabling need to check both DPCD capability and internal display flag. Signed-off-by: Stylon Wang <stylon.wang@amd.com> Reviewed-by: Harry Wentland <Harry.Wentland@amd.com> Acked-by: Anson Jacob <Anson.Jacob@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amd/display: Fix static checker warnings on tracebuff_fbLeo (Hanghong) Ma
[Why] Static analysis on linux-next has found a potential null pointer dereference; [How] Refactor the function, add ASSERT and remove the unnecessary check. Signed-off-by: Leo (Hanghong) Ma <hanghong.ma@amd.com> Reviewed-by: Harry Wentland <Harry.Wentland@amd.com> Acked-by: Anson Jacob <Anson.Jacob@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amd/display: Add refresh rate traceRodrigo Siqueira
When we have to debug VRR issues, we usually want to know the current refresh rate; for this reason, it is handy to have a way to check in real-time the refresh rate value. This commit introduces a kernel trace that can provide such information. Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Reviewed-by: Harry Wentland <Harry.Wentland@amd.com> Acked-by: Anson Jacob <Anson.Jacob@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amd/display: BIOS LTTPR Caps InterfaceWesley Chalmers
[WHY] Some platforms will have LTTPR capabilities forced on by VBIOS flags; the functions added here will access those flags. Signed-off-by: Wesley Chalmers <Wesley.Chalmers@amd.com> Reviewed-by: Jun Lei <Jun.Lei@amd.com> Acked-by: Anson Jacob <Anson.Jacob@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: skip PP_MP1_STATE_UNLOAD on aldebaranFeifei Xu
This message is not needed on Aldebaran. Signed-off-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amd/amdgpu: set MP1 state to UNLOAD before reload its FW for ↵Chengming Gui
vega20/ALDEBARAN When resume from gpu reset, need set MP1 state to UNLOAD before reload SMU FW otherwise will cause following errors: [ 121.642772] [drm] reserve 0x400000 from 0x87fec00000 for PSP TMR [ 123.801051] [drm] failed to load ucode id (24) [ 123.801055] [drm] psp command (0x6) failed and response status is (0x0) [ 123.801214] [drm:psp_load_smu_fw [amdgpu]] *ERROR* PSP load smu failed! [ 123.801398] [drm:psp_resume [amdgpu]] *ERROR* PSP resume failed [ 123.801536] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block <psp> failed -22 [ 123.801632] amdgpu 0000:04:00.0: amdgpu: GPU reset(9) failed [ 123.801691] amdgpu 0000:07:00.0: amdgpu: GPU reset(9) failed [ 123.802899] amdgpu 0000:04:00.0: amdgpu: GPU reset end with ret = -22 v2: add error info and including ALDEBARAN also Signed-off-by: Chengming Gui <Jack.Gui@amd.com> Reviewed-and-tested-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: Reset error code for 'no handler' caseLijo Lazar
If reset handler is not implemented, reset error before proceeding. Fixes issue with the following trace - [ 106.508592] amdgpu 0000:b1:00.0: amdgpu: ASIC reset failed with error, -38 for drm dev, 0000:b1:00.0 [ 106.508972] amdgpu 0000:b1:00.0: amdgpu: GPU reset succeeded, trying to resume [ 106.509116] [drm] PCIE GART of 512M enabled. [ 106.509120] [drm] PTB located at 0x0000008000000000 [ 106.509136] [drm] VRAM is lost due to GPU reset! [ 106.509332] [drm] PSP is resuming... Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-and-tested-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amd/display: Fix black screen with scaled modes on some eDP panelsNikola Cornij
[why] This was a regression introduced by commit: drm/amd/display: Skip modeset for front porch change Due to the change how timing parameters were set, scaled modes would cause a black screen on some eDP panels. Would probably apply to other displays (i.e. even non-eDP) that only have scaled modes, but such case is not that usual for external displays. [how] Pick up crtc frame dimensions when programming the timing unless it's FreeSync video mode. Fixes: 6f59f229f8ed7a ("drm/amd/display: Skip modeset for front porch change") Signed-off-by: Nikola Cornij <nikola.cornij@amd.com> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: ih reroute for newer asics than vega20Alex Sierra
Starting Arcturus, it supports ih reroute through mmio directly in bare metal environment. This is also valid for newer asics such as Aldebaran. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdkfd: dqm fence memory corruptionQu Huang
Amdgpu driver uses 4-byte data type as DQM fence memory, and transmits GPU address of fence memory to microcode through query status PM4 message. However, query status PM4 message definition and microcode processing are all processed according to 8 bytes. Fence memory only allocates 4 bytes of memory, but microcode does write 8 bytes of memory, so there is a memory corruption. Changes since v1: * Change dqm->fence_addr as a u64 pointer to fix this issue, also fix up query_status and amdkfd_fence_wait_timeout function uses 64 bit fence value to make them consistent. Signed-off-by: Qu Huang <jinsdb@126.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: fix offset calculation in amdgpu_vm_bo_clear_mappings()Nirmoy Das
Offset calculation wasn't correct as start addresses are in pfn not in bytes. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amd/amdgpu: Add CP_IB1_BASE_* to gc_10_3_0 headersTom St Denis
Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amd/pm: Fix DPM level count on aldebaranLijo Lazar
Firmware returns zero-based max level, increment by one to get total levels. This fixes the issue of not showing all levels and current frequency when frequency is at max DPM level. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amd/pm: unify the interface for gfx state settingEvan Quan
No need to have special handling for swSMU supported ASICs. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amd/pm: unify the interface for power gatingEvan Quan
No need to have special handling for swSMU supported ASICs. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amd/pm: fix missing static declarationsEvan Quan
Add "static" declarations for those APIs used internally. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amd/pm: unify the interface for loading SMU microcodeEvan Quan
No need to have special handling for swSMU supported ASICs. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amd/pm: no need to force MCLK to highest when no display connectedEvan Quan
Correct the check for vblank short. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Tested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: Fix build warningsLijo Lazar
Fix header guard and make internal functions static. Fixes the below warnings: drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgpu_reset.h:24:9: warning: '__AMDUGPU_RESET_H__' is used as a header guard here, followed by #define of a different macro [-Wheader-guard] drivers/gpu/drm/amd/amdgpu/aldebaran.c:110:6: warning: no previous prototype for function 'aldebaran_async_reset' [-Wmissing-prototypes] drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu13/aldebaran_ppt.c:1435:5: warning: no previous prototype for function 'aldebaran_mode2_reset' [-Wmissing-prototypes] Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: Enable recovery on aldebaranLijo Lazar
Add aldebaran to devices which support recovery Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: Add mode2 reset support for aldebaranLijo Lazar
v1: Aldebaran uses reset control to support mode2 reset. The sequences to reset and restore hardware context are specific to a particular configuration. v2: Clear bus mastering before reset. Fix coding style issues, drop unwanted variables and info log. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: Make set PG/CG state functions publicLijo Lazar
Expose PG/CG set states functions for other clients Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: Add PSP public function to load a list of FWsLijo Lazar
v1: Adds a function to load a list of FWs as passed by the caller. This is needed as only a select need to loaded for some use cases. v2: Omit unrelated change, remove info log, fix return value when count is 0 Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: Add reset control handling to reset workflowLijo Lazar
This prefers reset control based handling if it's implemented for a particular ASIC. If not, it takes the legacy path. It uses the legacy method of preparing environment (job, scheduler tasks) and restoring environment. v2: remove unused variable (Alex) Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: Add reset control to amdgpu_deviceLijo Lazar
v1: Add generic amdgpu_reset_control to handle different types of resets. It may be added at device, hive or ip level. Each reset control has a list of handlers associated with it to handle different types of reset. Reset control is responsible for choosing the right handler given a particular reset context. Handler objects may implement a set of functions on how to handle a particular type of reset. prepare_env = Prepare environment/software context (not used currently). prepare_hwcontext = Prepare hardware context for the reset. perform_reset = Perform the type of reset. restore_hwcontext = Restore the hw context after reset. restore_env = Restore the environment after reset (not used currently). Reset context carries the context of reset, as of now this is based on the parameters used for current set of resets. v2: Fix coding style Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amd/pm: Add support for reset completion on aldebaranLijo Lazar
v1: On aldebaran, after hardware context restore, another handshake needs to happen with PMFW so that reset recovery is complete from PMFW side. Treat this as RESET_COMPLETE event for aldebaran. v2: Cleanup coding style, info logs Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amd/pm: Add function to wait for smu eventsLijo Lazar
v1: Add function to wait for specific event/states from PMFW v2: Add mutex lock, simplify sequence Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amd/pm: Modify mode2 msg sequence on aldebaranLijo Lazar
v1: During mode2 reset, PCI space is lost after message is sent. Restore PCI space before waiting for response from firmware. v2: Move mode2 sequence to aldebaran and update PMFW version. Handle generic sequence in smu13 without PMFW version check. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amd/amdgpu implement tdr advanced modeJack Zhang
[Why] Previous tdr design treats the first job in job_timeout as the bad job. But sometimes a later bad compute job can block a good gfx job and cause an unexpected gfx job timeout because gfx and compute ring share internal GC HW mutually. [How] This patch implements an advanced tdr mode.It involves an additinal synchronous pre-resubmit step(Step0 Resubmit) before normal resubmit step in order to find the real bad job. 1. At Step0 Resubmit stage, it synchronously submits and pends for the first job being signaled. If it gets timeout, we identify it as guilty and do hw reset. After that, we would do the normal resubmit step to resubmit left jobs. 2. For whole gpu reset(vram lost), do resubmit as the old way. v2: squash in build fix (Alex) Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: make BO type check less restrictiveNirmoy Das
BO with ttm_bo_type_sg type can also have tiling_flag and metadata. So so BO type check for only ttm_bo_type_kernel. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reported-by: Tom StDenis <Tom.StDenis@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: use amdgpu_bo_user bo for metadata and tiling flagNirmoy Das
Tiling flag and metadata are only needed for BOs created by amdgpu_gem_object_create(), so we can remove those from the base class. v2: * squash tiling_flags and metadata relared patches into one * use BUG_ON for non ttm_bo_type_device type when accessing tiling_flags and metadata._ v3: *include to_amdgpu_bo_user Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: use amdgpu_bo_create_user() for when possibleNirmoy Das
Use amdgpu_bo_create_user() for all the BO allocations for ttm_bo_type_device type. v2: include amdgpu_amdkfd_alloc_gws() as well it calls amdgpu_bo_create() for ttm_bo_type_device Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: introduce struct amdgpu_bo_userNirmoy Das
Implement a new struct amdgpu_bo_user as subclass of struct amdgpu_bo and a function to created amdgpu_bo_user bo with a flag to identify the owner. v2: amdgpu_bo_to_amdgpu_bo_user -> to_amdgpu_bo_user() Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: allow variable BO struct creationNirmoy Das
Allow allocating BO structures with different structure size than struct amdgpu_bo. v2: Check bo_ptr_size in all amdgpu_bo_create() caller. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: load balance VCN3 decode as well v8Christian König
Add VCN3 IB parsing to figure out to which instance we can send the stream for decode. v2: remove VCN instance limit as well, fix amdgpu_cs_find_mapping, check supported formats instead of unsupported. v3: fix typo and error handling v4: make sure the message BO is CPU accessible v5: fix addr calculation once more v6: only check message buffers v7: fix constant and use defines v8: fix create msg calculation Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sonny Jiang <sonny.jiang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: share scheduler score on VCN3 instancesChristian König
The VCN3 instances can do both decode as well as encode. Share the scheduler load balancing score and remove fixing encode to only the second instance. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-and-Tested-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: add the sched_score to amdgpu_ring_initChristian König
Allow separate ring to share the same scheduler score. No functional change. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-and-Tested-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>