summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-08-02drm/amdgpu: Fix pcie_bw on Vega20Kent Russell
The registers used for VG20 are different in that certain performance counters were split off to TXCLK3/4. Vega10/12 doesn't have this, so add a new vg20_get_pcie_usage to reflect this change. Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-02drm/amdgpu: Update NBIO headers to add TXCLK3/4Kent Russell
These are added for VG20, and are needed for PCIe bandwidth. Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-02drm/amdgpu: Add amdgpu_asic_funcs.reset_method for Vega20Andrey Grodzovsky
Fixes GPU reset crash. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-02drm/amdgpu: Mark KFD VRAM allocations for wipe on releaseFelix Kuehling
Memory used by KFD applications can contain sensitive information that should not be leaked to other processes. The current approach to prevent leaks is to clear VRAM at allocation time. This is not effective because memory can be reused in other ways without being cleared. Synchronously clearing memory on the allocation path also carries a significant performance penalty. Stop clearing memory at allocation time. Instead mark the memory for wipe on release. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-02drm/amdgpu: Implement VRAM wipe on releaseFelix Kuehling
Wipe VRAM memory containing sensitive data when moving or releasing BOs. Clearing the memory is pipelined to minimize any impact on subsequent memory allocation latency. Use of a poison value should help debug future use-after-free bugs. When moving BOs, the existing ttm_bo_pipelined_move ensures that the memory won't be reused before being wiped. When releasing BOs, the BO is fenced with the memory fill operation, which results in queuing the BO for a delayed delete. v2: Move amdgpu_amdkfd_unreserve_memory_limit into amdgpu_bo_release_notify so that KFD can use memory that's still being cleared in the background Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-02drm/amdgpu: Add flag to wipe VRAM on releaseFelix Kuehling
This memory allocation flag will be used to indicate BOs containing sensitive data that should not be leaked to other processes. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-02drm/ttm: Add release_notify callback to ttm_bo_driverFelix Kuehling
This notifies the driver that a BO is about to be released. Releasing a BO also invokes the move_notify callback from ttm_bo_cleanup_memtype_use, but that happens too late for anything that would add fences to the BO and require a delayed delete. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-02drm/amd/display: Use switch table for dc_to_smu_clock_typeLeo Li
Using a static int array will cause errors if the given dm_pp_clk_type is out-of-bounds. For robustness, use a switch table, with a default case to handle all invalid values. v2: 0 is a valid clock type for smu_clk_type. Return SMU_CLK_COUNT instead on invalid mapping. Signed-off-by: Leo Li <sunpeng.li@amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-02drm/amd/display: Use proper enum conversion functionsNathan Chancellor
clang warns: drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_pp_smu.c:336:8: warning: implicit conversion from enumeration type 'enum smu_clk_type' to different enumeration type 'enum amd_pp_clock_type' [-Wenum-conversion] dc_to_smu_clock_type(clk_type), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_pp_smu.c:421:14: warning: implicit conversion from enumeration type 'enum amd_pp_clock_type' to different enumeration type 'enum smu_clk_type' [-Wenum-conversion] dc_to_pp_clock_type(clk_type), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ There are functions to properly convert between all of these types, use them so there are no longer any warnings. Fixes: a43913ea50a5 ("drm/amd/powerplay: add function get_clock_by_type_with_latency for navi10") Fixes: e5e4e22391c2 ("drm/amd/powerplay: add interface to get clock by type with latency for display (v2)") Link: https://github.com/ClangBuiltLinux/linux/issues/586 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-02drm/amdgpu: fix double ucode load by PSP(v3)Monk Liu
previously the ucode loading of PSP was repreated, one executed in phase_1 init/re-init/resume and the other in fw_loading routine Avoid this double loading by clearing ip_blocks.status.hw in suspend or reset prior to the FW loading and any block's hw_init/resume v2: still do the smu fw loading since it is needed by bare-metal v3: drop the change in reinit_early_sriov, just clear all block's status.hw in the head place and set the status.hw after hw_init done is enough Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-02drm/amdgpu: fix incorrect judge on sos fw versionMonk Liu
for SRIOV the SOS fw of PSP is loaded in hypervisor thus guest won't tell the version of it, and judging feature by reading the sos fw version in guest side is completely wrong Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-02drm/amdgpu: cleanup vega10 SRIOV code pathMonk Liu
we can simplify all those unnecessary function under SRIOV for vega10 since: 1) PSP L1 policy is by force enabled in SRIOV 2) original logic always set all flags which make itself a dummy step besides, 1) the ih_doorbell_range set should also be skipped for VEGA10 SRIOV. 2) the gfx_common registers should also be skipped for VEGA10 SRIOV. Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-02drm/amd/powerplay: sort feature status index by asic feature id for smuKevin Wang
before this change, the pp_feature sysfs show feature enable state by logic feature id, it is not easy to read. this change will sort pp_features show index by asic feature id. before: features high: 0x00000623 low: 0xb3cdaffb 00. DPM_PREFETCHER ( 0) : enabeld 01. DPM_GFXCLK ( 1) : enabeld 02. DPM_UCLK ( 3) : enabeld 03. DPM_SOCCLK ( 4) : enabeld 04. DPM_MP0CLK ( 5) : enabeld 05. DPM_LINK ( 6) : enabeld 06. DPM_DCEFCLK ( 7) : enabeld 07. DS_GFXCLK (10) : enabeld 08. DS_SOCCLK (11) : enabeld 09. DS_LCLK (12) : disabled 10. PPT (23) : enabeld 11. TDC (24) : enabeld 12. THERMAL (33) : enabeld 13. RM (35) : disabled ...... after: features high: 0x00000623 low: 0xb3cdaffb 00. DPM_PREFETCHER ( 0) : enabeld 01. DPM_GFXCLK ( 1) : enabeld 02. DPM_GFX_PACE ( 2) : disabled 03. DPM_UCLK ( 3) : enabeld 04. DPM_SOCCLK ( 4) : enabeld 05. DPM_MP0CLK ( 5) : enabeld 06. DPM_LINK ( 6) : enabeld 07. DPM_DCEFCLK ( 7) : enabeld 08. MEM_VDDCI_SCALING ( 8) : enabeld 09. MEM_MVDD_SCALING ( 9) : enabeld 10. DS_GFXCLK (10) : enabeld 11. DS_SOCCLK (11) : enabeld 12. DS_LCLK (12) : disabled 13. DS_DCEFCLK (13) : enabeld ...... Signed-off-by: Kevin Wang <kevin1.wang@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdkfd: enable KFD support for navi14Alex Deucher
Same as navi10. Reviewed-by: Xiaojie Yuan <xiaojie.yuan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: disable inject for failed subblocks of gfxDennis Li
some subblocks of gfx fail in inject test, disable them Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: support gfx ras error injection and err_cnt queryDennis Li
check gfx error count in both ras querry function and ras interrupt handler. gfx ras is still disabled by default due to known stability issue found in gpu reset. Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: add RAS callback for gfxDennis Li
Add functions for RAS error inject and query error counter Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: add define for gfx ras subblockDennis Li
Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amd/include: add define of TCP_EDC_CNT_NEWDennis Li
Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amd/include: add bitfield define for EDC registersDennis Li
Add EDC registers to support VEGA20 RAS Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: remove ras_reserve_vram in ras injectionTao Zhou
error injection address is not in gpu address space Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: add check for ras error typeTao Zhou
only ue and ce errors are supported Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: update interrupt callback for all ras clientsTao Zhou
add err_data parameter in interrupt cb for ras clients Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: allow ras interrupt callback to return error dataTao Zhou
add error data as parameter for ras interrupt cb and process it Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: query umc ras error addressTao Zhou
query umc ras error address, translate it to gpu 4k page view and save it. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: add structures for umc error address translationTao Zhou
add related registers, callback function and channel index table Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: add support for recording ras error addressTao Zhou
more than one error address may be recorded in one query Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: update algorithm of umc uncorrectable error countingTao Zhou
remove the check of ErrorCodeExt v2: refine the if condition for ue counting Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: switch to amdgpu_umc structureTao Zhou
create new amdgpu_umc structure to for more umc settings in future and switch to the new structure Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: use 64bit operation macros for umcTao Zhou
replace some 32bit macros with 64bit operations to simplify code Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: add RREG64/WREG64(_PCIE) operationsTao Zhou
add 64 bits register access functions v2: implement 64 bit functions in low level Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: add ras error count after each query (v2)Tao Zhou
v1: increase ras ce/ue error count v2: log the number of correctable and uncorrectable errors Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: querry umc error countHawking Zhang
check umc error count in both ras querry function and ras interrupt handler Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: init umc v6_1 functions for vega20Hawking Zhang
init umc callback function for vega20 in sw early init phase Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: add umc v6_1 query error count supportHawking Zhang
Implement umc query_ras_error_count function to support querry both correctable and uncorrectable error Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: add umc v6_1_1 IP headersHawking Zhang
the change introduces IP headers for unified memory controller (umc) Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: add rsmu v_0_0_2 ip headersHawking Zhang
remote smu (rsmu) is a sub-block used as ip register interface, error handling, reset generation.etc Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: add amdgpu_umc_functions structureHawking Zhang
This is common structure as UMC callback function Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: init RSMU and UMC ip base address for vega20Hawking Zhang
the driver needs to program RSMU and UMC registers to support vega20 RAS feature Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: move some ras data structure to amdgpu_ras.hHawking Zhang
These are common structures that can be included by IP specific source files Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: drop drmP.h from vcn_v2_5.cAlex Deucher
Unused. Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: drop drmP.h from vcn_v2_0.cAlex Deucher
And fix the fallout. Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: drop drmP.h from sdma_v5_0.cAlex Deucher
And fix the fallout. Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: drop drmP.h from nv.cAlex Deucher
And fix up the fallout. Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: drop drmP.h from navi10_ih.cAlex Deucher
And fix the fallout. Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: drop drmP.h in gfx_v10_0.cAlex Deucher
And fix the fallout. Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: drop drmP.h from amdgpu_amdkfd_gfx_v10.cAlex Deucher
Unused. Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: drop drmP.h in amdgpu_amdkfd_arcturus.cAlex Deucher
Unused. Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-30drm/amd/powerplay: determine the features to enable by pptable onlyEvan Quan
Per current logics, the features to enable are determined together by driver and pptable. This is not efficient in co-debug with firmware team. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-30drm/amdgpu: correct irq type used for sdma eccHawking Zhang
we should pass irq type, instead of irq client id, to irq_get/put interface Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>