summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-02-07drm/amd/display: Add delay before logging clks from hwEthan Bitnun
Add a small delay before reading clks from hw, to ensure correct values are used for logging. Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Ethan Bitnun <etbitnun@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07drm/amd/display: Fix MST Null Ptr for RVFangzhi Zuo
The change try to fix below error specific to RV platform: BUG: kernel NULL pointer dereference, address: 0000000000000008 PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 4 PID: 917 Comm: sway Not tainted 6.3.9-arch1-1 #1 124dc55df4f5272ccb409f39ef4872fc2b3376a2 Hardware name: LENOVO 20NKS01Y00/20NKS01Y00, BIOS R12ET61W(1.31 ) 07/28/2022 RIP: 0010:drm_dp_atomic_find_time_slots+0x5e/0x260 [drm_display_helper] Code: 01 00 00 48 8b 85 60 05 00 00 48 63 80 88 00 00 00 3b 43 28 0f 8d 2e 01 00 00 48 8b 53 30 48 8d 04 80 48 8d 04 c2 48 8b 40 18 <48> 8> RSP: 0018:ffff960cc2df77d8 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff8afb87e81280 RCX: 0000000000000224 RDX: ffff8afb9ee37c00 RSI: ffff8afb8da1a578 RDI: ffff8afb87e81280 RBP: ffff8afb83d67000 R08: 0000000000000001 R09: ffff8afb9652f850 R10: ffff960cc2df7908 R11: 0000000000000002 R12: 0000000000000000 R13: ffff8afb8d7688a0 R14: ffff8afb8da1a578 R15: 0000000000000224 FS: 00007f4dac35ce00(0000) GS:ffff8afe30b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 000000010ddc6000 CR4: 00000000003506e0 Call Trace: <TASK> ? __die+0x23/0x70 ? page_fault_oops+0x171/0x4e0 ? plist_add+0xbe/0x100 ? exc_page_fault+0x7c/0x180 ? asm_exc_page_fault+0x26/0x30 ? drm_dp_atomic_find_time_slots+0x5e/0x260 [drm_display_helper 0e67723696438d8e02b741593dd50d80b44c2026] ? drm_dp_atomic_find_time_slots+0x28/0x260 [drm_display_helper 0e67723696438d8e02b741593dd50d80b44c2026] compute_mst_dsc_configs_for_link+0x2ff/0xa40 [amdgpu 62e600d2a75e9158e1cd0a243bdc8e6da040c054] ? fill_plane_buffer_attributes+0x419/0x510 [amdgpu 62e600d2a75e9158e1cd0a243bdc8e6da040c054] compute_mst_dsc_configs_for_state+0x1e1/0x250 [amdgpu 62e600d2a75e9158e1cd0a243bdc8e6da040c054] amdgpu_dm_atomic_check+0xecd/0x1190 [amdgpu 62e600d2a75e9158e1cd0a243bdc8e6da040c054] drm_atomic_check_only+0x5c5/0xa40 drm_mode_atomic_ioctl+0x76e/0xbc0 ? _copy_to_user+0x25/0x30 ? drm_ioctl+0x296/0x4b0 ? __pfx_drm_mode_atomic_ioctl+0x10/0x10 drm_ioctl_kernel+0xcd/0x170 drm_ioctl+0x26d/0x4b0 ? __pfx_drm_mode_atomic_ioctl+0x10/0x10 amdgpu_drm_ioctl+0x4e/0x90 [amdgpu 62e600d2a75e9158e1cd0a243bdc8e6da040c054] __x64_sys_ioctl+0x94/0xd0 do_syscall_64+0x60/0x90 ? do_syscall_64+0x6c/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7f4dad17f76f Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c> RSP: 002b:00007ffd9ae859f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 000055e255a55900 RCX: 00007f4dad17f76f RDX: 00007ffd9ae85a90 RSI: 00000000c03864bc RDI: 000000000000000b RBP: 00007ffd9ae85a90 R08: 0000000000000003 R09: 0000000000000003 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c03864bc R13: 000000000000000b R14: 000055e255a7fc60 R15: 000055e255a01eb0 </TASK> Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device ccm cmac algif_hash algif_skcipher af_alg joydev mousedev bnep > typec libphy k10temp ipmi_msghandler roles i2c_scmi acpi_cpufreq mac_hid nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_mas> CR2: 0000000000000008 ---[ end trace 0000000000000000 ]--- RIP: 0010:drm_dp_atomic_find_time_slots+0x5e/0x260 [drm_display_helper] Code: 01 00 00 48 8b 85 60 05 00 00 48 63 80 88 00 00 00 3b 43 28 0f 8d 2e 01 00 00 48 8b 53 30 48 8d 04 80 48 8d 04 c2 48 8b 40 18 <48> 8> RSP: 0018:ffff960cc2df77d8 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff8afb87e81280 RCX: 0000000000000224 RDX: ffff8afb9ee37c00 RSI: ffff8afb8da1a578 RDI: ffff8afb87e81280 RBP: ffff8afb83d67000 R08: 0000000000000001 R09: ffff8afb9652f850 R10: ffff960cc2df7908 R11: 0000000000000002 R12: 0000000000000000 R13: ffff8afb8d7688a0 R14: ffff8afb8da1a578 R15: 0000000000000224 FS: 00007f4dac35ce00(0000) GS:ffff8afe30b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 000000010ddc6000 CR4: 00000000003506e0 With a second DP monitor connected, drm_atomic_state in dm atomic check sequence does not include the connector state for the old/existing/first DP monitor. In such case, dsc determination policy would hit a null ptr when it tries to iterate the old/existing stream that does not have a valid connector state attached to it. When that happens, dm atomic check should call drm_atomic_get_connector_state for a new connector state. Existing dm has already done that, except for RV due to it does not have official support of dsc where .num_dsc is not defined in dcn10 resource cap, that prevent from getting drm_atomic_get_connector_state called. So, skip dsc determination policy for ASICs that don't have DSC support. Cc: stable@vger.kernel.org # 6.1+ Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2314 Reviewed-by: Wayne Lin <wayne.lin@amd.com> Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07drm/amd/display: correct comment in set_default_brightness_aux()Camille Cho
0 nits is a valid default value for OLED panels. So, update the relevant comment to account for that fact. Reviewed-by: Krunoslav Kovac <krunoslav.kovac@amd.com> Signed-off-by: Camille Cho <camille.cho@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07drm/amd/display: Add left edge pixel for YCbCr422/420 + ODM pipe splitGeorge Shen
[Why] Currently 3-tap chroma subsampling is used for YCbCr422/420. When ODM pipesplit is used, pixels on the left edge of ODM slices need one extra pixel from the right edge of the previous slice to calculate the correct chroma value. Without this change, the chroma value is slightly different than expected. This is usually imperceptible visually, but it impacts test pattern CRCs for compliance test automation. [How] Update logic to use the register for adding extra left edge pixel for YCbCr422/420 ODM cases. Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: George Shen <george.shen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07drm/amd/display: Add debug option to force 1-tap chroma subsamplingGeorge Shen
[Why] Default driver behaviour is 3-tap subsampling, so we should keep it the same for test patterns as well. However, it is also useful to force 1-tap subsampling for testing purposes. Reviewed-by: Michael Strauss <michael.strauss@amd.com> Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: George Shen <george.shen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07drm/amd/display: Disable idle reallow as part of command/gpint executionNicholas Kazlauskas
[Why] Workaroud for a race condition where DMCUB is in the process of committing to IPS1 during the handshake causing us to miss the transition into IPS2 and touch the INBOX1 RPTR causing a HW hang. [How] Disable the reallow to ensure that we have enough of a gap between entry and exit and we're not seeing back-to-back wake_and_executes. Reviewed-by: Ovidiu Bunea <ovidiu.bunea@amd.com> Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07drm/amdgpu: Fix shared buff copy to userStanley.Yang
ta if invoke node buffer |-------- ta type ----------| |-------- ta id ----------| |-------- cmd id ----------| |------ shared buf len -----| |------ shared buffer ------| ta if invoke node buffer is as above, copy shared buffer data to correct location Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07drm/amd/display: Increase eval/entry delay for DCN35Nicholas Kazlauskas
[Why] To match firmware measurements and avoid hanging when accessing HW that's in idle. [How] Increase the delays to what we've measured. Reviewed-by: Ovidiu Bunea <ovidiu.bunea@amd.com> Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07drm/amd/display: Disable timeout in more places for dc_dmub_srvNicholas Kazlauskas
[Why] We're still missing a few and we'd like to avoid continuining when a hang occurs for debug purposes. [How] Add the loop anywhere we try to wait on rptr == wptr in dc_dmub_srv. Reviewed-by: Ovidiu Bunea <ovidiu.bunea@amd.com> Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07drm/amdgpu: Fix potential out-of-bounds access in ↵Srinivasan Shanmugam
'amdgpu_discovery_reg_base_init()' The issue arises when the array 'adev->vcn.vcn_config' is accessed before checking if the index 'adev->vcn.num_vcn_inst' is within the bounds of the array. The fix involves moving the bounds check before the array access. This ensures that 'adev->vcn.num_vcn_inst' is within the bounds of the array before it is used as an index. Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:1289 amdgpu_discovery_reg_base_init() error: testing array offset 'adev->vcn.num_vcn_inst' after use. Fixes: a0ccc717c4ab ("drm/amdgpu/discovery: validate VCN and SDMA instances") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07drm/amd/pm: Retrieve UMC ODECC error count from aca bankCandice Li
Instead of software managed counters. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07drm/amd/display: Add more checks for exiting idle in DCNicholas Kazlauskas
[Why] Any interface that touches registers needs to wake up the system. [How] Add a new interface dc_exit_ips_for_hw_access that wraps the check for IPS support and insert it into the public DC interfaces that touch registers. We don't re-enter, since we expect that the enter/exit to have been done on the DM side. Cc: stable@vger.kernel.org # 6.1+ Reviewed-by: Ovidiu Bunea <ovidiu.bunea@amd.com> Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07drm/amd/display: correct static screen event maskAllen Pan
[Why] Hardware register definition changed Reviewed-by: Charlene Liu <charlene.liu@amd.com> Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Allen Pan <allen.pan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07drm/amdgpu: remove asymmetrical irq disabling in jpeg 4.0.5 suspendLi Ma
A supplement to commit: 615dd56ac5379f4239940be69139a33e79e59c67 There is an irq warning of jpeg during resume in s2idle process. No irq enabled in jpeg 4.0.5 resume. Fixes: 615dd56ac537 ("drm/amdgpu: remove asymmetrical irq disabling in vcn 4.0.5 suspend") Signed-off-by: Li Ma <li.ma@amd.com> Acked-By: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07drm/amdgpu: reset gpu for s3 suspend abort casePrike Liang
In the s3 suspend abort case some type of gfx9 power rail not turn off from FCH side and this will put the GPU in an unknown power status, so let's reset the gpu to a known good power state before reinitialize gpu device. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07drm/amdgpu: skip to program GFXDEC registers for suspend abortPrike Liang
In the suspend abort cases, the gfx power rail doesn't turn off so some GFXDEC registers/CSB can't reset to default value and at this moment reinitialize GFXDEC/CSB will result in an unexpected error. So let skip those program sequence for the suspend abort case. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07drm/amd/display: set odm_combine_policy based on context in dcn32 resourceWenjing Liu
[why] When populating dml pipes, odm combine policy should be assigned based on the pipe topology of the context passed in. DML pipes could be repopulated multiple times during single validate bandwidth attempt. We need to make sure that whenever we repopulate the dml pipes it is always aligned with the updated context. There is a case where DML pipes get repopulated during FPO optimization after ODM combine policy is changed. Since in the current code we reinitlaize ODM combine policy, even though the current context has ODM combine enabled, we overwrite it despite the pipes are already split. This causes DML to think that MPC combine is used so we mistakenly enable MPC combine because we apply pipe split with ODM combine policy reset. This issue doesn't impact non windowed MPO with ODM case because the legacy policy has restricted use cases. We don't encounter the case where both ODM and FPO optimizations are enabled together. So we decide to leave it as is because it is about to be replaced anyway. Cc: stable@vger.kernel.org # 6.6+ Reviewed-by: Chaitanya Dhere <chaitanya.dhere@amd.com> Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07drm/amd/display: Don't perform rate toggle on DP2-capable FIXED_VS retimersMichael Strauss
[WHY] Only required if FIXED_VS retimer does not support DP2-capable. [HOW] Gate link rate toggle with DP 128b/132b LTTPR channel coding cap check. Reviewed-by: Charlene Liu <charlene.liu@amd.com> Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Michael Strauss <michael.strauss@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07drm/amdkfd: Add cache line sizes to KFD topologyJoseph Greathouse
The KFD topology includes cache line size, but we have not been filling that information out unless we are parsing a CRAT table. Fill in this information for the devices where we have cache information structs, and pipe this information to the topology sysfs files. v2: squash in fix from Joe (Alex) Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07drm/amd/display: Remove Legacy FIXED_VS Transparent LT SequenceMichael Strauss
The New sequence has been in use in DCN314 with no regressions introduced. Therefore, it is safe to enable this sequence for all devices using FIXED_VS retimers. So, remove the legacy codepath and its associated config flag. Reviewed-by: Ovidiu Bunea <ovidiu.bunea@amd.com> Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Michael Strauss <michael.strauss@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07drm/amd/display: add panel_power_savings sysfs entry to eDP connectorsHamza Mahfooz
We want programs besides the compositor to be able to enable or disable panel power saving features. However, since they are currently only configurable through DRM properties, that isn't possible. So, to remedy that issue introduce a new "panel_power_savings" sysfs attribute. v2: squash in fix from Hamza (Alex) Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07drm/amdgpu: Clear the hotplug interrupt ack bit before hpd initializationQiang Ma
Problem: The computer in the bios initialization process, unplug the HDMI display, wait until the system up, plug in the HDMI display, did not enter the hotplug interrupt function, the display is not bright. Fix: After the above problem occurs, and the hpd ack interrupt bit is 1, the interrupt should be cleared during hpd_init initialization so that when the driver is ready, it can respond to the hpd interrupt normally. Signed-off-by: Qiang Ma <maqianga@uniontech.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07drm/amdgpu: fix typo in parameter descriptionAlex Deucher
Missing space. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07drm/amdgpu: Only create mes event log debugfs when mes is enabledshaoyunl
Skip the debugfs file creation for mes event log if the GPU doesn't use MES. This to prevent potential kernel oops when user try to read the event log in debugfs on a GPU without MES Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07drm/amd/display: Add NULL test for 'timing generator' in 'dcn21_set_pipe()'Srinivasan Shanmugam
In "u32 otg_inst = pipe_ctx->stream_res.tg->inst;" pipe_ctx->stream_res.tg could be NULL, it is relying on the caller to ensure the tg is not NULL. Fixes: 474ac4a875ca ("drm/amd/display: Implement some asic specific abm call backs.") Cc: Yongqiang Sun <yongqiang.sun@amd.com> Cc: Anthony Koo <Anthony.Koo@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Anthony Koo <Anthony.Koo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07drm/amd/display: Fix 'panel_cntl' could be null in 'dcn21_set_backlight_level()'Srinivasan Shanmugam
'panel_cntl' structure used to control the display panel could be null, dereferencing it could lead to a null pointer access. Fixes the below: drivers/gpu/drm/amd/amdgpu/../display/dc/hwss/dcn21/dcn21_hwseq.c:269 dcn21_set_backlight_level() error: we previously assumed 'panel_cntl' could be null (see line 250) Fixes: 474ac4a875ca ("drm/amd/display: Implement some asic specific abm call backs.") Cc: Yongqiang Sun <yongqiang.sun@amd.com> Cc: Anthony Koo <Anthony.Koo@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Anthony Koo <Anthony.Koo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07drm/amdgpu/pm: Use inline function for IP version checkMa Jun
Use existing inline function for IP version check. Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amdgpu: Reset IH OVERFLOW_CLEAR bitFriedrich Vock
Allows us to detect subsequent IH ring buffer overflows as well. Cc: Joshua Ashton <joshua@froggi.es> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amdgpu: remove asymmetrical irq disabling in vcn 4.0.5 suspendYifan Zhang
There is no irq enabled in vcn 4.0.5 resume, causing wrong amdgpu_irq_src status. Beside, current set function callbacks are empty with no real effect. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Acked-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amdgpu: use PSP address query commandTao Zhou
Get UMC physical address from PSP in RAS error address coversion. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amdgpu: add PSP RAS address query commandTao Zhou
Convert mca address to physical address or vice versa via RAS TA. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amdgpu: drm/amdgpu: remove golden setting for gfx 11.5.0Yifan Zhang
No need to set GC golden settings in driver from gfx 11.5.0 onwards. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Lang Yu <lang.yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amdkfd: reserve the BO before validating itLang Yu
Fix a warning. v2: Avoid unmapping attachment repeatedly when ERESTARTSYS. v3: Lock the BO before accessing ttm->sg to avoid race conditions.(Felix) [ 41.708711] WARNING: CPU: 0 PID: 1463 at drivers/gpu/drm/ttm/ttm_bo.c:846 ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.708989] Call Trace: [ 41.708992] <TASK> [ 41.708996] ? show_regs+0x6c/0x80 [ 41.709000] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709008] ? __warn+0x93/0x190 [ 41.709014] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709024] ? report_bug+0x1f9/0x210 [ 41.709035] ? handle_bug+0x46/0x80 [ 41.709041] ? exc_invalid_op+0x1d/0x80 [ 41.709048] ? asm_exc_invalid_op+0x1f/0x30 [ 41.709057] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu] [ 41.709185] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709197] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu] [ 41.709337] ? srso_alias_return_thunk+0x5/0x7f [ 41.709346] kfd_mem_dmaunmap_attachment+0x9e/0x1e0 [amdgpu] [ 41.709467] amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x56/0x80 [amdgpu] [ 41.709586] kfd_ioctl_unmap_memory_from_gpu+0x1b7/0x300 [amdgpu] [ 41.709710] kfd_ioctl+0x1ec/0x650 [amdgpu] [ 41.709822] ? __pfx_kfd_ioctl_unmap_memory_from_gpu+0x10/0x10 [amdgpu] [ 41.709945] ? srso_alias_return_thunk+0x5/0x7f [ 41.709949] ? tomoyo_file_ioctl+0x20/0x30 [ 41.709959] __x64_sys_ioctl+0x9c/0xd0 [ 41.709967] do_syscall_64+0x3f/0x90 [ 41.709973] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Fixes: 101b8104307e ("drm/amdkfd: Move dma unmapping after TLB flush") Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amd/display: Fix buffer overflow in 'get_host_router_total_dp_tunnel_bw()'Srinivasan Shanmugam
The error message buffer overflow 'dc->links' 12 <= 12 suggests that the code is trying to access an element of the dc->links array that is beyond its bounds. In C, arrays are zero-indexed, so an array with 12 elements has valid indices from 0 to 11. Trying to access dc->links[12] would be an attempt to access the 13th element of a 12-element array, which is a buffer overflow. To fix this, ensure that the loop does not go beyond the last valid index when accessing dc->links[i + 1] by subtracting 1 from the loop condition. This would ensure that i + 1 is always a valid index in the array. Fixes the below: drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_dpia_bw.c:208 get_host_router_total_dp_tunnel_bw() error: buffer overflow 'dc->links' 12 <= 12 Fixes: 59f1622a5f05 ("drm/amd/display: Add dpia display mode validation logic") Cc: PeiChen Huang <peichen.huang@amd.com> Cc: Aric Cyr <aric.cyr@amd.com> Cc: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Cc: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amd/display: Add NULL check for kzalloc in 'amdgpu_dm_atomic_commit_tail()'Srinivasan Shanmugam
Add a NULL check for the kzalloc call that allocates memory for dummy_updates in the amdgpu_dm_atomic_commit_tail function. Previously, if kzalloc failed to allocate memory and returned NULL, the code would attempt to use the NULL pointer. The fix is to check if kzalloc returns NULL, and if so, log an error message and skip the rest of the current loop iteration with the continue statement. This prevents the code from attempting to use the NULL pointer. Cc: Julia Lawall <julia.lawall@inria.fr> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Cc: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Cc: Alex Hung <alex.hung@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reported-by: Julia Lawall <julia.lawall@inria.fr> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/r/202401300629.ICnCt983-lkp@intel.com/ Fixes: 135fd1b35690 ("drm/amd/display: Reduce stack size") Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amdgpu: Fix missing error code in 'gmc_v6/7/8/9_0_hw_init()'Srinivasan Shanmugam
Return 0 for success scenairos in 'gmc_v6/7/8/9_0_hw_init()' Fixes the below: drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c:920 gmc_v6_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:1104 gmc_v7_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c:1224 gmc_v8_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:2347 gmc_v9_0_hw_init() warn: missing error code? 'r' Fixes: fac4ebd79fed ("drm/amdgpu: Fix with right return code '-EIO' in 'amdgpu_gmc_vram_checking()'") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amdgpu: Need to resume ras during gpu reset for gfx v9_4_3 sriovYiPeng Chai
Need to resume ras during gpu reset for gfx v9_4_3 sriov Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amdgpu: disable RAS feature when finiTao Zhou
Send RAS disable feature command in fini. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amdgpu: Update boot time errors polling sequenceHawking Zhang
Update boot time errors polling sequence to align with the latest firmware change. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Frank Min <Frank.Min@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amd: Don't init MEC2 firmware when it fails to loadDavid McFarland
The same calls are made directly above, but conditional on the firmware loading and validating successfully. Cc: stable@vger.kernel.org Fixes: 9931b67690cf ("drm/amd: Load GFX10 microcode during early_init") Signed-off-by: David McFarland <corngood@gmail.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amdgpu: Fix the warning info in mode1 resetMa Jun
Fix the warning info below during mode1 reset. [ +0.000004] Call Trace: [ +0.000004] <TASK> [ +0.000006] ? show_regs+0x6e/0x80 [ +0.000011] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000005] ? __warn+0x91/0x150 [ +0.000009] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000006] ? report_bug+0x19d/0x1b0 [ +0.000013] ? handle_bug+0x46/0x80 [ +0.000012] ? exc_invalid_op+0x1d/0x80 [ +0.000011] ? asm_exc_invalid_op+0x1f/0x30 [ +0.000014] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000007] ? __flush_work.isra.0+0x208/0x390 [ +0.000007] ? _prb_read_valid+0x216/0x290 [ +0.000008] __cancel_work_timer+0x11d/0x1a0 [ +0.000007] ? try_to_grab_pending+0xe8/0x190 [ +0.000012] cancel_work_sync+0x14/0x20 [ +0.000008] amddrm_sched_stop+0x3c/0x1d0 [amd_sched] [ +0.000032] amdgpu_device_gpu_recover+0x29a/0xe90 [amdgpu] This warning info was printed after applying the patch "drm/sched: Convert drm scheduler to use a work queue rather than kthread". The root cause is that amdgpu driver tries to use the uninitialized work_struct in the struct drm_gpu_scheduler v2: - Rename the function to amdgpu_ring_sched_ready and move it to amdgpu_ring.c (Alex) v3: - Fix a few more checks based on Vitaly's patch (Alex) v4: - squash in fix noticed by Bert in https://gitlab.freedesktop.org/drm/amd/-/issues/3139 Fixes: 11b3b9f461c5 ("drm/sched: Check scheduler ready before calling timeout handling") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-29drm/amdgpu: use helper macro HW_ERR instead of Hardware error stringYang Wang
use helper macro HW_ERR to instead of Hardware error string. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-29drm/amdgpu/pm: Use macro definitions in the smu IH process functionMa Jun
Replace the hard-coded numbers with macro definition Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-29drm/amd/display: 3.2.270Aric Cyr
- Add control flag for IPS residency profiling - Populate invalid split index to be 0xF - Fix dcn35 8k30 Underflow/Corruption Issue - Fix DP audio settings - Use correct phantom pipe when populating subvp pipe info - Fix incorrect mpc_combine array size - Fix DPSTREAM CLK on and off sequence - Fix USB-C flag update after enc10 feature init - Add debugfs disallow edp psr - Unify optimize_required flags and VRR adjustments - Increased min_dcfclk_mhz and min_fclk_mhz - Fix static screen event mask definition change Reviewed-by: Tom Chung <chiahsuan.chung@amd.com> Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Aric Cyr <aric.cyr@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-29drm/amd/display: [FW Promotion] Release 0.0.202.0Anthony Koo
- Add control flag for IPS residency profiling Reviewed-by: Tom Chung <chiahsuan.chung@amd.com> Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Anthony Koo <anthony.koo@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-29drm/amd/display: Populate invalid split index to be 0xFAlvin Lee
[why] There exists scenarios where the split index for subvp can be pipe index 0. The assumption in FW is that the split index won't be 0 but this is incorrect. [how] Instead populate non-split cases to be 0xF to differentiate between split and non-split. Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Alvin Lee <alvin.lee2@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-29drm/amd/display: Fix dcn35 8k30 Underflow/Corruption IssueFangzhi Zuo
[why] odm calculation is missing for pipe split policy determination and cause Underflow/Corruption issue. [how] Add the odm calculation. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Charlene Liu <charlene.liu@amd.com> Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-29drm/amd/display: clkmgr unittest with removal of warn & rename DCN35 ips ↵Mounika Adhuri
handshake for idle [why] To Remove warnings of clk_mgr. [How] Added code to remove warnings by resolving redefinations. Reviewed-by: Martin Leung <martin.leung@amd.com> Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Mounika Adhuri <moadhuri@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-29drm/amd/display: fix DP audio settingsCharlene Liu
[why] Audio channel layout for 5.1ch is not correct [how] Add the audio layout for 5.1ch (channel_count = 6). Add divided by zero check. Reviewed-by: Zhan Liu <zhan.liu@amd.com> Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Charlene Liu <charlene.liu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-29drm/amd/display: Underflow workaround by increasing SR exit latencyNicholas Susanto
[Why] On 14us for exit latency time causes underflow for 8K monitor with HDR on. Increasing the latency to 28us fixes the underflow. [How] Increase the latency to 28us. This workaround should be sufficient before we figure out why SR exit so long. Reviewed-by: Chaitanya Dhere <chaitanya.dhere@amd.com> Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Nicholas Susanto <nicholas.susanto@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>