summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/i915/gt/uc
AgeCommit message (Collapse)Author
2024-01-09drm/i915/guc: Avoid circular locking issue on busyness flushJohn Harrison
Avoid the following lockdep complaint: <4> [298.856498] ====================================================== <4> [298.856500] WARNING: possible circular locking dependency detected <4> [298.856503] 6.7.0-rc5-CI_DRM_14017-g58ac4ffc75b6+ #1 Tainted: G N <4> [298.856505] ------------------------------------------------------ <4> [298.856507] kworker/4:1H/190 is trying to acquire lock: <4> [298.856509] ffff8881103e9978 (&gt->reset.backoff_srcu){++++}-{0:0}, at: _intel_gt_reset_lock+0x35/0x380 [i915] <4> [298.856661] but task is already holding lock: <4> [298.856663] ffffc900013f7e58 ((work_completion)(&(&guc->timestamp.work)->work)){+.+.}-{0:0}, at: process_scheduled_works+0x264/0x530 <4> [298.856671] which lock already depends on the new lock. The complaint is not actually valid. The busyness worker thread does indeed hold the worker lock and then attempt to acquire the reset lock (which may have happened in reverse order elsewhere). However, it does so with a trylock that exits if the reset lock is not available (specifically to prevent this and other similar deadlocks). Unfortunately, lockdep does not understand the trylock semantics (the lock is an i915 specific custom implementation for resets). Not doing a synchronous flush of the worker thread when a reset is in progress resolves the lockdep splat by never even attempting to grab the lock in this particular scenario. There are situatons where a synchronous cancel is required, however. So, always do the synchronous cancel if not in reset. And add an extra synchronous cancel to the end of the reset flow to account for when a reset is occurring at driver shutdown and the cancel is no longer synchronous but could lead to unallocated memory accesses if the worker is not stopped. Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Cc: Andi Shyti <andi.shyti@linux.intel.com> Cc: Daniel Vetter <daniel@ffwll.ch> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20231219195957.212600-1-John.C.Harrison@Intel.com
2024-01-09drm/i915/guc: Close deregister-context race against CT-lossAlan Previn
If we are at the end of suspend or very early in resume its possible an async fence signal (via rcu_call) is triggered to free_engines which could lead us to the execution of the context destruction worker (after a prior worker flush). Thus, when suspending, insert rcu_barriers at the start of i915_gem_suspend (part of driver's suspend prepare) and again in i915_gem_suspend_late so that all such cases have completed and context destruction list isn't missing anything. In destroyed_worker_func, close the race against CT-loss by checking that CT is enabled before calling into deregister_destroyed_contexts. Based on testing, guc_lrc_desc_unpin may still race and fail as we traverse the GuC's context-destroy list because the CT could be disabled right before calling GuC's CT send function. We've witnessed this race condition once every ~6000-8000 suspend-resume cycles while ensuring workloads that render something onscreen is continuously started just before we suspend (and the workload is small enough to complete and trigger the queued engine/context free-up either very late in suspend or very early in resume). In such a case, we need to unroll the entire process because guc-lrc-unpin takes a gt wakeref which only gets released in the G2H IRQ reply that never comes through in this corner case. Without the unroll, the taken wakeref is leaked and will cascade into a kernel hang later at the tail end of suspend in this function: intel_wakeref_wait_for_idle(&gt->wakeref) (called by) - intel_gt_pm_wait_for_idle (called by) - wait_for_suspend Thus, do an unroll in guc_lrc_desc_unpin and deregister_destroyed_- contexts if guc_lrc_desc_unpin fails due to CT send falure. When unrolling, keep the context in the GuC's destroy-list so it can get picked up on the next destroy worker invocation (if suspend aborted) or get fully purged as part of a GuC sanitization (end of suspend) or a reset flow. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com> Tested-by: Mousumi Jana <mousumi.jana@intel.com> Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231229215143.581619-1-alan.previn.teres.alexis@intel.com
2024-01-09drm/i915/guc: Flush context destruction worker at suspendAlan Previn
When suspending, flush the context-guc-id deregistration worker at the final stages of intel_gt_suspend_late when we finally call gt_sanitize that eventually leads down to __uc_sanitize so that the deregistration worker doesn't fire off later as we reset the GuC microcontroller. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Tested-by: Mousumi Jana <mousumi.jana@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231228045558.536585-2-alan.previn.teres.alexis@intel.com
2024-01-05drm/i915/huc: Allow for very slow HuC loadingJohn Harrison
A failure to load the HuC is occasionally observed where the cause is believed to be a low GT frequency leading to very long load times. So a) increase the timeout so that the user still gets a working system even in the case of slow load. And b) report the frequency during the load to see if that is the cause of the slow down. Also update the similar code on the GuC load to not use uncore->gt when there is a local gt available. The two should match, but no need for unnecessary de-referencing. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240102222202.310495-1-John.C.Harrison@Intel.com
2024-01-02drm/i915/guc: Change wa and EU_PERF_CNTL registers to MCR typeShuicheng Lin
Some of the wa registers are MCR register, and EU_PERF_CNTL registers are MCR register. MCR register needs extra process for read/write. As normal MMIO register also could work with the MCR register process, change all wa registers to MCR type for code simplicity. Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240102010231.843778-1-shuicheng.lin@intel.com
2023-12-29drm/i915/guc: reconcile Excess struct member kernel-doc warningsRandy Dunlap
Document nested struct members with full names as described in Documentation/doc-guide/kernel-doc.rst. intel_guc.h:305: warning: Excess struct member 'lock' description in 'intel_guc' intel_guc.h:305: warning: Excess struct member 'guc_ids' description in 'intel_guc' intel_guc.h:305: warning: Excess struct member 'num_guc_ids' description in 'intel_guc' intel_guc.h:305: warning: Excess struct member 'guc_ids_bitmap' description in 'intel_guc' intel_guc.h:305: warning: Excess struct member 'guc_id_list' description in 'intel_guc' intel_guc.h:305: warning: Excess struct member 'guc_ids_in_use' description in 'intel_guc' intel_guc.h:305: warning: Excess struct member 'destroyed_contexts' description in 'intel_guc' intel_guc.h:305: warning: Excess struct member 'destroyed_worker' description in 'intel_guc' intel_guc.h:305: warning: Excess struct member 'reset_fail_worker' description in 'intel_guc' intel_guc.h:305: warning: Excess struct member 'reset_fail_mask' description in 'intel_guc' intel_guc.h:305: warning: Excess struct member 'sched_disable_delay_ms' description in 'intel_guc' intel_guc.h:305: warning: Excess struct member 'sched_disable_gucid_threshold' description in 'intel_guc' intel_guc.h:305: warning: Excess struct member 'lock' description in 'intel_guc' intel_guc.h:305: warning: Excess struct member 'gt_stamp' description in 'intel_guc' intel_guc.h:305: warning: Excess struct member 'ping_delay' description in 'intel_guc' intel_guc.h:305: warning: Excess struct member 'work' description in 'intel_guc' intel_guc.h:305: warning: Excess struct member 'shift' description in 'intel_guc' intel_guc.h:305: warning: Excess struct member 'last_stat_jiffies' description in 'intel_guc' 18 warnings as Errors Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: intel-gfx@lists.freedesktop.org Cc: Jonathan Corbet <corbet@lwn.net> Cc: dri-devel@lists.freedesktop.org Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231226195432.10891-3-rdunlap@infradead.org
2023-12-15drm/i915: Use memcpy_from_page() in gt/uc/intel_uc_fw.cZhao Liu
The use of kmap_atomic() is being deprecated in favor of kmap_local_page()[1], and this patch converts the call from kmap_atomic() to kmap_local_page(). The main difference between atomic and local mappings is that local mappings doesn't disable page faults or preemption (the preemption is disabled for !PREEMPT_RT case, otherwise it only disables migration). With kmap_local_page(), we can avoid the often unwanted side effect of unnecessary page faults or preemption disables. In drm/i915/gt/uc/intel_us_fw.c, the function intel_uc_fw_copy_rsa() just use the mapping to do memory copy so it doesn't need to disable pagefaults and preemption for mapping. Thus the local mapping without atomic context (not disable pagefaults / preemption) is enough. Therefore, intel_uc_fw_copy_rsa() is a function where the use of memcpy_from_page() with kmap_local_page() in place of memcpy() with kmap_atomic() is correctly suited. Convert the calls of memcpy() with kmap_atomic() / kunmap_atomic() to memcpy_from_page() which uses local mapping to copy. [1]: https://lore.kernel.org/all/20220813220034.806698-1-ira.weiny@intel.com/T/#u Suggested-by: Ira Weiny <ira.weiny@intel.com> Suggested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231203132947.2328805-8-zhao1.liu@linux.intel.com
2023-12-08drm/i915/guc: Create the guc_to_i915() wrapperAndi Shyti
Given a reference to "guc", the guc_to_i915() returns the pointer to "i915" private data. Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231206184322.57111-1-andi.shyti@linux.intel.com
2023-12-07drm/i915: Use internal class when counting engine resetsTvrtko Ursulin
Commit 503579448db9 ("drm/i915/gsc: Mark internal GSC engine with reserved uabi class") made the GSC0 engine not have a valid uabi class and so broke the engine reset counting, which in turn was made class based in cb823ed9915b ("drm/i915/gt: Use intel_gt as the primary object for handling resets"). Despite the title and commit text of the latter is not mentioning it (and has left the storage array incorrectly sized), tracking by class, despite it adding aliasing in hypthotetical multi-tile systems, is handy for virtual engines which for instance do not have a valid engine->id. Therefore we keep that but just change it to use the internal class which is always valid. We also add a helper to increment the count, which aligns with the existing getter. What was broken without this fix were out of bounds reads every time a reset would happen on the GSC0 engine, or during selftests when storing and cross-checking the counts in igt_live_test_begin and igt_live_test_end. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: dfed6b58d54f ("drm/i915/gsc: Mark internal GSC engine with reserved uabi class") [tursulin: fixed Fixes tag] Reported-by: Alan Previn Teres Alexis <alan.previn.teres.alexis@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231201122109.729006-2-tvrtko.ursulin@linux.intel.com
2023-11-30drm/i915/guc: Add a selftest for FAST_REQUEST errorsJohn Harrison
There is a mechanism for reporting errors from fire and forget H2G messages. This is the only way to find out about almost any error in the GuC backend submission path. So it would be useful to know that it is working. v2: Fix some dumb over-complications and a couple of typos - review feedback from Daniele. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231114010016.234570-3-John.C.Harrison@Intel.com
2023-11-30drm/i915/guc: Fix for potential false positives in GuC hang selftestJohn Harrison
Noticed that the hangcheck selftest is submitting a non-preemptoble spinner. That means that even if the GuC does not die, the heartbeat will still kick in and trigger a reset. Which is rather defeating the purpose of the test - to verify that the heartbeat will kick in if the GuC itself has died. The test is deliberately killing the GuC, so it should never hit the case of a non-dead GuC. But it is not impossible that the kill might fail at some future point due to other driver re-work. So, make the spinner pre-emptible. That way the heartbeat can get through if the GuC is alive and context switching. Thus a reset only happens if the GuC dies. Thus, if the kill should stop working the test will now fail rather than claim to pass. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231114010016.234570-2-John.C.Harrison@Intel.com
2023-11-29drm/i915/pxp: Add drm_dbgs for critical PXP events.Alan Previn
Debugging PXP issues can't even begin without understanding precedding sequence of important events. Add drm_dbg into the most important PXP events. v5 : - rebase. v4 : - rebase. v3 : - move gt_dbg to after mutex block in function i915_gsc_proxy_component_bind. (Vivaik) v2 : - remove __func__ since drm_dbg covers that (Jani). - add timeout dbg of the restart from front-end (Alan). Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Vivaik Balasubrawmanian <vivaik.balasubrawmanian@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231122191523.58379-1-alan.previn.teres.alexis@intel.com
2023-11-20drm/i915: Track gt pm wakerefsAndrzej Hajda
Track every intel_gt_pm_get() until its corresponding release in intel_gt_pm_put() by returning a cookie to the caller for acquire that must be passed by on released. When there is an imbalance, we can see who either tried to free a stale wakeref, or who forgot to free theirs. v2: track recently added calls in gen8_ggtt_bind_get_ce and destroyed_worker_func Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231030-ref_tracker_i915-v1-2-006fe6b96421@intel.com
2023-11-16drm/i915/huc: Stop printing about unsupported HuC on MTLDaniele Ceraolo Spurio
On MTL, the HuC is only supported on the media GT, so our validation check on the module parameter detects an inconsistency on the root GT (the modparams asks to enable HuC, but the support is not there) and prints the following info message: [drm] GT0: Incompatible option enable_guc=3 - HuC is not supported! This can be confusing to the user and make them think that something is wrong when it isn't, so we need to silence it. Given that any platform that supports HuC also supports GuC, if a user tries to enable HuC on a platform that really doesn't support it they'll already see a message about GuC not being supported, so instead of just silencing the HuC message on newer platforms we can just get rid of it entirely. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <john.c.harrison@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231109235436.2349963-1-daniele.ceraolospurio@intel.com
2023-10-20Merge tag 'drm-intel-gt-next-2023-10-19' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-next Driver Changes: Fixes/improvements/new stuff: - Retry gtt fault when out of fence registers (Ville Syrjälä) - Determine context valid in OA reports [perf] (Umesh Nerlige Ramappa) Future platform enablement: - GuC based TLB invalidation for Meteorlake (Jonathan Cavitt, Prathap Kumar Valsan) - Don't set PIPE_CONTROL_FLUSH_L3 [mtl] (Vinay Belgaumkar) Miscellaneous: - Clean up zero initializers [guc,pxp] (Ville Syrjälä) - Prevent potential null-ptr-deref in engine_init_common (Nirmoy Das) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ZTFDFSbd/U7YP+hI@tursulin-desk
2023-10-18drm/i915: No TLB invalidation on wedged GTJonathan Cavitt
It is not an error for GuC TLB invalidations to fail when the GT is wedged or disabled, so do not process a wait failure as one in guc_send_invalidate_tlb. Signed-off-by: Fei Yang <fei.yang@intel.com> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> CC: John Harrison <john.c.harrison@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Nirmoy Das <nirmoy.das@intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231017180806.3054290-6-jonathan.cavitt@intel.com
2023-10-18drm/i915: No TLB invalidation on suspended GTJonathan Cavitt
In case of GT is suspended, don't allow submission of new TLB invalidation request and cancel all pending requests. The TLB entries will be invalidated either during GuC reload or on system resume. Signed-off-by: Fei Yang <fei.yang@intel.com> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> CC: John Harrison <john.c.harrison@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Nirmoy Das <nirmoy.das@intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231017180806.3054290-5-jonathan.cavitt@intel.com
2023-10-18drm/i915: Define and use GuC and CTB TLB invalidation routinesPrathap Kumar Valsan
The GuC firmware had defined the interface for Translation Look-Aside Buffer (TLB) invalidation. We should use this interface when invalidating the engine and GuC TLBs. Add additional functionality to intel_gt_invalidate_tlb, invalidating the GuC TLBs and falling back to GT invalidation when the GuC is disabled. The invalidation is done by sending a request directly to the GuC tlb_lookup that invalidates the table. The invalidation is submitted as a wait request and is performed in the CT event handler. This means we cannot perform this TLB invalidation path if the CT is not enabled. If the request isn't fulfilled in two seconds, this would constitute an error in the invalidation as that would constitute either a lost request or a severe GuC overload. With this new invalidation routine, we can perform GuC-based GGTT invalidations. GuC-based GGTT invalidation is incompatible with MMIO invalidation so we should not perform MMIO invalidation when GuC-based GGTT invalidation is expected. The additional complexity incurred in this patch will be necessary for range-based tlb invalidations, which will be platformed in the future. Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com> Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> Signed-off-by: Fei Yang <fei.yang@intel.com> CC: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231017180806.3054290-4-jonathan.cavitt@intel.com
2023-10-18drm/i915/guc: Add CT size delay helperJonathan Cavitt
As of now, there is no mechanism for tracking a given request's progress through the queue. Instead, add a helper that returns an estimated maximum time the queue should take to drain if completely full. Suggested-by: John Harrison <john.c.harrison@intel.com> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: John Harrison <john.c.harrison@intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231017180806.3054290-3-jonathan.cavitt@intel.com
2023-10-17drm/i915/guc: Clean up zero initializersVille Syrjälä
Just use a simple {} to zero initialize arrays/structs instead of the hodgepodge of stuff we are using currently. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231012122442.15718-4-ville.syrjala@linux.intel.com Reviewed-by: Jani Nikula <jani.nikula@intel.com>
2023-10-17Merge tag 'drm-intel-gt-next-2023-10-12' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-next Driver Changes: Fixes/improvements/new stuff: - Register engines early to avoid type confusion (Mathias Krause) - Suppress 'ignoring reset notification' message [guc] (John Harrison) - Update 'recommended' version to 70.12.1 for DG2/ADL-S/ADL-P/MTL [guc] (John Harrison) - Enable WA 14018913170 [guc, dg2] (Daniele Ceraolo Spurio) Future platform enablement: - Clean steer semaphore on resume (Nirmoy Das) - Skip MCR ops for ring fault register [mtl] (Nirmoy Das) - Make i915_gem_shrinker multi-gt aware [gem] (Jonathan Cavitt) - Enable GGTT updates with binder in MTL (Nirmoy Das, Chris Wilson) - Invalidate the TLBs on each GT (Chris Wilson) Miscellaneous: - Clarify type evolution of uabi_node/uabi_engines (Mathias Krause) - Annotate struct ct_incoming_msg with __counted_by [guc] (Kees Cook) - More use of GT specific print helpers [gt] (John Harrison) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ZSfKotZVdypU6NaX@tursulin-desk
2023-10-10drm/i915: More use of GT specific print helpersJohn Harrison
Update a bunch of GT related print messages in non-GT files to use the GT specific helpers. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231009183802.673882-3-John.C.Harrison@Intel.com
2023-10-09drm/i915/guc: Enable WA 14018913170Daniele Ceraolo Spurio
The GuC handles the WA, the KMD just needs to set the flag to enable it on the appropriate platforms. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231006013553.1339418-1-John.C.Harrison@Intel.com
2023-10-09drm/i915/guc: Annotate struct ct_incoming_msg with __counted_byKees Cook
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct ct_incoming_msg. Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: intel-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Cc: linux-hardening@vger.kernel.org Link: https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci [1] Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231006201744.work.135-kees@kernel.org
2023-10-07drm/i915/guc: Update 'recommended' version to 70.12.1 for DG2/ADL-S/ADL-P/MTLJohn Harrison
The latest GuC has new features and new workarounds that we wish to enable. So let the universe know that it is useful to update their firmware. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231006145801.161868-1-John.C.Harrison@Intel.com
2023-10-03drm/i915/guc: Suppress 'ignoring reset notification' messageJohn Harrison
If an active context has been banned (e.g. Ctrl+C killed) then it is likely to be reset as part of evicting it from the hardware. That results in a 'ignoring context reset notification: banned = 1' message at info level. This confuses/concerns people and makes them think something has gone wrong when it hasn't. There is already a debug level message with essentially the same information. So drop the 'ignore' info level one and just add the 'ignore' flag to the debug level one instead (which will therefore not appear by default but will still show up in CI runs). Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230921182033.135448-1-John.C.Harrison@Intel.com
2023-10-03Merge tag 'drm-intel-gt-next-2023-09-28' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-next Driver Changes: Fixes/improvements/new stuff: - Fix TLB-Invalidation seqno store [mtl] (Alan Previn) - Force a reset on internal GuC error [guc] (John Harrison) - Define GSC fw [gsc] (Daniele Ceraolo Spurio) - Update workaround 14016712196 [dg2/mtl] (Tejas Upadhyay) - Mark requests for GuC virtual engines to avoid use-after-free (Andrzej Hajda) - Add Wa_14015150844 [dg2/mtl] (Shekhar Chauhan) - Prevent error pointer dereference (Dan Carpenter) - Add Wa_18022495364 [tgl,adl,rpl] (Dnyaneshwar Bhadane) - Fix GuC PMU by moving execlist stats initialization to execlist specific setup (Umesh Nerlige Ramappa) - Fix PXP firmware load [pxp/mtl] (Alan Previn) - Fix execution/context state of PXP contexts (Alan Previn) - Limit the length of an sg list to the requested length (Matthew Wilcox) - Fix reservation address in ggtt_reserve_guc_top [guc] (Javier Pello) - Add Wa_18028616096 [dg2] (Shekhar Chauhan) - Get runtime pm in busyness worker only if already active [guc/pmu] (Umesh Nerlige Ramappa) - Don't set PIPE_CONTROL_FLUSH_L3 for aux inval (Nirmoy Das) Future platform enablement: - Fix and consolidate some workaround checks, make others IP version based [mtl] (Matt Roper) - Replace Meteorlake subplatforms with IP version checks (Matt Roper) - Adding DeviceID for Arrowlake-S under MTL [mtl] (Nemesa Garg) - Run relevant bits of debugfs drop_caches per GT (Tvrtko Ursulin) Miscellaneous: - Remove Wa_15010599737 [dg2] (Shekhar Chauhan) - Align igt_spinner_create_request with hangcheck [selftests] (Jonathan Cavitt) - Remove pre-production workarounds [dg2] (Matt Roper) - Tidy some workaround definitions (Matt Roper) - Wait longer for tasks in migrate selftest [gt] (Jonathan Cavitt) - Skip WA verification for GEN7_MISCCPCTL on DG2 [gt] (Andrzej Hajda) - Silence injected failure in the load via GSC path [huc] (Daniele Ceraolo Spurio) - Refactor deprecated strncpy (Justin Stitt) - Update RC6 mask for mtl_drpc [debugfs/mtl] (Badal Nilawar) - Remove a static inline that requires including i915_drv.h [gt] (Jani Nikula) - Remove inlines from i915_gem_execbuffer.c [gem] (Jani Nikula) - Remove gtt_offset from stream->oa_buffer.head/.tail [perf] (Ashutosh Dixit) - Do not disable preemption for resets (Tvrtko Ursulin) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ZRVzL02VFuwIkcGl@tursulin-desk
2023-09-26i915/guc: Get runtime pm in busyness worker only if already activeUmesh Nerlige Ramappa
Ideally the busyness worker should take a gt pm wakeref because the worker only needs to be active while gt is awake. However, the gt_park path cancels the worker synchronously and this complicates the flow if the worker is also running at the same time. The cancel waits for the worker and when the worker releases the wakeref, that would call gt_park and would lead to a deadlock. The resolution is to take the global pm wakeref if runtime pm is already active. If not, we don't need to update the busyness stats as the stats would already be updated when the gt was parked. Note: - We do not requeue the worker if we cannot take a reference to runtime pm since intel_guc_busyness_unpark would requeue the worker in the resume path. - If the gt was parked longer than time taken for GT timestamp to roll over, we ignore those rollovers since we don't care about tracking the exact GT time. We only care about roll overs when the gt is active and running workloads. - There is a window of time between gt_park and runtime suspend, where the worker may run. This is acceptable since the worker will not find any new data to update busyness. v2: (Daniele) - Edit commit message and code comment - Use runtime pm in the worker - Put runtime pm after enabling the worker - Use Link tag and add Fixes tag v3: (Daniele) - Reword commit and comments and add details Link: https://gitlab.freedesktop.org/drm/intel/-/issues/7077 Fixes: 77cdd054dd2c ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230925192117.2497058-1-umesh.nerlige.ramappa@intel.com
2023-09-19drm/i915/pxp/mtl: Update pxp-firmware response timeoutAlan Previn
Update the max GSC-fw response time to match updated internal fw specs. Because this response time is an SLA on the firmware, not inclusive of i915->GuC->HW handoff latency, when submitting requests to the GSC fw via intel_gsc_uc_heci_cmd_submit helpers, start the count after the request hits the GSC command streamer. Also, move GSC_REPLY_LATENCY_MS definition from pxp header to intel_gsc_uc_heci_cmd_submit.h since its for any GSC HECI packet. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Vivaik Balasubrawmanian <vivaik.balasubrawmanian@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230917211933.1407559-2-alan.previn.teres.alexis@intel.com
2023-08-30drm/i915: mark requests for GuC virtual engines to avoid use-after-freeAndrzej Hajda
References to i915_requests may be trapped by userspace inside a sync_file or dmabuf (dma-resv) and held indefinitely across different proceses. To counter-act the memory leaks, we try to not to keep references from the request past their completion. On the other side on fence release we need to know if rq->engine is valid and points to hw engine (true for non-virtual requests). To make it possible extra bit has been added to rq->execution_mask, for marking virtual engines. Fixes: bcb9aa45d5a0 ("Revert "drm/i915: Hold reference to intel_context over life of i915_request"") Signed-off-by: Chris Wilson <chris.p.wilson@linux.intel.com> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230821153035.3903006-1-andrzej.hajda@intel.com (cherry picked from commit 280410677af763f3871b93e794a199cfcf6fb580) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-08-30drm/i915: mark requests for GuC virtual engines to avoid use-after-freeAndrzej Hajda
References to i915_requests may be trapped by userspace inside a sync_file or dmabuf (dma-resv) and held indefinitely across different proceses. To counter-act the memory leaks, we try to not to keep references from the request past their completion. On the other side on fence release we need to know if rq->engine is valid and points to hw engine (true for non-virtual requests). To make it possible extra bit has been added to rq->execution_mask, for marking virtual engines. Fixes: bcb9aa45d5a0 ("Revert "drm/i915: Hold reference to intel_context over life of i915_request"") Signed-off-by: Chris Wilson <chris.p.wilson@linux.intel.com> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230821153035.3903006-1-andrzej.hajda@intel.com
2023-08-28drm/i915/gsc: define gsc fwDaniele Ceraolo Spurio
Add FW definition and the matching override modparam. The GSC FW has both a release version, based on platform and a rolling counter, and a compatibility version, which is the one tracking interface changes. Since what we care about is the interface, we use the compatibility version in the binary names. Same as with the GuC, a major version bump indicate a backward-incompatible change, while a minor version bump indicates a backward-compatible one, so we use only the former in the file name. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230825162754.1949838-1-daniele.ceraolospurio@intel.com
2023-08-22drm/i915/guc: Force a reset on internal GuC errorJohn Harrison
If GuC hits an internal error (and survives long enough to report it to the KMD), it is basically toast and will stop until a GT reset and subsequent GuC reload is performed. Previously, the KMD just printed an error message and then waited for the heartbeat to eventually kick in and trigger a reset (assuming the heartbeat had not been disabled). Instead, force the reset immediately to guarantee that it happens and to eliminate the very long heartbeat delay. The captured error state is also more likely to be useful if captured at the time of the error rather than many seconds later. Note that it is not possible to trigger a reset from with the G2H handler itself. The reset prepare process involves flushing outstanding G2H contents. So a deadlock could result. Instead, the G2H handler queues a worker thread to do the reset asynchronously. v2: Flush the worker on suspend and shutdown. Add rate limiting to prevent spam from a totally dead system (review feedback from Daniele). Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230816003957.3572654-1-John.C.Harrison@Intel.com
2023-08-21drm/i915: Eliminate IS_MTL_GRAPHICS_STEPMatt Roper
Several workarounds are guarded by IS_MTL_GRAPHICS_STEP. However none of these workarounds are actually tied to MTL as a platform; they only relate to the Xe_LPG graphics IP, regardless of what platform it appears in. At the moment MTL is the only platform that uses Xe_LPG with IP versions 12.70 and 12.71, but we can't count on this being true in the future. Switch these to use a new IS_GFX_GT_IP_STEP() macro instead that is purely based on IP version. IS_GFX_GT_IP_STEP() is also GT-based rather than device-based, which will help prevent mistakes where we accidentally try to apply Xe_LPG graphics workarounds to the Xe_LPM+ media GT and vice-versa. v2: - Switch to a more generic and shorter IS_GT_IP_STEP macro that can be used for both graphics and media IP (and any other kind of GTs that show up in the future). v3: - Switch back to long-form IS_GFX_GT_IP_STEP macro. (Jani) - Move macro to intel_gt.h. (Andi) v4: - Build IS_GFX_GT_IP_STEP on top of IS_GFX_GT_IP_RANGE and IS_GRAPHICS_STEP building blocks and name the parameters from/until rather than begin/fixed. (Jani) - Fix usage examples in comment. v5: - Tweak comment on macro. (Gustavo) Cc: Gustavo Sousa <gustavo.sousa@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Andi Shyti <andi.shyti@linux.intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230821180619.650007-15-matthew.d.roper@intel.com
2023-08-21drm/i915: Consolidate condition for Wa_22011802037Matt Roper
The workaround bounds for Wa_22011802037 are somewhat complex and are replicated in several places throughout the code. Pull the condition out to a helper function to prevent mistakes if this condition needs to change again in the future. Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230821180619.650007-12-matthew.d.roper@intel.com
2023-08-17drm/i915/dg2: Drop Wa_16011777198Matt Roper
Wa_16011777198 only applies to pre-production steppings of DG2, which we're no longer supporting. Remove the workaround and override_gucrc handling, which is no longer needed. Since this was the final use of IS_DG2_GRAPHICS_STEP, that macro can also be removed now. v2: - Include the promised removal of override_gucrc handling. Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230816214824.548575-2-matthew.d.roper@intel.com
2023-08-17drm/i915/dg2: Drop pre-production GT workaroundsMatt Roper
DG2 first production steppings were C0 (for DG2-G10), B1 (for DG2-G11), and A1 (for DG2-G12). Several workarounds that apply onto to pre-production hardware can be dropped. Furthermore, several workarounds that apply to all production steppings can have their conditions simplified to no longer check the GT stepping. v2: - Keep Wa_16011777198 in place for now; it will be removed separately in a follow-up patch to keep review easier. Bspec: 44477 Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230816214201.534095-10-matthew.d.roper@intel.com
2023-08-15Merge tag 'drm-intel-gt-next-2023-08-11' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-next Cross-subsystem Changes: - Backmerge of drm-next Driver Changes: - Apply workaround 22016122933 correctly (Jonathan, Matt R) - Simplify shmem_create_from_object map_type selection (Jonathan, Tvrtko) - Make i915_coherent_map_type GT-centric (Jonathan, Matt R) - Selftest improvements (John) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ZNYR3bKFquGc7u9w@jlahtine-mobl.ger.corp.intel.com
2023-08-10drm/i915/guc: Fix potential null pointer deref in GuC 'steal id' testJohn Harrison
It was noticed that if the very first 'stealing' request failed to create for some reason then the 'steal all ids' loop would immediately exit with 'last' still being NULL. The test would attempt to continue but using a null pointer. Fix that by aborting the test if it fails to create any requests at all. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230802184940.911753-1-John.C.Harrison@Intel.com
2023-08-10drm/i915/gt: Apply workaround 22016122933 correctlyJonathan Cavitt
WA_22016122933 was recently applied to all MeteorLake engines, which is simultaneously too broad (should only apply to Media engines) and too specific (should apply to all platforms that use the same media engine as MeteorLake). Correct this in cases where coherency settings are modified. There were also two additional places where the workaround was applied unconditionally. The change was confirmed as necessary for all platforms, so the workaround label was removed. Suggested-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Acked-by: Fei Yang <fei.yang@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230801153242.2445478-4-jonathan.cavitt@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20230807121957.598420-4-andi.shyti@linux.intel.com
2023-08-10drm/i915: Make i915_coherent_map_type GT-centricJonathan Cavitt
Refactor i915_coherent_map_type to be GT-centric rather than device-centric. Each GT may require different coherency handling due to hardware workarounds. Since the function now takes a GT instead of the i915, the function is renamed and moved to the gt folder. Suggested-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Acked-by: Fei Yang <fei.yang@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230801153242.2445478-3-jonathan.cavitt@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20230807121957.598420-3-andi.shyti@linux.intel.com
2023-08-07drm/i915/adls: s/ADLS_RPLS/RAPTORLAKE_S in platform and subplatform definesDnyaneshwar Bhadane
Driver refers to the platform Alderlake S as ADLS_RPLS in places and RAPTORLAKE_S in some. v2: - Unrolled wrapper IS_ADLS_GRAPHICS_STEP v3: - Replace IS_RAPTORLAKE_S instead of IS_ADLS_RPLS. (Tvrtko/Lucas). - Remove unused macro IS_ADLS_GRAPHICS/DISPLAY_STEP - Change the subject Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Anusha Srivatsa <anusha.srivatsa@intel.com> Signed-off-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com> Reviewed-by: Anusha Srivatsa <anusha.srivatsa@intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230801135344.3797924-15-dnyaneshwar.bhadane@intel.com
2023-08-07drm/i915/adln: s/ADLP/ALDERLAKE_P in ADLN definesAnusha Srivatsa
Follow consistent naming convention. Replace ADLP with ALDERLAKE_P Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230801135344.3797924-14-dnyaneshwar.bhadane@intel.com
2023-08-07Merge tag 'drm-intel-gt-next-2023-08-04' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-next Driver Changes: - Avoid infinite GPU waits by avoidin premature release of request's reusable memory (Chris, Janusz) - Expose RPS thresholds in sysfs (Tvrtko) - Apply GuC SLPC min frequency softlimit correctly (Vinay) - Restore SLPC efficient freq earlier (Vinay) - Consider OA buffer boundary when zeroing out reports (Umesh) - Extend Wa_14015795083 to TGL, RKL, DG1 and ADL (Matt R) - Fix context workarounds with non-masked regs on MTL/DG2 (Lucas) - Enable the CCS_FLUSH bit in the pipe control and in the CS for MTL+ (Andi) - Update MTL workarounds 14018778641, 22016122933 (Tejas, Zhanjun) - Ensure memory quiesced before AUX CCS invalidation (Jonathan) - Add a gsc_info debugfs (Daniele) - Invalidate the TLBs on each GT on multi-GT device (Chris) - Fix a VMA UAF for multi-gt platform (Nirmoy) - Do not use stolen on MTL due to HW bug (Nirmoy) - Check HuC and GuC version compatibility on MTL (Daniele) - Dump perf_limit_reasons for slow GuC init debug (Vinay) - Replace kmap() with kmap_local_page() (Sumitra, Ira) - Add sentinel to xehp_oa_b_counters for KASAN (Andrzej) - Add the gen12_needs_ccs_aux_inv helper (Andi) - Fixes and updates for GSC memory allocation (Daniele) - Fix one wrong caching mode enum usage (Tvrtko) - Fixes for GSC wakeref (Alan) - Static checker fixes (Harshit, Arnd, Dan, Cristophe, David, Andi) - Rename flags with bit_group_X according to the datasheet (Andi) - Use direct alias for i915 in requests (Andrzej) - Replace i915->gt0 with to_gt(i915) (Andi) - Use the i915_vma_flush_writes helper (Tvrtko) - Selftest improvements (Alan) - Remove dead code (Tvrtko) Signed-off-by: Dave Airlie <airlied@redhat.com> # Conflicts: # drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ZMy6kDd9npweR4uy@jlahtine-mobl.ger.corp.intel.com
2023-08-07Merge tag 'drm-intel-next-2023-08-03' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-next - Removing unused declarations (Arnd, Gustavo) - ICL+ DSI modeset sequence fixes (Ville) - Improvements on HDCP (Suraj) - Fixes and clean up on MTL Display (Mika Kahola, Lee, RK, Nirmoy, Chaitanya) - Restore HSW/BDW PSR1 (Ville) - Other PSR Fixes (Jouni) - Fixes around DC states and other Display Power (Imre) - Init DDI ports in VBT order (Ville) - General documentation fixes (Jani) - General refactor for better organization (Jani) - Bigjoiner fix (Stanislav) - VDSC Fixes and improvements (Stanialav, Suraj) - Hotplug fixes and improvements (Simon, Suraj) - Start using plane scale factor for relative data rate (Stanislav) - Use shmem for dpt objects (RK) - Simplify expression &to_i915(dev)->drm (Uwe) - Do not access i915_gem_object members from frontbuffer tracking (Jouni) - Fix uncore race around i915->params.mmio_debug (Jani) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ZMv4RCzGyCmG/BDe@intel.com
2023-08-02drm/i915/guc/slpc: Restore efficient freq earlierVinay Belgaumkar
This should be done before the soft min/max frequencies are restored. When we disable the "Ignore efficient frequency" flag, GuC does not actually bring the requested freq down to RPn. Specifically, this scenario- - ignore efficient freq set to true - reduce min to RPn (from efficient) - suspend - resume (includes GuC load, restore soft min/max, restore efficient freq) - validate min freq has been resored to RPn This will fail if we didn't first restore(disable, in this case) efficient freq flag before setting the soft min frequency. v2: Bring the min freq down to RPn when we disable efficient freq (Rodrigo) Also made the change to set the min softlimit to RPn at init. Otherwise, we were storing RPe there. Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8736 Fixes: 55f9720dbf23 ("drm/i915/guc/slpc: Provide sysfs for efficient freq") Fixes: 95ccf312a1e4 ("drm/i915/guc/slpc: Allow SLPC to use efficient frequency") Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230726010044.3280402-1-vinay.belgaumkar@intel.com
2023-07-31drm/i915/huc: fix intel_huc.c doc bulleted list format errorDavid Reaver
Fix the following make htmldocs errors/warnings: ./drivers/gpu/drm/i915/gt/uc/intel_huc.c:29: ERROR: Unexpected indentation. ./drivers/gpu/drm/i915/gt/uc/intel_huc.c:30: WARNING: Block quote ends without a blank line; unexpected unindent. ./drivers/gpu/drm/i915/gt/uc/intel_huc.c:35: WARNING: Bullet list ends without a blank line; unexpected unindent. This output is a bit misleading. The real issue here is we need a blank line before and after the bulleted list. Link: https://www.kernel.org/doc/html/latest/gpu/i915.html#huc Link: https://lore.kernel.org/dri-devel/20230530152958.1384061-1-daniele.ceraolospurio@intel.com/ Signed-off-by: David Reaver <me@davidreaver.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230727025400.372965-1-me@davidreaver.com
2023-07-27drm/i915/selftest/gsc: Ensure GSC Proxy init completes before selftestsAlan Previn
On MTL, if the GSC Proxy init flows haven't completed, submissions to the GSC engine will fail. Those init flows are dependent on the mei's gsc_proxy component that is loaded in parallel with i915 and a worker that could potentially start after i915 driver init is done. That said, all subsytems that access the GSC engine today does check for such init flow completion before using the GSC engine. However, selftests currently don't wait on anything before starting. To fix this, add a waiter function at the start of __run_selftests that waits for gsc-proxy init flows to complete. Selftests shouldn't care if the proxy-init failed as that should be flagged elsewhere. Difference from prior versions: v7: - Change the fw status to INTEL_UC_FIRMWARE_LOAD_FAIL if the proxy-init fails so that intel_gsc_uc_fw_proxy_get_status catches it. (Daniele) v6: - Add a helper that returns something more than a boolean so we selftest can stop waiting if proxy-init hadn't completed but failed (Daniele). v5: - Move the call to __wait_gsc_proxy_completed from common __run_selftests dispatcher to the group-level selftest function (Trvtko). - change the pr_info to pr_warn if we hit the timeout. v4: - Remove generalized waiters function table framework (Tvrtko). - Remove mention of CI-framework-timeout from comments (Tvrtko). v3: - Rebase to latest drm-tip. v2: - Based on internal testing, increase the timeout for gsc-proxy specific case to 8 seconds. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230720230126.375566-1-alan.previn.teres.alexis@intel.com
2023-07-20drm/i915/huc: check HuC and GuC version compatibility on MTLDaniele Ceraolo Spurio
Due to a change in the auth flow on MTL, GuC 70.7.0 and newer will only be able to authenticate HuC 8.5.1 and newer. The plan is to update the 2 binaries synchronously in linux-firmware so that the fw repo always has a matching pair that works; still, it's better to check in the kernel so we can print an error message and abort HuC loading if the binaries are out of sync instead of failing the authentication. v2: Add clarification comment, fix typo in commit msg, clean up variable declaration (John) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> #v1 Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230717234905.117114-1-daniele.ceraolospurio@intel.com
2023-07-06drm/i915/guc: Dump perf_limit_reasons for debugVinay Belgaumkar
GuC load takes longer sometimes due to GT frequency not ramping up. Add perf_limit_reasons to the existing warn print to see if frequency is being throttled. v2: Review comments (Ashutosh) Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230627191336.319381-1-vinay.belgaumkar@intel.com