summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/i915/gt/intel_reset.c
AgeCommit message (Collapse)Author
2022-05-12i915/guc/reset: Make __guc_reset_context aware of guilty enginesUmesh Nerlige Ramappa
There are 2 ways an engine can get reset in i915 and the method of reset affects how KMD labels a context as guilty/innocent. (1) GuC initiated engine-reset: GuC resets a hung engine and notifies KMD. The context that hung on the engine is marked guilty and all other contexts are innocent. The innocent contexts are resubmitted. (2) GT based reset: When an engine heartbeat fails to tick, KMD initiates a gt/chip reset. All active contexts are marked as guilty and discarded. In order to correctly mark the contexts as guilty/innocent, pass a mask of engines that were reset to __guc_reset_context. Fixes: eb5e7da736f3 ("drm/i915/guc: Reset implementation for new GuC interface") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220426003045.3929439-1-umesh.nerlige.ramappa@intel.com
2022-05-06drm/i915: Drop has_reset_engine from device infoJosé Roberto de Souza
No need to have this parameter in intel_device_info struct as all platforms with graphics version 7 or newer can reset engines. As a side effect of the of removal this flag, it will not be printed in dmesg during driver load anymore and developers will have to rely on to check the macro and compare with platform being used and IP versions of it. Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220505193524.276400-3-jose.souza@intel.com
2022-04-21Merge drm/drm-next into drm-intel-gt-nextRodrigo Vivi
In order to get the GSC Support merged on drm-intel-gt-next in a clean fashion we needed this ATS-M patch to avoid conflict in i915_pci.c: commit 412c942bdfae ("drm/i915/ats-m: add ATS-M platform info") -- Fixing a silent conflict on drivers/gpu/drm/i915/gt/intel_gt_gmch.c: - if (!intel_vtd_active(i915)) + if (!i915_vtd_active(i915)) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2022-04-19drm/i915/guc: Enable Wa_22011802037 for gen12 GuC based platformsUmesh Nerlige Ramappa
Initiating a reset when the command streamer is not idle or in the middle of executing an MI_FORCE_WAKE can result in a hang. Multiple command streamers can be part of a single reset domain, so resetting one would mean resetting all command streamers in that domain. To workaround this, before initiating a reset, ensure that all command streamers within that reset domain are either IDLE or are not executing a MI_FORCE_WAKE. Enable GuC PRE_PARSER WA bit so that GuC follows the WA sequence when initiating engine-resets. For gt-resets, ensure that i915 applies the WA sequence. Opens to address in future patches: - The part of the WA to wait for pending forcewakes is also applicable to execlists backend. - The WA also needs to be applied for gen11 Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220415224025.3693037-3-umesh.nerlige.ramappa@intel.com
2022-03-22drm/i915/guc: Plumb GuC-capture into gpu_coredumpAlan Previn
Add a flags parameter through all of the coredump creation functions. Add a bitmask flag to indicate if the top level gpu_coredump event is triggered in response to a GuC context reset notification. Using that flag, ensure all coredump functions that read or print mmio-register values related to work submission or command-streamer engines are skipped and replaced with a calls guc-capture module equivalent functions to retrieve or print the register dump. While here, split out display related register reading and printing into its own function that is called agnostic to whether GuC had triggered the reset. For now, introduce an empty printing function that can filled in on a subsequent patch just to handle formatting. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220321164527.2500062-13-alan.previn.teres.alexis@intel.com
2022-03-02drm/i915: Use str_yes_no()Lucas De Marchi
Remove the local yesno() implementation and adopt the str_yes_no() from linux/string_helpers.h. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220225234631.3725943-1-lucas.demarchi@intel.com
2022-02-23Merge tag 'drm-intel-gt-next-2022-02-17' of ↵Rodrigo Vivi
git://anongit.freedesktop.org/drm/drm-intel into drm-intel-next UAPI Changes: - Weak parallel submission support for execlists Minimal implementation of the parallel submission support for execlists backend that was previously only implemented for GuC. Support one sibling non-virtual engine. Core Changes: - Two backmerges of drm/drm-next for header file renames/changes and i915_regs reorganization Driver Changes: - Add new DG2 subplatform: DG2-G12 (Matt R) - Add new DG2 workarounds (Matt R, Ram, Bruce) - Handle pre-programmed WOPCM registers for DG2+ (Daniele) - Update guc shim control programming on XeHP SDV+ (Daniele) - Add RPL-S C0/D0 stepping information (Anusha) - Improve GuC ADS initialization to work on ARM64 on dGFX (Lucas) - Fix KMD and GuC race on accessing PMU busyness (Umesh) - Use PM timestamp instead of RING TIMESTAMP for reference in PMU with GuC (Umesh) - Report error on invalid reset notification from GuC (John) - Avoid WARN splat by holding RPM wakelock during PXP unbind (Juston) - Fixes to parallel submission implementation (Matt B.) - Improve GuC loading status check/error reports (John) - Tweak TTM LRU priority hint selection (Matt A.) - Align the plane_vma to min_page_size of stolen mem (Ram) - Introduce vma resources and implement async unbinding (Thomas) - Use struct vma_resource instead of struct vma_snapshot (Thomas) - Return some TTM accel move errors instead of trying memcpy move (Thomas) - Fix a race between vma / object destruction and unbinding (Thomas) - Remove short-term pins from execbuf (Maarten) - Update to GuC version 69.0.3 (John, Michal Wa.) - Improvements to GT reset paths in GuC backend (Matt B.) - Use shrinker_release_pages instead of writeback in shmem object hooks (Matt A., Tvrtko) - Use trylock instead of blocking lock when freeing GEM objects (Maarten) - Allocate intel_engine_coredump_alloc with ALLOW_FAIL (Matt B.) - Fixes to object unmapping and purging (Matt A) - Check for wedged device in GuC backend (John) - Avoid lockdep splat by locking dpt_obj around set_cache_level (Maarten) - Allow dead vm to unbind vma's without lock (Maarten) - s/engine->i915/i915/ for DG2 engine workarounds (Matt R) - Use to_gt() helper for GGTT accesses (Michal Wi.) - Selftest improvements (Matt B., Thomas, Ram) - Coding style and compiler warning fixes (Matt B., Jasmine, Andi, Colin, Gustavo, Dan) From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Yg4i2aCZvvee5Eai@jlahtine-mobl.ger.corp.intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Fixed conflicts while applying, using the fixups/drm-intel-gt-next.patch from drm-rerere's 1f2b1742abdd ("2022y-02m-23d-16h-07m-57s UTC: drm-tip rerere cache update")]
2022-02-16drm/i915: Move MCHBAR registers to their own headerMatt Roper
Registers that exist within the MCH BAR and are mirrored into the GPU's MMIO space are a good candidate to separate out into their own header. For reference, the mirror of the MCH BAR starts at the following locations in the graphics MMIO space (the end of the MCHBAR range differs slightly on each platform): * Pre-gen6: 0x10000 * Gen6-Gen11 + RKL: 0x140000 v2: - Create separate patch to swtich a few register definitions to be relative to the MCHBAR mirror base. - Drop upper bound of MCHBAR mirror from commit message; there are too many different combinations between various platforms to list out, and the documentation is spotty for the older pre-gen6 platforms anyway. Bspec: 134, 51771 Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220215061342.2055952-2-matthew.d.roper@intel.com
2022-02-16drm/i915/gt: Move SFC lock bits to intel_engine_regs.hMatt Roper
These SFC registers were defined in an unusual way, taking an engine as a parameter rather than an engine MMIO base offset. Let's adjust them to match the style used by other per-engine registers and move them to intel_engine_regs.h. While doing this move, we can drop GEN12_HCP_SFC_FORCED_LOCK completely; it was intended for use in an early version of a hardware workaround, but was no longer necessary by the time the workaround was finalized. It is not used anywhere in the driver. Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220209051140.1599643-3-matthew.d.roper@intel.com
2022-02-14drm/i915: split out i915_file_private.h from i915_drv.hJani Nikula
Limit the scope of struct drm_i915_file_private to the files that actually need it. Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/e375859dc1729a1b988036e4103e5b1bd48caa00.1644507885.git.jani.nikula@intel.com
2022-02-11drm/i915/dg2: Add Wa_22011100796Bruce Chang
Whenever Full soft reset is required, reset all individual engines first, and then do a full soft reset. Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com> cc: Matt Roper <matthew.d.roper@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220128185209.18077-5-ramalingam.c@intel.com
2022-02-02drm/i915: Move GT registers to their own header fileMatt Roper
This is a huge, chaotic mass of registers copied over as-is without any real cleanup. We'll come back and organize these better, align on consistent coding style, remove dead code, etc. in separate patches later that will be easier to review. v2: - Add missing include in intel_pxp_irq.c v3: - Correct a few indentation errors (Lucas) - Minor conflict resolution Cc: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220127234334.4016964-6-matthew.d.roper@intel.com
2022-01-31Merge drm/drm-next into drm-intel-nextRodrigo Vivi
Catch-up with 5.17-rc2 and trying to align with drm-intel-gt-next for a possible topic branch for merging the split of i915_regs... Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2022-01-11drm/i915/gt: Move engine registers to their own headerMatt Roper
Let's continue breaking up and cleaning up the massive i915_reg.h file by moving all registers that are defined in relation to an engine base to their own header. There are probably a bunch of other "engine registers" that we haven't moved yet (especially those that belong to the render engine in the 0x2??? range), but this is a relatively straightforward first step. Cc: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220111051600.3429104-8-matthew.d.roper@intel.com
2022-01-10drm/i915: split out PCI config space registers from i915_reg.hJani Nikula
The PCI config space registers don't really belong next to the MMIO register definitions. v2: Fix copyright year (Matt) Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220110095740.166078-1-jani.nikula@intel.com
2021-12-17Merge tag 'drm-intel-next-2021-12-14' of ↵Dave Airlie
ssh://git.freedesktop.org/git/drm/drm-intel into drm-next drm/i915 feature pull #2 for v5.17: Features and functionality: - Add eDP privacy screen support (Hans) - Add Raptor Lake S (RPL-S) support (Anusha) - Add CD clock squashing support (Mika) - Properly support ADL-P without force probe (Clint) - Enable pipe color support (10 bit gamma) for display 13 platforms (Uma) - Update ADL-P DMC firmware to v2.14 (Madhumitha) Refactoring and cleanups: - More FBC refactoring preparing for multiple FBC instances (Ville) - Plane register cleanups (Ville) - Header refactoring and include cleanups (Jani) - Crtc helper and vblank wait function cleanups (Jani, Ville) - Move pipe/transcoder/abox masks under intel_device_info.display (Ville) Fixes: - Add a delay to let eDP source OUI write take effect (Lyude) - Use div32 version of MPLLB word clock for UHBR on SNPS PHY (Jani) - Fix DMC firmware loader overflow check (Harshit Mogalapalli) - Fully disable FBC on FIFO underruns (Ville) - Disable FBC with double wide pipe as mutually exclusive (Ville) - DG2 workarounds (Matt) - Non-x86 build fixes (Siva) - Fix HDR plane max width for NV12 (Vidya) - Disable IRQ for selftest timestamp calculation (Anshuman) - ADL-P VBT DDC pin mapping fix (Tejas) Merges: - Backmerge drm-next for privacy screen plumbing (Jani) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/87ee6f5h9u.fsf@intel.com
2021-12-13drm/i915/reset: include intel_display.h instead of intel_display_types.hJani Nikula
Use the more specific include that's needed. v2: Include intel_display.h (Ville) Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/e1e195b8ca1252d1e149c8185d107a5517496973.1639142167.git.jani.nikula@intel.com
2021-12-08drm/i915/gt: Use hw_engine_masks as reset_domainsTejas Upadhyay
We need a way to reset engines by their reset domains. This change sets up way to fetch reset domains of each engine globally. Changes since V1: - Use static reset domain array - Ville and Tvrtko - Use BUG_ON at appropriate place - Tvrtko Signed-off-by: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211206081026.4024401-1-tejaskumarx.surendrakumar.upadhyay@intel.com
2021-11-16drm/i915: Skip error capture when wedged on initTvrtko Ursulin
Trying to capture uninitialised engines when we wedged on init ends in tears. Skip that together with uC capture, since failure to initialise the latter can actually be one of the reasons for wedging on init. v2: * Use i915_disable_error_state when wedging on init/fini. v3: * Handle mock tests. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> # v1 Link: https://patchwork.freedesktop.org/patch/msgid/20211111130634.266098-1-tvrtko.ursulin@linux.intel.com
2021-11-09drm/i915/resets: Don't set / test for per-engine reset bits with GuC submissionMatthew Brost
Don't set, test for, or clear per-engine reset bits with GuC submission as the GuC owns the per engine resets not the i915. Setting, testing for, and clearing these bits is causing issues with the hangcheck selftest. Rather than change to test to not use these bits, rip the use of these bits out from the reset code. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211028224224.32693-1-matthew.brost@intel.com
2021-07-27drm/i915/guc: Implement banned contexts for GuC submissionMatthew Brost
When using GuC submission, if a context gets banned disable scheduling and mark all inflight requests as complete. Cc: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-25-matthew.brost@intel.com
2021-07-27drm/i915/guc: Fix for error capture after full GPU reset with GuCJohn Harrison
In the case of a full GPU reset (e.g. because GuC has died or because GuC's hang detection has been disabled), the driver can't rely on GuC reporting the guilty context. Instead, the driver needs to scan all active contexts and find one that is currently executing, as per the execlist mode behaviour. In GuC mode, this scan is different to execlist mode as the active request list is handled very differently. Similarly, the request state dump in debugfs needs to be handled differently when in GuC submission mode. Also refactured some of the request scanning code to avoid duplication across the multiple code paths that are now replicating it. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-20-matthew.brost@intel.com
2021-07-27drm/i915/guc: Reset implementation for new GuC interfaceMatthew Brost
Reset implementation for new GuC interface. This is the legacy reset implementation which is called when the i915 owns the engine hang check. Future patches will offload the engine hang check to GuC but we will continue to maintain this legacy path as a fallback and this code path is also required if the GuC dies. With the new GuC interface it is not possible to reset individual engines - it is only possible to reset the GPU entirely. This patch forces an entire chip reset if any engine hangs. v2: (Michal) - Check for -EPIPE rather than -EIO (CT deadlock/corrupt check) v3: (John H) - Split into a series of smaller patches v4: (John H) - Fix typo - Add braces around if statements in reset code v5: (Checkpatch) - Fix warnings Cc: John Harrison <john.c.harrison@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <john.c.harrison@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-9-matthew.brost@intel.com
2021-07-24drm/i915/xehp: Extra media engines - Part 3 (reset)John Harrison
Xe_HP can have a lot of extra media engines. This patch adds the reset support for them. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210723174239.1551352-5-matthew.d.roper@intel.com
2021-06-05drm/i915/gt: replace IS_GEN and friends with GRAPHICS_VERLucas De Marchi
This was done by the following semantic patch: @@ expression i915; @@ - INTEL_GEN(i915) + GRAPHICS_VER(i915) @@ expression i915; expression E; @@ - INTEL_GEN(i915) >= E + GRAPHICS_VER(i915) >= E @@ expression dev_priv; expression E; @@ - !IS_GEN(dev_priv, E) + GRAPHICS_VER(dev_priv) != E @@ expression dev_priv; expression E; @@ - IS_GEN(dev_priv, E) + GRAPHICS_VER(dev_priv) == E @@ expression dev_priv; expression from, until; @@ - IS_GEN_RANGE(dev_priv, from, until) + IS_GRAPHICS_VER(dev_priv, from, until) @def@ expression E; identifier id =~ "^gen$"; @@ - id = GRAPHICS_VER(E) + ver = GRAPHICS_VER(E) @@ identifier def.id; @@ - id + ver It also takes care of renaming the variable we assign to GRAPHICS_VER() so to use "ver" rather than "gen". Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210605155356.4183026-2-lucas.demarchi@intel.com
2021-05-27drm/i915: Add Wa_14010733141Aditya Swarup
The WA requires the following procedure for VDBox SFC reset: If (MFX-SFC usage is 1) { 1.Issue a MFX-SFC forced lock 2.Wait for MFX-SFC forced lock ack 3.Check the MFX-SFC usage bit If (MFX-SFC usage bit is 1) Reset VDBOX and SFC else Reset VDBOX Release the force lock MFX-SFC } else if(HCP+SFC usage is 1) { 1.Issue a VE-SFC forced lock 2.Wait for SFC forced lock ack 3.Check the VE-SFC usage bit If (VE-SFC usage bit is 1) Reset VDBOX else Reset VDBOX and SFC Release the force lock VE-SFC. } else Reset VDBOX - Restructure: the changes to the original code flow should stay relatively minimal; we only need to do an extra HCP check after the usual VD-MFX check and, if true, switch the register/bit we're performing the lock on.(MattR) v2: - Assign unlock mask using paired_engine->mask instead of using BIT(paired_vecs->id). (Daniele) Bspec: 52890, 53509 Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Aditya Swarup <aditya.swarup@intel.com> Co-developed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210526094852.286424-2-aditya.swarup@intel.com
2021-05-25drm/i915/gt: Move submission_method into intel_gtChris Wilson
Since we setup the submission method for the engines once, it is easy to assign an enum and use that instead of probing into the backends. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210521183215.65451-3-matthew.brost@intel.com
2021-04-08Merge tag 'drm-intel-gt-next-2021-04-06' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-next Driver Changes: - Prepare for local/device memory support on DG1 by starting to use it for kernel internal allocations: context, ring and engine scratch (Matt A, CQ, Abdiel, Imre) - Sandybridge fix to avoid hard hang on ring resume (Chris) - Limit imported dma-buf size to int32 (Matt A) - Double check heartbeat timeout before resetting (Chris) - Use new tasklet API for execution list (Emil) - Fix SPDX checkpats warnings (Chris) - Fixes for various checkpatch warnings (Chris) - Selftest improvements (Chris) - Move the defer_request waiter active assertion to correct spot (Chris) - Make local-memory probing a GT operation (Matt, Tvrtko) - Protect against request freeing during cancellation on wedging (Chris) - Retire unexpected starting state error dumping (Chris) - Distinction of memory regions in debugging (Zbigniew) - Always flush the submission queue on checking for idle (Chris) - Consolidate 2big error check to helper (Matt) - Decrease number of subplatform bits (Tvrtko) - Remove unused internal request priority levels (Chris) - Document the unused internal header bits in buddy allocator (Matt) - Cleanup the region class/instance encoding (Matt) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/YGxksaZGXHnFxlwg@jlahtine-mobl.ger.corp.intel.com
2021-03-24drm/i915: Protect against request freeing during cancellation on wedgingChris Wilson
As soon as we mark a request as completed, it may be retired. So when cancelling a request and marking it complete, make sure we first keep a reference to the request. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210201085715.27435-4-chris@chris-wilson.co.uk Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2021-03-24drm/i915/gt: SPDX cleanupChris Wilson
Clean up the SPDX licence declarations to comply with checkpatch. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210122192913.4518-1-chris@chris-wilson.co.uk Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2021-03-24drm/i915: Move gt_revoke() slightlyMaarten Lankhorst
We get a lockdep splat when the reset mutex is held, because it can be taken from fence_wait. This conflicts with the mmu notifier we have, because we recurse between reset mutex and mmap lock -> mmu notifier. Remove this recursion by calling revoke_mmaps before taking the lock. The reset code still needs fixing, as taking mmap locks during reset is not allowed. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> [danvet: Add FIXME.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-64-maarten.lankhorst@linux.intel.com
2021-03-11Merge drm/drm-next into drm-intel-nextJani Nikula
Sync up with upstream. Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2021-02-02drm/i915/gt: Remove references to struct drm_device.pdevThomas Zimmermann
Using struct drm_device.pdev is deprecated. Convert i915 to struct drm_device.dev. No functional changes. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210128133127.2311-3-tzimmermann@suse.de
2021-01-15drm/i915: Mark up protected uses of 'i915_request_completed'Chris Wilson
When we know that we are inside the timeline mutex, or inside the submission flow (under active.lock or the holder's rcu lock), we know that the rq->hwsp is stable and we can use the simpler direct version. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210114135612.13210-1-chris@chris-wilson.co.uk
2021-01-15Merge drm/drm-next into drm-intel-gt-nextJoonas Lahtinen
Backmerging to get a common base for merging topic branches between drm-intel-next and drm-intel-gt-next. Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2021-01-15Merge tag 'drm-intel-gt-next-2021-01-14' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-next UAPI Changes: - Deprecate I915_PMU_LAST and optimize state tracking (Tvrtko) Avoid relying on last item ABI marker in i915_drm.h, add a comment to mark as deprecated. Cross-subsystem Changes: Core Changes: Driver Changes: - Restore clear residuals security mitigations for Ivybridge and Baytrail (Chris) - Close #1858: Allow sysadmin to choose applied GPU security mitigations through i915.mitigations=... similar to CPU (Chris) - Fix for #2024: GPU hangs on HSW GT1 (Chris) - Fix for #2707: Driver hang when editing UVs in Blender (Chris, Ville) - Fix for #2797: False positive GuC loading error message (Chris) - Fix for #2859: Missing GuC firmware for older Cometlakes (Chris) - Lessen probability of GPU hang due to DMAR faults [reason 7, next page table ptr is invalid] on Tigerlake (Chris) - Fix REVID macros for TGL to fetch correct stepping (Aditya) - Limit frequency drop to RPe on parking (Chris, Edward) - Limit W/A 1406941453 to TGL, RKL and DG1 (Swathi) - Make W/A 22010271021 permanent on DG1 (Lucas) - Implement W/A 16011163337 to prevent a HS/DS hang on DG1 (Swathi) - Only disable preemption on gen8 render engines (Chris) - Disable arbitration around Braswell's PDP updates (Chris) - Disable arbitration on no-preempt requests (Chris) - Check for arbitration after writing start seqno before busywaiting (Chris) - Retain default context state across shrinking (Venkata, CQ) - Fix mismatch between misplaced vma check and vma insert for 32-bit addressing userspaces (Chris, CQ) - Propagate error for vmap() failure instead kernel NULL deref (Chris) - Propagate error from cancelled submit due to context closure immediately (Chris) - Fix RCU race on HWSP tracking per request (Chris) - Clear CMD parser shadow and GPU reloc batches (Matt A) - Populate logical context during first pin (Maarten) - Optimistically prune dma-resv from the shrinker (Chris) - Fix for virtual engine ownership race (Chris) - Remove timeslice suppression to restore fairness for virtual engines (Chris) - Rearrange IVB/HSW workarounds properly between GT and engine (Chris) - Taint the reset mutex with the shrinker (Chris) - Replace direct submit with direct call to tasklet (Chris) - Multiple corrections to virtual engine dequeue and breadcrumbs code (Chris) - Avoid wakeref from potentially hard IRQ context in PMU (Tvrtko) - Use raw clock for RC6 time estimation in PMU (Tvrtko) - Differentiate OOM failures from invalid map types (Chris) - Fix Gen9 to have 64 MOCS entries similar to Gen11 (Chris) - Ignore repeated attempts to suspend request flow across reset (Chris) - Remove livelock from "do_idle_maps" VT-d W/A (Chris) - Cancel the preemption timeout early in case engine reset fails (Chris) - Code flow optimization in the scheduling code (Chris) - Clear the execlists timers upon reset (Chris) - Drain the breadcrumbs just once (Chris, Matt A) - Track the overall GT awake/busy time (Chris) - Tweak submission tasklet flushing to avoid starvation (Chris) - Track timelines created using the HWSP to restore on resume (Chris) - Use cmpxchg64 for 32b compatilibity for active tracking (Chris) - Prefer recycling an idle GGTT fence to avoid GPU wait (Chris) - Restructure GT code organization for clearer split between GuC and execlists (Chris, Daniele, John, Matt A) - Remove GuC code that will remain unused by new interfaces (Matt B) - Restructure the CS timestamp clocks code to local to GT (Chris) - Fix error return paths in perf code (Zhang) - Replace idr_init() by idr_init_base() in perf (Deepak) - Fix shmem_pin_map error path (Colin) - Drop redundant free_work worker for GEM contexts (Chris, Mika) - Increase readability and understandability of intel_workarounds.c (Lucas) - Defer enabling the breadcrumb interrupt to after submission (Chris) - Deal with buddy alloc block sizes beyond 4G (Venkata, Chris) - Encode fence specific waitqueue behaviour into the wait.flags (Chris) - Don't cancel the breadcrumb interrupt shadow too early (Chris) - Cancel submitted requests upon context reset (Chris) - Use correct locks in GuC code (Tvrtko) - Prevent use of engine->wa_ctx after error (Chris, Matt R) - Fix build warning on 32-bit (Arnd) - Avoid memory leak if platform would have more than 16 W/A (Tvrtko) - Avoid unnecessary #if CONFIG_PM in PMU code (Chris, Tvrtko) - Improve debugging output (Chris, Tvrtko, Matt R) - Make file local variables static (Jani) - Avoid uint*_t types in i915 (Jani) - Selftest improvements (Chris, Matt A, Dan) - Documentation fixes (Chris, Jose) Signed-off-by: Dave Airlie <airlied@redhat.com> # Conflicts: # drivers/gpu/drm/i915/gt/intel_breadcrumbs.c # drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h # drivers/gpu/drm/i915/gt/intel_lrc.c # drivers/gpu/drm/i915/gvt/mmio_context.h # drivers/gpu/drm/i915/i915_drv.h From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210114152232.GA21588@jlahtine-mobl.ger.corp.intel.com
2021-01-14drm/i915/gt: Prune inlinesChris Wilson
Remove all the manual inlines from non-critical sections in gt/ add/remove: 2/0 grow/shrink: 0/3 up/down: 762/-1473 (-711) Function old new delta mi_set_context.isra - 602 +602 write_dma_entry - 160 +160 __set_pd_entry 214 69 -145 clear_pd_entry 190 42 -148 ring_request_alloc 2021 841 -1180 Total: Before=1605086, After=1604375, chg -0.04% Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210113152224.29794-1-chris@chris-wilson.co.uk
2021-01-05drm/i915/gt: Allow failed resets without assertionChris Wilson
If the engine reset fails, we will attempt to resume with the current inflight submissions. When that happens, we cannot assert that the engine reset cleared the pending submission, so do not. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2878 Fixes: 16f2941ad307 ("drm/i915/gt: Replace direct submit with direct call to tasklet") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210104115145.24460-3-chris@chris-wilson.co.uk
2020-12-30drm/i915/gt: Taint the reset mutex with the shrinkerChris Wilson
Declare that, under extreme circumstances, the shrinker may need to wait upon a request, in which case reset must not itself deadlock in order to ensure forward progress of the driver. That is since the shrinker may depend upon a reset, any reset cannot touch the shrinker. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201229141626.4773-1-chris@chris-wilson.co.uk
2020-12-24drm/i915/gt: Replace direct submit with direct call to taskletChris Wilson
Rather than having special case code for opportunistically calling process_csb() and performing a direct submit while holding the engine spinlock for submitting the request, simply call the tasklet directly. This allows us to retain the direct submission path, including the CS draining to allow fast/immediate submissions, without requiring any duplicated code paths, and most importantly greatly simplifying the control flow by removing reentrancy. This will enable us to close a few races in the virtual engines in the next few patches. The trickiest part here is to ensure that paired operations (such as schedule_in/schedule_out) remain under consistent locking domains, e.g. when pulled outside of the engine->active.lock v2: Use bh kicking, see commit 3c53776e29f8 ("Mark HI and TASKLET softirq synchronous"). v3: Update engine-reset to be tasklet aware Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201224135544.1713-1-chris@chris-wilson.co.uk
2020-12-04drm/i915/gt: Include reset failures in the traceChris Wilson
The GT and engine reset failures are completely invisible when looking at a trace for a bug, but are vital to understanding the incomplete flow. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201204151234.19729-3-chris@chris-wilson.co.uk
2020-12-03Merge tag 'drm-intel-next-queued-2020-11-27' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-next drm/i915 features for v5.11: Highlights: - Enable big joiner to join two pipes to one port to overcome pipe restrictions (Manasi, Ville, Maarten) Display: - More DG1 enabling (Lucas, Aditya) - Fixes to cases without display (Lucas, José, Jani) - Initial PSR state improvements (José) - JSL eDP vswing updates (Tejas) - Handle EDID declared max 16 bpc (Ville) - Display refactoring (Ville) Other: - GVT features - Backmerge Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/87czzzkk1s.fsf@intel.com
2020-11-11drm/i915/display: add namespace to intel_finish_resetLucas De Marchi
Rename intel_finish_reset to intel_display_finish_reset, so it's clear from gt/ that we are calling out the display code. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201106225531.920641-2-lucas.demarchi@intel.com
2020-11-11drm/i915/display: add namespace to intel_prepare_resetLucas De Marchi
Rename intel_prepare_reset to intel_display_prepare_reset, so it's clear from gt/ that we are calling out the display code. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201106225531.920641-1-lucas.demarchi@intel.com
2020-11-09drm/i915: Improve record of hung engines in error stateTvrtko Ursulin
Between events which trigger engine and GPU resets and capturing the error state we lose information on which engine triggered the reset. Improve this by passing in the hung engine mask down to error capture. Result is that the list of engines in user visible "GPU HANG: ecode <gen>:<engines>:<ecode>, <process>" is now a list of hanging and not just active engines. Most importantly the displayed process is now the one which was actually hung. v2: * Stub prototype. (Chris) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201104134743.916027-1-tvrtko.ursulin@linux.intel.com
2020-09-30drm/i915/gt: Retire cancelled requests on unloadChris Wilson
If we manage to hit the intel_gt_set_wedged_on_fini() while active, i.e. module unload during a stress test, we may cancel the requests but not clean up. This leads to a very slow module unload as we wait for something or other to trigger the retirement flushing, or timeout and unload with a bunch of warnings. Instead if we explicitly cancel then cleanup on an active unload, it should be instant and quiet. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200930163253.2789-3-chris@chris-wilson.co.uk
2020-09-07drm/i915/gt: Distinguish the virtual breadcrumbs from the irq breadcrumbsChris Wilson
On the virtual engines, we only use the intel_breadcrumbs for tracking signaling of stale breadcrumbs from the irq_workers. They do not have any associated interrupt handling, active requests are passed to a physical engine and associated breadcrumb interrupt handler. This causes issues for us as we need to ensure that we do not actually try and enable interrupts and the powermanagement required for them on the virtual engine, as they will never be disabled. Instead, let's specify the physical engine used for interrupt handler on a particular breadcrumb. v2: Drop b->irq_armed = true mocking for no interrupt HW Fixes: 4fe6abb8f513 ("drm/i915/gt: Ignore irq enabling on the virtual engines") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200731154834.8378-4-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-07-08drm/i915: Move the engine mask to intel_gt_infoDaniele Ceraolo Spurio
Since the engines belong to the GT, move the runtime-updated list of available engines to the intel_gt struct. The original mask has been renamed to indicate it contains the maximum engine list that can be found on a matching device. In preparation for other info being moved to the gt in follow up patches (sseu), introduce an intel_gt_info structure to group all gt-related runtime info. v2: s/max_engine_mask/platform_engine_mask (tvrtko), fix selftest Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Andi Shyti <andi.shyti@intel.com> Cc: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v1 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200708003952.21831-5-daniele.ceraolospurio@intel.com
2020-07-06drm/i915: Print caller when tainting for CIMichał Winiarski
We can add taint from multiple places, printing the caller allows us to have a better overview of what exactly caused us to do the tainting. v2: Tweak format and print the device (Chris) v3: Move things around (Chris) Suggested-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Petri Latvala <petri.latvala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200706144107.204821-2-michal@hardline.pl
2020-07-06drm/i915: Reboot CI if we get wedged during driver initMichał Winiarski
Getting wedged device on driver init is pretty much unrecoverable. Since we're running various scenarios that may potentially hit this in CI (module reload / selftests / hotunplug), and if it happens, it means that we can't trust any subsequent CI results, we should just apply the taint to let the CI know that it should reboot (CI checks taint between test runs). v2: Comment that WEDGED_ON_INIT is non-recoverable, distinguish WEDGED_ON_INIT from WEDGED_ON_FINI (Chris) v3: Appease checkpatch, fixup search-replace logic expression mindbomb in assert (Chris) Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Petri Latvala <petri.latvala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200706144107.204821-1-michal@hardline.pl