summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/i915/gt/intel_gtt.h
AgeCommit message (Collapse)Author
2021-04-27drm/i915/gtt/dgfx: place the PD in LMEMMatthew Auld
It's a requirement that for dgfx we place all the paging structures in device local-memory. v2: use i915_coherent_map_type() v3: improve the shared dma-resv object comment Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210427085417.120246-4-matthew.auld@intel.com
2021-04-27drm/i915/gtt: map the PD up frontMatthew Auld
We need to generalise our accessor for the page directories and tables from using the simple kmap_atomic to support local memory, and this setup must be done on acquisition of the backing storage prior to entering fence execution contexts. Here we replace the kmap with the object mapping code that for simple single page shmemfs object will return a plain kmap, that is then kept for the lifetime of the page directory. Note that keeping the mapping around is a potential concern here, since while the vma is pinned the mapping remains there for the PDs underneath, or at least until the used_count reaches zero, at which point we can safely destroy the mapping. For 32b this will be even worse since the address space is more limited, but since this change mostly impacts full ppGTT platforms, the justification is that for modern platforms we shouldn't care too much about 32b. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210427085417.120246-3-matthew.auld@intel.com
2021-04-08Merge tag 'drm-intel-gt-next-2021-04-06' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-next Driver Changes: - Prepare for local/device memory support on DG1 by starting to use it for kernel internal allocations: context, ring and engine scratch (Matt A, CQ, Abdiel, Imre) - Sandybridge fix to avoid hard hang on ring resume (Chris) - Limit imported dma-buf size to int32 (Matt A) - Double check heartbeat timeout before resetting (Chris) - Use new tasklet API for execution list (Emil) - Fix SPDX checkpats warnings (Chris) - Fixes for various checkpatch warnings (Chris) - Selftest improvements (Chris) - Move the defer_request waiter active assertion to correct spot (Chris) - Make local-memory probing a GT operation (Matt, Tvrtko) - Protect against request freeing during cancellation on wedging (Chris) - Retire unexpected starting state error dumping (Chris) - Distinction of memory regions in debugging (Zbigniew) - Always flush the submission queue on checking for idle (Chris) - Consolidate 2big error check to helper (Matt) - Decrease number of subplatform bits (Tvrtko) - Remove unused internal request priority levels (Chris) - Document the unused internal header bits in buddy allocator (Matt) - Cleanup the region class/instance encoding (Matt) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/YGxksaZGXHnFxlwg@jlahtine-mobl.ger.corp.intel.com
2021-03-24drm/i915/gtt/dg1: add PTE_LM plumbing for GGTTMatthew Auld
For the PTEs we get an LM bit, to signal whether the page resides in SMEM or LMEM. Based on a patch from Michel Thierry. BSpec: 45015 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20210203171231.551338-3-matthew.auld@intel.com Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2021-03-24drm/i915/gtt/dg1: add PTE_LM plumbing for ppGTTMatthew Auld
For the PTEs we get an LM bit, to signal whether the page resides in SMEM or LMEM. BSpec: 45040 v2: just use gen8_pte_encode for dg1 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20210203171231.551338-2-matthew.auld@intel.com Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2021-03-24drm/i915: Use a single page table lock for each gtt.Maarten Lankhorst
We may create page table objects on the fly, but we may need to wait with the ww lock held. Instead of waiting on a freed obj lock, ensure we have the same lock for each object to keep -EDEADLK working. This ensures that i915_vma_pin_ww can lock the page tables when required. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-41-maarten.lankhorst@linux.intel.com
2021-03-24drm/i915: Move pinning to inside engine_wa_list_verify()Maarten Lankhorst
This should be done as part of the ww loop, in order to remove a i915_vma_pin that needs ww held. Now only i915_ggtt_pin() callers remaining. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-25-maarten.lankhorst@linux.intel.com
2020-12-21drm/i915/gt: Provide a utility to create a scratch bufferChris Wilson
Primarily used by selftests, but also by runtime debugging of engine w/a, is a routine to create a temporarily bound buffer for readback. Almagamate the duplicated routines into one. Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201219020343.22681-2-chris@chris-wilson.co.uk
2020-10-06drm/i915: Fix DMA mapped scatterlist walksTvrtko Ursulin
When walking DMA mapped scatterlists sg_dma_len has to be used since it can be different (coalesced) from the backing store entry. This also means we have to end the walk when encountering a zero length DMA entry and cannot rely on the normal sg list end marker. Both issues were there in theory for some time but were hidden by the fact Intel IOMMU driver was never coalescing entries. As there are ongoing efforts to change this we need to start handling it. v2: * Use unsigned int for local storing sg_dma_len. (Logan) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> References: 85d1225ec066 ("drm/i915: Introduce & use new lightweight SGL iterators") References: b31144c0daa8 ("drm/i915: Micro-optimise gen6_ppgtt_insert_entries()") Reported-by: Tom Murphy <murphyt7@tcd.ie> Suggested-by: Tom Murphy <murphyt7@tcd.ie> # __sgt_iter Suggested-by: Logan Gunthorpe <logang@deltatee.com> # __sgt_iter Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201006092508.1064287-1-tvrtko.ursulin@linux.intel.com
2020-09-07drm/i915/gt: Shrink i915_page_directory's slab bucketChris Wilson
kmalloc uses power-of-two slab buckets for small allocations (up to a few pages). Since i915_page_directory is a page of pointers, plus a couple more, this is rounded up to 8K, and we waste nearly 50% of that allocation. Long terms this leads to poor memory utilisation, bloating the kernel footprint, but the problem is exacerbated by our conservative preallocation scheme for binding VMA. As we are required to allocate all levels for each vma just in case we need to insert them upon binding, this leads to a large multiplication factor for a single page vma. By halving the allocation we need for the page directory structure, we halve the impact of that factor, bringing workloads that once fitted into memory, hopefully back to fitting into memory. We maintain the split between i915_page_directory and i915_page_table as we only need half the allocation for the lowest, most populous, level. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200729164219.5737-3-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07drm/i915/gt: Switch to object allocations for page directoriesChris Wilson
The GEM object is grossly overweight for the practicality of tracking large numbers of individual pages, yet it is currently our only abstraction for tracking DMA allocations. Since those allocations need to be reserved upfront before an operation, and that we need to break away from simple system memory, we need to ditch using plain struct page wrappers. In the process, we drop the WC mapping as we ended up clflushing everything anyway due to various issues across a wider range of platforms. Though in a future step, we need to drop the kmap_atomic approach which suggests we need to pre-map all the pages and keep them mapped. v2: Verify our large scratch page is suitably DMA aligned; and manually clear the scratch since we are allocating plain struct pages full of prior content. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200729164219.5737-2-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07drm/i915: Preallocate stashes for vma page-directoriesChris Wilson
We need to make the DMA allocations used for page directories to be performed up front so that we can include those allocations in our memory reservation pass. The downside is that we have to assume the worst case, even before we know the final layout, and always allocate enough page directories for this object, even when there will be overlap. This unfortunately can be quite expensive, especially as we have to clear/reset the page directories and DMA pages, but it should only be required during early phases of a workload when new objects are being discovered, or after memory/eviction pressure when we need to rebind. Once we reach steady state, the objects should not be moved and we no longer need to preallocating the pages tables. It should be noted that the lifetime for the page directories DMA is more or less decoupled from individual fences as they will be shared across objects across timelines. v2: Only allocate enough PD space for the PTE we may use, we do not need to allocate PD that will be left as scratch. v3: Store the shift unto the first PD level to encapsulate the different PTE counts for gen6/gen8. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200729164219.5737-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-07-03drm/i915: Export ppgtt_bind_vmaChris Wilson
Reuse the ppgtt_bind_vma() for aliasing_ppgtt_bind_vma() so we can reduce some code near-duplication. The catch is that we need to then pass along the i915_address_space and not rely on vma->vm, as they differ with the aliasing-ppgtt. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200703102519.26539-1-chris@chris-wilson.co.uk
2020-03-16drm/i915/gt: Allocate i915_fence_reg arrayChris Wilson
Since the number of fence regs can vary dramactically between platforms, allocate the array on demand so we don't waste as much space. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200316113846.4974-4-chris@chris-wilson.co.uk
2020-03-16drm/i915: Move GGTT fence registers under gt/Chris Wilson
Since the fence registers control HW detiling through the GGTT aperture, make them a part of the intel_ggtt under gt/ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200316113846.4974-1-chris@chris-wilson.co.uk
2020-02-28drm/i915/gt: Pull marking vm as closed underneath the vm->mutexChris Wilson
Pull the final atomic_dec of vm->open (marking the vm as closed) underneath the same vm->mutex as used to close it. This is required to correctly serialise with attempting to reuse the vma as the vm is closed by a second thread. References: 00de702c6c6f ("drm/i915: Check that the vma hasn't been closed before we insert it") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200227085723.1961649-10-chris@chris-wilson.co.uk
2020-02-27drm/i915/ggtt: do not set bits 1-11 in gen12 ptesDaniele Ceraolo Spurio
On TGL, bits 2-4 in the GGTT PTE are not ignored anymore and are instead used for some extra VT-d capabilities. We don't (yet?) have support for those capabilities, but, given that we shared the pte_encode function betweed GGTT and PPGTT, we still set those bits to the PPGTT PPAT values. The DMA engine gets very confused when those bits are set while the iommu is enabled, leading to errors. E.g. when loading the GuC we get: [ 9.796218] DMAR: DRHD: handling fault status reg 2 [ 9.796235] DMAR: [DMA Write] Request device [00:02.0] PASID ffffffff fault addr 0 [fault reason 02] Present bit in context entry is clear [ 9.899215] [drm:intel_guc_fw_upload [i915]] *ERROR* GuC firmware signature verification failed To fix this, just have dedicated gen8_pte_encode function per type of gtt. Also, explicitly set vm->pte_encode for gen8_ppgtt, even if we don't use it, to make sure we don't accidentally assign it to the GGTT one, like we do for gen6_ppgtt, in case we need it in the future. Reported-by: "Sodhi, Vunny" <vunny.sodhi@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200226185657.26445-1-daniele.ceraolospurio@intel.com
2020-01-30drm/i915/gt: Rename i915_gem_restore_ggtt_mappings() for its new placementChris Wilson
The i915_ggtt now sits beneath gt/ outside of the auspices of gem/ and should be given a fresh name to reflect that. We also want to give it a name that reflects its role in the system suspend/resume, with the intention of pulling together all the GGTT operations (e.g. restoring the fence registers once they are pulled under gt/intel_ggtt_detiler.c) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Rreviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200130181710.2030251-2-chris@chris-wilson.co.uk
2020-01-10drm/i915: Start chopping up the GPU error captureChris Wilson
In the near future, we will want to start a GPU error capture from a new context, from inside the softirq region of a forced preemption. To do so requires us to break up the monolithic error capture to provide new entry points with finer control; in particular focusing on one engine/gt, and being able to compose an error state from little pieces of HW capture. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Acked-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200110123059.1348712-1-chris@chris-wilson.co.uk
2020-01-07drm/i915/gtt: split up i915_gem_gttMatthew Auld
Attempt to split i915_gem_gtt.[ch] into more manageable chunks. Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200107134009.3255354-1-chris@chris-wilson.co.uk