summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/i915/intel_lrc.c
AgeCommit message (Collapse)Author
2017-05-12drm/i915: set initialised only when init_context callback is NULLChuanxiao Dong
During execlist_context_deferred_alloc() we presumed that the context is uninitialised (we only just allocated the state object for it!) and chose to optimise away the later call to engine->init_context() if engine->init_context were NULL. This breaks with GVT's contexts that are marked as pre-initialised to avoid us annoyingly calling engine->init_context(). The fix is to not override ce->initialised if it is already true. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1494497262-24855-1-git-send-email-chuanxiao.dong@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-04drm/i915: Use engine->context_pin() to report the intel_ringChris Wilson
Since unifying ringbuffer/execlist submission to use engine->pin_context, we ensure that the intel_ring is available before we start constructing the request. We can therefore move the assignment of the request->ring to the central i915_gem_request_alloc() and not require it in every engine->request_alloc() callback. Another small step towards simplification (of the core, but at a cost of handling error pointers in less important callers of engine->pin_context). v2: Rearrange a few branches to reduce impact of PTR_ERR() on gcc's code generation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Oscar Mateo <oscar.mateo@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170504093308.4137-1-chris@chris-wilson.co.uk
2017-04-28drm/i915: Sanitize engine context sizesJoonas Lahtinen
Pre-calculate engine context size based on engine class and device generation and store it in the engine instance. v2: - Squash and get rid of hw_context_size (Chris) v3: - Move after MMIO init for probing on Gen7 and 8 (Chris) - Retained rounding (Tvrtko) v4: - Rebase for deferred legacy context allocation Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: intel-gvt-dev@lists.freedesktop.org Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-25drm/i915: Differentiate between sw write location into ring and last hw readChris Wilson
We need to keep track of the last location we ask the hw to read up to (RING_TAIL) separately from our last write location into the ring, so that in the event of a GPU reset we do not tell the HW to proceed into a partially written request (which can happen if that request is waiting for an external signal before being executed). v2: Refactor intel_ring_reset() (Mika) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100144 Testcase: igt/gem_exec_fence/await-hang Fixes: 821ed7df6e2a ("drm/i915: Update reset path to fix incomplete requests") Fixes: d55ac5bf97c6 ("drm/i915: Defer transfer onto execution timeline to actual hw submission") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170425130049.26147-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-04-25drm/i915: Report request restarts for both execlists/gucChris Wilson
As we now share the execlist_port[] tracking for both execlists/guc, we can reset the inflight count on both and report which requests are being restarted. Suggested-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170425103835.31871-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-04-12drm/i915/execlists: Document runtime pm for intel_lrc_irq_handler()Chris Wilson
We indirectly hold the runtime-pm for the intel_lrc_irq_handler() by virtue of dev_priv->gt.awake keeping a wakeref whilst the requests are busy. As this is not obvious from the code, add a comment. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170411175850.2470-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-04-11drm/i915: Use the engine class to get the context sizeDaniele Ceraolo Spurio
Technically speaking, the context size is per engine class, not per instance. v2: Add MISSING_CASE (Tvrtko) v3: Rebased v4: Restore the interface back to hiding the class lookup (Chris) Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1491905472-16189-1-git-send-email-oscar.mateo@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-03drm/i915: intel_ring.engine is unusedChris Wilson
Or rather it is used only by intel_ring_pin() to extract the drm_i915_private which we can easily pass in. As this is a relatively rare operation, save the space in the struct, and as such it is even break even in the extra code for passing around the parameter: add/remove: 0/0 grow/shrink: 2/3 up/down: 15/-15 (0) function old new delta intel_init_ring_buffer 906 918 +12 execlists_context_pin 1308 1311 +3 mock_engine 407 403 -4 intel_engine_create_ring 367 363 -4 intel_ring_pin 326 319 -7 Total: Before=1261794, After=1261794, chg +0.00% v2: Reorder intel_init_ring_buffer to keep the ring setup together: add/remove: 0/0 grow/shrink: 2/3 up/down: 9/-15 (-6) function old new delta intel_init_ring_buffer 906 912 +6 execlists_context_pin 1308 1311 +3 mock_engine 407 403 -4 intel_engine_create_ring 367 363 -4 intel_ring_pin 326 319 -7 Total: Before=1261794, After=1261788, chg -0.00% Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170403113426.25707-1-chris@chris-wilson.co.uk
2017-03-29Revert "drm/i915: Skip execlists_dequeue() early if the list is empty"Chris Wilson
This reverts commit 6c943de6686f ("drm/i915: Skip execlists_dequeue() early if the list is empty"). The validity of using READ_ONCE there depends upon having a mb to coordinate the assignment of engine->execlist_first inside submit_request() and checking prior to taking the spinlock in execlists_dequeue(). We wrote "the update to TASKLET_SCHED incurs a memory barrier making this cross-cpu checking safe", but failed to notice that this mb was *conditional* on the execlists being ready, i.e. there wasn't the required mb when it was most necessary! We could install an unconditional memory barrier to fixup the READ_ONCE(): diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 7dd732cb9f57..1ed164b16d44 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -616,6 +616,7 @@ static void execlists_submit_request(struct drm_i915_gem_request *request) if (insert_request(&request->priotree, &engine->execlist_queue)) { engine->execlist_first = &request->priotree.node; + smp_wmb(); if (execlists_elsp_ready(engine)) But we have opted to remove the race as it should be rarely effective, and saves us having to explain the necessary memory barriers which we quite clearly failed at. Reported-and-tested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Fixes: 6c943de6686f ("drm/i915: Skip execlists_dequeue() early if the list is empty") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170329100052.29505-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
2017-03-29drm/i915: Avoid lock dropping between reschedulingChris Wilson
Unlocking is dangerous. In this case we combine an early update to the out-of-queue request, because we know that it will be inserted into the correct FIFO priority-ordered slot when it becomes ready in the future. However, given sufficient enthusiasm, it may become ready as we are continuing to reschedule, and so may gazump the FIFO if we have since dropped its spinlock. The result is that it may be executed too early, before its dependencies. v2: Move all work into the second phase over the topological sort. This removes the shortcut on the out-of-rbtree request to ensure that we only adjust its priority after adjusting all of its dependencies. Fixes: 20311bd35060 ("drm/i915/scheduler: Execute requests in order of priorities") Testcase: igt/gem_exec_whisper Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v4.10+ Link: http://patchwork.freedesktop.org/patch/msgid/20170327202143.7972-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-03-27drm/i915: Refactor tests for validity of RING_TAILChris Wilson
Whilst I like having the assertions clearly visible in the code, they are quite repetitious! As we find new limits we want to incorporate into the set of assertions, it make sense to refactor them to a common routine. v2: Add a guc holdout. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170327131412.20293-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-03-27drm/i915: Assert that the request->tail fits within the ringChris Wilson
In addition to being qword-aligned, the RING_TAIL offset must be within the ring! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170327130009.4678-2-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-03-27drm/i915/execlists: Wrap tail pointer after reset tweakingChris Wilson
If the request->wa_tail is 0 (because it landed exactly on the end of the ringbuffer), when we reconstruct request->tail following a reset we fill in an illegal value (-8 or 0x001ffff8). As a result, RING_HEAD is never able to catch up with RING_TAIL and the GPU spins endlessly. If the ring contains a couple of breadcrumbs, even our hangcheck is unable to catch the busy-looping as the ACTHD and seqno continually advance. v2: Move the wrap into a common intel_ring_wrap(). Fixes: a3aabe86a340 ("drm/i915/execlists: Reinitialise context image after GPU hang") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: <stable@vger.kernel.org> # v4.10+ Link: http://patchwork.freedesktop.org/patch/msgid/20170327130009.4678-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-03-27drm/i915/execlists: Trim irq handlerChris Wilson
I noticed that gcc was spilling the CSB to the stack, so rearrange the code to be more compact. Spilling in this function is slightly more interesting due to the mmio reads acting as memory barriers and so end up flushing the stack spills. Still miniscule to having to do at least the pair of uncached reads :( function old new delta intel_lrc_irq_handler 1039 878 -161 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170325201053.21306-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-03-23drm/i915/execlists: Relax the locked clear_bit(IRQ_EXECLIST)Chris Wilson
We only need to care about the ordering of the clearing of the bit with the uncached CSB read in order to correctly detect a new interrupt before the read completes. The uncached read itself acts as a full memory barrier, so we do not need to enforce another in the form of a locked clear_bit. v2: Clarify why the split and unlocked test/clear is harmless. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170323134803.10418-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-03-21drm/i915: Spinlocks in tasklets can use spin_(un)lock_irqTvrtko Ursulin
The tasklets callbacks are only called from tasklet context so it is safe do to this. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170321105511.18269-1-tvrtko.ursulin@linux.intel.com
2017-03-21drm/i915: Remove intel_ring.last_retired_headChris Wilson
Storing the position of the breadcrumb of the last retired request as a separate last_retired_head is superfluous as we always copy that into head prior to recalculation of the intel_ring.space. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170321102552.24357-1-chris@chris-wilson.co.uk
2017-03-21drm/i915/execlists: Split the atomic test_and_clear_bit for irq handlerChris Wilson
Rather than impose the cost of a locked test before queuing a new request, reduce it to a simple test_bit() with a following clear_bit() prior to doing the CSB check. This ensure that if an interrupt does occur whilst reading from the CSB, we still detect it (the interrupt would trigger a rescheduling of the tasklet anyway). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170321113320.2603-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-03-20drm/i915: Reset tasklet back to execlists after disabling gucChris Wilson
When switching back to execlists, we also now need to restore the tasklet handler. Reported-by: Oscar Mateo <oscar.mateo@intel.com> Fixes: 31de73501ac9 ("drm/i915/scheduler: emulate a scheduler for guc") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Oscar Mateo <oscar.mateo@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170318102859.24101-1-chris@chris-wilson.co.uk Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
2017-03-17drm/i915: Skip execlists_dequeue() early if the list is emptyChris Wilson
Do an early read of the execlists' queue before we take the spinlock and start checking. This is safe as the first writer to the execlists queue will cause the tasklet to be run again after a memory barrier. v2: Keep guc in sync with execlists queue changes v3: Explain the mb between the tasklet running on one cpu and the execlist_first update and schedule from a second cpu. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170317120716.17191-1-chris@chris-wilson.co.uk
2017-03-16drm/i915: Assert that the context pin_counts do not overflowChris Wilson
This should be impossible, but let's assert that we do not pin a context 4 billion times before retiring! v2: Fix the assertion -- the patch had just one job to do! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170316171628.3228-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-03-16drm/i915: Move engine->submit_request selection to a vfuncChris Wilson
It turns out that we may want to restore the original engine->submit_request (and engine->schedule) callbacks from more than just the guc <-> execlists transition. Move this to a vfunc so we can have a common interface. v2: Move initial selection to intel_engines_init_common(), repaint vfunc with engine->set_default_submission (and a similar colour for the helper). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170316171305.12972-2-chris@chris-wilson.co.uk
2017-03-16drm/i915: make context status notifier head be per engineChangbin Du
GVTg has introduced the context status notifier to schedule the GVTg workload. At that time, the notifier is bound to GVTg context only, so GVTg is not aware of host workloads. Now we are going to improve GVTg's guest workload scheduler policy, and add Guc emulation support for new Gen graphics. Both these two features require acknowledgment for all contexts running on hardware. (But will not alter host workload.) So here try to make some change. The change is simple: 1. Move the context status notifier head from i915_gem_context to intel_engine_cs. Which means there is a notifier head per engine instead of per context. Execlist driver still call notifier for each context sched-in/out events of current engine. 2. At GVTg side, it binds a notifier_block for each physical engine at GVTg initialization period. Then GVTg can hear all context status events. In this patch, GVTg do nothing for host context event, but later will add a function there. But in any case, the notifier callback is a noop if this is no active vGPU. Since intel_gvt_init() is called at early initialization stage and require the status notifier head has been initiated, I initiate it in intel_engine_setup(). v2: remove a redundant newline. (chris) Fixes: 3c7ba6359d70 ("drm/i915: Introduce execlist context status change notification") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100232 Signed-off-by: Changbin Du <changbin.du@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170313024711.28591-1-changbin.du@intel.com Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-03-16drm/i915/scheduler: emulate a scheduler for gucChris Wilson
This emulates execlists on top of the GuC in order to defer submission of requests to the hardware. This deferral allows time for high priority requests to gazump their way to the head of the queue, however it nerfs the GuC by converting it back into a simple execlist (where the CPU has to wake up after every request to feed new commands into the GuC). v2: Drop hack status - though iirc there is still a lockdep inversion between fence and engine->timeline->lock (which is impossible as the nesting only occurs on different fences - hopefully just requires some judicious lockdep annotation) v3: Apply lockdep nesting to enabling signaling on the request, using the pattern we already have in __i915_gem_request_submit(); v4: Replaying requests after a hang also now needs the timeline spinlock, to disable the interrupts at least v5: Hold wq lock for completeness, and emit a tracepoint for enabling signal v6: Reorder interrupt checking for a happier gcc. v7: Only signal the tasklet after a user-interrupt if using guc scheduling v8: Restore lost update of rq through the i915_guc_irq_handler (Tvrtko) v9: Avoid re-initialising the engine->irq_tasklet from inside a reset v10: Hook up the execlists-style tracepoints v11: Clear the execlists irq_posted bit after taking over the interrupt/tasklet Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170316125619.6856-1-chris@chris-wilson.co.uk Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
2017-03-03drm/i915: Avoid using word legacy with ppgttMika Kuoppala
The term legacy is subjective. Use 3lvl and 4lvl where appropriate. Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1488295691-9404-4-git-send-email-mika.kuoppala@intel.com
2017-03-03drm/i915: Don't mark pdps clear if pdps are not submittedMika Kuoppala
Don't mark pdps clear if never do the necessary actions with the hardware to make them clear. v2: totally get rid of confusing ppgtt bool (Chris) Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1488295691-9404-2-git-send-email-mika.kuoppala@intel.com
2017-03-03drm/i915: Generalise wait for execlists to be idleChris Wilson
The code to check for execlists completion is generic, so move it to intel_engine_cs.c, where we can reuse the new intel_engine_is_idle(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170303121947.20482-2-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-02-27drm/i915/bdw: Do not write the replay bit of the ring mode registerKelvin Gardiner
The replay bit of the ring mode register is not a valid bit for Gen8+. Do not write to this bit. Signed-off-by: Kelvin Gardiner <kelvin.gardiner@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Ceraolo Spurio, Daniele <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> [Joonas: Fixed commit message line to be under 72 chars] Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1487963724-4824-1-git-send-email-kelvin.gardiner@intel.com
2017-02-24drm/i915/execlists: Detect an out-of-order context switchChris Wilson
We require that the request is completed before the context is switched away. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170223145031.26210-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-02-21drm/i915/tracepoints: Add backend level request in and out tracepointsTvrtko Ursulin
Two new tracepoints placed at the call sites where requests are actually passed to the GPU enable userspace to track engine utilisation. These tracepoints are only enabled when the DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option is enabled. v2: Fix compilation with !CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS. v3: Name global seqno consistently across tracepoints. v4: Remove port info from request out tracepoint. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-02-21drm/i915: Tidy execlists_init_reg_stateTvrtko Ursulin
Compact the name of the macro and reg_state variable, and cache some data in local variables to make the function more compact and more readable. v2: Fixup some checkpatch warnings. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170221095839.30525-1-tvrtko.ursulin@linux.intel.com
2017-02-20drm/i915: Assert that the request->tail is always qword alignedChris Wilson
The hardware requires that the tail pointer only advance in qword units, so assert that the value we write is aligned to qwords, and similarly enforce this restriction onto the request->tail. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170217163833.731-1-chris@chris-wilson.co.uk Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
2017-02-17drm/i915: Consolidate gen8_emit_pipe_controlTvrtko Ursulin
We have a few open coded instances in the execlists code and an almost suitable helper in intel_ringbuf.c We can consolidate to a single helper if we change the existing helper to emit directly to ring buffer memory and move the space reservation outside it. v2: Drop memcpy for memset. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170216122325.31391-2-tvrtko.ursulin@linux.intel.com
2017-02-17drm/i915: Tidy workaround batch buffer emissionTvrtko Ursulin
Use the "*batch++ = " style as in the ring emission for better readability and also simplify the logic a bit by consolidating the offset and size calculations and overflow checking. The latter is a programming error so it is not required to check for it after each write to the object, but rather do it once the whole state has been written and fail the driver if something went wrong. v2: Rebase. v3: Keep track of offsets and sizes in bytes for simplicity and rename function pointer variable to _fn suffix. (Chris Wilson) v4: Fix size calc broken in v3 and add alignment warning. (Chris Wilson) v5: Fix return code. v6: I added an exit from loop in v5 but forgot to put back the object teardown. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v5) Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-02-15drm/i915: Remove duplicate intel_logical_ring_workarounds_emitTvrtko Ursulin
intel_ring_workarounds_emit is exactly the same code. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170214150017.16058-1-tvrtko.ursulin@linux.intel.com
2017-02-14drm/i915: Emit to ringbuffer directlyTvrtko Ursulin
This removes the usage of intel_ring_emit in favour of directly writing to the ring buffer. intel_ring_emit was preventing the compiler for optimising fetch and increment of the current ring buffer pointer and therefore generating very verbose code for every write. It had no useful purpose since all ringbuffer operations are started and ended with intel_ring_begin and intel_ring_advance respectively, with no bail out in the middle possible, so it is fine to increment the tail in intel_ring_begin and let the code manage the pointer itself. Useless instruction removal amounts to approximately two and half kilobytes of saved text on my build. Not sure if this has any measurable performance implications but executing a ton of useless instructions on fast paths cannot be good. v2: * Change return from intel_ring_begin to error pointer by popular demand. * Move tail increment to intel_ring_advance to enable some error checking. v3: * Move tail advance back into intel_ring_begin. * Rebase and tidy. v4: * Complete rebase after a few months since v3. v5: * Remove unecessary cast and fix !debug compile. (Chris Wilson) v6: * Make intel_ring_offset take request as well. * Fix recording of request postfix plus a sprinkle of asserts. (Chris Wilson) v7: * Use intel_ring_offset to get the postfix. (Chris Wilson) * Convert GVT code as well. v8: * Rename *out++ to *cs++. v9: * Fix GVT out to cs conversion in GVT. v10: * Rebase for new intel_ring_begin in selftests. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170214113242.29241-1-tvrtko.ursulin@linux.intel.com
2017-02-10drm/i915: Rename conditional GEM execution macrosChris Wilson
After a brief discussion, we settled on a naming convention for the conditional GEM debugging data that should be clearer to the casual user: GEM_DEBUG Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170207102319.10910-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-02-10drm/i915: Always pin contexts into the high GGTTChris Wilson
Now that we have fast top-down insertion into the drm_mm, we can use it for frequent runtime operations like insertion of the context object, whereas before we limited it to the one-off insertion of the pinned kernel context. Keeping the active context objects out of the mappable region of the global GTT (except under memory pressure) improves our ability to allocate mappable aperture region without triggering a GPU stall. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170210101422.1598-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-02-09drm/i915: Use the size/type of address space to make decisionsChris Wilson
Once the address space has been created (using 3 or 4 levels of page tables), we should use that to program the appropriate type into the contexts. This gives us the flexibility to handle different types of address spaces at runtime. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170209144036.23664-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-02-07drm/i915: Restore context and pd for ringbuffer submission after resetChris Wilson
Following a reset, the context and page directory registers are lost. However, the queue of requests that we resubmit after the reset may depend upon them - the registers are restored from a context image, but that restore may be inhibited and may simply be absent from the request if it was in the middle of a sequence using the same context. If we prime the CCID/PD registers with the first request in the queue (even for the hung request), we prevent invalid memory access for the following requests (and continually hung engines). v2: Magic BIT(8), reserved for future use but still appears unused. v3: Some commentary on handling innocent vs guilty requests v4: Add a wait for PD_BASE fetch. The reload appears to be instant on my Ivybridge, but this bit probably exists for a reason. Fixes: 821ed7df6e2a ("drm/i915: Update reset path to fix incomplete requests") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170207152437.4252-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-02-06drm/i915: Avoid unguarded reads from the request pointerChris Wilson
In commit 86aa7e760a67 ("drm/i915: Assert that the context-switch completion matches our context") I added a read to the irq tasklet handler that compared the on-chip status with that of our sw tracking, using an unguarded read of the request pointer to get the context and beyond. Whilst we hold a reference to the request, we do not hold anything on the context and if we are unlucky it may be reaped from a second thread retiring the request (since it may retire the request as soon as the breadcrumb is complete, even before we finish processing the context switch) as we try to read from the context pointer. Avoid the racy read from underneath the request by storing the expected result in the execlist_port[]. v2: Include commentary about port[].request being unprotected. Fixes: 86aa7e760a67 ("drm/i915: Assert that the context-switch completion matches our context") Reported-by: Mika Kuoppala <mika.kuoppala@intel.com> Testcase: igt/gem_ctx_create Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170206170502.30944-2-chris@chris-wilson.co.uk
2017-02-06drm/i915: Let execlist_update_context() cover !FULL_PPGTT mode.Zhi Wang
execlist_update_context() will try to update PDPs in a context before a ELSP submission only for full PPGTT mode, while PDPs was populated during context initialization. Now the latter code path is removed. Let execlist_update_context() also cover !FULL_PPGTT mode. Fixes: 34869776c76b ("drm/i915: check ppgtt validity when init reg state") Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhiyuan Lv <zhiyuan.lv@intel.com> Signed-off-by: Zhi Wang <zhi.a.wang@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1486377436-15380-1-git-send-email-zhi.a.wang@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-02-06drm/i915: Print execlists restart after resetChris Wilson
After resetting, show the requests that each engine restarts from in the debug log. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170204110519.7645-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-02-01drm/i915/execlists: Add interrupt-pending check to intel_execlists_idle()Chris Wilson
Primarily this serves as a sanity check that the bit has been cleared before we suspend (and hasn't reappeared after resume). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Link: http://patchwork.freedesktop.org/patch/msgid/20170201131222.11882-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-02-01drm/i915/execlists: Skip resetting RING_CONTEXT_STATUS_PTRChris Wilson
As we now flag when the GPU signals a context-switch and do not read the status register before we see that signal, we do not have to ensure that it is cleared upon reset (and can leave it to the GPU to reset it from the power context). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Link: http://patchwork.freedesktop.org/patch/msgid/20170201125338.12932-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-01-30drm/i915: Create context desc template when context is createdMika Kuoppala
Move the invariant parts of context desc setup from execlist init to context creation. This is advantageous when we need to create different templates based on the context parametrization, ie. for svm capable contexts. v2: s/create/default, remove engine->ctx_desc_template Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1485522189-31984-1-git-send-email-mika.kuoppala@intel.com
2017-01-30drm/i915/glk: Turn on workarounds that apply to Geminilake tooAnder Conselvan de Oliveira
Apply workarounds to Geminilake, and annotate those that are applied unconditionally when they apply to GLK based on the workaround database. v2: Fix commit message typos. (David) v3: Rebase. Cc: David Weinehall <david.weinehall@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@gmail.com> Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Reviewed-by: David Weinehall <david.weinehall@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1485422218-9102-1-git-send-email-ander.conselvan.de.oliveira@intel.com
2017-01-24drm/i915: Dequeue execlists on a new request if any port is availableChris Wilson
If the second ELSP port is available, schedule the execlists tasklet to see if the new request can occupy it. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170124110009.28947-7-chris@chris-wilson.co.uk
2017-01-24drm/i915: Only attempt to pass the first request to execlistsChris Wilson
Only the first request added to the execlist queue can be submitted. If this request is not the first request on the queue, it means that there are already higher priority requests waiting upon the tasklet and kicking it will make no difference. This is more relevant for a later patch, where we more eagerly try and kick the tasklet to handle the submission of new requests. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170124110009.28947-6-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-01-24drm/i915: Skip the execlists CSB scan and rewrite if the ring is untouchedChris Wilson
If the CSB head/tail pointers are unchanged, we can skip the update of the CSB register afterwards. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170124110009.28947-5-chris@chris-wilson.co.uk