summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/xe/xe_ring_ops.c
AgeCommit message (Collapse)Author
2025-07-17drm/xe: Dont skip TLB invalidations on VFTejas Upadhyay
Skipping TLB invalidations on VF causing unrecoverable faults. Probable reason for skipping TLB invalidations on SRIOV could be lack of support for instruction MI_FLUSH_DW_STORE_INDEX. Add back TLB flush with some additional handling. Helps in resolving, [ 704.913454] xe 0000:00:02.1: [drm:pf_queue_work_func [xe]] ASID: 0 VFID: 0 PDATA: 0x0d92 Faulted Address: 0x0000000002fa0000 FaultType: 0 AccessType: 1 FaultLevel: 0 EngineClass: 3 bcs EngineInstance: 8 [ 704.913551] xe 0000:00:02.1: [drm:pf_queue_work_func [xe]] Fault response: Unsuccessful -22 V2: - Use Xmas tree (MichalW) Suggested-by: Matthew Brost <matthew.brost@intel.com> Fixes: 97515d0b3ed92 ("drm/xe/vf: Don't emit access to Global HWSP if VF") Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250710045945.1023840-1-tejas.upadhyay@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com> (cherry picked from commit b528e896fa570844d654b5a4617a97fa770a1030) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-05-14drm/xe: Save CTX_TIMESTAMP mmio value instead of LRC valueUmesh Nerlige Ramappa
For determining actual job execution time, save the current value of the CTX_TIMESTAMP register rather than the value saved in LRC since the current register value is the closest to the start time of the job. v2: Define MI_STORE_REGISTER_MEM to fix compile error v3: Place MI_STORE_REGISTER_MEM sorted by MI_INSTR (Lucas) Fixes: 65921374c48f ("drm/xe: Emit ctx timestamp copy in ring ops") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250509161159.2173069-6-umesh.nerlige.ramappa@intel.com (cherry picked from commit 38b14233e5deff51db8faec287b4acd227152246) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-04-07drm/xe: Invalidate L3 read-only cachelines for geometry streams tooKenneth Graunke
Historically, the Vertex Fetcher unit has not been an L3 client. That meant that, when a buffer containing vertex data was written to, it was necessary to issue a PIPE_CONTROL::VF Cache Invalidate to invalidate any VF L2 cachelines associated with that buffer, so the new value would be properly read from memory. Since Tigerlake and later, VERTEX_BUFFER_STATE and 3DSTATE_INDEX_BUFFER have included an "L3 Bypass Enable" bit which userspace drivers can set to request that the vertex fetcher unit snoop L3. However, unlike most true L3 clients, the "VF Cache Invalidate" bit continues to only invalidate the VF L2 cache - and not any associated L3 lines. To handle that, PIPE_CONTROL has a new "L3 Read Only Cache Invalidation Bit", which according to the docs, "controls the invalidation of the Geometry streams cached in L3 cache at the top of the pipe." In other words, the vertex and index buffer data that gets cached in L3 when "L3 Bypass Disable" is set. Mesa always sets L3 Bypass Disable so that the VF unit snoops L3, and whenever it issues a VF Cache Invalidate, it also issues a L3 Read Only Cache Invalidate so that both L2 and L3 vertex data is invalidated. xe is issuing VF cache invalidates too (which handles cases like CPU writes to a buffer between GPU batches). Because userspace may enable L3 snooping, it needs to issue an L3 Read Only Cache Invalidate as well. Fixes significant flickering in Firefox on Meteorlake, which was writing to vertex buffers via the CPU between batches; the missing L3 Read Only invalidates were causing the vertex fetcher to read stale data from L3. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4460 Fixes: 6ef3bb60557d ("drm/xe: enable lite restore") Cc: stable@vger.kernel.org # v6.13+ Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250330165923.56410-1-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit 61672806b579dd5a150a042ec9383be2bbc2ae7e) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-03-12drm/xe: Pass flags directly to emit_flush_imm_ggttTvrtko Ursulin
This is more readable than the nameless booleans and will also come handy later. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250307111402.26577-4-tvrtko.ursulin@igalia.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit 52a237e8d6c4abcda40c71268ee6cec75aa62799) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-03-12drm/xe: Fix ring flush invalidationTvrtko Ursulin
Emit_flush_invalidate() is incorrectly marking the write to LRC_PPHWSP as a GGTT write and also writing an atypical ~0 dword as the payload. Fix it. While at it drop the unused flags argument. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250307111402.26577-3-tvrtko.ursulin@igalia.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit 08ea901d0b8f6ea261d9936e03fa690540af0126) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-26drm/xe: Add Wa_16021333562 and Wa_14016712196Aradhya Bhatia
Wa_16021333562 and Wa_14016712196 are permanent workarounds that apply to multiple platforms. Wa_16021333562 applies to platforms ranging from TGL (12.00) to Xe_LPM (13.00), while Wa_14016712196 from DG2 (12.55) to Xe_LPG (12.74). Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Signed-off-by: Aradhya Bhatia <aradhya.bhatia@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250220094645.358647-2-aradhya.bhatia@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-03drm/xe/pxp: Add VCS inline termination supportDaniele Ceraolo Spurio
The key termination is done with a specific submission to the VCS engine. This flow will be triggered in response to a termination interrupt, whose handling is coming in a follow-up patch in the series. v2: clean up defines and command emission code. (John) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-4-daniele.ceraolospurio@intel.com
2025-01-02Revert "drm/xe: Force write completion of MI_STORE_DATA_IMM"José Roberto de Souza
This reverts commit 1460bb1fef9ccf7390af0d74a15252442fd6effd. In all places the MI_STORE_DATA_IMM are not followed by a read of the same memory address in the same batch buffer and the posted writes are flushed with PIPE_CONTROL or MI_FLUSH_DW in xe_ring_ops.c functions so there is no need to set this register. Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Fixes: 1460bb1fef9c ("drm/xe: Force write completion of MI_STORE_DATA_IMM") Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241227183230.101334-1-jose.souza@intel.com
2024-12-23xe/oa: Fix query mode of operation for OAR/OACUmesh Nerlige Ramappa
This is a set of squashed commits to facilitate smooth applying to stable. Each commit message is retained for reference. 1) Allow a GGTT mapped batch to be submitted to user exec queue For a OA use case, one of the HW registers needs to be modified by submitting an MI_LOAD_REGISTER_IMM command to the users exec queue, so that the register is modified in the user's hardware context. In order to do this a batch that is mapped in GGTT, needs to be submitted to the user exec queue. Since all user submissions use q->vm and hence PPGTT, add some plumbing to enable submission of batches mapped in GGTT. v2: ggtt is zero-initialized, so no need to set it false (Matt Brost) 2) xe/oa: Use MI_LOAD_REGISTER_IMMEDIATE to enable OAR/OAC To enable OAR/OAC, a bit in RING_CONTEXT_CONTROL needs to be set. Setting this bit cause the context image size to change and if not done correct, can cause undesired hangs. Current code uses a separate exec_queue to modify this bit and is error-prone. As per HW recommendation, submit MI_LOAD_REGISTER_IMM to the target hardware context to modify the relevant bit. In v2 version, an attempt to submit everything to the user-queue was made, but it failed the unprivileged-single-ctx-counters test. It appears that the OACTXCONTROL must be modified from a remote context. In v3 version, all context specific register configurations were moved to use LOAD_REGISTER_IMMEDIATE and that seems to work well. This is a cleaner way, since we can now submit all configuration to user exec_queue and the fence handling is simplified. v2: (Matt) - set job->ggtt to true if create job is successful - unlock vm on job error (Ashutosh) - don't wait on job submission - use kernel exec queue where possible v3: (Ashutosh) - Fix checkpatch issues - Remove extra spaces/new-lines - Add Fixes: and Cc: tags - Reset context control bit when OA stream is closed - Submit all config via MI_LOAD_REGISTER_IMMEDIATE (Umesh) - Update commit message for v3 experiment - Squash patches for easier port to stable v4: (Ashutosh) - No need to pass q to xe_oa_submit_bb - Do not support exec queues with width > 1 - Fix disabling of CTX_CTRL_OAC_CONTEXT_ENABLE v5: (Ashutosh) - Drop reg_lri related comments - Use XE_OA_SUBMIT_NO_DEPS in xe_oa_load_with_lri Fixes: 8135f1c09dd2 ("drm/xe/oa: Don't reset OAC_CONTEXT_ENABLE on OA stream close") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> # commit 1 Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Cc: stable@vger.kernel.org Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241220171919.571528-2-umesh.nerlige.ramappa@intel.com
2024-12-18drm/xe: Force write completion of MI_STORE_DATA_IMMJosé Roberto de Souza
With Force write completion unset there is no guarantees of when the write will be globally visible what is not the behavior wanted. Fixes: 9c57bc08652a ("drm/xe/lnl: Drop force_probe requirement") Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241217160732.46280-1-jose.souza@intel.com
2024-06-12drm/xe: Emit ctx timestamp copy in ring opsMatthew Brost
Copy ctx timestamp at beginning of every GPU job to a saved location. Used to determine how long a job has been running on the hardware. v2: - - s/ctx_timestamp_job/ctx_job_timestamp Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240611144053.2805091-4-matthew.brost@intel.com
2024-06-05drm/xe: flush engine buffers before signalling user fence on all enginesAndrzej Hajda
Tests show that user fence signalling requires kind of write barrier, otherwise not all writes performed by the workload will be available to userspace. It is already done for render and compute, we need it also for the rest: video, gsc, copy. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240605-fix_user_fence_posted-v3-2-06e7932f784a@intel.com
2024-06-05Revert "drm/xe: flush gtt before signalling user fence on all engines"Andrzej Hajda
This reverts commit 38007fa96419a9db9719f170b9e8a7877821cdd1. Signaling user-fence after seqno write does not seem to be good solution. Instead of changing order separate barrier should be put before user-fence, this will be done in separate patch. v2: added fixes tag in case reverted patch gets backported to stable Fixes: 38007fa96419 ("drm/xe: flush gtt before signalling user fence on all engines") Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240605-fix_user_fence_posted-v3-1-06e7932f784a@intel.com
2024-05-29drm/xe: Decouple xe_exec_queue and xe_lrcNiranjana Vishwanathapura
Decouple xe_lrc from xe_exec_queue and reference count xe_lrc. Removing hard coupling between xe_exec_queue and xe_lrc allows flexible design where the user interface xe_exec_queue can be destroyed independent of the hardware/firmware interface xe_lrc. v2: Fix lrc indexing in wq_item_append() Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240530032211.29299-1-niranjana.vishwanathapura@intel.com
2024-05-28drm/xe: flush gtt before signalling user fence on all enginesAndrzej Hajda
Tests show that user fence signalling requires kind of write barrier, otherwise not all writes performed by the workload will be available to userspace. It is already done for render and compute, we need it also for the rest: video, gsc, copy. v2: added gsc and copy engines, added fixes and r-b tags Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1488 Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522-xu_flush_vcs_before_ufence-v2-1-9ac3e9af0323@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
2024-05-27drm/xe: Don't initialize fences at xe_sched_job_create()Thomas Hellström
Pre-allocate but don't initialize fences at xe_sched_job_create(), and initialize / arm them instead at xe_sched_job_arm(). This makes it possible to move xe_sched_job_create() with its memory allocation out of any lock that is required for fence initialization, and that may not allow memory allocation under it. Replaces the struct dma_fence_array for parallell jobs with a struct dma_fence_chain, since the former doesn't allow a split-up between allocation and initialization. v2: - Rebase. - Don't always use the first lrc when initializing parallel lrc fences. - Use dma_fence_chain_contained() to access the lrc fences. v4: - Add an assert that job->lrc_seqno == fence->seqno. (Matthew Brost) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> #v2 Link: https://patchwork.freedesktop.org/patch/msgid/20240527135912.152156-4-thomas.hellstrom@linux.intel.com
2024-05-27drm/xe: Decouple job seqno and lrc seqnoMatthew Brost
Tightly coupling these seqno presents problems if alternative fences for jobs are used. Decouple these for correctness. v2: - Slightly reword commit message (Thomas) - Make sure the lrc fence ops are used in comparison (Thomas) - Assume seqno is unsigned rather than signed in format string (Thomas) Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240527135912.152156-2-thomas.hellstrom@linux.intel.com
2024-05-09drm/xe: Move xe_gpu_commands.h file to instructions/Michal Wajdeczko
All other files with commands definitions are in instructions/ folder. Move xe_gpu_commands.h also there. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240508174856.1908-1-michal.wajdeczko@intel.com
2024-04-08drm/xe/vf: Don't emit access to Global HWSP if VFMichal Wajdeczko
VFs can't access Global HWSP, don't emit questionable MI_FLUSH_DW while processing a migration job. Bspec: 52398 Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240405133936.891-2-michal.wajdeczko@intel.com
2024-03-28drm/xe: Use ring ops TLB invalidation for rebindsThomas Hellström
For each rebind we insert a GuC TLB invalidation and add a corresponding unordered TLB invalidation fence. This might add a huge number of TLB invalidation fences to wait for so rather than doing that, defer the TLB invalidation to the next ring ops for each affected exec queue. Since the TLB is invalidated on exec_queue switch, we need to invalidate once for each affected exec_queue. v2: - Simplify if-statements around the tlb_flush_seqno. (Matthew Brost) - Add some comments and asserts. Fixes: 5387e865d90e ("drm/xe: Add TLB invalidation fence after rebinds issued from execs") Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240327091136.3271-2-thomas.hellstrom@linux.intel.com
2024-02-21drm/xe: Do not include current dir for generated/xe_wa_oob.hDafna Hirschfeld
The generated file 'generated/xe_wa_oob.h' is included using: "generated/xe_wa_oob.h" which first look inside the source code. But the file resides in the build directory and should therefore be included using: <generated/xe_wa_oob.h> Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240221083622.1584492-1-dhirschfeld@habana.ai
2024-01-30drm/xe: Use function to emit PIPE_CONTROLJosé Roberto de Souza
This reduces code duplication in xe_ring_ops. v2: - fix flags of emit_pipe_imm_ggtt() - reduce to only one function v3: - fix emit_pipe_imm_ggtt() stall_only check Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240130132249.8615-1-jose.souza@intel.com
2023-12-21drm/xe: Drop some unnecessary header includesMatt Roper
Several files were including register headers that they no longer require. Drop the unnecessary includes to reduce build dependencies. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20231214184659.2249559-18-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21drm/xe/xe2: Add workaround 16020292621Tejas Upadhyay
Workaround applies to Graphics 20.04 as part of ring submission V4(MattR): - Rule for engine in oob WA not supported, add explicitly V3(MattR): - Pass hwe and rename API name to hint end of ring work - Use existing RING_NOPID API V2: - Marking this WA for 20.04 instead of 20.00 Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21drm/xe/migrate: fix MI_ARB_ON_OFF usageMatthew Auld
Spec says: "This is a privileged command; it will not be effective (will be converted to a no-op) if executed from within a non-privileged batch buffer." However here it looks like we are just emitting it inside some bb which was jumped to via the ppGTT, which should be considered a non-privileged address space. It looks like we just need some way of preventing things like the emit_pte() and later copy/clear being preempted in-between so rather just emit directly in the ring for migration jobs. Bspec: 45716 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21drm/xe: Extract MI_* instructions to their own headerMatt Roper
Extracting the common MI_* instructions that can be used with any engine to their own header will make it easier as we add additional engine instructions in upcoming patches. Also, since the majority of GPU instructions (both MI and non-MI) have a "length" field in bits 7:0 of the instruction header, a common define is added for that. Instruction-specific length fields are still defined for special case instructions that have larger/smaller length fields. v2: - Use "instr" instead of "inst" as the short form of "instruction" everywhere. (Lucas) - Include xe_reg_defs.h instead of the i915 compat header. (Lucas) Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20231016163449.1300701-12-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21drm/xe: Clarify number of dwords/qwords stored by MI_STORE_DATA_IMMMatt Roper
MI_STORE_DATA_IMM can store either dword values or qword values, and can store more than one value if the instruction's length field is large enough. Create explicit defines to specify the number of dwords/qwords to be stored, which will set the instruction length correctly and, if necessary, turn on the 'store qword' bit. While we're here, also replace an open-coded version of MI_STORE_DATA_IMM with the common macros. Bspec: 60246 Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20231016163449.1300701-11-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21drm/xe: Separate number of registers from MI_LRI opcodeMatt Roper
Keeping the number of registers to be loaded as a separate macro from the instruction opcode will simplify some upcoming LRC parsing code. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20231016163449.1300701-10-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21drm/xe: Make MI_FLUSH_DW immediate size more explicitMatt Roper
Despite its name, MI_FLUSH_DW instruction can write an immediate value of either dword size or qword size, depending on the 'length' field of the instruction. Since "length" excludes the first two dwords of the instruction, a value of 2 in the length field implies a dword write and a value of 3 implies a qword write. Even in cases where the flush instruction's post-sync operation is set to "no write" we're still expected to size the overall instruction as if we were doing a dword or qword write (i.e., a length of 1 shouldn't be used on modern platforms). Rather than baking a size of "1" into the #define and then adding another unexplained "+ 1" at all the spots where the definition gets used, lets just create MI_FLUSH_IMM_DW and MI_FLUSH_IMM_QW definitions that should be OR'd into the instruction header to make it more explicit what behavior we're requesting. Bspec: 60229 Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20231016163449.1300701-9-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21drm/xe: Use Xe assert macros instead of XE_WARN_ON macroFrancois Dugast
The XE_WARN_ON macro maps to WARN_ON which is not justified in many cases where only a simple debug check is needed. Replace the use of the XE_WARN_ON macro with the new xe_assert macros which relies on drm_*. This takes a struct drm_device argument, which is one of the main changes in this commit. The other main change is that the condition is reversed, as with XE_WARN_ON a message is displayed if the condition is true, whereas with xe_assert it is if the condition is false. v2: - Rebase - Keep WARN splats in xe_wopcm.c (Matt Roper) v3: - Rebase Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21drm/xe: standardize vm-less kernel submissionsDaniele Ceraolo Spurio
The current only submission in the driver that doesn't use a vm is the WA setup. We still pass a vm structure (the migration one), but we don't actually use it at submission time and we instead have an hack to use GGTT for this particular engine. Instead of special-casing the WA engine, we can skip providing a VM and use that as selector for whether to use GGTT or PPGTT. As part of this change, we can drop the special engine flag for the WA engine and switch the WA submission to use the standard job functions instead of dedicated ones. v2: rebased on s/engine/exec_queue Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20230822173334.1664332-4-daniele.ceraolospurio@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21drm/xe: fix submissions without vmDaniele Ceraolo Spurio
Kernel queues can submit privileged batches directly in GGTT, so they don't always need a vm. The submission front-end already supports creating and submitting jobs without a vm, but some parts of the back-end assume the vm is always there. Fix this by handling a lack of vm in the back-end as well. v2: s/XE_BUG_ON/XE_WARN_ON, s/engine/exec_queue Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20230822173334.1664332-2-daniele.ceraolospurio@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21drm/xe/xe2: AuxCCS is no longer usedMatt Roper
Starting with Xe2, all platforms (including igpu platforms) use FlatCCS compression rather than AuxCCS. Similar to PVC, any future platforms that don't support FlatCCS should not attempt to fall back to AuxCCS programming. Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21drm/xe: add GSCCS ring opsDaniele Ceraolo Spurio
Like the BCS, the GSCCS doesn't have any special HW that needs handling when emitting commands, so we can re-use the same emit_job code. To make it clear that this is now a shared low-level function, it has been renamed to use the "simple" postfix, instead of "copy", to indicate that it applies to all engines that don't need any additional engine-specific handling. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230817201831.1583172-5-daniele.ceraolospurio@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21drm/xe: Rename engine to exec_queueFrancois Dugast
Engine was inappropriately used to refer to execution queues and it also created some confusion with hardware engines. Where it applies the exec_queue variable name is changed to q and comments are also updated. Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/162 Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21drm/xe: Prefer WARN() over BUG() to avoid crashing the kernelFrancois Dugast
Replace calls to XE_BUG_ON() with calls XE_WARN_ON() which in turn calls WARN() instead of BUG(). BUG() crashes the kernel and should only be used when it is absolutely unavoidable in case of catastrophic and unrecoverable failures, which is not the case here. Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19drm/xe: Emit a render cache flush after each rcs/ccs batchThomas Hellström
We need to flush render caches before fence signalling, where we might release the memory for reuse. We can't rely on userspace doing this, so flush render caches after the batch, but before user fence- and dma_fence signalling. Copy the cache flush from i915, but omit PIPE_CONTROL_FLUSH_L3, since it should be implied by the other flushes. Also omit PIPE_CONTROL_TLB_INVALIDATE since there should be no apparent need to invalidate TLB after batch completion. v2: - Update Makefile for OOB WA. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Tested-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> #1 Reported-by: José Roberto de Souza <jose.souza@intel.com> Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/291 Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/291 Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19drm/xe: Invalidate TLB also on bind if in scratch page modeThomas Hellström
For scratch table mode we need to cover the case where a scratch PTE might have been pre-fetched and cached and used instead of that of the newly bound vma. For compute vms, invalidate TLB globally using GuC before signalling bind complete. For !long-running vms, invalidate TLB at batch start. Also document how TLB invalidation works. v2: - Fix a pointer to the comment about TLB invalidation (Jose Souza). - Add a bool to the vm whether we want to invalidate TLB at batch start. - Invalidate TLB also on BCS- and video engines at batch start where needed. - Use BIT() macro instead of explicit shift. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Tested-by: José Roberto de Souza <jose.souza@intel.com> #v1 Reported-by: José Roberto de Souza <jose.souza@intel.com> #v1 Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/291 Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/291 Acked-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19drm/xe: Introduce xe_tileMatt Roper
Create a new xe_tile structure to begin separating the concept of "tile" from "GT." A tile is effectively a complete GPU, and a GT is just one part of that. On platforms like MTL, there's only a single full GPU (tile) which has its IP blocks provided by two GTs. In contrast, a "multi-tile" platform like PVC is basically multiple complete GPUs packed behind a single PCI device. For now, just create xe_tile as a simple wrapper around xe_gt. The items in xe_gt that are truly tied to the tile rather than the GT will be moved in future patches. Support for multiple GTs per tile (i.e., the MTL standalone media case) will also be re-introduced in a future patch. v2: - Fix kunit test build - Move hunk from next patch to use local tile variable rather than direct xe->tiles[id] accesses. (Lucas) - Mention compute in kerneldoc. (Rodrigo) Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20230601215244.678611-3-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19drm/xe: Replace PVC check by engine type checkJosé Roberto de Souza
__emit_job_gen12_render_compute() masks some PIPE_CONTROL bits that do not exist in platforms without render engine. So here replacing the PVC check by something more generic that will support any future platforms without render engine. Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19drm/xe/pvc: Don't try to invalidate AuxCCS TLBMatt Roper
Generally !has_flatccs implies that a platform has AuxCCS compression and thus needs to invalidate the AuxCCS TLB. However PVC is a special case because it has no compression of either type (FlatCCS or AuxCCS) so we should avoid writing to non-existent AuxCCS registers. Reviewed-by: Haridhar Kalvala <haridhar.kalvala@intel.com> Link: https://lore.kernel.org/r/20230524192635.673293-1-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19drm/xe: Rename reg field to addrLucas De Marchi
Rename the address field to "addr" rather than "reg" so it's easier to understand what it is. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20230508225322.2692066-4-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19drm/xe/mmio: Use struct xe_regLucas De Marchi
Convert all the callers to deal with xe_mmio_*() using struct xe_reg instead of plain u32. In a few places there was also a rename s/reg/reg_val/ when dealing with the value returned so it doesn't get mixed up with the register address. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20230508225322.2692066-2-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19drm/xe: Do not mark 1809175790 as a WALucas De Marchi
Additional programming annotated with Wa_<number> should be reserved to those that have a official workaround. Just pointing to a bug or additional reference can be done with something else. Copy what i915 does and refer to it as "hsdes: ....". Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20230504073250.1436293-2-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19drm/xe: Drop gen afixes from registersLucas De Marchi
The defines for the registers were brought over from i915 while bootstrapping the driver. As xe supports TGL and later only, it doesn't make sense to keep the GEN* prefixes and suffixes in the registers: TGL is graphics version 12, previously called "GEN12". So drop the prefix everywhere. v2: - Also drop _TGL suffix and reword commit message as suggested by Matt Roper. While at it, rename VSUNIT_CLKGATE_DIS_TGL to VSUNIT_CLKGATE2_DIS with the additional "2", so it doesn't clash with the define for the other register Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230427223256.1432787-3-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19drm/xe: Reinstate render / compute cache invalidation in ring opsMatthew Brost
Render / compute engines have additional caches (not just TLBs) that need to be invalidated each batch, reinstate these invalidations in ring ops. Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Suggested-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19drm/xe: Remove dependency on i915_reg.hLucas De Marchi
Copy the macros used by xe in i915_reg.h to regs/xe_regs.h. A minimal cleanup is done while copying so they adhere minimally to the coding style. Further reordering and cleaning is left for later. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19drm/xe: Remove dependency on intel_gpu_commands.hLucas De Marchi
Copy the macros used by xe in intel_gpu_commands.h to regs/xe_gpu_commands.h. PIPE_CONTROL_3D_ENGINE_FLAGS and PIPE_CONTROL_3D_ARCH_FLAGS were already defined in drivers/gpu/drm/xe/xe_ring_ops.c and only used there. So let that define to be used instead of also adding to the new header. v2: Let PIPE_CONTROL_3D_ENGINE_FLAGS/PIPE_CONTROL_3D_ARCH_FLAGS in the only .c that uses it instead of redefining (Matt Roper) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19drm/xe: Remove dependency on intel_lrc_reg.hLucas De Marchi
Create regs/xe_lrc_layout.h file with all the offsets used by the xe driver. Eventually the xe driver may use a different way to define them since it doesn't supported below gen12. v2: Rename file to intel_lrc_layout.h since it's not really about registers (Matt Roper) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19drm/xe: Remove dependency on intel_gt_regs.hLucas De Marchi
Create regs/xe_gt_regs.h file with all the registers and bit definitions used by the xe driver. Eventually the registers may be defined in a different way and since xe doesn't supported below gen12, the number of registers touched is much smaller, so create a new header. The definitions themselves are direct copy from the gt/intel_gt_regs.h file, just sorting the registers by address. Cleaning those up and adhering to a common coding style is left for later. v2: Make the change to MCR_REG location in a separate patch to go through the i915 branch (Matt Roper / Rodrigo) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>