Age | Commit message (Collapse) | Author |
|
Split NV01_ROOT handling out into its own module.
Aside from moving the function pointers, no code change is intended.
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timur Tabi <ttabi@nvidia.com>
Tested-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Split base RM_ALLOC handling out into its own module.
Aside from moving the function pointers, no code change is intended.
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timur Tabi <ttabi@nvidia.com>
Tested-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Split base RM_CONTROL handling out into its own module.
Aside from moving the function pointers, no code change is intended.
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timur Tabi <ttabi@nvidia.com>
Tested-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Later patches in the series add HALs around various RM APIs in order to
support a newer version of GSP-RM firmware. In order to do this, begin
by splitting the code up into "modules" that roughly represent RM's API
boundaries so they can be more easily managed.
Aside from moving the RPC function pointers, no code change is indended.
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timur Tabi <ttabi@nvidia.com>
Tested-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
In order to specify a channel ID to RM during channel allocation, the
channel ID is broken down into a "userd page" index and an index into
that page.
It was assumed that RM would enforce that the same physical block of
memory be used for all CHIDs within a "userd page", and the GSP paths
override NVKM's normal CHID allocation to handle this.
However, none of that turns out to be necessary.
Remove the GSP-specific code and use the regular CHID allocation path.
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Though the initial upstreamed GSP-RM version in nouveau was 535.113.01,
the code was developed against earlier versions.
535.42.02 modified the mailbox value used by GSP-RM to signal shutdown
has completed, which was missed at the time.
I'm not aware of any issues caused by this, but noticed the bug while
working on GB20x support.
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timur Tabi <ttabi@nvidia.com>
Tested-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Some GSP RPC commands need a new reply policy: "caller don't care about
the message content but want to make sure a reply is received". To
support this case, a new reply policy is introduced.
NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY is a large GSP RPC command. The actual
required policy is NVKM_GSP_RPC_REPLY_POLL. This can be observed from the
dump of the GSP message queue. After the large GSP RPC command is issued,
GSP will write only an empty RPC header in the queue as the reply.
Without this change, the policy "receiving the entire message" is used
for NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY. This causes the timeout of receiving
the returned GSP message in the suspend/resume path.
Introduce the new reply policy NVKM_GSP_RPC_REPLY_POLL, which waits for
the returned GSP message but discards it for the caller. Use the new policy
NVKM_GSP_RPC_REPLY_POLL on the GSP RPC command
NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY.
Fixes: 50f290053d79 ("drm/nouveau: support handling the return of large GSP message")
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Alexandre Courbot <acourbot@nvidia.com>
Tested-by: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250227013554.8269-3-zhiw@nvidia.com
|
|
There can be multiple cases of handling the GSP RPC messages, which are
the reply of GSP RPC commands according to the requirement of the
callers and the nature of the GSP RPC commands.
The current supported reply policies are "callers don't care" and "receive
the entire message" according to the requirement of the callers. To
introduce a new policy, factor out the current RPC command reply polices.
Also, centralize the handling of the reply in a single function.
Factor out NVKM_GSP_RPC_REPLY_NOWAIT as "callers don't care" and
NVKM_GSP_RPC_REPLY_RECV as "receive the entire message". Introduce a
kernel doc to document the policies. Factor out
r535_gsp_rpc_handle_reply().
No functional change is intended for small GSP RPC commands. For large GSP
commands, the caller decides the policy of how to handle the returned GSP
RPC message.
Cc: Ben Skeggs <bskeggs@nvidia.com>
Cc: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250227013554.8269-2-zhiw@nvidia.com
|
|
If "rpc" is an error pointer then return directly. Otherwise it leads
to an error pointer dereference.
Fixes: 50f290053d79 ("drm/nouveau: support handling the return of large GSP message")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Acked-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/b7052ac0-98e4-433b-ad58-f563bf51858c@stanley.mountain
|
|
As the GSP message recv path is able to handle the return of large GSP
message, consume the return of large GSP message in the sending path.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-16-zhiw@nvidia.com
|
|
The max GSP message element size is 16 pages (including the headers). To
send a message larger than 16 pages, nvkm should split it into multiple
and send them accordingly. The first element has the expected function
number, while the rest are sent with function number as
NV_VGPU_MSG_FUNCTION_CONTINUATION_RECORD. GSP consumes the elements from
the cmdq and always writes the result back to the msgq. The result is also
formed as split elements.
However, nvkm is able to split the large GSP message and send them, but
totally not aware of handling the return of the large GSP message, which
are the split elements in the msgq. Thus, it keeps dumping the unknown RPC
messages from msgq, which is actually CONTINUATION_RECORD message,
discard them unexpectedly. Thus, the caller will not be able to consume
the result from GSP.
Introduce the handling of the return of large GSP message on the msgq path.
Slightly re-factor the low-level part of msg receiving routines. Merge the
split elements back into a large element before handling it to the upper
level. Thus, the upper-level of GSP RPC APIs don't need to be heavily
changed.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-15-zhiw@nvidia.com
|
|
Prepare for supporting receive the large GSP RPC message.
Factor out r535_gsp_msgq_recv_one_elem(). Fold its params into a data
structure of params. Move the allocation of the GSP RPC message to its
caller. Refine the variable names in the re-factor.
No functional change is intended.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-14-zhiw@nvidia.com
|
|
To receive a GSP message queue element from the GSP status queue, the
driver needs to make sure there are available elements in the queue.
The previous r535_gsp_msgq_wait() consists of three functions, which is
a little too complicated for a single function:
- wait for an available element.
- peek the message element header in the queue.
- recevice the element from the queue.
Factor out r535_gsp_msgq_peek() and divide the functions in
r535_gsp_msgq_wait() into three functions.
No functional change is intended.
Cc: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-13-zhiw@nvidia.com
|
|
Refine the name to align with the terms in the kernel doc.
No functional change is intended.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-12-zhiw@nvidia.com
|
|
The variable "msg" in r535_gsp_msg_recv() actually means the GSP RPC.
Refine the names to align with the terms in the kernel doc.
No functional change is intended.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-11-zhiw@nvidia.com
|
|
The variable names in r535_gsp_rpc_push() are quite confusing and some
of them are not representing what they really are.
Update the names and explanations in the decoder section of the
kernel doc. Refine the names to align with the terms in the kernel doc.
No functional change is intended.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-10-zhiw@nvidia.com
|
|
There has been a GSP_MSG_MAX_SIZE which represents the max size of a GSP
message element header. Use it instead of a magic number.
No functional change is intended.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-9-zhiw@nvidia.com
|
|
The macro GSP_MSG_MAX_SIZE refers to another macro that doesn't exist.
It represents the max GSP message element size.
Fix the broken marco so it can be used to replace some magic numbers in
the code.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-8-zhiw@nvidia.com
|
|
The name "argc" has different meanings in different functions.
To improve the readability, it's better to refine it to a name that
reflects what it represents.
Rename "argc" to what it represents. Add terms in the decoder section to
explain their meaning.
No functional change is intended.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
[ Fix indentation. - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-7-zhiw@nvidia.com
|
|
The name "argv" has different meanings in different functions.
To improve the readability, it's better to refine it to a name that
reflects what it represents.
Rename "argv" to what it represents. Wrap the long container_of() into
to_payload_header() to denote a clear meaning and make checkpatch.pl
happy.
No functional change is intended.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-6-zhiw@nvidia.com
|
|
The user of *rm_alloc_push() always pass 0 in repc.
Remove unused param repc since no user actually uses it.
No functional change is intended.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-5-zhiw@nvidia.com
|
|
The name "argv" has different meanings in different functions.
To improve the readability, it's better to refine it to a name that
reflects what it represents.
Rename "repc" to what it represents in the GSP message send path.
Wrap the long container_of() into to_gsp_hdr() to make checkpatch.pl
happy.
No functional change is intended.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-4-zhiw@nvidia.com
|
|
The name "repc" has different meanings in different contexts.
To improve the readability, it's better to refine it to a name that
reflects what it actually represents.
Rename "repc" to "gsp_rpc_len" in the GSP message recv path. Add an
section in the doc to explain the terms.
No functional change is intended.
Cc: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-3-zhiw@nvidia.com
|
|
In order to explain the name clean-ups in GSP RPC routines, a kernel
doc to explain the memory layout and terms is required.
Add a kernel doc to introduce the GSP RPC.
Cc: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
[ Fix bullet list indentation; add SPDX-License-Identifier. - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-2-zhiw@nvidia.com
|
|
Fix some malformed kernel-doc comments that were added in a recent commit.
Also, kernel-doc does not support global variables, so change those
kernel-doc comments into regular comments.
Fixes: 214c9539cf2f ("drm/nouveau: expose GSP-RM logging buffers via debugfs")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202412310834.jtCJj4oz-lkp@intel.com/
Signed-off-by: Timur Tabi <ttabi@nvidia.com>
Reviewed-by: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250108234329.842256-1-ttabi@nvidia.com
|
|
The LOGINIT, LOGINTR, LOGRM, and LOGPMU buffers are circular buffers
that have printf-like logs from GSP-RM and PMU encoded in them.
LOGINIT, LOGINTR, and LOGRM are allocated by Nouveau and their DMA
addresses are passed to GSP-RM during initialization. The buffers are
required for GSP-RM to initialize properly.
LOGPMU is also allocated by Nouveau, but its contents are updated
when Nouveau receives an NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT RPC from
GSP-RM. Nouveau then copies the RPC to the buffer.
The messages are encoded as an array of variable-length structures that
contain the parameters to an NV_PRINTF call. The format string and
parameter count are stored in a special ELF image that contains only
logging strings. This image is not currently shipped with the Nvidia
driver.
There are two methods to extract the logs.
OpenRM tries to load the logging ELF, and if present, parses the log
buffers in real time and outputs the strings to the kernel console.
Alternatively, and this is the method used by this patch, the buffers
can be exposed to user space, and a user-space tool (along with the
logging ELF image) can parse the buffer and dump the logs.
This method has the advantage that it allows the buffers to be parsed
even when the logging ELF file is not available to the user. However,
it has the disadvantage the debugfs entries need to remain until the
driver is unloaded.
The buffers are exposed via debugfs. If GSP-RM fails to initialize, then
Nouveau immediately shuts down the GSP interface. This would normally
also deallocate the logging buffers, thereby preventing the user from
capturing the debug logs.
To avoid this, introduce the keep-gsp-logging command line parameter. If
specified, and if at least one logging buffer has content, then Nouveau
will migrate these buffers into new debugfs entries that are retained
until the driver unloads.
An end-user can capture the logs using the following commands:
cp /sys/kernel/debug/nouveau/<path>/loginit loginit
cp /sys/kernel/debug/nouveau/<path>/logrm logrm
cp /sys/kernel/debug/nouveau/<path>/logintr logintr
cp /sys/kernel/debug/nouveau/<path>/logpmu logpmu
where (for a PCI device) <path> is the PCI ID of the GPU (e.g.
0000:65:00.0).
Since LOGPMU is not needed for normal GSP-RM operation, it is only
created if debugfs is available. Otherwise, the
NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT RPCs are ignored.
A simple way to test the buffer migration feature is to have
nvkm_gsp_init() return an error code.
Tested-by: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20241030202952.694055-2-ttabi@nvidia.com
|
|
Store the struct device pointer used to allocate the DMA buffer in
the nvkm_gsp_mem object. This allows nvkm_gsp_mem_dtor() to release
the buffer without needing the nvkm_gsp. This is needed so that
we can retain DMA buffers even after the nvkm_gsp object is deleted.
Signed-off-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20241030202952.694055-1-ttabi@nvidia.com
|
|
Kickstart 6.14 cycle.
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
|
r535_gsp_cmdq_push() waits for the available page in the GSP cmdq
buffer when handling a large RPC request. When it sees at least one
available page in the cmdq, it quits the waiting with the amount of
free buffer pages in the queue.
Unfortunately, it always takes the [write pointer, buf_size) as
available buffer pages before rolling back and wrongly calculates the
size of the data should be copied. Thus, it can overwrite the RPC
request that GSP is currently reading, which causes GSP hang due
to corrupted RPC request:
[ 549.209389] ------------[ cut here ]------------
[ 549.214010] WARNING: CPU: 8 PID: 6314 at drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c:116 r535_gsp_msgq_wait+0xd0/0x190 [nvkm]
[ 549.225678] Modules linked in: nvkm(E+) gsp_log(E) snd_seq_dummy(E) snd_hrtimer(E) snd_seq(E) snd_timer(E) snd_seq_device(E) snd(E) soundcore(E) rfkill(E) qrtr(E) vfat(E) fat(E) ipmi_ssif(E) amd_atl(E) intel_rapl_msr(E) intel_rapl_common(E) mlx5_ib(E) amd64_edac(E) edac_mce_amd(E) kvm_amd(E) ib_uverbs(E) kvm(E) ib_core(E) acpi_ipmi(E) ipmi_si(E) mxm_wmi(E) ipmi_devintf(E) rapl(E) i2c_piix4(E) wmi_bmof(E) joydev(E) ptdma(E) acpi_cpufreq(E) k10temp(E) pcspkr(E) ipmi_msghandler(E) xfs(E) libcrc32c(E) ast(E) i2c_algo_bit(E) crct10dif_pclmul(E) drm_shmem_helper(E) nvme_tcp(E) crc32_pclmul(E) ahci(E) drm_kms_helper(E) libahci(E) nvme_fabrics(E) crc32c_intel(E) nvme(E) cdc_ether(E) mlx5_core(E) nvme_core(E) usbnet(E) drm(E) libata(E) ccp(E) ghash_clmulni_intel(E) mii(E) t10_pi(E) mlxfw(E) sp5100_tco(E) psample(E) pci_hyperv_intf(E) wmi(E) dm_multipath(E) sunrpc(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) be2iscsi(E) bnx2i(E) cnic(E) uio(E) cxgb4i(E) cxgb4(E) tls(E) libcxgbi(E) libcxgb(E) qla4xxx(E)
[ 549.225752] iscsi_boot_sysfs(E) iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E) fuse(E) [last unloaded: gsp_log(E)]
[ 549.326293] CPU: 8 PID: 6314 Comm: insmod Tainted: G E 6.9.0-rc6+ #1
[ 549.334039] Hardware name: ASRockRack 1U1G-MILAN/N/ROMED8-NL, BIOS L3.12E 09/06/2022
[ 549.341781] RIP: 0010:r535_gsp_msgq_wait+0xd0/0x190 [nvkm]
[ 549.347343] Code: 08 00 00 89 da c1 e2 0c 48 8d ac 11 00 10 00 00 48 8b 0c 24 48 85 c9 74 1f c1 e0 0c 4c 8d 6d 30 83 e8 30 89 01 e9 68 ff ff ff <0f> 0b 49 c7 c5 92 ff ff ff e9 5a ff ff ff ba ff ff ff ff be c0 0c
[ 549.366090] RSP: 0018:ffffacbccaaeb7d0 EFLAGS: 00010246
[ 549.371315] RAX: 0000000000000000 RBX: 0000000000000012 RCX: 0000000000923e28
[ 549.378451] RDX: 0000000000000000 RSI: 0000000055555554 RDI: ffffacbccaaeb730
[ 549.385590] RBP: 0000000000000001 R08: ffff8bd14d235f70 R09: ffff8bd14d235f70
[ 549.392721] R10: 0000000000000002 R11: ffff8bd14d233864 R12: 0000000000000020
[ 549.399854] R13: ffffacbccaaeb818 R14: 0000000000000020 R15: ffff8bb298c67000
[ 549.406988] FS: 00007f5179244740(0000) GS:ffff8bd14d200000(0000) knlGS:0000000000000000
[ 549.415076] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 549.420829] CR2: 00007fa844000010 CR3: 00000001567dc005 CR4: 0000000000770ef0
[ 549.427963] PKRU: 55555554
[ 549.430672] Call Trace:
[ 549.433126] <TASK>
[ 549.435233] ? __warn+0x7f/0x130
[ 549.438473] ? r535_gsp_msgq_wait+0xd0/0x190 [nvkm]
[ 549.443426] ? report_bug+0x18a/0x1a0
[ 549.447098] ? handle_bug+0x3c/0x70
[ 549.450589] ? exc_invalid_op+0x14/0x70
[ 549.454430] ? asm_exc_invalid_op+0x16/0x20
[ 549.458619] ? r535_gsp_msgq_wait+0xd0/0x190 [nvkm]
[ 549.463565] r535_gsp_msg_recv+0x46/0x230 [nvkm]
[ 549.468257] r535_gsp_rpc_push+0x106/0x160 [nvkm]
[ 549.473033] r535_gsp_rpc_rm_ctrl_push+0x40/0x130 [nvkm]
[ 549.478422] nvidia_grid_init_vgpu_types+0xbc/0xe0 [nvkm]
[ 549.483899] nvidia_grid_init+0xb1/0xd0 [nvkm]
[ 549.488420] ? srso_alias_return_thunk+0x5/0xfbef5
[ 549.493213] nvkm_device_pci_probe+0x305/0x420 [nvkm]
[ 549.498338] local_pci_probe+0x46/0xa0
[ 549.502096] pci_call_probe+0x56/0x170
[ 549.505851] pci_device_probe+0x79/0xf0
[ 549.509690] ? driver_sysfs_add+0x59/0xc0
[ 549.513702] really_probe+0xd9/0x380
[ 549.517282] __driver_probe_device+0x78/0x150
[ 549.521640] driver_probe_device+0x1e/0x90
[ 549.525746] __driver_attach+0xd2/0x1c0
[ 549.529594] ? __pfx___driver_attach+0x10/0x10
[ 549.534045] bus_for_each_dev+0x78/0xd0
[ 549.537893] bus_add_driver+0x112/0x210
[ 549.541750] driver_register+0x5c/0x120
[ 549.545596] ? __pfx_nvkm_init+0x10/0x10 [nvkm]
[ 549.550224] do_one_initcall+0x44/0x300
[ 549.554063] ? do_init_module+0x23/0x240
[ 549.557989] do_init_module+0x64/0x240
Calculate the available buffer page before rolling back based on
the result from the waiting.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20241017071922.2518724-3-zhiw@nvidia.com
|
|
A GSP event message consists three parts: message header, RPC header,
message body. GSP calculates the number of pages to write from the
total size of a GSP message. This behavior can be observed from the
movement of the write pointer.
However, nvkm takes only the size of RPC header and message body as
the message size when advancing the read pointer. When handling a
two-page GSP message in the non rollback case, It wrongly takes the
message body of the previous message as the message header of the next
message. As the "message length" tends to be zero, in the calculation of
size needs to be copied (0 - size of (message header)), the size needs to
be copied will be "0xffffffxx". It also triggers a kernel panic due to a
NULL pointer error.
[ 547.614102] msg: 00000f90: ff ff ff ff ff ff ff ff 40 d7 18 fb 8b 00 00 00 ........@.......
[ 547.622533] msg: 00000fa0: 00 00 00 00 ff ff ff ff ff ff ff ff 00 00 00 00 ................
[ 547.630965] msg: 00000fb0: ff ff ff ff ff ff ff ff 00 00 00 00 ff ff ff ff ................
[ 547.639397] msg: 00000fc0: ff ff ff ff 00 00 00 00 ff ff ff ff ff ff ff ff ................
[ 547.647832] nvkm 0000:c1:00.0: gsp: peek msg rpc fn:0 len:0x0/0xffffffffffffffe0
[ 547.655225] nvkm 0000:c1:00.0: gsp: get msg rpc fn:0 len:0x0/0xffffffffffffffe0
[ 547.662532] BUG: kernel NULL pointer dereference, address: 0000000000000020
[ 547.669485] #PF: supervisor read access in kernel mode
[ 547.674624] #PF: error_code(0x0000) - not-present page
[ 547.679755] PGD 0 P4D 0
[ 547.682294] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 547.686643] CPU: 22 PID: 322 Comm: kworker/22:1 Tainted: G E 6.9.0-rc6+ #1
[ 547.694893] Hardware name: ASRockRack 1U1G-MILAN/N/ROMED8-NL, BIOS L3.12E 09/06/2022
[ 547.702626] Workqueue: events r535_gsp_msgq_work [nvkm]
[ 547.707921] RIP: 0010:r535_gsp_msg_recv+0x87/0x230 [nvkm]
[ 547.713375] Code: 00 8b 70 08 48 89 e1 31 d2 4c 89 f7 e8 12 f5 ff ff 48 89 c5 48 85 c0 0f 84 cf 00 00 00 48 81 fd 00 f0 ff ff 0f 87 c4 00 00 00 <8b> 55 10 41 8b 46 30 85 d2 0f 85 f6 00 00 00 83 f8 04 76 10 ba 05
[ 547.732119] RSP: 0018:ffffabe440f87e10 EFLAGS: 00010203
[ 547.737335] RAX: 0000000000000010 RBX: 0000000000000008 RCX: 000000000000003f
[ 547.744461] RDX: 0000000000000000 RSI: ffffabe4480a8030 RDI: 0000000000000010
[ 547.751585] RBP: 0000000000000010 R08: 0000000000000000 R09: ffffabe440f87bb0
[ 547.758707] R10: ffffabe440f87dc8 R11: 0000000000000010 R12: 0000000000000000
[ 547.765834] R13: 0000000000000000 R14: ffff9351df1e5000 R15: 0000000000000000
[ 547.772958] FS: 0000000000000000(0000) GS:ffff93708eb00000(0000) knlGS:0000000000000000
[ 547.781035] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 547.786771] CR2: 0000000000000020 CR3: 00000003cc220002 CR4: 0000000000770ef0
[ 547.793896] PKRU: 55555554
[ 547.796600] Call Trace:
[ 547.799046] <TASK>
[ 547.801152] ? __die+0x20/0x70
[ 547.804211] ? page_fault_oops+0x75/0x170
[ 547.808221] ? print_hex_dump+0x100/0x160
[ 547.812226] ? exc_page_fault+0x64/0x150
[ 547.816152] ? asm_exc_page_fault+0x22/0x30
[ 547.820341] ? r535_gsp_msg_recv+0x87/0x230 [nvkm]
[ 547.825184] r535_gsp_msgq_work+0x42/0x50 [nvkm]
[ 547.829845] process_one_work+0x196/0x3d0
[ 547.833861] worker_thread+0x2fc/0x410
[ 547.837613] ? __pfx_worker_thread+0x10/0x10
[ 547.841885] kthread+0xdf/0x110
[ 547.845031] ? __pfx_kthread+0x10/0x10
[ 547.848775] ret_from_fork+0x30/0x50
[ 547.852354] ? __pfx_kthread+0x10/0x10
[ 547.856097] ret_from_fork_asm+0x1a/0x30
[ 547.860019] </TASK>
[ 547.862208] Modules linked in: nvkm(E) gsp_log(E) snd_seq_dummy(E) snd_hrtimer(E) snd_seq(E) snd_timer(E) snd_seq_device(E) snd(E) soundcore(E) rfkill(E) qrtr(E) vfat(E) fat(E) ipmi_ssif(E) amd_atl(E) intel_rapl_msr(E) intel_rapl_common(E) amd64_edac(E) mlx5_ib(E) edac_mce_amd(E) kvm_amd(E) ib_uverbs(E) kvm(E) ib_core(E) acpi_ipmi(E) ipmi_si(E) ipmi_devintf(E) mxm_wmi(E) joydev(E) rapl(E) ptdma(E) i2c_piix4(E) acpi_cpufreq(E) wmi_bmof(E) pcspkr(E) k10temp(E) ipmi_msghandler(E) xfs(E) libcrc32c(E) ast(E) i2c_algo_bit(E) drm_shmem_helper(E) crct10dif_pclmul(E) drm_kms_helper(E) ahci(E) crc32_pclmul(E) nvme_tcp(E) libahci(E) nvme(E) crc32c_intel(E) nvme_fabrics(E) cdc_ether(E) nvme_core(E) usbnet(E) mlx5_core(E) ghash_clmulni_intel(E) drm(E) libata(E) ccp(E) mii(E) t10_pi(E) mlxfw(E) sp5100_tco(E) psample(E) pci_hyperv_intf(E) wmi(E) dm_multipath(E) sunrpc(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) be2iscsi(E) bnx2i(E) cnic(E) uio(E) cxgb4i(E) cxgb4(E) tls(E) libcxgbi(E) libcxgb(E) qla4xxx(E)
[ 547.862283] iscsi_boot_sysfs(E) iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E) fuse(E) [last unloaded: gsp_log(E)]
[ 547.962691] CR2: 0000000000000020
[ 547.966003] ---[ end trace 0000000000000000 ]---
[ 549.012012] clocksource: Long readout interval, skipping watchdog check: cs_nsec: 1370499158 wd_nsec: 1370498904
[ 549.043676] pstore: backend (erst) writing error (-28)
[ 549.050924] RIP: 0010:r535_gsp_msg_recv+0x87/0x230 [nvkm]
[ 549.056389] Code: 00 8b 70 08 48 89 e1 31 d2 4c 89 f7 e8 12 f5 ff ff 48 89 c5 48 85 c0 0f 84 cf 00 00 00 48 81 fd 00 f0 ff ff 0f 87 c4 00 00 00 <8b> 55 10 41 8b 46 30 85 d2 0f 85 f6 00 00 00 83 f8 04 76 10 ba 05
[ 549.075138] RSP: 0018:ffffabe440f87e10 EFLAGS: 00010203
[ 549.080361] RAX: 0000000000000010 RBX: 0000000000000008 RCX: 000000000000003f
[ 549.087484] RDX: 0000000000000000 RSI: ffffabe4480a8030 RDI: 0000000000000010
[ 549.094609] RBP: 0000000000000010 R08: 0000000000000000 R09: ffffabe440f87bb0
[ 549.101733] R10: ffffabe440f87dc8 R11: 0000000000000010 R12: 0000000000000000
[ 549.108857] R13: 0000000000000000 R14: ffff9351df1e5000 R15: 0000000000000000
[ 549.115982] FS: 0000000000000000(0000) GS:ffff93708eb00000(0000) knlGS:0000000000000000
[ 549.124061] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 549.129807] CR2: 0000000000000020 CR3: 00000003cc220002 CR4: 0000000000770ef0
[ 549.136940] PKRU: 55555554
[ 549.139653] Kernel panic - not syncing: Fatal exception
[ 549.145054] Kernel Offset: 0x18c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 549.165074] ---[ end Kernel panic - not syncing: Fatal exception ]---
Also, nvkm wrongly advances the read pointer when handling a two-page GSP
message in the rollback case. In the rollback case, the GSP message will
be copied in two rounds. When handling a two-page GSP message, nvkm first
copies amount of (GSP_PAGE_SIZE - header) data into the buffer, then
advances the read pointer by the result of DIV_ROUND_UP(size,
GSP_PAGE_SIZE). Thus, the read pointer is advanced by 1.
Next, nvkm copies the amount of (total size - (GSP_PAGE_SIZE -
header)) data into the buffer. The left amount of the data will be always
larger than one page since the message header is not taken into account
in the first copy. Thus, the read pointer is advanced by DIV_ROUND_UP(
size(larger than one page), GSP_PAGE_SIZE) = 2.
In the end, the read pointer is wrongly advanced by 3 when handling a
two-page GSP message in the rollback case.
Fix the problems by taking the total size of the message into account
when advancing the read pointer and calculate the read pointer in the end
of the all copies for the rollback case.
BTW: the two-page GSP message can be observed in the msgq when vGPU is
enabled.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20241017071922.2518724-2-zhiw@nvidia.com
|
|
The upper layer transfer functions expect EBUSY as a return
for when retries should be done.
Fix the AUX error translation, but also check for both errors
in a few places.
Fixes: eb284f4b3781 ("drm/nouveau/dp: Honor GSP link training retry timeouts")
Cc: stable@vger.kernel.org
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241111034126.2028401-1-airlied@gmail.com
|
|
This aligns with what open gpu does, the 0x15 hex is just to trick you.
Fixes: 176fdcbddfd2 ("drm/nouveau/gsp/r535: add support for booting GSP-RM")
Reviewed-by: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240828023720.1596602-1-airlied@gmail.com
|
|
Let's start the new release cycle.
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
|
Currently we allocate all 3 levels of radix3 page tables using
nvkm_gsp_mem_ctor(), which uses dma_alloc_coherent() for allocating all of
the relevant memory. This can end up failing in scenarios where the system
has very high memory fragmentation, and we can't find enough contiguous
memory to allocate level 2 of the page table.
Currently, this can result in runtime PM issues on systems where memory
fragmentation is high - as we'll fail to allocate the page table for our
suspend/resume buffer:
kworker/10:2: page allocation failure: order:7, mode:0xcc0(GFP_KERNEL),
nodemask=(null),cpuset=/,mems_allowed=0
CPU: 10 PID: 479809 Comm: kworker/10:2 Not tainted
6.8.6-201.ChopperV6.fc39.x86_64 #1
Hardware name: SLIMBOOK Executive/Executive, BIOS N.1.10GRU06 02/02/2024
Workqueue: pm pm_runtime_work
Call Trace:
<TASK>
dump_stack_lvl+0x64/0x80
warn_alloc+0x165/0x1e0
? __alloc_pages_direct_compact+0xb3/0x2b0
__alloc_pages_slowpath.constprop.0+0xd7d/0xde0
__alloc_pages+0x32d/0x350
__dma_direct_alloc_pages.isra.0+0x16a/0x2b0
dma_direct_alloc+0x70/0x270
nvkm_gsp_radix3_sg+0x5e/0x130 [nouveau]
r535_gsp_fini+0x1d4/0x350 [nouveau]
nvkm_subdev_fini+0x67/0x150 [nouveau]
nvkm_device_fini+0x95/0x1e0 [nouveau]
nvkm_udevice_fini+0x53/0x70 [nouveau]
nvkm_object_fini+0xb9/0x240 [nouveau]
nvkm_object_fini+0x75/0x240 [nouveau]
nouveau_do_suspend+0xf5/0x280 [nouveau]
nouveau_pmops_runtime_suspend+0x3e/0xb0 [nouveau]
pci_pm_runtime_suspend+0x67/0x1e0
? __pfx_pci_pm_runtime_suspend+0x10/0x10
__rpm_callback+0x41/0x170
? __pfx_pci_pm_runtime_suspend+0x10/0x10
rpm_callback+0x5d/0x70
? __pfx_pci_pm_runtime_suspend+0x10/0x10
rpm_suspend+0x120/0x6a0
pm_runtime_work+0x98/0xb0
process_one_work+0x171/0x340
worker_thread+0x27b/0x3a0
? __pfx_worker_thread+0x10/0x10
kthread+0xe5/0x120
? __pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
Luckily, we don't actually need to allocate coherent memory for the page
table thanks to being able to pass the GPU a radix3 page table for
suspend/resume data. So, let's rewrite nvkm_gsp_radix3_sg() to use the sg
allocator for level 2. We continue using coherent allocations for lvl0 and
1, since they only take a single page.
V2:
* Don't forget to actually jump to the next scatterlist when we reach the
end of the scatterlist we're currently on when writing out the page table
for level 2
Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Ben Skeggs <bskeggs@nvidia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-2-lyude@redhat.com
|
|
Add the missing break statement that causes the following build error
CC [M] drivers/gpu/drm/i915/display/intel_display_device.o
../drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c: In function ‘build_registry’:
../drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c:1266:3: error: label at end of compound statement
1266 | default:
| ^~~~~~~
CC [M] drivers/gpu/drm/amd/amdgpu/gfx_v10_0.o
HDRTEST drivers/gpu/drm/xe/compat-i915-headers/i915_reg.h
CC [M] drivers/gpu/drm/amd/amdgpu/imu_v11_0.o
make[7]: *** [../scripts/Makefile.build:244: drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.o] Error 1
make[7]: *** Waiting for unfinished jobs....
Fixes: b58a0bc904ff ("nouveau: add command-line GSP-RM registry support")
Closes: https://lore.kernel.org/all/913052ca6c0988db1bab293cfae38529251b4594.camel@nvidia.com/T/#m3c9acebac754f2e74a85b76c858c093bb1aacaf0
Closes: https://lore.kernel.org/all/CA+G9fYu7Ug0K8h9QJT0WbtWh_LL9Juc+VC0WMU_Z_vSSPDNymg@mail.gmail.com/
Tested-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240430131840.742924-1-chaitanya.kumar.borah@intel.com
|
|
Add the NVreg_RegistryDwords command line parameter, which allows
specifying additional registry keys to be sent to GSP-RM. This
allows additional configuration, debugging, and experimentation
with GSP-RM, which uses these keys to alter its behavior.
Note that these keys are passed as-is to GSP-RM, and Nouveau does
not parse them. This is in contrast to the Nvidia driver, which may
parse some of the keys to configure some functionality in concert with
GSP-RM. Therefore, any keys which also require action by the driver
may not function correctly when passed by Nouveau. Caveat emptor.
The name and format of NVreg_RegistryDwords is the same as used by
the Nvidia driver, to maintain compatibility.
Signed-off-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240417215317.3490856-1-ttabi@nvidia.com
|
|
Using the end of rpc->entries[] for addressing runs into both compile-time
and run-time detection of accessing beyond the end of the array. Use the
base pointer instead, since was allocated with the additional bytes for
storing the strings. Avoids the following warning in future GCC releases
with support for __counted_by:
In function 'fortify_memcpy_chk',
inlined from 'r535_gsp_rpc_set_registry' at ../drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c:1123:3:
../include/linux/fortify-string.h:553:25: error: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning]
553 | __write_overflow_field(p_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
for this code:
strings = (char *)&rpc->entries[NV_GSP_REG_NUM_ENTRIES];
...
memcpy(strings, r535_registry_entries[i].name, name_len);
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240330141159.work.063-kees@kernel.org
|
|
Pull drm fixes from Dave Airlie:
"Fixes from the last week (or 3 weeks in amdgpu case), after amdgpu,
it's xe and nouveau then a few scattered core fixes.
core:
- fix rounding in drm_fixp2int_round()
bridge:
- fix documentation for DRM_BRIDGE_OP_EDID
sun4i:
- fix 64-bit division on 32-bit architectures
tests:
- fix dependency on DRM_KMS_HELPER
probe-helper:
- never return negative values from .get_modes() plus driver fixes
xe:
- invalidate userptr vma on page pin fault
- fail early on sysfs file creation error
- skip VMA pinning on xe_exec if no batches
nouveau:
- clear bo resource bus after eviction
- documentation fixes
- don't check devinit disable on GSP
amdgpu:
- Freesync fixes
- UAF IOCTL fixes
- Fix mmhub client ID mapping
- IH 7.0 fix
- DML2 fixes
- VCN 4.0.6 fix
- GART bind fix
- GPU reset fix
- SR-IOV fix
- OD table handling fixes
- Fix TA handling on boards without display hardware
- DML1 fix
- ABM fix
- eDP panel fix
- DPPCLK fix
- HDCP fix
- Revert incorrect error case handling in ioremap
- VPE fix
- HDMI fixes
- SDMA 4.4.2 fix
- Other misc fixes
amdkfd:
- Fix duplicate BO handling in process restore"
* tag 'drm-next-2024-03-22' of https://gitlab.freedesktop.org/drm/kernel: (50 commits)
drm/amdgpu/pm: Don't use OD table on Arcturus
drm/amdgpu: drop setting buffer funcs in sdma442
drm/amd/display: Fix noise issue on HDMI AV mute
drm/amd/display: Revert Remove pixle rate limit for subvp
Revert "drm/amdgpu/vpe: don't emit cond exec command under collaborate mode"
Revert "drm/amd/amdgpu: Fix potential ioremap() memory leaks in amdgpu_device_init()"
drm/amd/display: Add a dc_state NULL check in dc_state_release
drm/amd/display: Return the correct HDCP error code
drm/amd/display: Implement wait_for_odm_update_pending_complete
drm/amd/display: Lock all enabled otg pipes even with no planes
drm/amd/display: Amend coasting vtotal for replay low hz
drm/amd/display: Fix idle check for shared firmware state
drm/amd/display: Update odm when ODM combine is changed on an otg master pipe with no plane
drm/amd/display: Init DPPCLK from SMU on dcn32
drm/amd/display: Add monitor patch for specific eDP
drm/amd/display: Allow dirty rects to be sent to dmub when abm is active
drm/amd/display: Override min required DCFCLK in dml1_validate
drm/amdgpu: Bypass display ta if display hw is not available
drm/amdgpu: correct the KGQ fallback message
drm/amdgpu/pm: Check the validity of overdiver power limit
...
|
|
kernel test robot complains about missing kerneldoc entries:
drivers-gpu-drm-nouveau-nvkm-subdev-gsp-r535.c:warning:
Function-parameter-or-struct-member-gsp-not-described-in-nvkm_gsp_radix3_sg
Signed-off-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240210002900.148982-1-ttabi@nvidia.com
|
|
Nouveau deallocates a few buffers post GPU init which are required for GPU suspend/resume to function correctly.
This is likely not as big an issue on systems where the NVGPU is the only GPU, but on multi-GPU set ups it leads to a regression where the kernel module errors and results in a system-wide rendering freeze.
This commit addresses that regression by moving the two buffers required for suspend and resume to be deallocated at driver unload instead of post init.
Fixes: 042b5f83841fb ("drm/nouveau: fix several DMA buffer leaks")
Signed-off-by: Sid Pranjale <sidpranjale127@protonmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Turing and Ampere will continue to use the old paths by default,
but we should allow distros to decide what the policy is.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240214040632.661069-1-airlied@gmail.com
|
|
Function nvkm_gsp_radix3_sg() uses nvkm_gsp_mem objects to allocate the
radix3 tables, but it unnecessarily creates those objects manually
instead of using the standard nvkm_gsp_mem_ctor() function like the
rest of the code does.
Signed-off-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240202230608.1981026-2-ttabi@nvidia.com
|
|
Nouveau manages GSP-RM DMA buffers with nvkm_gsp_mem objects. Several of
these buffers are never dealloced. Some of them can be deallocated
right after GSP-RM is initialized, but the rest need to stay until the
driver unloads.
Also futher bullet-proof these objects by poisoning the buffer and
clearing the nvkm_gsp_mem object when it is deallocated. Poisoning
the buffer should trigger an error (or crash) from GSP-RM if it tries
to access the buffer after we've deallocated it, because we were wrong
about when it is safe to deallocate.
Finally, change the mem->size field to a size_t because that's the same
type that dma_alloc_coherent expects.
Cc: <stable@vger.kernel.org> # v6.7
Fixes: 176fdcbddfd2 ("drm/nouveau/gsp/r535: add support for booting GSP-RM")
Signed-off-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240202230608.1981026-1-ttabi@nvidia.com
|
|
Timur pointed this out before, and it just slipped my mind,
but this might help some things work better, around pcie power
management.
Cc: <stable@vger.kernel.org> # v6.7
Fixes: 8d55b0a940bb ("nouveau/gsp: add some basic registry entries.")
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240130032643.2498315-1-airlied@gmail.com
|
|
Pull drm fixes from Dave Airlie:
"This is just a wrap up of fixes from the last few days. It has the
proper fix to the i915/xe collision, we can clean up what you did
later once rc1 lands.
Otherwise it's a few other i915, a v3d, rockchip and a nouveau fix to
make GSP load on some original Turing GPUs.
i915:
- Fixes for kernel-doc warnings enforced in linux-next
- Another build warning fix for string formatting of intel_wakeref_t
- Display fixes for DP DSC BPC and C20 PLL state verification
v3d:
- register readout fix
rockchip:
- two build warning fixes
nouveau:
- fix GSP loading on Turing with different nvdec configuration"
* tag 'drm-next-2024-01-15-1' of git://anongit.freedesktop.org/drm/drm:
nouveau/gsp: handle engines in runl without nonstall interrupts.
drm/i915/perf: reconcile Excess struct member kernel-doc warnings
drm/i915/guc: reconcile Excess struct member kernel-doc warnings
drm/i915/gt: reconcile Excess struct member kernel-doc warnings
drm/i915/gem: reconcile Excess struct member kernel-doc warnings
drm/i915/dp: Fix the max DSC bpc supported by source
drm/i915: don't make assumptions about intel_wakeref_t type
drm/i915/dp: Fix the PSR debugfs entries wrt. MST connectors
drm/i915/display: Fix C20 pll selection for state verification
drm/v3d: Fix support for register debugging on the RPi 4
drm/rockchip: vop2: Drop unused if_dclk_rate variable
drm/rockchip: vop2: Drop superfluous include
|
|
It appears on TU106 GPUs (2070), that some of the nvdec engines
are in the runlist but have no valid nonstall interrupt, nouveau
didn't handle that too well.
This should let nouveau/gsp work on those.
Cc: stable@vger.kernel.org # v6.7+
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://lore.kernel.org/all/20240110011826.3996289-1-airlied@gmail.com/
|
|
Fixes a memory leak seen with kmemleak.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231222043308.3090089-10-airlied@gmail.com
|
|
It looks like for some messages the upper layers need to get access to the
results of the message so we can interpret it.
Rework the ctrl push interface to not free things and cleanup properly
whereever it errors out.
Requested-by: Lyude
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231222043308.3090089-9-airlied@gmail.com
|
|
This should let the upper layers retry as needed on EAGAIN.
There may be other values we will care about in the future, but
this covers our present needs.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231222043308.3090089-8-airlied@gmail.com
|
|
Currently we get an error from ACPI because both of these arguments expect
a single argument, and we don't provide one. I'm not totally clear on what
that argument does, but we're able to find the missing value from
_acpiCacheMethodData() in src/kernel/platform/acpi_common.c in nvidia's
driver. So, let's add that - which doesn't get eDP displays to power on
quite yet, but gets rid of the argument warning at least.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231222043308.3090089-7-airlied@gmail.com
|