Age | Commit message (Collapse) | Author |
|
The encode_dma() function has some validation on in_trans->size but it
would be more clear to move those checks to find_and_map_user_pages().
The encode_dma() had two checks:
if (in_trans->addr + in_trans->size < in_trans->addr || !in_trans->size)
return -EINVAL;
The in_trans->addr variable is the starting address. The in_trans->size
variable is the total size of the transfer. The transfer can occur in
parts and the resources->xferred_dma_size tracks how many bytes we have
already transferred.
This patch introduces a new variable "remaining" which represents the
amount we want to transfer (in_trans->size) minus the amount we have
already transferred (resources->xferred_dma_size).
I have modified the check for if in_trans->size is zero to instead check
if in_trans->size is less than resources->xferred_dma_size. If we have
already transferred more bytes than in_trans->size then there are negative
bytes remaining which doesn't make sense. If there are zero bytes
remaining to be copied, just return success.
The check in encode_dma() checked that "addr + size" could not overflow
and barring a driver bug that should work, but it's easier to check if
we do this in parts. First check that "in_trans->addr +
resources->xferred_dma_size" is safe. Then check that "xfer_start_addr +
remaining" is safe.
My final concern was that we are dealing with u64 values but on 32bit
systems the kmalloc() function will truncate the sizes to 32 bits. So
I calculated "total = in_trans->size + offset_in_page(xfer_start_addr);"
and returned -EINVAL if it were >= SIZE_MAX. This will not affect 64bit
systems.
Fixes: 129776ac2e38 ("accel/qaic: Add control path")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/24d3348b-25ac-4c1b-b171-9dae7c43e4e0@moroto.mountain
|
|
The temporary buffer storing slicing configuration data from user is only
freed on error. This is a memory leak. Free the buffer unconditionally.
Fixes: ff13be830333 ("accel/qaic: Add datapath")
Signed-off-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230802145937.14827-1-quic_jhugo@quicinc.com
|
|
Pull virtio fixes from Michael Tsirkin:
"Just a bunch of bugfixes all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (26 commits)
virtio-mem: check if the config changed before fake offlining memory
virtio-mem: keep retrying on offline_and_remove_memory() errors in Sub Block Mode (SBM)
virtio-mem: convert most offline_and_remove_memory() errors to -EBUSY
virtio-mem: remove unsafe unplug in Big Block Mode (BBM)
pds_vdpa: fix up debugfs feature bit printing
pds_vdpa: alloc irq vectors on DRIVER_OK
pds_vdpa: clean and reset vqs entries
pds_vdpa: always allow offering VIRTIO_NET_F_MAC
pds_vdpa: reset to vdpa specified mac
virtio-net: Zero max_tx_vq field for VIRTIO_NET_CTRL_MQ_HASH_CONFIG case
vdpa/mlx5: Fix crash on shutdown for when no ndev exists
vdpa/mlx5: Delete control vq iotlb in destroy_mr only when necessary
vdpa/mlx5: Fix mr->initialized semantics
vdpa/mlx5: Correct default number of queues when MQ is on
virtio-vdpa: Fix cpumask memory leak in virtio_vdpa_find_vqs()
vduse: Use proper spinlock for IRQ injection
vdpa: Enable strict validation for netlinks ops
vdpa: Add max vqp attr to vdpa_nl_policy for nlattr length check
vdpa: Add queue index attr to vdpa_nl_policy for nlattr length check
vdpa: Add features attr to vdpa_nl_policy for nlattr length check
...
|
|
David Vernet says:
====================
The struct bpf_struct_ops structure in BPF is a framework that allows
subsystems to extend themselves using BPF. In commit 68b04864ca425
("bpf: Create links for BPF struct_ops maps") and commit aef56f2e918bf
("bpf: Update the struct_ops of a bpf_link"), the structure was updated
to include new ->validate() and ->update() callbacks respectively in
support of allowing struct_ops maps to be created with BPF_F_LINK.
The intention was that struct bpf_struct_ops implementations could
support map updates through the link. Because map validation and
registration would take place in two separate steps for struct_ops
maps managed by the link (the first in map update elem, and the latter
in link create), the ->validate() callback was added, and any struct_ops
implementation that wished to use BPF_F_LINK, even just for lifetime
management, would then be required to define both it and ->update().
Not all struct_ops implementations can or will support update, however.
For example, the sched_ext struct_ops implementation proposed in [0]
will not be able to support atomic map updates because it can race with
sysrq, has to cycle tasks through various states in order to safely
transition, etc. It can, however, benefit from letting the BPF link
automatically evict the struc_ops map when the application exits (e.g.
if it crashes).
This patch set therefore:
1. Updates the struct_ops implementation to support default values for
->validate() and ->update() so that struct_ops implementations can
benefit from BPF_F_LINK management even if they can't support
updates.
2. Documents struct bpf_struct_ops so that the semantics are clear and
well defined.
---
v2: https://lore.kernel.org/bpf/0f5ea3de-c6e7-490f-b5ec-b5c7cd288687@gmail.com/T/
Changes from v2 -> v3:
- Add patch 2/2 that documents the struct bpf_struct_ops structure.
- Add Kui-Feng's Acked-by tag to patch 1/2.
v1: https://lore.kernel.org/lkml/20230811150934.GA542801@maniforge/
Changes from v1 -> v2:
- Move the if (!st_map->st_ops->update) check outside of the critical
section before we acquire the update_mutex.
====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Subsystems that want to implement a struct bpf_struct_ops structure to
enable struct_ops maps must currently reverse engineer how the structure
works. Given that this is meant to be a way for subsystem maintainers to
extend their subsystems using BPF, let's document it to make it a bit
easier on them.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230814185908.700553-3-void@manifault.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Currently, if a struct_ops map is loaded with BPF_F_LINK, it must also
define the .validate() and .update() callbacks in its corresponding
struct bpf_struct_ops in the kernel. Enabling struct_ops link is useful
in its own right to ensure that the map is unloaded if an application
crashes. For example, with sched_ext, we want to automatically unload
the host-wide scheduler if the application crashes. We would likely
never support updating elements of a sched_ext struct_ops map, so we'd
have to implement these callbacks showing that they _can't_ support
element updates just to benefit from the basic lifetime management of
struct_ops links.
Let's enable struct_ops maps to work with BPF_F_LINK even if they
haven't defined these callbacks, by assuming that a struct_ops map
element cannot be updated by default.
Acked-by: Kui-Feng Lee <thinker.li@gmail.com>
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230814185908.700553-2-void@manifault.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Add several new tcx test cases to improve test coverage. This also includes
a few new tests with ingress instead of clsact qdisc, to cover the fix from
commit dc644b540a2d ("tcx: Fix splat in ingress_destroy upon tcx_entry_free").
# ./test_progs -t tc
[...]
#234 tc_links_after:OK
#235 tc_links_append:OK
#236 tc_links_basic:OK
#237 tc_links_before:OK
#238 tc_links_chain_classic:OK
#239 tc_links_chain_mixed:OK
#240 tc_links_dev_cleanup:OK
#241 tc_links_dev_mixed:OK
#242 tc_links_ingress:OK
#243 tc_links_invalid:OK
#244 tc_links_prepend:OK
#245 tc_links_replace:OK
#246 tc_links_revision:OK
#247 tc_opts_after:OK
#248 tc_opts_append:OK
#249 tc_opts_basic:OK
#250 tc_opts_before:OK
#251 tc_opts_chain_classic:OK
#252 tc_opts_chain_mixed:OK
#253 tc_opts_delete_empty:OK
#254 tc_opts_demixed:OK
#255 tc_opts_detach:OK
#256 tc_opts_detach_after:OK
#257 tc_opts_detach_before:OK
#258 tc_opts_dev_cleanup:OK
#259 tc_opts_invalid:OK
#260 tc_opts_mixed:OK
#261 tc_opts_prepend:OK
#262 tc_opts_replace:OK
#263 tc_opts_revision:OK
[...]
Summary: 44/38 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/8699efc284b75ccdc51ddf7062fa2370330dc6c0.1692029283.git.daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
The failure handling procedure destroys page pools for all queues,
including those that haven't had their page pool created yet. this patch
introduces necessary adjustments to prevent potential risks and
inconsistency with the error handling behavior.
Fixes: 0ebab78cbcbf ("net: veth: add page_pool for page recycling")
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
Link: https://lore.kernel.org/r/20230812023016.10553-1-liangchen.linux@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Michal Schmidt says:
====================
octeon_ep: fixes for error and remove paths
I have an Octeon card that's misconfigured in a way that exposes a
couple of bugs in the octeon_ep driver's error paths. It can reproduce
the issues that patches 1 & 4 are fixing. Patches 2 & 3 are a result of
reviewing the nearby code.
====================
Link: https://lore.kernel.org/r/20230810150114.107765-1-mschmidt@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If it fails to get the devices's MAC address, octep_probe exits while
leaving the delayed work intr_poll_task queued. When the work later
runs, it's a use after free.
Move the cancelation of intr_poll_task from octep_remove into
octep_device_cleanup. This does not change anything in the octep_remove
flow, but octep_device_cleanup is called also in the octep_probe error
path, where the cancelation is needed.
Note that the cancelation of ctrl_mbox_task has to follow
intr_poll_task's, because the ctrl_mbox_task may be queued by
intr_poll_task.
Fixes: 24d4333233b3 ("octeon_ep: poll for control messages")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Link: https://lore.kernel.org/r/20230810150114.107765-5-mschmidt@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
intr_poll_task may queue ctrl_mbox_task. The function
octep_poll_non_ioq_interrupts_cn93_pf does this.
When removing the driver and canceling these two works, cancel
ctrl_mbox_task last to guarantee it does not run anymore.
Fixes: 24d4333233b3 ("octeon_ep: poll for control messages")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Link: https://lore.kernel.org/r/20230810150114.107765-4-mschmidt@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
tx_timeout_task is canceled too early when removing the driver. Nothing
prevents .ndo_tx_timeout from triggering and queuing the work again.
Better cancel it after the netdev is unregistered.
It's harmless for octep_tx_timeout_task to run in the window between the
unregistration and cancelation, because it checks netif_running.
Fixes: 862cd659a6fb ("octeon_ep: Add driver framework and device initialization")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Link: https://lore.kernel.org/r/20230810150114.107765-3-mschmidt@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The intention was to wait up to 500 ms for the mbox response.
The third argument to wait_event_interruptible_timeout() is supposed to
be the timeout duration. The driver mistakenly passed absolute time
instead.
Fixes: 577f0d1b1c5f ("octeon_ep: add separate mailbox command and response queues")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230810150114.107765-2-mschmidt@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
On Zynq UltraScale+ MPSoC ubuntu platform when systemctl issues suspend,
network manager bring down the interface and goes into suspend. When it
wakes up it again enables the interface.
This leads to xilinx-psgtr "PLL lock timeout" on interface bringup, as
the power management controller power down the entire FPD (including
SERDES) if none of the FPD devices are in use and serdes is not
initialized on resume.
$ sudo rtcwake -m no -s 120 -v
$ sudo systemctl suspend <this does ifconfig eth1 down>
$ ifconfig eth1 up
xilinx-psgtr fd400000.phy: lane 0 (type 10, protocol 5): PLL lock timeout
phy phy-fd400000.phy.0: phy poweron failed --> -110
macb driver is called in this way:
1. macb_close: Stop network interface. In this function, it
reset MACB IP and disables PHY and network interface.
2. macb_suspend: It is called in kernel suspend flow. But because
network interface has been disabled(netif_running(ndev) is
false), it does nothing and returns directly;
3. System goes into suspend state. Some time later, system is
waken up by RTC wakeup device;
4. macb_resume: It does nothing because network interface has
been disabled;
5. macb_open: It is called to enable network interface again. ethernet
interface is initialized in this API but serdes which is power-off
by PMUFW during FPD-off suspend is not initialized again and so
we hit GT PLL lock issue on open.
To resolve this PLL timeout issue always do PS GTR initialization
when ethernet device is configured as non-wakeup source.
Fixes: f22bd29ba19a ("net: macb: Fix ZynqMP SGMII non-wakeup source resume failure")
Fixes: 8b73fa3ae02b ("net: macb: Added ZynqMP-specific initialization")
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Link: https://lore.kernel.org/r/1691414091-2260697-1-git-send-email-radhey.shyam.pandey@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a phylink_get_caps implementation for Marvell 88e6060 DSA switch.
This is a fast ethernet switch, with internal PHYs for ports 0 through
4. Port 4 also supports MII, REVMII, REVRMII and SNI. Port 5 supports
MII, REVMII, REVRMII and SNI without an internal PHY.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/E1qUkx7-003dMX-9b@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Whenever mlx5 driver is probed or reloaded, it queries some capabilities
in MAX mode via set_hca_cap() API. Afterwards, the driver queries all
capabilities in MAX mode via mlx5_query_hca_caps() API.
Since MAX caps are read only caps, querying them twice is redundant.
Hence, delete the second query.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Each device cap has two modes: MAX and CUR. The driver maintains a
cache of both modes of the capabilities. For most device caps, the MAX
cap mode is never used.
Hence, remove all driver queries of the MAX mode of the said caps as
well as their helper MACROs.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
mlx5 driver queries the device for VECTOR_CALC and SHAMPO caps, but
there isn't any user who requires them.
As well as, MLX5_MCAM_REGS_0x9080_0x90FF is queried but not used.
Thus, drop all usages and definitions of the mentioned caps above.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
sw_function_id contains sfnum, so fix the error message to name the
value properly.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Since mlx5_vhca_event_supported() is called in mlx5_sf_dev_supported(),
remove the redundant call.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
MLX5_CAP_GEN()
There is a helper called mlx5_sf_start_function_id() that
wraps up a query to get base SF function id. Use that instead of
calling MLX5_CAP_GEN() directly.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Since mlx5_sf_supported() check is done as a first thing in
mlx5_sf_max_functions(), remove the redundant check.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Instead of using device_put(), use auxiliary_device_uninit() for
auxiliary device uninit which internally just calls device_put().
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Firmware doesn't allow flow rules in FDB to do header rewrite and send
packets to both internal and uplink vports. The following syndrome
will be generated when trying to offload such kind of rules:
mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 23569): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x8c8f08), err(-22)
To avoid this syndrome, add a checking before creating FTE. If a rule
with header rewrite action forwards packets to both VF and PF, an
error is returned directly.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Even if the PF driver had no error on his part of the sync reset flow,
the firmware can see wider picture as it syncs all the PFs in the flow.
So add at end of sync reset flow check with firmware by reading MFRL
register and initialization segment that the flow had no issue from
firmware point of view too.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Introduce devlink resource for exposing max possible SFs on mlx5
devices.
For example:
$ devlink resource show pci/0000:00:0b.0
pci/0000:00:0b.0:
name max_local_SFs size 5 unit entry dpipe_tables none
name max_external_SFs size 0 unit entry dpipe_tables none
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
A new check for the tx devlink health reporter is introduced for
determining when the PTP port timestamping SQ is considered unhealthy. If
there are enough CQEs considered never to be delivered, the space that can
be utilized on the SQ decreases significantly, impacting performance and
usability of the SQ. The health reporter is triggered when the number of
likely never delivered port timestamping CQEs that utilize the space of the
PTP SQ is greater than 93.75% of the total capacity of the SQ. A devlink
health reporter recover method is also provided for this specific TX error
context that restarts the PTP SQ.
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Use a map structure for associating CQEs containing port timestamping
information with the appropriate skb. Track order of WQEs submitted using a
FIFO. Check if the corresponding port timestamping CQEs from the lookup
values in the FIFO are considered dropped due to time elapsed. Return the
lookup value to a freelist after consuming the skb. Reuse the freed lookup
in future WQE submission iterations.
The map structure uses an integer identifier for the key and returns an skb
corresponding to that identifier. Embed the integer identifier in the WQE
submitted to the WQ for the transmit path when the SQ is a PTP (port
timestamping) SQ. The embedded identifier can then be queried using a field
in the CQE of the corresponding port timestamping CQ. In the port
timestamping napi_poll context, the identifier is queried from the CQE
polled from CQ and used to lookup the corresponding skb from the WQE submit
path. The skb reference is removed from map and then embedded with the port
HW timestamp information from the CQE and eventually consumed.
The metadata freelist FIFO is an array containing integer identifiers that
can be pushed and popped in the FIFO. The purpose of this structure is
bookkeeping what identifier values can safely be used in a subsequent WQE
submission and should not contain identifiers that have still not been
reaped by processing a corresponding CQE completion on the port
timestamping CQ.
The ts_cqe_pending_list structure is a combination of an array and linked
list. The array is pre-populated with the nodes that will be added and
removed from the head of the linked list. Each node contains the unique
identifier value associated with the values submitted in the WQEs and
retrieved in the port timestamping CQEs. When a WQE is submitted, the node
in the array corresponding to the identifier popped from the metadata
freelist is added to the end of the CQE pending list and is marked as
"in-use". The node is removed from the linked list under two conditions.
The first condition is that the corresponding port timestamping CQE is
polled in the PTP napi_poll context. The second condition is that more than
a second has elapsed since the DMA timestamp value corresponding to the WQE
submission. When the first condition occurs, the "in-use" bit in the linked
list node is cleared, and the resources corresponding to the WQE submission
are then released. The second condition, however, indicates that the port
timestamping CQE will likely never be delivered. It's not impossible for
the device to post a CQE after an infinite amount of time though highly
improbable. In order to be resilient to this improbable case, resources
related to the corresponding WQE submission are still kept, the identifier
value is not returned to the freelist, and the "in-use" bit is cleared on
the node to indicate that it's no longer part of the linked list of "likely
to be delivered" port timestamping CQE identifiers. A count for the number
of port timestamping CQEs considered highly likely to never be delivered by
the device is maintained. This count gets decremented in the unlikely event
a port timestamping CQE considered unlikely to ever be delivered is polled
in the PTP napi_poll context.
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
De-duplicate documentation by removing mellanox/mlx5/devlink.rst. Instead,
only use the generic devlink documentation directory to document mlx5
devlink parameters. Avoid providing general devlink tool usage information
in mlx5-specific documentation.
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
svc_tcp_sendmsg used to factor in the xdr->page_base when sending pages,
but commit 5df5dd03a8f7 ("sunrpc: Use sendmsg(MSG_SPLICE_PAGES) rather
then sendpage") dropped that part of the handling. Fix it by setting
the bv_offset of the first bvec.
Fixes: 5df5dd03a8f7 ("sunrpc: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpage")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Jiri Pirko says:
====================
devlink: introduce selective dumps
Motivation:
For SFs, one devlink instance per SF is created. There might be
thousands of these on a single host. When a user needs to know port
handle for specific SF, he needs to dump all devlink ports on the host
which does not scale good.
Solution:
Allow user to pass devlink handle (and possibly other attributes)
alongside the dump command and dump only objects which are matching
the selection.
Use split ops to generate policies for dump callbacks acccording to
the attributes used for selection.
The userspace can use ctrl genetlink GET_POLICY command to find out if
the selective dumps are supported by kernel for particular command.
Example:
$ devlink port show
auxiliary/mlx5_core.eth.0/65535: type eth netdev eth2 flavour physical port 0 splittable false
auxiliary/mlx5_core.eth.1/131071: type eth netdev eth3 flavour physical port 1 splittable false
$ devlink port show auxiliary/mlx5_core.eth.0
auxiliary/mlx5_core.eth.0/65535: type eth netdev eth2 flavour physical port 0 splittable false
$ devlink port show auxiliary/mlx5_core.eth.1
auxiliary/mlx5_core.eth.1/131071: type eth netdev eth3 flavour physical port 1 splittable false
Extension:
patches #12 and #13 extends selection attributes by port index
for health reporter dumping.
====================
Link: https://lore.kernel.org/r/20230811155714.1736405-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Allow user to pass port index for health reporter dump request.
Re-generate the related code.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230811155714.1736405-14-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Introduce a possibility for devlink object to expose attributes it
supports for selection of dumped objects.
Use this by health reporter to indicate it supports port index based
selection of dump objects. Implement this selection mechanism in
devlink_nl_cmd_health_reporter_get_dump_one()
Example:
$ devlink health
pci/0000:08:00.0:
reporter fw
state healthy error 0 recover 0 auto_dump true
reporter fw_fatal
state healthy error 0 recover 0 grace_period 60000 auto_recover true auto_dump true
reporter vnic
state healthy error 0 recover 0
pci/0000:08:00.0/32768:
reporter vnic
state healthy error 0 recover 0
pci/0000:08:00.0/32769:
reporter vnic
state healthy error 0 recover 0
pci/0000:08:00.0/32770:
reporter vnic
state healthy error 0 recover 0
pci/0000:08:00.1:
reporter fw
state healthy error 0 recover 0 auto_dump true
reporter fw_fatal
state healthy error 0 recover 0 grace_period 60000 auto_recover true auto_dump true
reporter vnic
state healthy error 0 recover 0
pci/0000:08:00.1/98304:
reporter vnic
state healthy error 0 recover 0
pci/0000:08:00.1/98305:
reporter vnic
state healthy error 0 recover 0
pci/0000:08:00.1/98306:
reporter vnic
state healthy error 0 recover 0
$ devlink health show pci/0000:08:00.0
pci/0000:08:00.0:
reporter fw
state healthy error 0 recover 0 auto_dump true
reporter fw_fatal
state healthy error 0 recover 0 grace_period 60000 auto_recover true auto_dump true
reporter vnic
state healthy error 0 recover 0
pci/0000:08:00.0/32768:
reporter vnic
state healthy error 0 recover 0
pci/0000:08:00.0/32769:
reporter vnic
state healthy error 0 recover 0
pci/0000:08:00.0/32770:
reporter vnic
state healthy error 0 recover 0
$ devlink health show pci/0000:08:00.0/32768
pci/0000:08:00.0/32768:
reporter vnic
state healthy error 0 recover 0
The last command is possible because of this patch.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230811155714.1736405-13-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
instance attributes
Extend per-instance dump command definitions to accept instance
attributes. Allow parsing of devlink handle attributes so they could
be used for instance selection.
Re-generate the related code.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230811155714.1736405-12-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
For SFs, one devlink instance per SF is created. There might be
thousands of these on a single host. When a user needs to know port
handle for specific SF, he needs to dump all devlink ports on the host
which does not scale good.
Allow user to pass devlink handle attributes alongside the dump command
and dump only objects which are under selected devlink instance.
Example:
$ devlink port show
auxiliary/mlx5_core.eth.0/65535: type eth netdev eth2 flavour physical port 0 splittable false
auxiliary/mlx5_core.eth.1/131071: type eth netdev eth3 flavour physical port 1 splittable false
$ devlink port show auxiliary/mlx5_core.eth.0
auxiliary/mlx5_core.eth.0/65535: type eth netdev eth2 flavour physical port 0 splittable false
$ devlink port show auxiliary/mlx5_core.eth.1
auxiliary/mlx5_core.eth.1/131071: type eth netdev eth3 flavour physical port 1 splittable false
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230811155714.1736405-11-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As the commands are already defined in split ops, remove them
from small ops.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230811155714.1736405-10-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Remove the duplicate temporary netlink callback prototype as the
generated ones are already in place.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230811155714.1736405-9-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add the definitions for the commands that do per-instance dump
and re-generate the related code.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230811155714.1736405-8-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In order to easily set NLM_F_DUMP_FILTERED for partial dumps, pass the
flags as an arg of dump_one() callback. Currently, it is always
NLM_F_MULTI.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230811155714.1736405-7-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Introduce dumpit callbacks for generated split ops. Have them
as a thin wrapper around iteration function and allow to pass dump_one()
function pointer directly without need to store in devlink_cmd structs.
Note that the function prototypes are temporary until the generated ones
will replace them in a follow-up patch.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230811155714.1736405-6-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Rename netlink doit callback functions for the commands that do
implement per-instance dump to match the generated names that are going
to be introduce in the follow-up patch.
Note that the function prototypes are temporary until the generated ones
will replace them in a follow-up patch.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230811155714.1736405-5-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Define port handling helpers what don't rely on internal_flags.
Have __devlink_nl_pre_doit() to accept the flags as a function arg and
make devlink_nl_pre_doit() a wrapper helper function calling it.
Introduce new helpers devlink_nl_pre_doit_port() and
devlink_nl_pre_doit_port_optional() to be used by split ops in follow-up
patch.
Note that the function prototypes are temporary until the generated ones
will replace them in a follow-up patch.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230811155714.1736405-4-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
No need to give the rate any special treatment in netlink attributes
parsing, as unlike for ports, there is only a couple of commands
benefiting from that.
Remove DEVLINK_NL_FLAG_NEED_RATE*, make pre_doit() callback simpler
by moving the rate attributes parsing to rate_*_doit() ops.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230811155714.1736405-3-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
No need to give the linecards any special treatment in netlink attribute
parsing, as unlike for ports, there is only a couple of commands
benefiting from that.
Remove DEVLINK_NL_FLAG_NEED_LINECARD, make pre_doit() callback simpler
by moving the linecard attribute parsing to linecard_[gs]et_doit() ops.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230811155714.1736405-2-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Running the bench_rename test script, the following error occurs:
# ./benchs/run_bench_rename.sh
base : 0.819 ± 0.012M/s
kprobe : 0.538 ± 0.009M/s
kretprobe : 0.503 ± 0.004M/s
rawtp : 0.779 ± 0.020M/s
fentry : 0.726 ± 0.007M/s
fexit : 0.691 ± 0.007M/s
benchmark 'rename-fmodret' not found
The bench_rename_fmodret has been removed in commit b000def2e052
("selftests: Remove fmod_ret from test_overhead"), thus remove it
from the runners in the test script.
Fixes: b000def2e052 ("selftests: Remove fmod_ret from test_overhead")
Signed-off-by: Yipeng Zou <zouyipeng@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230814030727.3010390-1-zouyipeng@huawei.com
|
|
There is no way where topts.repeat can be set to 1 when tc_test fails.
Fix the typo where the break statement slipped by one line.
Fixes: fb66223a244f ("selftests/bpf: add test for accessing ctx from syscall program type")
Signed-off-by: Yipeng Zou <zouyipeng@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Li Zetao <lizetao1@huawei.com>
Link: https://lore.kernel.org/bpf/20230814031434.3077944-1-zouyipeng@huawei.com
|
|
This code was only used in the past for the sysfs interface. But since
this was replace with netlink, it was never executed. The function pointer
was only checked to figure out whether the limit 255 (B.A.T.M.A.N. IV) or
2**32-1 (B.A.T.M.A.N. V) should be used as limit.
So instead of keeping the function pointer, just store the limits directly
in struct batadv_algo_gw_ops.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
|
|
The batadv_netlink_notify_*() functions are not used by any other source
file. Just keep them local to netlink.c to get informed by the compiler
when they are not used anymore.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
|
|
This function is no longer used since the sysfs support was removed from
batman-adv.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
|
|
The current display probe is unable to differentiate between IVB Q and
IVB D GT2 server, as they both have the same device id, but different
subvendor and subdevice. This leads to the latter being misidentified as
the former, and should just end up not having a display. However, the no
display case returns a NULL as the display device info, and promptly
oopses.
As the IVB Q case is rare, and we're anyway moving towards GMD ID,
handle the identification requiring subvendor and subdevice as a special
case first, instead of unnecessarily growing the intel_display_ids[]
array with subvendor and subdevice.
[ 5.425298] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 5.426059] #PF: supervisor read access in kernel mode
[ 5.426810] #PF: error_code(0x0000) - not-present page
[ 5.427570] PGD 0 P4D 0
[ 5.428285] Oops: 0000 [#1] PREEMPT SMP PTI
[ 5.429035] CPU: 0 PID: 137 Comm: (udev-worker) Not tainted 6.4.0-1-amd64 #1 Debian 6.4.4-1
[ 5.429759] Hardware name: HP HP Z220 SFF Workstation/HP Z220 SFF Workstation, BIOS 4.19-218-gb184e6e0a1 02/02/2023
[ 5.430485] RIP: 0010:intel_device_info_driver_create+0xf1/0x120 [i915]
[ 5.431338] Code: 48 8b 97 80 1b 00 00 89 8f c0 1b 00 00 48 89 b7 b0 1b 00 00 48 89 97 b8 1b 00 00 0f b7 fd e8 76 e8 14 00 48 89 83 50 1b 00 00 <48> 8b 08 48 89 8b c4 1b 00 00 48 8b 48 08 48 89 8b cc 1b 00 00 8b
[ 5.432920] RSP: 0018:ffffb8254044fb98 EFLAGS: 00010206
[ 5.433707] RAX: 0000000000000000 RBX: ffff923076e80000 RCX: 0000000000000000
[ 5.434494] RDX: 0000000000000260 RSI: 0000000100001000 RDI: 000000000000016a
[ 5.435277] RBP: 000000000000016a R08: ffffb8254044fb00 R09: 0000000000000000
[ 5.436055] R10: ffff922d02761de8 R11: 00657361656c6572 R12: ffffffffc0e5d140
[ 5.436867] R13: ffff922d00b720d0 R14: 0000000076e80000 R15: ffff923078c0cae8
[ 5.437646] FS: 00007febd19a18c0(0000) GS:ffff92307c000000(0000) knlGS:0000000000000000
[ 5.438434] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5.439218] CR2: 0000000000000000 CR3: 000000010256e002 CR4: 00000000001706f0
[ 5.440009] Call Trace:
[ 5.440824] <TASK>
[ 5.441611] ? __die+0x23/0x70
[ 5.442394] ? page_fault_oops+0x17d/0x4c0
[ 5.443173] ? exc_page_fault+0x7f/0x180
[ 5.443949] ? asm_exc_page_fault+0x26/0x30
[ 5.444756] ? intel_device_info_driver_create+0xf1/0x120 [i915]
[ 5.445652] ? intel_device_info_driver_create+0xea/0x120 [i915]
[ 5.446545] i915_driver_probe+0x7f/0xb60 [i915]
[ 5.447431] ? drm_privacy_screen_get+0x15c/0x1a0 [drm]
[ 5.448240] local_pci_probe+0x45/0xa0
[ 5.449013] pci_device_probe+0xc7/0x240
[ 5.449748] really_probe+0x19e/0x3e0
[ 5.450464] ? __pfx___driver_attach+0x10/0x10
[ 5.451172] __driver_probe_device+0x78/0x160
[ 5.451870] driver_probe_device+0x1f/0x90
[ 5.452601] __driver_attach+0xd2/0x1c0
[ 5.453293] bus_for_each_dev+0x88/0xd0
[ 5.453989] bus_add_driver+0x116/0x220
[ 5.454672] driver_register+0x59/0x100
[ 5.455336] i915_init+0x25/0xc0 [i915]
[ 5.456104] ? __pfx_i915_init+0x10/0x10 [i915]
[ 5.456882] do_one_initcall+0x5d/0x240
[ 5.457511] do_init_module+0x60/0x250
[ 5.458126] __do_sys_finit_module+0xac/0x120
[ 5.458721] do_syscall_64+0x60/0xc0
[ 5.459314] ? syscall_exit_to_user_mode+0x1b/0x40
[ 5.459897] ? do_syscall_64+0x6c/0xc0
[ 5.460510] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 5.461082] RIP: 0033:0x7febd20b0eb9
[ 5.461648] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2f 1f 0d 00 f7 d8 64 89 01 48
[ 5.462905] RSP: 002b:00007fffabb1ba78 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 5.463554] RAX: ffffffffffffffda RBX: 0000561e6304f410 RCX: 00007febd20b0eb9
[ 5.464201] RDX: 0000000000000000 RSI: 00007febd2244f0d RDI: 0000000000000015
[ 5.464869] RBP: 00007febd2244f0d R08: 0000000000000000 R09: 000000000000000a
[ 5.465512] R10: 0000000000000015 R11: 0000000000000246 R12: 0000000000020000
[ 5.466124] R13: 0000000000000000 R14: 0000561e63032b60 R15: 000000000000000a
[ 5.466700] </TASK>
[ 5.467271] Modules linked in: i915(+) drm_buddy video crc32_pclmul sr_mod hid_generic wmi crc32c_intel i2c_algo_bit sd_mod cdrom drm_display_helper cec usbhid rc_core ghash_clmulni_intel hid sha512_ssse3 ttm sha512_generic xhci_pci ehci_pci xhci_hcd ehci_hcd nvme ahci drm_kms_helper nvme_core libahci t10_pi libata psmouse aesni_intel scsi_mod crypto_simd i2c_i801 scsi_common crc64_rocksoft_generic cryptd i2c_smbus drm lpc_ich crc64_rocksoft crc_t10dif e1000e usbcore crct10dif_generic usb_common crct10dif_pclmul crc64 crct10dif_common button
[ 5.469750] CR2: 0000000000000000
[ 5.470364] ---[ end trace 0000000000000000 ]---
[ 5.470971] RIP: 0010:intel_device_info_driver_create+0xf1/0x120 [i915]
[ 5.471699] Code: 48 8b 97 80 1b 00 00 89 8f c0 1b 00 00 48 89 b7 b0 1b 00 00 48 89 97 b8 1b 00 00 0f b7 fd e8 76 e8 14 00 48 89 83 50 1b 00 00 <48> 8b 08 48 89 8b c4 1b 00 00 48 8b 48 08 48 89 8b cc 1b 00 00 8b
[ 5.473034] RSP: 0018:ffffb8254044fb98 EFLAGS: 00010206
[ 5.473698] RAX: 0000000000000000 RBX: ffff923076e80000 RCX: 0000000000000000
[ 5.474371] RDX: 0000000000000260 RSI: 0000000100001000 RDI: 000000000000016a
[ 5.475045] RBP: 000000000000016a R08: ffffb8254044fb00 R09: 0000000000000000
[ 5.475725] R10: ffff922d02761de8 R11: 00657361656c6572 R12: ffffffffc0e5d140
[ 5.476405] R13: ffff922d00b720d0 R14: 0000000076e80000 R15: ffff923078c0cae8
[ 5.477124] FS: 00007febd19a18c0(0000) GS:ffff92307c000000(0000) knlGS:0000000000000000
[ 5.477811] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5.478499] CR2: 0000000000000000 CR3: 000000010256e002 CR4: 00000000001706f0
Fixes: 69d439818fe5 ("drm/i915/display: Make display responsible for probing its own IP")
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8991
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Luca Coelho <luciano.coelho@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230804084600.1005818-1-jani.nikula@intel.com
(cherry picked from commit 1435188307d128671f677eb908e165666dd83652)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|