Age | Commit message (Collapse) | Author |
|
Implement removing additional RSS contexts via Netlink.
Technically it'd be possible to shoehorn the delete operation
into ethnl_request_ops-compatible handler. The code ends
up longer than open coded version, and I think we'll need
a custom way of sending notifications at some stage (if we
allow tying the context lifetime to the netlink socket, in
the future).
Link: https://patch.msgid.link/20250717234343.2328602-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Support creating contexts via Netlink. Setting flow hashing
fields on the new context is not supported at this stage,
it can be added later.
An empty indirection table is not supported. This is a carry
over from the IOCTL interface where empty indirection table
meant delete. We can repurpose empty indirection table in
Netlink but for now to avoid confusion reject it using the
policy.
Support letting user choose the ID for the new context. This was
not possible in IOCTL since the context ID field for the create
action had to be set to the ETH_RXFH_CONTEXT_ALLOC magic value.
Link: https://patch.msgid.link/20250717234343.2328602-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add initial support for RSS_SET, for now only operations on
the indirection table are supported.
Unlike the ioctl don't check if at least one parameter is
being changed. This is how other ethtool-nl ops behave,
so pick the ethtool-nl consistency vs copying ioctl behavior.
There are two special cases here:
1) resetting the table to defaults;
2) support for tables of different size.
For (1) I use an empty Netlink attribute (array of size 0).
(2) may require some background. AFAICT a lot of modern devices
allow allocating RSS tables of different sizes. mlx5 can upsize
its tables, bnxt has some "table size calculation", and Intel
folks asked about RSS table sizing in context of resource allocation
in the past. The ethtool IOCTL API has a concept of table size,
but right now the user is expected to provide a table exactly
the size the device requests. Some drivers may change the table
size at runtime (in response to queue count changes) but the
user is not in control of this. What's not great is that all
RSS contexts share the same table size. For example a device
with 128 queues enabled, 16 RSS contexts 8 queues in each will
likely have 256 entry tables for each of the 16 contexts,
while 32 would be more than enough given each context only has
8 queues. To address this the Netlink API should avoid enforcing
table size at the uAPI level, and should allow the user to express
the min table size they expect.
To fully solve (2) we will need more driver plumbing but
at the uAPI level this patch allows the user to specify
a table size smaller than what the device advertises. The device
table size must be a multiple of the user requested table size.
We then replicate the user-provided table to fill the full device
size table. This addresses the "allow the user to express the min
table size" objective, while not enforcing any fixed size.
From Netlink perspective .get_rxfh_indir_size() is now de facto
the "max" table size supported by the device.
We may choose to support table replication in ethtool, too,
when we actually plumb this thru the device APIs.
Initially I was considering moving full pattern generation
to the kernel (which queues to use, at which frequency and
what min sequence length). I don't think this complexity
would buy us much and most if not all devices have pow-2
table sizes, which simplifies the replication a lot.
Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250716000331.1378807-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Ido spotted that I made a mistake in commit under Fixes,
ethnl_default_parse() may acquire a dev reference even when it returns
an error. This may have been driven by the code structure in dumps
(which unconditionally release dev before handling errors), but it's
too much of a trap. Functions should undo what they did before returning
an error, rather than expecting caller to clean up.
Rather than fixing ethnl_default_set_doit() directly make
ethnl_default_parse() clean up errors.
Reported-by: Ido Schimmel <idosch@idosch.org>
Link: https://lore.kernel.org/aGEPszpq9eojNF4Y@shredder
Fixes: 963781bdfe20 ("net: ethtool: call .parse_request for SET handlers")
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250630154053.1074664-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In preparation for RSS_SET handling in ethnl introduce Netlink
notifications for RSS. Only cover modifications, not creation
and not removal of a context, because the latter may deserve
a different notification type. We should cross that bridge
when we add the support for context add / remove via Netlink.
Link: https://patch.msgid.link/20250623231720.3124717-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Copy information parsed for SET with .req_parse to NTF handling
and therefore the GET-equivalent that it ends up executing.
This way if the SET was on a sub-object (like RSS context)
the notification will also be appropriately scoped.
Also copy the phy_index, Maxime suggests this will help PLCA
commands generate accurate notifications as well.
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250623231720.3124717-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ethtool_notify() takes a const void *data argument, which presumably
was intended to pass information from the call site to the subcommand
handler. This argument currently has no users.
Expecting the data to be subcommand-specific has two complications.
Complication #1 is that its not plumbed thru any of the standardized
callbacks. It gets propagated to ethnl_default_notify() where it
remains unused. Coming from the ethnl_default_set_doit() side we pass
in NULL, because how could we have a command specific attribute in
a generic handler.
Complication #2 is that we expect the ethtool_notify() callers to
know what attribute type to pass in. Again, the data pointer is
untyped.
RSS will need to pass the context ID to the notifications.
I think it's a better design if the "subcommand" exports its own
typed interface and constructs the appropriate argument struct
(which will be req_info). Remove the unused data argument from
ethtool_notify() but retain it in a new internal helper which
subcommands can use to build a typed interface.
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250623231720.3124717-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In preparation for using req_info to carry parameters between SET
and NTF - call .parse_request during ethnl_default_set_doit().
The main question here is whether .parse_request is intended to be
GET-specific. Originally the SET handling was delegated to each subcommand
directly - ethnl_default_set_doit() and .set callbacks in ethnl_request_ops
did not exist. Looking at existing users does not shed much light, all
of the following subcommands use .parse_request but have no SET handler
(and no NTF):
net/ethtool/eeprom.c
net/ethtool/rss.c
net/ethtool/stats.c
net/ethtool/strset.c
net/ethtool/tsinfo.c
There's only one which does have a SET:
net/ethtool/pause.c
where .parse_request handling is used to select which statistics to query.
Not relevant for SET but also harmless.
Going back to RSS (which doesn't have SET today) .parse_request parses
the rss_context ID. Using the req_info struct to pass the context ID
from SET to NTF will be very useful.
Switch to ethnl_default_parse(), effectively adding the .parse_request
for SET handlers.
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250623231720.3124717-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In preparation for using req_info to carry parameters between
SET and NTF allocate a full request info struct. Since the size
depends on the subcommand we need to allocate it on the heap.
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250623231720.3124717-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move away from dev_hold and use netdev_hold with a local reftracker when
performing a DUMP on each netdev.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250502085242.248645-4-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now that we have an infrastructure in ethnl for perphy DUMPs, we can get
rid of the custom ->doit and ->dumpit to deal with PHY listing commands.
As most of the code was custom, this basically means re-writing how we
deal with PHY listing.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250502085242.248645-3-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ethnl commands that target a phy_device need a DUMP implementation that
will fill the reply for every PHY behind a netdev. We therefore need to
iterate over the dev->topo to list them.
When multiple PHYs are behind the same netdev, it's also useful to
perform DUMP with a filter on a given netdev, to get the capability of
every PHY.
Implement dedicated genl ->start(), ->dumpit() and ->done() operations
for PHY-targetting command, allowing filtered dumps and using a dump
context that keep track of the PHY iteration for multi-message dump.
PSE-PD and PLCA are converted to this new set of ops along the way.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250502085242.248645-2-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There's a consistent pattern where the .cleanup_data() callback is
called when .prepare_data() fails, when it should really be called to
clean after a successful .prepare_data() as per the documentation.
Rewrite the error-handling paths to make sure we don't cleanup
un-prepared data.
Fixes: c781ff12a2f3 ("ethtool: Allow network drivers to dump arbitrary EEPROM data")
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250407130511.75621-1-maxime.chevallier@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Move the more esoteric helpers for netdev instance lock to
a dedicated header. This avoids growing netdevice.h to infinity
and makes rebuilding the kernel much faster (after touching
the header with the helpers).
The main netdev_lock() / netdev_unlock() functions are used
in static inlines in netdevice.h and will probably be used
most commonly, so keep them in netdevice.h.
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250307183006.2312761-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ethnl_default_dump_one() operates on the device provided in its @dev
parameter, not from ctx->req_info->dev.
syzbot reported:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000197: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000cb8-0x0000000000000cbf]
RIP: 0010:netdev_need_ops_lock include/linux/netdevice.h:2792 [inline]
RIP: 0010:netdev_lock_ops include/linux/netdevice.h:2803 [inline]
RIP: 0010:ethnl_default_dump_one net/ethtool/netlink.c:557 [inline]
RIP: 0010:ethnl_default_dumpit+0x447/0xd40 net/ethtool/netlink.c:593
Call Trace:
<TASK>
genl_dumpit+0x10d/0x1b0 net/netlink/genetlink.c:1027
netlink_dump+0x64d/0xe10 net/netlink/af_netlink.c:2309
__netlink_dump_start+0x5a2/0x790 net/netlink/af_netlink.c:2424
genl_family_rcv_msg_dumpit net/netlink/genetlink.c:1076 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:1192 [inline]
genl_rcv_msg+0x894/0xec0 net/netlink/genetlink.c:1210
netlink_rcv_skb+0x206/0x480 net/netlink/af_netlink.c:2534
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline]
netlink_unicast+0x7f6/0x990 net/netlink/af_netlink.c:1339
netlink_sendmsg+0x8de/0xcb0 net/netlink/af_netlink.c:1883
sock_sendmsg_nosec net/socket.c:709 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:724
____sys_sendmsg+0x53a/0x860 net/socket.c:2564
___sys_sendmsg net/socket.c:2618 [inline]
__sys_sendmsg+0x269/0x350 net/socket.c:2650
Fixes: 2bcf4772e45a ("net: ethtool: try to protect all callback with netdev instance lock")
Reported-by: syzbot+3da2442641f0c6a705a2@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/67caaf5e.050a0220.15b4b9.007a.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250307083544.1659135-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.14-rc6).
Conflicts:
net/ethtool/cabletest.c
2bcf4772e45a ("net: ethtool: try to protect all callback with netdev instance lock")
637399bf7e77 ("net: ethtool: netlink: Allow NULL nlattrs when getting a phy_device")
No Adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Protect all ethtool callbacks and PHY related state with the netdev
instance lock, for drivers which want / need to have their ops
instance-locked. Basically take the lock everywhere we take rtnl_lock.
It was tempting to take the lock in ethnl_ops_begin(), but turns
out we actually nest those calls (when generating notifications).
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Cc: Saeed Mahameed <saeed@kernel.org>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250305163732.2766420-11-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ethnl_req_get_phydev() is used to lookup a phy_device, in the case an
ethtool netlink command targets a specific phydev within a netdev's
topology.
It takes as a parameter a const struct nlattr *header that's used for
error handling :
if (!phydev) {
NL_SET_ERR_MSG_ATTR(extack, header,
"no phy matching phyindex");
return ERR_PTR(-ENODEV);
}
In the notify path after a ->set operation however, there's no request
attributes available.
The typical callsite for the above function looks like:
phydev = ethnl_req_get_phydev(req_base, tb[ETHTOOL_A_XXX_HEADER],
info->extack);
So, when tb is NULL (such as in the ethnl notify path), we have a nice
crash.
It turns out that there's only the PLCA command that is in that case, as
the other phydev-specific commands don't have a notification.
This commit fixes the crash by passing the cmd index and the nlattr
array separately, allowing NULL-checking it directly inside the helper.
Fixes: c15e065b46dc ("net: ethtool: Allow passing a phy index for some commands")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Reported-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com>
Link: https://patch.msgid.link/20250301141114.97204-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
No conflicts and no adjacent changes.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Record the pending configuration in net_device struct.
ethtool core duplicates the current config and the specific
handlers (for now just ringparam) can modify it.
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250119020518.1962249-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
For ease of review of the next patch store the dev pointer
on the stack, instead of referring to req_info.dev every time.
No functional changes.
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250119020518.1962249-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The following trace can be seen if a device is being unregistered while
its number of channels are being modified.
DEBUG_LOCKS_WARN_ON(lock->magic != lock)
WARNING: CPU: 3 PID: 3754 at kernel/locking/mutex.c:564 __mutex_lock+0xc8a/0x1120
CPU: 3 UID: 0 PID: 3754 Comm: ethtool Not tainted 6.13.0-rc6+ #771
RIP: 0010:__mutex_lock+0xc8a/0x1120
Call Trace:
<TASK>
ethtool_check_max_channel+0x1ea/0x880
ethnl_set_channels+0x3c3/0xb10
ethnl_default_set_doit+0x306/0x650
genl_family_rcv_msg_doit+0x1e3/0x2c0
genl_rcv_msg+0x432/0x6f0
netlink_rcv_skb+0x13d/0x3b0
genl_rcv+0x28/0x40
netlink_unicast+0x42e/0x720
netlink_sendmsg+0x765/0xc20
__sys_sendto+0x3ac/0x420
__x64_sys_sendto+0xe0/0x1c0
do_syscall_64+0x95/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
This is because unregister_netdevice_many_notify might run before the
rtnl lock section of ethnl operations, eg. set_channels in the above
example. In this example the rss lock would be destroyed by the device
unregistration path before being used again, but in general running
ethnl operations while dismantle has started is not a good idea.
Fix this by denying any operation on devices being unregistered. A check
was already there in ethnl_ops_begin, but not wide enough.
Note that the same issue cannot be seen on the ioctl version
(__dev_ethtool) because the device reference is retrieved from within
the rtnl lock section there. Once dismantle started, the net device is
unlisted and no reference will be found.
Fixes: dde91ccfa25f ("ethtool: do not perform operations on net devices being unregistered")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/20250116092159.50890-1-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Introduce support for ETHTOOL_MSG_TSCONFIG_GET/SET ethtool netlink socket
to read and configure hwtstamp configuration of a PHC provider. Note that
simultaneous hwtstamp isn't supported; configuring a new one disables the
previous setting.
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Either the MAC or the PHY can provide hwtstamp, so we should be able to
read the tsinfo for any hwtstamp provider.
Enhance 'get' command to retrieve tsinfo of hwtstamp providers within a
network topology.
Add support for a specific dump command to retrieve all hwtstamp
providers within the network topology, with added functionality for
filtered dump to target a single interface.
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As we have the ability to track the PHYs connected to a net_device
through the link_topology, we can expose this list to userspace. This
allows userspace to use these identifiers for phy-specific commands and
take the decision of which PHY to target by knowing the link topology.
Add PHY_GET and PHY_DUMP, which can be a filtered DUMP operation to list
devices on only one interface.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some netlink commands are target towards ethernet PHYs, to control some
of their features. As there's several such commands, add the ability to
pass a PHY index in the ethnl request, which will populate the generic
ethnl_req_info with the passed phy_index.
Add a helper that netlink command handlers need to use to grab the
targeted PHY from the req_info. This helper needs to hold rtnl_lock()
while interacting with the PHY, as it may be removed at any point.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that we track RSS contexts in the core we can easily dump
them. This is a major introspection improvement, as previously
the only way to find all contexts would be to try all ids
(of which there may be 2^32 - 1).
Don't use the XArray iterators (like xa_for_each_start()) as they
do not move the index past the end of the array once done, which
caused multiple bugs in Netlink dumps in the past.
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 31e0aa99dc02 ("ethtool: Veto some operations during firmware flashing process")
added a flag module_fw_flash_in_progress to struct net_device. As
this is ethtool related state, move it to the recently created
struct ethtool_netdev_state, accessed via the 'ethtool' member of
struct net_device.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20240703121849.652893-1-edward.cree@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add the ability to flash the modules' firmware by implementing the
interface between the user space and the kernel.
Example from a succeeding implementation:
# ethtool --flash-module-firmware swp40 file test.bin
Transceiver module firmware flashing started for device swp40
Transceiver module firmware flashing in progress for device swp40
Progress: 99%
Transceiver module firmware flashing completed for device swp40
In addition, add infrastructure that allows modules to set socket-specific
private data. This ensures that when a socket is closed from user space
during the flashing process, the right socket halts sending notifications
to user space until the work item is completed.
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some operations cannot be performed during the firmware flashing
process.
For example:
- Port must be down during the whole flashing process to avoid packet loss
while committing reset for example.
- Writing to EEPROM interrupts the flashing process, so operations like
ethtool dump, module reset, get and set power mode should be vetoed.
- Split port firmware flashing should be vetoed.
In order to veto those scenarios, add a flag in 'struct net_device' that
indicates when a firmware flash is taking place on the module and use it
to prevent interruptions during the process.
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add progress notifications ability to user space while flashing modules'
firmware by implementing the interface between the user space and the
kernel.
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
for_each_netdev_dump() can be used with RCU protection,
no need for rtnl if we are going to use dev_hold()/dev_put().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240207153514.3640952-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The default dump handler needs to clear ret before returning.
Otherwise if the last interface returns an inconsequential
error this error will propagate to user space.
This may confuse user space (ethtool CLI seems to ignore it,
but YNL doesn't). It will also terminate the dump early
for mutli-skb dump, because netlink core treats EOPNOTSUPP
as a real error.
Fixes: 728480f12442 ("ethtool: default handlers for GET requests")
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231126225806.2143528-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We had a number of bugs in the past because developers forgot
to fully test dumps, which pass NULL as info to .prepare_data.
.prepare_data implementations would try to access info->extack
leading to a null-deref.
Now that dumps and notifications can access struct genl_info
we can pass it in, and remove the info null checks.
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # pause
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230814214723.2924989-11-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pass struct genl_info directly instead of its members.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230814214723.2924989-10-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since dumps carry struct genl_info now, use the attrs pointer
from genl_info and remove the one in struct genl_dumpit_info.
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230814214723.2924989-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Reap the benefits of easier iteration thanks to the xarray.
Convert just the genetlink ones, those are easier to test.
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230726185530.2247698-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
New users of dev_get_by_index() and dev_get_by_name() keep
getting added and it would be nice to steer them towards
the APIs with reference tracking.
Add variants of those calls which allocate the reference
tracker and use them in a couple of places.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ethtool currently requires a header nest (which is used to carry
the common family options) in all requests including dumps.
$ cli.py --spec netlink/specs/ethtool.yaml --dump channels-get
lib.ynl.NlError: Netlink error: Invalid argument
nl_len = 64 (48) nl_flags = 0x300 nl_type = 2
error: -22 extack: {'msg': 'request header missing'}
$ cli.py --spec netlink/specs/ethtool.yaml --dump channels-get \
--json '{"header":{}}'; )
[{'combined-count': 1,
'combined-max': 1,
'header': {'dev-index': 2, 'dev-name': 'enp1s0'}}]
Requiring the header nest to always be there may seem nice
from the consistency perspective, but it's not serving any
practical purpose. We shouldn't burden the user like this.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Convert all SET commands where new common code is applicable.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Most ethtool SET callbacks follow the same general structure.
ethnl_parse_header_dev_get()
rtnl_lock()
ethnl_ops_begin()
... do stuff ...
ethtool_notify()
ethnl_ops_complete()
rtnl_unlock()
ethnl_parse_header_dev_put()
This leads to a lot of copy / pasted code an bugs when people
mis-handle the error path.
Add a generic implementation of this pattern with a .set callback
in struct ethnl_request_ops called to "do stuff".
Also add an optional .set_validate which is called before
ethnl_ops_begin() -- a lot of implementations do basic request
capability / sanity checking at that point.
Because we want to avoid generating the notification when
no change happened - adopt a slightly hairy return values:
- 0 means nothing to do (no notification)
- 1 means done / continue
- negative error codes on error
Reuse .hdr_attr from struct ethnl_request_ops, GET and SET
use the same attr spaces in all cases.
Convert pause as an example (and to avoid unused function warnings).
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The MAC merge sublayer (IEEE 802.3-2018 clause 99) is one of 2
specifications (the other being Frame Preemption; IEEE 802.1Q-2018
clause 6.7.2), which work together to minimize latency caused by frame
interference at TX. The overall goal of TSN is for normal traffic and
traffic with a bounded deadline to be able to cohabitate on the same L2
network and not bother each other too much.
The standards achieve this (partly) by introducing the concept of
preemptible traffic, i.e. Ethernet frames that have a custom value for
the Start-of-Frame-Delimiter (SFD), and these frames can be fragmented
and reassembled at L2 on a link-local basis. The non-preemptible frames
are called express traffic, they are transmitted using a normal SFD, and
they can preempt preemptible frames, therefore having lower latency,
which can matter at lower (100 Mbps) link speeds, or at high MTUs (jumbo
frames around 9K). Preemption is not recursive, i.e. a P frame cannot
preempt another P frame. Preemption also does not depend upon priority,
or otherwise said, an E frame with prio 0 will still preempt a P frame
with prio 7.
In terms of implementation, the standards talk about the presence of an
express MAC (eMAC) which handles express traffic, and a preemptible MAC
(pMAC) which handles preemptible traffic, and these MACs are multiplexed
on the same MII by a MAC merge layer.
To support frame preemption, the definition of the SFD was generalized
to SMD (Start-of-mPacket-Delimiter), where an mPacket is essentially an
Ethernet frame fragment, or a complete frame. Stations unaware of an SMD
value different from the standard SFD will treat P frames as error
frames. To prevent that from happening, a negotiation process is
defined.
On RX, packets are dispatched to the eMAC or pMAC after being filtered
by their SMD. On TX, the eMAC/pMAC classification decision is taken by
the 802.1Q spec, based on packet priority (each of the 8 user priority
values may have an admin-status of preemptible or express).
The MAC Merge layer and the Frame Preemption parameters have some degree
of independence in terms of how software stacks are supposed to deal
with them. The activation of the MM layer is supposed to be controlled
by an LLDP daemon (after it has been communicated that the link partner
also supports it), after which a (hardware-based or not) verification
handshake takes place, before actually enabling the feature. So the
process is intended to be relatively plug-and-play. Whereas FP settings
are supposed to be coordinated across a network using something
approximating NETCONF.
The support contained here is exclusively for the 802.3 (MAC Merge)
portions and not for the 802.1Q (Frame Preemption) parts. This API is
sufficient for an LLDP daemon to do its job. The FP adminStatus variable
from 802.1Q is outside the scope of an LLDP daemon.
I have taken a few creative licenses and augmented the Linux kernel UAPI
compared to the standard managed objects recommended by IEEE 802.3.
These are:
- ETHTOOL_A_MM_PMAC_ENABLED: According to Figure 99-6: Receive
Processing state diagram, a MAC Merge layer is always supposed to be
able to receive P frames. However, this implies keeping the pMAC
powered on, which will consume needless power in applications where FP
will never be used. If LLDP is used, the reception of an Additional
Ethernet Capabilities TLV from the link partner is sufficient
indication that the pMAC should be enabled. So my proposal is that in
Linux, we keep the pMAC turned off by default and that user space
turns it on when needed.
- ETHTOOL_A_MM_VERIFY_ENABLED: The IEEE managed object is called
aMACMergeVerifyDisableTx. I opted for consistency (positive logic) in
the boolean netlink attributes offered, so this is also positive here.
Other than the meaning being reversed, they correspond to the same
thing.
- ETHTOOL_A_MM_MAX_VERIFY_TIME: I found it most reasonable for a LLDP
daemon to maximize the verifyTime variable (delay between SMD-V
transmissions), to maximize its chances that the LP replies. IEEE says
that the verifyTime can range between 1 and 128 ms, but the NXP ENETC
stupidly keeps this variable in a 7 bit register, so the maximum
supported value is 127 ms. I could have chosen to hardcode this in the
LLDP daemon to a lower value, but why not let the kernel expose its
supported range directly.
- ETHTOOL_A_MM_TX_MIN_FRAG_SIZE: the standard managed object is called
aMACMergeAddFragSize, and expresses the "additional" fragment size
(on top of ETH_ZLEN), whereas this expresses the absolute value of the
fragment size.
- ETHTOOL_A_MM_RX_MIN_FRAG_SIZE: there doesn't appear to exist a managed
object mandated by the standard, but user space clearly needs to know
what is the minimum supported fragment size of our local receiver,
since LLDP must advertise a value no lower than that.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for configuring the PLCA Reconciliation Sublayer on
multi-drop PHYs that support IEEE802.3cg-2019 Clause 148 (e.g.,
10BASE-T1S). This patch adds the appropriate netlink interface
to ethtool.
Signed-off-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add netlink based support for "ethtool -x <dev> [context x]"
command by implementing ETHTOOL_MSG_RSS_GET netlink message.
This is equivalent to functionality provided via ETHTOOL_GRSSH
in ioctl path. It sends RSS table, hash key and hash function
of an interface to user space.
This patch implements existing functionality available
in ioctl path and enables addition of new RSS context
based parameters in future.
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Link: https://lore.kernel.org/r/20221202002555.241580-1-sudheer.mogilappagari@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add interface to support Power Sourcing Equipment. At current step it
provides generic way to address all variants of PSE devices as defined
in IEEE 802.3-2018 but support only objects specified for IEEE 802.3-2018 104.4
PoDL Power Sourcing Equipment (PSE).
Currently supported and mandatory objects are:
IEEE 802.3-2018 30.15.1.1.3 aPoDLPSEPowerDetectionStatus
IEEE 802.3-2018 30.15.1.1.2 aPoDLPSEAdminState
IEEE 802.3-2018 30.15.1.2.1 acPoDLPSEAdminControl
This is minimal interface needed to control PSE on each separate
ethernet port but it provides not all mandatory objects specified in
IEEE 802.3-2018.
Since "PoDL PSE" and "PSE" have similar names, but some different values
I decide to not merge them and keep separate naming schema. This should
allow as to be as close to IEEE 802.3 spec as possible and avoid name
conflicts in the future.
This implementation is connected to PHYs instead of MACs because PSE
auto classification can potentially interfere with PHY auto negotiation.
So, may be some extra PHY related initialization will be needed.
With WIP version of ethtools interaction with PSE capable link looks
as following:
$ ip l
...
5: t1l1@eth0: <BROADCAST,MULTICAST> ..
...
$ ethtool --show-pse t1l1
PSE attributs for t1l1:
PoDL PSE Admin State: disabled
PoDL PSE Power Detection Status: disabled
$ ethtool --set-pse t1l1 podl-pse-admin-control enable
$ ethtool --show-pse t1l1
PSE attributs for t1l1:
PoDL PSE Admin State: enabled
PoDL PSE Power Detection Status: delivering power
Signed-off-by: kernel test robot <lkp@intel.com>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The actual presence check for the header is in
ethnl_parse_header_dev_get() but it's a few layers in,
and already has a ton of arguments so let's just pick
the low hanging fruit and check for missing header in
the default request handler.
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
We had historically not checked that genlmsghdr.reserved
is 0 on input which prevents us from using those precious
bytes in the future.
One use case would be to extend the cmd field, which is
currently just 8 bits wide and 256 is not a lot of commands
for some core families.
To make sure that new families do the right thing by default
put the onus of opting out of validation on existing families.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com> (NetLabel)
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Netdev reference helpers have a dev_ prefix for historic
reasons. Renaming the old helpers would be too much churn
but we can rename the tracking ones which are relatively
recent and should be the default for new code.
Rename:
dev_hold_track() -> netdev_hold()
dev_put_track() -> netdev_put()
dev_replace_track() -> netdev_ref_replace()
Link: https://lore.kernel.org/r/20220608043955.919359-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As reported by Johannes, the tracker allocated in
ethnl_default_notify() is not really needed, as this
function is not expected to change a device reference count.
Fixes: e4b8954074f6 ("netlink: add net device refcount tracker to struct ethnl_req_info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Tested-by: Johannes Berg <johannes@sipsolutions.net>
Link: https://lore.kernel.org/r/20220105170849.2610470-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The assignment here will be overwritten, so it should be deleted
The clang_analyzer complains as follows:
net/ethtool/netlink.c:
Value stored to 'ret' is never read
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: luo penghao <luo.penghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|