Age | Commit message (Collapse) | Author |
|
We expect NAPI to be in disabled state when page pool is torn down.
But it is also legal if the NAPI is completely uninitialized.
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250206225638.1387810-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We seem to be missing a netif_running() check from the devmem
installation path. Starting a queue on a stopped device makes
no sense. We still want to be able to allocate the memory, just
to test that the device is indeed setting up the page pools
in a memory provider compatible way.
This is not a bug fix, because existing drivers check if
the interface is down as part of the ops. But new drivers
shouldn't have to do this, as long as they can correctly
alloc/free while down.
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250206225638.1387810-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Shorten the lines by storing dev->queue_mgmt_ops in a temp variable.
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250206225638.1387810-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use phy_set_max_speed() to simplify init_phy().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/b863dcf7-31e8-45a1-a284-7075da958ff0@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fabio Porcedda says:
====================
Add usb support for Telit Cinterion FN990B
Add usb support for Telit Cinterion FN990B.
Also fix Telit Cinterion FN990A name.
Connection with ModemManager was tested also AT ports.
====================
Link: https://patch.msgid.link/20250205171649.618162-1-fabio.porcedda@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The correct name for FN990 is FN990A so use it in order to avoid
confusion with FN990B.
Signed-off-by: Fabio Porcedda <fabio.porcedda@gmail.com>
Link: https://patch.msgid.link/20250205171649.618162-6-fabio.porcedda@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The correct name for FN990 is FN990A so use it in order to avoid
confusion with FN990B.
Signed-off-by: Fabio Porcedda <fabio.porcedda@gmail.com>
Link: https://patch.msgid.link/20250205171649.618162-5-fabio.porcedda@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add the following Telit Cinterion FN990B composition:
0x10d0: rmnet + tty (AT/NMEA) + tty (AT) + tty (AT) + tty (AT) +
tty (diag) + DPL + QDSS (Qualcomm Debug SubSystem) + adb
T: Bus=01 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#= 17 Spd=480 MxCh= 0
D: Ver= 2.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=1bc7 ProdID=10d0 Rev=05.15
S: Manufacturer=Telit Cinterion
S: Product=FN990
S: SerialNumber=43b38f19
C: #Ifs= 9 Cfg#= 1 Atr=e0 MxPwr=500mA
I: If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan
E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=82(I) Atr=03(Int.) MxPS= 8 Ivl=32ms
I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=60 Driver=option
E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=84(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=86(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
I: If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=88(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
I: If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=89(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=8a(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
I: If#= 5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
E: Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=8b(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I: If#= 6 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=80 Driver=(none)
E: Ad=8c(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I: If#= 7 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=70 Driver=(none)
E: Ad=8d(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I: If#= 8 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=usbfs
E: Ad=07(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=8e(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
Cc: stable@vger.kernel.org
Signed-off-by: Fabio Porcedda <fabio.porcedda@gmail.com>
Link: https://patch.msgid.link/20250205171649.618162-3-fabio.porcedda@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Simplify rswitch_get_port_node() by using the
for_each_available_child_of_node() helper instead of manually ignoring
unavailable child nodes, and leaking a reference.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/54f544d573a64b96e01fd00d3481b10806f4d110.1738771798.git.geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Russell King says:
====================
net: stmmac: yet more EEE updates
Continuing on with the STMMAC EEE cleanups from last cycle, this series
further cleans up the EEE code, and fixes a problem with the existing
implementation - disabling EEE doesn't immediately disable LPI
signalling until the next packet is transmitted. It likely also fixes
a potential race condition when trying to disable LPI vs the software
timer.
====================
Link: https://patch.msgid.link/Z6NqGnM2yL7Ayo-T@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As we no longer call the set_eee_mode(), reset_eee_mode() and
set_eee_lpi_entry_timer() methods, remove these and their glue in
common.h
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tffe7-003ZIm-Qv@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use the new stmmac_set_lpi_mode() API to configure the parameters of
the desired LPI mode rather than the older methods.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tffe2-003ZIg-Mx@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Ensure that LPI_CTRL_STATUS_LPITCSE is also appropriately cleared when
disabling LPI or enabling LPI without TX clock gating.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tffdx-003ZIZ-JQ@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a new method to control LPI mode configuration. This is architected
to have three configuration states: LPI disabled, LPI forced (active),
or LPI under hardware timer control. This reflects the three modes
which the main body of the driver wishes to deal with.
We pass in whether transmit clock gating should be used, and the
hardware timer value in microseconds to be set when using hardware
timer control.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tffds-003ZIT-E8@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The bit definitions for the LPI control/status register are
identical across all MAC versions, with the exception that some
bits may not be implemented. Provide definitions for bits in this
register in common.h, convert to use them, and remove the core-
specific definitions.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tffdn-003ZIN-9p@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Remove the unnecessary LPI disable when enabling LPI - as noted in
previous commits, there will never be two consecutive calls to
stmmac_mac_enable_tx_lpi() without an intervening
stmmac_mac_disable_tx_lpi.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tffdi-003ZIH-5h@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As other code paths do, clear priv->tx_path_in_lpi_mode when disabling
LPI. This is done after the software timer has been deleted and
hardware LPI has been disabled.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tffdd-003ZIB-22@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Phylink will not call the mac_disable_tx_lpi() and mac_enable_tx_lpi()
methods randomly - the first method to be called will be the enable
method, and then after, the disable method will be called once between
subsequent enable calls. Thus there is a guaranteed ordering.
Therefore, we know the previous state of priv->eee_enabled, and can
remove it from both methods.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tffdX-003ZI5-UV@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since priv->eee_active is assigned with a constant value in each of
these methods, there is no need to test its value later. Remove these
unnecessary tests.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tffdS-003ZHz-Qi@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The tests for priv->dma_cap.eee in stmmac_mac_{en,dis}able_tx_lpi()
is useless as these methods will only be called when using phylink
managed EEE, and that will only be enabled if the LPI capabilities
in phylink_config have been populated during initialisation. This
only occurs when priv->dma_cap.eee was true.
As priv->dma_cap.eee remains constant during the lifetime of the driver
instance, there is no need to re-check it in these methods.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tffdN-003ZHt-Mq@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move the appropriate parts of stmmac_init_eee() into the phylink
mac_enable_tx_lpi() and mac_disable_tx_lpi() methods.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tffdI-003ZHn-Iz@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
LPIATE enables the hardware timer for entering LPI mode. To sure that
the correct mode is used, clear LPIATE when using manual/software-timed
mode to prevent the hardware using the timer.
stmmac_main.c avoids this being a problem at the moment by calling
stmmac_set_eee_lpi_timer(..., 0) before switching to software mode.
We no longer need to call stmmac_set_eee_lpi_timer(..., 0) when
disabling EEE as stmmac_reset_eee_mode() will now clear all LPI
settings.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tffdD-003ZHh-Ew@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When EEE is disabled, we call stmmac_set_eee_lpi_timer(..., 0).
For dwmac4, this will result in LPIATE being cleared, but LPIEN and
LPITXA being set, causing LPI mode to be signalled (if it wasn't
before).
For others MACs, stmmac_set_eee_lpi_timer() does nothing, which means
that LPI mode will continue to be signalled despite the expectation
for it to be disabled.
In both cases, LPI mode will be terminated when the transmitter has
a packet to send, and LPIEN will be cleared by hardware.
Call stmmac_reset_eee_mode() to ensure that LPI mode is disabled when
EEE mode is requested to be disabled.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tffd8-003ZHb-AX@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Delete the software timer to ensure that the timer doesn't fire while
we are modifying the LPI register state, potentially re-enabling LPI.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tffd3-003ZHV-6C@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
inet_csk_delete_keepalive_timer() and inet_csk_reset_keepalive_timer()
are only used from core TCP, there is no need to export them.
Replace their prefix by tcp.
Move them to net/ipv4/tcp_timer.c and make tcp_delete_keepalive_timer()
static.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250206094605.2694118-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
These two functions are not called from modules.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250206093436.2609008-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Remove the two unnecessary comments around vxlan_rcv() and
vxlan_err_lookup(), which indicate that the callers are from
net/ipv{4,6}/udp.c. These callers are trivial to find. Additionally, the
comment for vxlan_rcv() missed that the caller could also be from
net/ipv6/udp.c.
Suggested-by: Nikolay Aleksandrov <razor@blackwall.org>
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Ted Chen <znscnchen@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250206140002.116178-1-znscnchen@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Biju Das says:
====================
Add of_get_available_child_by_name()
There are lot of net drivers using of_get_child_by_name() followed by
of_device_is_available() to find the available child node by name for a
given parent. Provide a helper for these users to simplify the code.
v1->v2:
* Make it as a series as per [1] to cover the dependency.
* Added Rb tag from Rob for patch#1 and this patch can be merged through
net as it is the main user.
* Updated all the patches with patch suffix net-next
* Dropped _free() usage.
[1]
https://lore.kernel.org/all/CAL_JsqLo4uSGYMcLXN=0iSUMHdW8RaGCY+o8ThQHq3_eUTV9wQ@mail.gmail.com/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use the helper of_get_available_child_by_name() to simplify
emac_dt_mdio_probe().
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use the helper of_get_available_child_by_name() to simplify
owl_emac_mdio_init().
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use the helper of_get_available_child_by_name() to simplify
mtk_mdio_init().
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use the helper of_get_available_child_by_name() to simplify
mtk_star_mdio_init().
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use the helper of_get_available_child_by_name() to simplify
sja1105_mdiobus_register().
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Simplify a5psw_probe() by using of_get_available_child_by_name().
While at it, move of_node_put(mdio) inside the if block to avoid code
duplication.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are lot of drivers using of_get_child_by_name() followed by
of_device_is_available() to find the available child node by name for a
given parent. Provide a helper for these users to simplify the code.
Suggested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
ice: managing MSI-X in driver
Michal Swiatkowski says:
It is another try to allow user to manage amount of MSI-X used for each
feature in ice. First was via devlink resources API, it wasn't accepted
in upstream. Also static MSI-X allocation using devlink resources isn't
really user friendly.
This try is using more dynamic way. "Dynamic" across whole kernel when
platform supports it and "dynamic" across the driver when not.
To achieve that reuse global devlink parameter pf_msix_max and
pf_msix_min. It fits how ice hardware counts MSI-X. In case of ice amount
of MSI-X reported on PCI is a whole MSI-X for the card (with MSI-X for
VFs also). Having pf_msix_max allow user to statically set how many
MSI-X he wants on PF and how many should be reserved for VFs.
pf_msix_min is used to set minimum number of MSI-X with which ice driver
should probe correctly.
Meaning of this field in case of dynamic vs static allocation:
- on system with dynamic MSI-X allocation support
* alloc pf_msix_min as static, rest will be allocated dynamically
- on system without dynamic MSI-X allocation support
* try alloc pf_msix_max as static, minimum acceptable result is
pf_msix_min
As Jesse and Piotr suggested pf_msix_max and pf_msix_min can (an
probably should) be stored in NVM. This patchset isn't implementing
that.
Dynamic (kernel or driver) way means that splitting MSI-X across the
RDMA and eth in case there is a MSI-X shortage isn't correct. Can work
when dynamic is only on driver site, but can't when dynamic is on kernel
site.
Let's remove this code and move to MSI-X allocation feature by feature.
If there is no more MSI-X for a feature, a feature is working with less
MSI-X or it is turned off.
There is a regression here. With MSI-X splitting user can run RDMA and
eth even on system with not enough MSI-X. Now only eth will work. RDMA
can be turned on by changing number of PF queues (lowering) and reprobe
RDMA driver.
Example:
72 CPU number, eth, RDMA and flow director (1 MSI-X), 1 MSI-X for OICR
on PF, and 1 more for RDMA. Card is using 1 + 72 + 1 + 72 + 1 = 147.
We set pf_msix_min = 2, pf_msix_max = 128
OICR: 1
eth: 72
flow director: 1
RDMA: 128 - 74 = 54
We can change number of queues on pf to 36 and do devlink reinit
OICR: 1
eth: 36
RDMA: 73
flow director: 1
We can also (implemented in "ice: enable_rdma devlink param") turned
RDMA off.
OICR: 1
eth: 72
RDMA: 0 (turned off)
flow director: 1
After this changes we have a static base vector for SRIOV (SIOV probably
in the feature). Last patch from this series is simplifying managing VF
MSI-X code based on static vector.
Now changing queues using ethtool is also changing MSI-X. If there is
enough MSI-X it is always one to one. When there is not enough there
will be more queues than MSI-X.
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: init flow director before RDMA
ice: simplify VF MSI-X managing
ice: enable_rdma devlink param
ice: treat dyn_allowed only as suggestion
ice, irdma: move interrupts code to irdma
ice: get rid of num_lan_msix field
ice: remove splitting MSI-X between features
ice: devlink PF MSI-X max and min parameter
ice: count combined queues using Rx/Tx count
====================
Link: https://patch.msgid.link/20250205185512.895887-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Simplify miic_parse_dt() by using the for_each_available_child_of_node()
helper instead of manually skipping unavailable child nodes.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/3e394d4cf8204bcf17b184bfda474085aa8ed0dd.1738771631.git.geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Populate the PCS supported_interfaces bitmap with the interfaces that
this PCS supports. This makes the manual checking in miic_validate()
redundant, so remove that.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1tfhYq-003aTm-Nx@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
John Daley says:
====================
enic: Use Page Pool API for receiving packets
Use the Page Pool API for RX. The Page Pool API improves bandwidth and
CPU overhead by recycling pages instead of allocating new buffers in the
driver. Also, page pool fragment allocation for smaller MTUs is used
allow multiple packets to share pages.
RX code was moved to its own file and some refactoring was done
beforehand to make the page pool changes more trasparent and to simplify
the resulting code.
====================
Link: https://patch.msgid.link/20250205235416.25410-1-johndale@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
With the move to using the Page Pool API for RX, rx copybreak was not
showing any improvement in host CPU overhead, latency or bandwidth so
the driver no longer makes use of the rx_copybreak setting. This patch
removes the ethtool tuneable hooks to set and get the rx copybreak since
they and now no-ops. Rx copybreak was the only tunable supported, so
remove the set and get tunable callbacks all together.
Co-developed-by: Nelson Escobar <neescoba@cisco.com>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Co-developed-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: John Daley <johndale@cisco.com>
Link: https://patch.msgid.link/20250205235416.25410-5-johndale@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The Page Pool API improves bandwidth and CPU overhead by recycling pages
instead of allocating new buffers in the driver. Make use of page pool
fragment allocation for smaller MTUs so that multiple packets can share
a page. For MTUs larger than PAGE_SIZE, adjust the 'order' page
parameter so that contiguous pages can be used to receive the larger
packets.
The RQ descriptor field 'os_buf' is repurposed to hold page pointers
allocated from page_pool instead of SKBs. When packets arrive, SKBs are
allocated and the page pointers are attached instead of preallocating SKBs.
'alloc_fail' netdev statistic is incremented when page_pool_dev_alloc()
fails.
Co-developed-by: Nelson Escobar <neescoba@cisco.com>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Co-developed-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: John Daley <johndale@cisco.com>
Link: https://patch.msgid.link/20250205235416.25410-4-johndale@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Split up RX handler functions in preparation for moving
to a page pool based implementation.
No functional changes.
Co-developed-by: Nelson Escobar <neescoba@cisco.com>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Co-developed-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: John Daley <johndale@cisco.com>
Link: https://patch.msgid.link/20250205235416.25410-3-johndale@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move RX handler code into its own file in preparation for further
changes. Some formatting changes were necessary in order to satisfy
checkpatch but there were no functional changes.
Co-developed-by: Nelson Escobar <neescoba@cisco.com>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Co-developed-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: John Daley <johndale@cisco.com>
Link: https://patch.msgid.link/20250205235416.25410-2-johndale@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There are at least two cases where napi_id may not present and the
napi_id should be elided:
1. Queues could be created, but napi_enable may not have been called
yet. In this case, there may be a NAPI but it may not have an ID and
output of a napi_id should be elided.
2. TX-only NAPIs currently do not have NAPI IDs. If a TX queue happens
to be linked with a TX-only NAPI, elide the NAPI ID from the netlink
output as a NAPI ID of 0 is not useful for users.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250205193751.297211-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
David Wei says:
====================
io_uring zero copy rx
This patchset contains net/ patches needed by a new io_uring request
implementing zero copy rx into userspace pages, eliminating a kernel
to user copy.
We configure a page pool that a driver uses to fill a hw rx queue to
hand out user pages instead of kernel pages. Any data that ends up
hitting this hw rx queue will thus be dma'd into userspace memory
directly, without needing to be bounced through kernel memory. 'Reading'
data out of a socket instead becomes a _notification_ mechanism, where
the kernel tells userspace where the data is. The overall approach is
similar to the devmem TCP proposal.
This relies on hw header/data split, flow steering and RSS to ensure
packet headers remain in kernel memory and only desired flows hit a hw
rx queue configured for zero copy. Configuring this is outside of the
scope of this patchset.
We share netdev core infra with devmem TCP. The main difference is that
io_uring is used for the uAPI and the lifetime of all objects are bound
to an io_uring instance. Data is 'read' using a new io_uring request
type. When done, data is returned via a new shared refill queue. A zero
copy page pool refills a hw rx queue from this refill queue directly. Of
course, the lifetime of these data buffers are managed by io_uring
rather than the networking stack, with different refcounting rules.
This patchset is the first step adding basic zero copy support. We will
extend this iteratively with new features e.g. dynamically allocated
zero copy areas, THP support, dmabuf support, improved copy fallback,
general optimisations and more.
In terms of netdev support, we're first targeting Broadcom bnxt. Patches
aren't included since Taehee Yoo has already sent a more comprehensive
patchset adding support in [1]. Google gve should already support this,
and Mellanox mlx5 support is WIP pending driver changes.
===========
Performance
===========
Note: Comparison with epoll + TCP_ZEROCOPY_RECEIVE isn't done yet.
Test setup:
* AMD EPYC 9454
* Broadcom BCM957508 200G
* Kernel v6.11 base [2]
* liburing fork [3]
* kperf fork [4]
* 4K MTU
* Single TCP flow
With application thread + net rx softirq pinned to _different_ cores:
+-------------------------------+
| epoll | io_uring |
|-----------|-------------------|
| 82.2 Gbps | 116.2 Gbps (+41%) |
+-------------------------------+
Pinned to _same_ core:
+-------------------------------+
| epoll | io_uring |
|-----------|-------------------|
| 62.6 Gbps | 80.9 Gbps (+29%) |
+-------------------------------+
=====
Links
=====
Broadcom bnxt support:
[1]: https://lore.kernel.org/20241003160620.1521626-8-ap420073@gmail.com
Linux kernel branch including io_uring bits:
[2]: https://github.com/isilence/linux.git zcrx/v13
liburing for testing:
[3]: https://github.com/isilence/liburing.git zcrx/next
kperf for testing:
[4]: https://git.kernel.dk/kperf.git
====================
Link: https://patch.msgid.link/20250204215622.695511-1-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add helpers that properly prep or remove a memory provider for an rx
queue then restart the queue.
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20250204215622.695511-11-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add helpers for memory providers to interact with page pools.
net_mp_niov_{set,clear}_page_pool() serve to [dis]associate a net_iov
with a page pool. If used, the memory provider is responsible to match
"set" calls with "clear" once a net_iov is not going to be used by a page
pool anymore, changing a page pool, etc.
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20250204215622.695511-10-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There is a good bunch of places in generic paths assuming that the only
page pool memory provider is devmem TCP. As we want to reuse the net_iov
and provider infrastructure, we need to patch it up and explicitly check
the provider type when we branch into devmem TCP code.
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20250204215622.695511-9-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Devmem TCP needs a hook in unregister_netdevice_many_notify() to upkeep
the set tracking queues it's bound to, i.e. ->bound_rxqs. Instead of
devmem sticking directly out of the genetic path, add a mp function.
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20250204215622.695511-8-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a mandatory callback that prints information about the memory
provider to netlink.
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20250204215622.695511-7-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|