Age | Commit message (Collapse) | Author |
|
Allocate and initialize struct ice_adapter object only once per physical
card instead of once per port. This is not a big deal by now, but we want
to extend this struct more and more in the near future. Our plans include
PTP stuff and a devlink instance representing whole-device/physical card.
Transactions requiring to be sleep-able (like those doing user (here ice)
memory allocation) must be performed with an additional (on top of xarray)
mutex. Adding it here removes need to xa_lock() manually.
Since this commit is a reimplementation of ice_adapter_get(), a rather new
scoped_guard() wrapper for locking is used to simplify the logic.
It's worth to mention that xa_insert() use gives us both slot reservation
and checks if it is already filled, what simplifies code a tiny bit.
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Admin queue command for shutdown AQ contains a flag to indicate driver
unload. However, the flag is always set in the driver, even for resets. It
can cause the firmware to consider driver as unloaded once the PF reset is
triggered on all ports of device, which could lead to unexpected results.
Add an additional function parameter to functions that shutdown AQ,
indicating whether the driver is actually unloading.
Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Allow the driver to be compatible with different FW API versions based
on the device's MAC type. Currently, E810 is only compatible with one
FW API version. Now the driver can be compatible with different FW API
versions for both E810 and E830. For example, E810 FW API version is
1.5.0 and E830 is 1.7.0.
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Check the return value from ice_vsi_rebuild() and prevent the usage of
incorrectly configured VSI.
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Eric Joyner <eric.joyner@intel.com>
Signed-off-by: Karen Ostrowska <karen.ostrowska@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Changing the MAC address of the VFs is currently unsupported via devlink.
Add the function handlers to set and get the HW address for the VFs.
Signed-off-by: Karthik Sundaravel <ksundara@redhat.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
ConnectX devices lack ability to count payload data byte size which is
needed for SA to return to libreswan for rekeying.
As a solution let's approximate that by decreasing headers size from
total size counted by flow steering. The calculation doesn't take into
account any other headers which can be in the packet (e.g. IP extensions).
Fixes: 5a6cddb89b51 ("net/mlx5e: Update IPsec per SA packets/bytes count")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
IPsec SA statistics presents successfully decrypted and encrypted
packet and bytes, and not total handled by this SA. So update the
calculation logic to take into account failures.
Fixes: 6fb7f9408779 ("net/mlx5e: Connect mlx5 IPsec statistics with XFRM core")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the cited commit, mqprio_rl cleanup and free are mistakenly removed
in mlx5e_priv_cleanup(), and it causes the leakage of host memory and
firmware SCHEDULING_ELEMENT objects while changing eswitch mode. So,
add them back.
Fixes: 0bb7228f7096 ("net/mlx5e: Fix mqprio_rl handling on devlink reload")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, ingress acl is used for three features. It is created only
when vport metadata match and prio tag are enabled. But active-backup
lag mode also uses it. It is independent of vport metadata match and
prio tag. And vport metadata match can be disabled using the
following devlink command:
# devlink dev param set pci/0000:08:00.0 name esw_port_metadata \
value false cmode runtime
If ingress acl is not created, will hit panic when creating drop rule
for active-backup lag mode. If always create it, there will be about
5% performance degradation.
Fix it by creating ingress acl when needed. If esw_port_metadata is
true, ingress acl exists, then create drop rule using existing
ingress acl. If esw_port_metadata is false, create ingress acl and
then create drop rule.
Fixes: 1749c4c51c16 ("net/mlx5: E-switch, add drop rule support to ingress ACL")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Due a bug in the device max_num_eqs doesn't always reflect a written
value. As a result, setting max_io_eqs may not work but appear
successful. Instead write max_num_eqs_24b, which reflects correct
value.
Fixes: 93197c7c509d ("mlx5/core: Support max_io_eqs for a function")
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A new capability with more bits is added. If it's set use that value as
the maximum number of EQs available.
This cap is also writable by the vhca_resource_manager to allow limiting
the number of EQs available to SFs and VFs.
Fixes: 93197c7c509d ("mlx5/core: Support max_io_eqs for a function")
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Implement the ethtool_ops::set_module_eeprom_by_page operation to allow
ethtool to write to a transceiver module EEPROM, in a similar fashion to
the ethtool_ops::get_module_eeprom_by_page operation.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Embedding net_device into structures prohibits the usage of flexible
arrays in the net_device structure. For more details, see the discussion
at [1].
Un-embed the net_devices from struct lmac by converting them
into pointers, and allocating them dynamically. Use the leverage
alloc_netdev() to allocate the net_device object at
bgx_lmac_enable().
The free of the device occurs at bgx_lmac_disable().
Do not free_netdevice() if bgx_lmac_enable() fails after lmac->netdev
is allocated, since bgx_lmac_disable() is called if bgx_lmac_enable()
fails, and lmac->netdev will be freed there (similarly to lmac->dmacs).
Link: https://lore.kernel.org/all/20240229225910.79e224cf@kernel.org/ [1]
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20240626173503.87636-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
e3f02f32a050 ("ionic: fix kernel panic due to multi-buffer handling")
d9c04209990b ("ionic: Mark error paths in the data path as unlikely")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When auxiliary_device_add() returns error and then calls
auxiliary_device_uninit(), callback function adev_release
calls kfree(madev). We shouldn't call kfree(madev) again
in the error handling path. Set 'madev' to NULL.
Fixes: a69839d4327d ("net: mana: Add support for auxiliary device")
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Link: https://patch.msgid.link/20240625130314.2661257-1-make24@iscas.ac.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add Ethernet support for STM32MP25.
STM32MP25 is STM32 SOC with 2 GMACs instances.
GMAC IP version is SNPS 5.3x.
GMAC IP configure with 2 RX and 4 TX queue.
DMA HW capability register supported
RX Checksum Offload Engine supported
TX Checksum insertion supported
Wake-Up On Lan supported
TSO supported
Signed-off-by: Christophe Roullier <christophe.roullier@foss.st.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Marek Vasut <marex@denx.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
WQE (Work Queue Element) includes 3 scatter/gather entries for buffers.
The buffer can be split into 3 parts, software should set address and byte
count of each part.
A previous patch-set used page pool to allocate buffers, to simplify the
change, we first used one continuous buffer, which was allocated with
order > 0. This patch improves page pool usage to allocate the exact
number of pages which are required for packet.
As part of init, fill WQE.address[x] and WQE.byte_count* with pages which
are allocated from the pool. Fill x entries according to number of
scatter/gather entries which are required for maximum packet size. When a
packet is received, check the actual size and replace only the used pages.
Save bytes for software overhead only as part of the first entry.
This change also requires using fragmented SKB, till now all the buffer
was in the linear part. Note that 'skb->truesize' is decreased for small
packets.
For now the maximum buffer size is 3 * PAGE_SIZE which is enough, in
case that the driver will support larger MTU, we can use 'order' to
allocate more than one page per scatter/gather entry.
This change significantly reduces memory consumption of mlxsw driver. The
footprint is reduced by 26%.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/ee38898c692e7f644a7f3ea4d33aeddb4dd917d2.1719321422.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
A previous patch-set used page pool for Rx buffers allocations. To
simplify the change, we first used page pool for one allocation per
packet - one continuous buffer is allocated for each packet. This can be
improved by using fragmented buffers, then memory consumption will be
significantly reduced.
WQE (Work Queue Element) includes up to 3 scatter/gather entries for
data. As preparation for fragmented buffer usage, calculate number of
scatter/gather entries which are required for packet according to
maximum MTU and store it for future use. For now use PAGE_SIZE for each
entry, which means that maximum buffer size is 3 * PAGE_SIZE. This is
enough for the maximum MTU which is supported in the driver now (10K).
Warn in an unlikely case of maximum MTU which requires more than 3 pages,
for now this warn should not happen with standard page size (>=4K) and
maximum MTU (10K).
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/98c3e3adb7e727e571ac538faf67cef262cec4fc.1719321422.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
These drivers don't use the driver_data member of struct i2c_device_id,
so don't explicitly initialize this member.
This prepares putting driver_data in an anonymous union which requires
either no initialization or named designators. But it's also a nice
cleanup on its own.
While add it, also remove commas after the sentinel entries.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Reviewed-by: Petr Machata <petrm@nvidia.com> # For mlxsw
Reviewed-by: Kory Maincent <Kory.maincent@bootlin.com>
Reviewed-by: Jeremy Kerr <jk@codeconstruct.com.au> # for mctp-i2c
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20240625083853.2205977-2-u.kleine-koenig@baylibre.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch adds supports for multiple PHY hardware with phylink. The
adapters with TN40xx chips use multiple PHY hardware; AMCC QT2025, TI
TLK10232, Aqrate AQR105, and Marvell 88X3120, 88X3310, and MV88E2010.
For now, the PCI ID table of this driver enables adapters using only
QT2025 PHY. I've tested this driver and the QT2025 PHY driver (SFP+
10G SR) with Edimax EN-9320 10G adapter.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
Reviewed-by: Hans-Frieder Vogt <hfdevel@gmx.net>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20240623235507.108147-8-fujita.tomonori@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch adds supports for mdio bus. A later path adds PHYLIB
support on the top of this.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
Link: https://patch.msgid.link/20240623235507.108147-7-fujita.tomonori@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch adds basic Rx handling. The Rx logic uses three major data
structures; two ring buffers with NIC and one database. One ring
buffer is used to send information to NIC about memory to be stored
packets to be received. The other is used to get information from NIC
about received packets. The database is used to keep the information
about DMA mapping. After a packet arrived, the db is used to pass the
packet to the network stack.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
Reviewed-by: Hans-Frieder Vogt <hfdevel@gmx.net>
Link: https://patch.msgid.link/20240623235507.108147-6-fujita.tomonori@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch adds device specific structures to initialize the hardware
with basic Tx handling. The original driver loads the embedded
firmware in the header file. This driver is implemented to use the
firmware APIs.
The Tx logic uses three major data structures; two ring buffers with
NIC and one database. One ring buffer is used to send information
about packets to be sent for NIC. The other is used to get information
from NIC about packet that are sent. The database is used to keep the
information about DMA mapping. After a packet is sent, the db is used
to free the resource used for the packet.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
Link: https://patch.msgid.link/20240623235507.108147-5-fujita.tomonori@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This adds several defines to handle registers in Tehuti Networks
TN40xx chips for later patches.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
Reviewed-by: Hans-Frieder Vogt <hfdevel@gmx.net>
Link: https://patch.msgid.link/20240623235507.108147-4-fujita.tomonori@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This just adds the scaffolding for an ethernet driver for Tehuti
Networks TN40xx chips.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240623235507.108147-3-fujita.tomonori@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement the ethtool commands that can be used to configure and query
flow-steering rules.
A large part of this change consists of translating the ethtool
representation of 'ntuples' to our internal gve_flow_rule and vice-versa
in the new created gve_flow_rule.c
Considering the possible large amount of flow rules, the driver doesn't
store all the rules locally. When the user runs 'ethtool -n <nic>' to
check the registered rules, the driver will send adminq command to
query a limited amount of rules/rule ids(that filled in a 4096 bytes dma
memory) at a time as a cache for the ethtool queries. The adminq query
commands will be repeated for several times until the ethtool has
queried all the needed rules.
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Co-developed-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240625001232.1476315-6-ziweixiao@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add new adminq commands for the driver to configure and query flow rules
that are stored in the device. Flow steering rules are assigned with a
location that determines the relative order of the rules.
Flow rules can run up to an order of millions. In such cases, storing
a full copy of the rules in the driver to prepare for the ethtool query
is infeasible while querying them from the device is better. That needs
to be optimized too so that we don't send a lot of adminq commands. The
solution here is to store a limited number of rules/rule ids in the
driver in a cache. Use dma_pool to allocate 4k bytes which lets device
write at most 46 flow rules(4096/88) or 1024 rule ids(4096/4) at a time.
For configuring flow rules, there are 3 sub-commands:
- ADD which adds a rule at the location supplied
- DEL which deletes the rule at the location supplied
- RESET which clears all currently active rules in the device
For querying flow rules, there are also 3 sub-commands:
- QUERY_RULES corresponds to ETHTOOL_GRXCLSRULE. It fills the rules in
the allocated cache after querying the device
- QUERY_RULES_IDS corresponds to ETHTOOL_GRXCLSRLALL. It fills the
rule_ids in the allocated cache after querying the device
- QUERY_RULES_STATS corresponds to ETHTOOL_GRXCLSRLCNT. It queries the
device's current flow rule number and the supported max flow rule
limit
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Co-developed-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240625001232.1476315-5-ziweixiao@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a new device option to signal to the driver that the device supports
flow steering. This device option also carries the maximum number of
flow steering rules that the device can store.
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Co-developed-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240625001232.1476315-4-ziweixiao@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The adminq command is limited to 64 bytes per entry and it's 56 bytes
for the command itself at maximum. To support larger commands, we need
to dma_alloc a separate memory to put the command in that memory and
send the dma memory address instead of the actual command.
Introduce an extended adminq command to wrap the real command with the
inner opcode and the allocated dma memory address specified. Once the
device receives it, it can get the real command from the given dma
memory address. As designed with the device, all the extended commands
will use inner opcode larger than 0xFF.
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Co-developed-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240625001232.1476315-3-ziweixiao@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We were depending on the rtnl_lock to make sure there is only one adminq
command running at a time. But some commands may take too long to hold
the rtnl_lock, such as the upcoming flow steering operations. For such
situations, it can temporarily drop the rtnl_lock, and replace it for
these operations with a new adminq lock, which can ensure the adminq
command execution to be thread-safe.
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240625001232.1476315-2-ziweixiao@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
All EtherAVB instances on R-Car Gen3/Gen4 SoCs support the RGMII
interface. In addition, the first two EtherAVB instances on R-Car V4M
also support the MII interface, but this is not yet supported by the
driver.
Add support for MII on R-Car Gen4 by adding an R-Car Gen4-specific EMAC
initialization function that selects the MII clock instead of the RGMII
clock when the PHY interface is MII. Note that all implementations of
EtherAVB on R-Car Gen4 SoCs have the APSR register, but only MII-capable
instances are documented to have the MIISELECT bit, which has a
documented value of zero when reserved.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Link: https://patch.msgid.link/3a21d1d6680864aa85afff9260234c2b8054020a.1719234830.git.geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move ravb_gen2_hw_info before ravb_gen3_hw_info to match
ravb_match_table[] order.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Link: https://patch.msgid.link/a76febe3737e26365a784e9193da9363f22aa550.1719234830.git.geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If we're not in a NAPI softirq context, we need to be careful
about how we call napi_consume_skb(), specifically we need to
call it with budget==0 to signal to it that we're not in a
safe context.
This was found while running some configuration stress testing
of traffic and a change queue config loop running, and this
curious note popped out:
[ 4371.402645] BUG: using smp_processor_id() in preemptible [00000000] code: ethtool/20545
[ 4371.402897] caller is napi_skb_cache_put+0x16/0x80
[ 4371.403120] CPU: 25 PID: 20545 Comm: ethtool Kdump: loaded Tainted: G OE 6.10.0-rc3-netnext+ #8
[ 4371.403302] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 01/23/2021
[ 4371.403460] Call Trace:
[ 4371.403613] <TASK>
[ 4371.403758] dump_stack_lvl+0x4f/0x70
[ 4371.403904] check_preemption_disabled+0xc1/0xe0
[ 4371.404051] napi_skb_cache_put+0x16/0x80
[ 4371.404199] ionic_tx_clean+0x18a/0x240 [ionic]
[ 4371.404354] ionic_tx_cq_service+0xc4/0x200 [ionic]
[ 4371.404505] ionic_tx_flush+0x15/0x70 [ionic]
[ 4371.404653] ? ionic_lif_qcq_deinit.isra.23+0x5b/0x70 [ionic]
[ 4371.404805] ionic_txrx_deinit+0x71/0x190 [ionic]
[ 4371.404956] ionic_reconfigure_queues+0x5f5/0xff0 [ionic]
[ 4371.405111] ionic_set_ringparam+0x2e8/0x3e0 [ionic]
[ 4371.405265] ethnl_set_rings+0x1f1/0x300
[ 4371.405418] ethnl_default_set_doit+0xbb/0x160
[ 4371.405571] genl_family_rcv_msg_doit+0xff/0x130
[...]
I found that ionic_tx_clean() calls napi_consume_skb() which calls
napi_skb_cache_put(), but before that last call is the note
/* Zero budget indicate non-NAPI context called us, like netpoll */
and
DEBUG_NET_WARN_ON_ONCE(!in_softirq());
Those are pretty big hints that we're doing it wrong. We can pass a
context hint down through the calls to let ionic_tx_clean() know what
we're doing so it can call napi_consume_skb() correctly.
Fixes: 386e69865311 ("ionic: Make use napi_consume_skb")
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://patch.msgid.link/20240624175015.4520-1-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Introduce an additional validation to ensure that the PPE index
is modified exclusively for mtk_eth ingress devices.
This primarily addresses the issue related
to WED operation with multiple PPEs.
Fixes: dee4dd10c79a ("net: ethernet: mtk_eth_soc: ppe: add support for multiple PPEs")
Signed-off-by: Elad Yifee <eladwf@gmail.com>
Link: https://lore.kernel.org/r/20240623175113.24437-1-eladwf@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Extend wake-on LAN support with an ARP packet.
Currently, if PHY supports WOL, ethtool ignores the modes supported
by MACB. This change extends the WOL modes with MACB supported modes.
Advertise wake-on LAN supported modes by default without relying on
dt node. By default, wake-on LAN will be in disabled state.
Using ethtool, users can enable/disable or choose packet types.
For wake-on LAN via ARP, ensure the IP address is assigned and
report an error otherwise.
Co-developed-by: Harini Katakam <harini.katakam@amd.com>
Signed-off-by: Harini Katakam <harini.katakam@amd.com>
Signed-off-by: Vineeth Karumanchi <vineeth.karumanchi@amd.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
Tested-by: Claudiu Beznea <claudiu.beznea@tuxon.dev> # on SAMA7G5
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Enable queue disable for Versal devices.
Signed-off-by: Vineeth Karumanchi <vineeth.karumanchi@amd.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When GEM is used as a wake device, it is not mandatory for the RX DMA
to be active. The RX engine in IP only needs to receive and identify
a wake packet through an interrupt. The wake packet is of no further
significance; hence, it is not required to be copied into memory.
By disabling RX DMA during suspend, we can avoid unnecessary DMA
processing of any incoming traffic.
During suspend, perform either of the below operations:
- tie-off/dummy descriptor: Disable unused queues by connecting
them to a looped descriptor chain without free slots.
- queue disable: The newer IP version allows disabling individual queues.
Co-developed-by: Harini Katakam <harini.katakam@amd.com>
Signed-off-by: Harini Katakam <harini.katakam@amd.com>
Signed-off-by: Vineeth Karumanchi <vineeth.karumanchi@amd.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
Tested-by: Claudiu Beznea <claudiu.beznea@tuxon.dev> # on SAMA7G5
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Below is a summary of how the driver stores a reference to an skb during
transmit:
tx_buff[free_map[consumer_index]]->skb = new_skb;
free_map[consumer_index] = IBMVNIC_INVALID_MAP;
consumer_index ++;
Where variable data looks like this:
free_map == [4, IBMVNIC_INVALID_MAP, IBMVNIC_INVALID_MAP, 0, 3]
consumer_index^
tx_buff == [skb=null, skb=<ptr>, skb=<ptr>, skb=null, skb=null]
The driver has checks to ensure that free_map[consumer_index] pointed to
a valid index but there was no check to ensure that this index pointed
to an unused/null skb address. So, if, by some chance, our free_map and
tx_buff lists become out of sync then we were previously risking an
skb memory leak. This could then cause tcp congestion control to stop
sending packets, eventually leading to ETIMEDOUT.
Therefore, add a conditional to ensure that the skb address is null. If
not then warn the user (because this is still a bug that should be
patched) and free the old pointer to prevent memleak/tcp problems.
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Fix unintended sign extension and klockwork issues. These are not real
issue but for sanity checks.
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
TC queues needs to be correctly updated when the number of queues on
a VSI is reconfigured, so netdev's queue and TC settings will be
dynamically adjusted and could accurately represent the underlying
hardware state after changes to the VSI queue counts.
Fixes: 0754d65bd4be ("ice: Add infrastructure for mqprio support via ndo_setup_tc")
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Karen Ostrowska <karen.ostrowska@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
ice: prepare representor for SF support
Michal Swiatkowski says:
This is a series to prepare port representor for supporting also
subfunctions. We need correct devlink locking and the possibility to
update parent VSI after port representor is created.
Refactor how devlink lock is taken to suite the subfunction use case.
VSI configuration needs to be done after port representor is created.
Port representor needs only allocated VSI. It doesn't need to be
configured before.
VSI needs to be reconfigured when update function is called.
The code for this patchset was split from (too big) patchset [1].
[1] https://lore.kernel.org/netdev/20240213072724.77275-1-michal.swiatkowski@linux.intel.com/
---
Originally from https://lore.kernel.org/netdev/20240605-next-2024-06-03-intel-next-batch-v2-0-39c23963fa78@intel.com/
Changes:
- delete ice_repr_get_by_vsi() from header
- rephrase commit message in moving devlink locking
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We support multicast addresses, so enable it by default.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are 2 types of outstanding tx skb's:
Type 1: Packets that are sitting in the drivers ind_buff that are
waiting to be batch sent to the NIC. During a device reset, these are
freed with a call to ibmvnic_tx_scrq_clean_buffer()
Type 2: Packets that have been sent to the NIC and are awaiting a TX
completion IRQ. These are free'd during a reset with a call to
clean_tx_pools()
During any reset which requires us to free the tx irq, ensure that the
Type 2 skb references are freed. Since the irq is released, it is
impossible for the NIC to inform of any completions.
Furthermore, later in the reset process is a call to init_tx_pools()
which marks every entry in the tx pool as free (ie not outstanding).
So if the driver is to make a call to init_tx_pools(), it must first
be sure that the tx pool is empty of skb references.
This issue was discovered by observing the following in the logs during
EEH testing:
TX free map points to untracked skb (tso_pool 0 idx=4)
TX free map points to untracked skb (tso_pool 0 idx=5)
TX free map points to untracked skb (tso_pool 1 idx=36)
Fixes: 65d6470d139a ("ibmvnic: clean pending indirect buffs during reset")
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In case of reset of VF VSI can be reallocated. To handle this case it
should be properly updated.
Reload representor as vsi->vsi_num can be different than the one stored
when representor was created.
Instead of only changing antispoof do whole VSI configuration for
eswitch.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
It is needed because subfunction port representor shouldn't configure
the source VSI during representor creation.
Move the code to separate function and call it only in case the VF port
representor is being created.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
In case of subfunction lock will be taken for whole port creation and
removing. Do the same in VF case.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
It is used to get representor structure during cleaning.
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The following two shared buffer operations make use of the Shared Buffer
Status Register (SBSR):
# devlink sb occupancy snapshot pci/0000:01:00.0
# devlink sb occupancy clearmax pci/0000:01:00.0
The register has two masks of 256 bits to denote on which ingress /
egress ports the register should operate on. Spectrum-4 has more than
256 ports, so the register was extended by cited commit with a new
'port_page' field.
However, when filling the register's payload, the driver specifies the
ports as absolute numbers and not relative to the first port of the port
page, resulting in memory corruptions [1].
Fix by specifying the ports relative to the first port of the port page.
[1]
BUG: KASAN: slab-use-after-free in mlxsw_sp_sb_occ_snapshot+0xb6d/0xbc0
Read of size 1 at addr ffff8881068cb00f by task devlink/1566
[...]
Call Trace:
<TASK>
dump_stack_lvl+0xc6/0x120
print_report+0xce/0x670
kasan_report+0xd7/0x110
mlxsw_sp_sb_occ_snapshot+0xb6d/0xbc0
mlxsw_devlink_sb_occ_snapshot+0x75/0xb0
devlink_nl_sb_occ_snapshot_doit+0x1f9/0x2a0
genl_family_rcv_msg_doit+0x20c/0x300
genl_rcv_msg+0x567/0x800
netlink_rcv_skb+0x170/0x450
genl_rcv+0x2d/0x40
netlink_unicast+0x547/0x830
netlink_sendmsg+0x8d4/0xdb0
__sys_sendto+0x49b/0x510
__x64_sys_sendto+0xe5/0x1c0
do_syscall_64+0xc1/0x1d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
[...]
Allocated by task 1:
kasan_save_stack+0x33/0x60
kasan_save_track+0x14/0x30
__kasan_kmalloc+0x8f/0xa0
copy_verifier_state+0xbc2/0xfb0
do_check_common+0x2c51/0xc7e0
bpf_check+0x5107/0x9960
bpf_prog_load+0xf0e/0x2690
__sys_bpf+0x1a61/0x49d0
__x64_sys_bpf+0x7d/0xc0
do_syscall_64+0xc1/0x1d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Freed by task 1:
kasan_save_stack+0x33/0x60
kasan_save_track+0x14/0x30
kasan_save_free_info+0x3b/0x60
poison_slab_object+0x109/0x170
__kasan_slab_free+0x14/0x30
kfree+0xca/0x2b0
free_verifier_state+0xce/0x270
do_check_common+0x4828/0xc7e0
bpf_check+0x5107/0x9960
bpf_prog_load+0xf0e/0x2690
__sys_bpf+0x1a61/0x49d0
__x64_sys_bpf+0x7d/0xc0
do_syscall_64+0xc1/0x1d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes: f8538aec88b4 ("mlxsw: Add support for more than 256 ports in SBSR register")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Cited commit added support for a new reset flow ("all reset") which is
deeper than the existing reset flow ("software reset") and allows the
device's PCI firmware to be upgraded.
In the new flow the driver first tells the firmware that "all reset" is
required by issuing a new reset command (i.e., MRSR.command=6) and then
triggers the reset by having the PCI core issue a secondary bus reset
(SBR).
However, due to a race condition in the device's firmware the device is
not always able to recover from this reset, resulting in initialization
failures [1].
New firmware versions include a fix for the bug and advertise it using a
new capability bit in the Management Capabilities Mask (MCAM) register.
Avoid initialization failures by reading the new capability bit and
triggering the new reset flow only if the bit is set. If the bit is not
set, trigger a normal PCI hot reset by skipping the call to the
Management Reset and Shutdown Register (MRSR).
Normal PCI hot reset is weaker than "all reset", but it results in a
fully operational driver and allows users to flash a new firmware, if
they want to.
[1]
mlxsw_spectrum4 0000:01:00.0: not ready 1023ms after bus reset; waiting
mlxsw_spectrum4 0000:01:00.0: not ready 2047ms after bus reset; waiting
mlxsw_spectrum4 0000:01:00.0: not ready 4095ms after bus reset; waiting
mlxsw_spectrum4 0000:01:00.0: not ready 8191ms after bus reset; waiting
mlxsw_spectrum4 0000:01:00.0: not ready 16383ms after bus reset; waiting
mlxsw_spectrum4 0000:01:00.0: not ready 32767ms after bus reset; waiting
mlxsw_spectrum4 0000:01:00.0: not ready 65535ms after bus reset; giving up
mlxsw_spectrum4 0000:01:00.0: PCI function reset failed with -25
mlxsw_spectrum4 0000:01:00.0: cannot register bus device
mlxsw_spectrum4: probe of 0000:01:00.0 failed with error -25
Fixes: f257c73e5356 ("mlxsw: pci: Add support for new reset flow")
Reported-by: Maksym Yaremchuk <maksymy@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Maksym Yaremchuk <maksymy@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add initial support for Renesas Ethernet-TSN End-station device of R-Car
V4H. The Ethernet End-station can connect to an Ethernet network using a
10 Mbps, 100 Mbps, or 1 Gbps full-duplex link via MII/GMII/RMII/RGMII.
Depending on the connected PHY.
The driver supports Rx checksum and offload and hardware timestamps.
While full power management and suspend/resume is not yet supported the
driver enables runtime PM in order to enable the module clock. While
explicit clock management using clk_enable() would suffice for the
supported SoC, the module could be reused on SoCs where the module is
part of a power domain.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|