Age | Commit message (Collapse) | Author |
|
We have run into an issue that a task gets stuck in
balance_dirty_pages_ratelimited() when perform I/O stress testing.
The reason we observed is that an I_DIRTY_PAGES inode with lots
of dirty pages is in b_dirty_time list and standard background
writeback cannot writeback the inode.
After studing the relevant code, the following scenario may lead
to the issue:
task1 task2
----- -----
fuse_flush
write_inode_now //in b_dirty_time
writeback_single_inode
__writeback_single_inode
fuse_write_end
filemap_dirty_folio
__xa_set_mark:PAGECACHE_TAG_DIRTY
lock inode->i_lock
if mapping tagged PAGECACHE_TAG_DIRTY
inode->i_state |= I_DIRTY_PAGES
unlock inode->i_lock
__mark_inode_dirty:I_DIRTY_PAGES
lock inode->i_lock
-was dirty,inode stays in
-b_dirty_time
unlock inode->i_lock
if(!(inode->i_state & I_DIRTY_All))
-not true,so nothing done
This patch moves the dirty inode to b_dirty list when the inode
currently is not queued in b_io or b_more_io list at the end of
writeback_single_inode.
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
CC: stable@vger.kernel.org
Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option")
Signed-off-by: Jing Xia <jing.xia@unisoc.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220510023514.27399-1-jing.xia@unisoc.com
|
|
Pointer dev is being assigned a value that is never used, the assignment
and the variable are redundant and can be removed. Also replace null check
with the preferred !ptr idiom.
Cleans up clang scan warning:
net/x25/x25_proc.c:94:26: warning: Although the value stored to 'dev' is
used in the enclosing expression, the value is never actually read
from 'dev' [deadcode.DeadStores]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20220508214500.60446-1-colin.i.king@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Wells Lu says:
====================
This is a patch series for Ethernet driver of Sunplus SP7021 SoC.
Sunplus SP7021 is an ARM Cortex A7 (4 cores) based SoC. It integrates
many peripherals (ex: UART, I2C, SPI, SDIO, eMMC, USB, SD card and
etc.) into a single chip. It is designed for industrial control
applications.
Refer to:
https://sunplus.atlassian.net/wiki/spaces/doc/overview
https://tibbo.com/store/plus1.html
====================
Link: https://lore.kernel.org/r/1652004800-3212-1-git-send-email-wellslutw@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add driver for Sunplus SP7021 SoC.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Wells Lu <wellslutw@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add bindings documentation for Sunplus SP7021 SoC.
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Wells Lu <wellslutw@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The impact of this regression is the same for resume that I saw on
thaw: the kernel hangs and nothing except SysRq rebooting can be done.
Fixes regression in commit cbe6c3a8f8f4 ("net: atlantic: invert deep
par in pm functions, preventing null derefs"), where I disabled deep
pm resets in suspend and resume, trying to make sense of the
atl_resume_common() deep parameter in the first place.
It turns out, that atlantic always has to deep reset on pm
operations. Even though I expected that and tested resume, I screwed
up by kexec-rebooting into an unpatched kernel, thus missing the
breakage.
This fixup obsoletes the deep parameter of atl_resume_common, but I
leave the cleanup for the maintainers to post to mainline.
Suspend and hibernation were successfully tested by the reporters.
Fixes: cbe6c3a8f8f4 ("net: atlantic: invert deep par in pm functions, preventing null derefs")
Link: https://lore.kernel.org/regressions/9-Ehc_xXSwdXcvZqKD5aSqsqeNj5Izco4MYEwnx5cySXVEc9-x_WC4C3kAoCqNTi-H38frroUK17iobNVnkLtW36V6VWGSQEOHXhmVMm5iQ=@protonmail.com/
Reported-by: Jordan Leppert <jordanleppert@protonmail.com>
Reported-by: Holger Hoffstaette <holger@applied-asynchrony.com>
Tested-by: Jordan Leppert <jordanleppert@protonmail.com>
Tested-by: Holger Hoffstaette <holger@applied-asynchrony.com>
CC: <stable@vger.kernel.org> # 5.10+
Signed-off-by: Manuel Ullmann <labre@posteo.de>
Link: https://lore.kernel.org/r/87bkw8dfmp.fsf@posteo.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
'ptp-support-hardware-clocks-with-additional-free-running-cycle-counter'
Gerhard Engleder says:
====================
ptp: Support hardware clocks with additional free running cycle counter
ptp vclocks require a clock with free running time for the timecounter.
Currently only a physical clock forced to free running is supported.
If vclocks are used, then the physical clock cannot be synchronized
anymore. The synchronized time is not available in hardware in this
case. As a result, timed transmission with TAPRIO hardware support
is not possible anymore.
If hardware would support a free running time additionally to the
physical clock, then the physical clock does not need to be forced to
free running. Thus, the physical clocks can still be synchronized while
vclocks are in use.
The physical clock could be used to synchronize the time domain of the
TSN network and trigger TAPRIO. In parallel vclocks can be used to
synchronize other time domains.
One year ago I thought for two time domains within a TSN network also
two physical clocks are required. This would lead to new kernel
interfaces for asking for the second clock, ... . But actually for a
time triggered system like TSN there can be only one time domain that
controls the system itself. All other time domains belong to other
layers, but not to the time triggered system itself. So other time
domains can be based on a free running counter if similar mechanisms
like 2 step synchroisation are used.
Synchronisation was tested with two time domains between two directly
connected hosts. Each host run two ptp4l instances, the first used the
physical clock and the second used the virtual clock. I used my FPGA
based network controller as network device. ptp4l was used in
combination with the virtual clock support patches from Miroslav
Lichvar.
v4:
- if_index of 0 is invalid (Jonathan Lemon)
- set if_index to 0 in the SOF_TIMESTAMPING_RAW_HARDWARE block (Jonathan
Lemon)
- add helper function for netdev_get_tstamp() call (Jonathan Lemon)
- update SKBTX_ANY_TSTAMP (Paolo Abeni)
- use separate bits for new tx_flags (Richard Cochran)
v3:
- optimize ptp_convert_timestamp (Richard Cochran)
- call dev_get_by_napi_id() only if needed (Richard Cochran)
- use non-negated logical test (Richard Cochran)
- add comment for skipped output (Richard Cochran)
- add comment for SKBTX_HW_TSTAMP_USE_CYCLES masking (Richard Cochran)
v2:
- rename ptp_clock cycles to has_cycles (Richard Cochran)
- call it free running cycle counter (Richard Cochran)
- update struct skb_shared_hwtstamps kdoc (Richard Cochran)
- optimize timestamp address/cookie processing path (Richard Cochran,
Vinicius Costa Gomes)
v1:
- complete rework based on suggestions (Richard Cochran)
====================
Link: https://lore.kernel.org/r/20220506200142.3329-1-gerhard@engleder-embedded.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The TSN endpoint Ethernet MAC supports a free running counter
additionally to its clock. This free running counter can be read and
hardware timestamps are supported. As the name implies, this counter
cannot be set and its frequency cannot be adjusted.
Add free running cycle counter support based on this free running
counter to physical clock. This also requires hardware time stamps
based on that free running counter.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
ptp_convert_timestamp() is called in the RX path of network messages.
The current implementation takes ~5000ns on 1.2GHz A53. This is too much
for the hot path of packet processing.
Introduce hash table for fast vclock lookup in ptp_convert_timestamp().
The execution time of ptp_convert_timestamp() is reduced to ~700ns on
1.2GHz A53.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
If a physical clock supports a free running cycle counter, then
timestamps shall be based on this time too. For TX it is known in
advance before the transmission if a timestamp based on the free running
cycle counter is needed. For RX it is impossible to know which timestamp
is needed before the packet is received and assigned to a socket.
Support late timestamp determination by a network device. Therefore, an
address/cookie is stored within the new netdev_data field of struct
skb_shared_hwtstamps. This address/cookie is provided to a new network
device function called ndo_get_tstamp(), which returns a timestamp based
on the normal/adjustable time or based on the free running cycle
counter. If function is not supported, then timestamp handling is not
changed.
This mechanism is intended for RX, but TX use is also possible.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
ptp_convert_timestamp() converts only the timestamp hwtstamp, which is
a field of the argument with the type struct skb_shared_hwtstamps *. So
a pointer to the hwtstamp field of this structure is sufficient.
Rework ptp_convert_timestamp() to use an argument of type ktime_t *.
This allows to add additional timestamp manipulation stages before the
call of ptp_convert_timestamp().
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The free running cycle counter of physical clocks called cycles shall be
used for hardware timestamps to enable synchronisation.
Introduce new flag SKBTX_HW_TSTAMP_USE_CYCLES, which signals driver to
provide a TX timestamp based on cycles if cycles are supported.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
ptp vclocks require a free running time for their timecounter.
Currently only a physical clock forced to free running is supported.
If vclocks are used, then the physical clock cannot be synchronized
anymore. The synchronized time is not available in hardware in this
case. As a result, timed transmission with TAPRIO hardware support
is not possible anymore.
If hardware would support a free running time additionally to the
physical clock, then the physical clock does not need to be forced to
free running. Thus, the physical clocks can still be synchronized
while vclocks are in use.
The physical clock could be used to synchronize the time domain of the
TSN network and trigger TAPRIO. In parallel vclocks can be used to
synchronize other time domains.
Introduce support for a free running cycle counter called cycles to
physical clocks. Rework ptp vclocks to use this free running cycle
counter. Default implementation is based on time of physical clock.
Thus, behavior of ptp vclocks based on physical clocks without free
running cycle counter is identical to previous behavior.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Since commit 4e30e98c4b4c ("dpaa2-mac: return -EPROBE_DEFER from dpaa2_mac_open in case the fwnode is not set")
@parent can't be NULL after the if. It's either the address
of the ->fwnode of @dpmacs or @fwnode in case of ACPI.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20220506200029.852310-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Lag state has become very complicated with many modes, flags, types and
port selections methods and future work will add additional features.
Add a debugfs to query the current lag state. A new directory named "lag"
will be created under the mlx5 debugfs directory. As the driver has
debugfs per pci function the location will be: <debugfs>/mlx5/<BDF>/lag
For example:
/sys/kernel/debug/mlx5/0000:08:00.0/lag
The following files are exposed:
- state: Returns "active" or "disabled". If "active" it means hardware
lag is active.
- members: Returns the BDFs of all the members of lag object.
- type: Returns the type of the lag currently configured. Valid only
if hardware lag is active.
* "roce" - Members are bare metal PFs.
* "switchdev" - Members are in switchdev mode.
* "multipath" - ECMP offloads.
- port_sel_mode: Returns the egress port selection method, valid
only if hardware lag is active.
* "queue_affinity" - Egress port is selected by
the QP/SQ affinity.
* "hash" - Egress port is selected by hash done on
each packet. Controlled by: xmit_hash_policy of the
bond device.
- flags: Returns flags that are specific per lag @type. Valid only if
hardware lag is active.
* "shared_fdb" - "on" or "off", if "on" single FDB is used.
- mapping: Returns the mapping which is used to select egress port.
Valid only if hardware lag is active.
If @port_sel_mode is "hash" returns the active egress ports.
The hash result will select only active ports.
if @port_sel_mode is "queue_affinity" returns the mapping
between the configured port affinity of the QP/SQ and actual
egress port. For example:
* 1:1 - Mapping means if the configured affinity is port 1
traffic will egress via port 1.
* 1:2 - Mapping means if the configured affinity is port 1
traffic will egress via port 2. This can happen
if port 1 is down or in active/backup mode and port 1
is backup.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
When in hardware lag and the NIC has more than 2 ports when one port
goes down need to distribute the traffic between the remaining
active ports.
For better spread in such cases instead of using 1-to-1 mapping and only
4 slots in the hash, use many.
Each port will have many slots that point to it. When a port goes down
go over all the slots that pointed to that port and spread them between
the remaining active ports. Once the port comes back restore the default
mapping.
We will have number_of_ports * MLX5_LAG_MAX_HASH_BUCKETS slots.
Each MLX5_LAG_MAX_HASH_BUCKETS belong to a different port.
The native mapping is such that:
port 1: The first MLX5_LAG_MAX_HASH_BUCKETS slots are: [1, 1, .., 1]
which means if a packet is hased into one of this slots it will hit the
wire via port 1.
port 2: The second MLX5_LAG_MAX_HASH_BUCKETS slots are: [2, 2, .., 2]
which means if a packet is hased into one of this slots it will hit the
wire via port2.
and this mapping is the same of the rest of the ports.
On a failover, lets say port 2 goes down (port 1, 3, 4 are still up).
the new mapping for port 2 will be:
port 2: The second MLX5_LAG_MAX_HASH_BUCKETS are: [1, 3, 1, 4, .., 4]
which means the mapping was changed from the native mapping to a mapping
that consists of only the active ports.
With this if a port goes down the traffic will be split between the
active ports randomly
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Combine dmesg lag prints into a single function.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Increase the define MLX5_MAX_PORTS to 4 as the driver is ready
to support NICs with 4 ports.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Refactor the entire lag code to use ldev->ports instead of hard-coded
defines (like MLX5_MAX_PORTS) for its operations.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Downstream patches will add support for lag over 4 ports.
In that mode we will only use hash as the uplink selection method.
Using hash instead of queue affinity (before this patch) offers key
advantages like:
- Align ports selection method with the method used by the bond device
- Better packets distribution where a single queue can transmit from
multiple ports (with queue affinity a queue is bound to a single port
regardless of the packet being sent).
- In case of failover we traffic is split between multiple ports and not
a single one like in queue affinity.
Going forward it was decided that queue affinity will be deprecated
as using hash provides a better user experience which means on 4 ports
HCAs hash will always be used.
Future work will add hash support for 2 ports HCAs as well.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
E-Switch currently doesn't support more than 2 E-Switch managers
being aggregated under a single hardware lag. Have specific checks
to disallow creating lag when the code doesn't support it.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Store the number of lag ports inside the lag object. Lag object is a single
shared object managing the lag state of multiple mlx5 devices on the same
physical HCA.
Downstream patches will allow hardware lag to be created over devices with
more than 2 ports.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
When search for a peer lag device we can filter based on that
device's capabilities.
Downstream patch will be less strict when filtering compatible devices
and remove the limitation where we require exact MLX5_MAX_PORTS and
change it to a range.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Use a lag specific lock instead of depending on external locks to
synchronise the lag creation/destruction.
With this, taking E-Switch mode lock is no longer needed for syncing
lag logic.
Cleanup any dead code that is left over and don't export functions that
aren't used outside the E-Switch core code.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
There is no need to expose E-Switch function for something that can be
checked with already present API inside lag code.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Devcom API is intended to be used between 2 devices only add this
implied assumption into the code and check when it's no true.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Downstream patches will add support for hardware lag with
more than 2 ports. Add a way for users to query the number of lag ports.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Currently, health recovery will reload driver to recover it from fatal
errors. During the driver's load process, it would wait for FW to set the
pre-init bit for up to 120 seconds, beyond this threshold it would abort
the load process. In some cases, such as a FW upgrade on the DPU, this
timeout period is insufficient, and the user has no way to recover the
host device.
To solve this issue, introduce a new FW pre-init timeout for health
recovery, which is set to 2 hours.
The timeout for devlink reload and probe will use the original one because
they are user triggered flows, and therefore should not have a
significantly long timeout, during which the user command would hang.
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Currently, removing a device needs to get the driver interface lock before
doing any cleanup. If the driver is waiting in a loop for FW init, there
is no way to cancel the wait, instead the device cleanup waits for the
loop to conclude and release the lock.
To allow immediate response to remove device commands, check the TEARDOWN
flag while waiting for FW init, and exit the loop if it has been set.
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Simon Horman says:
====================
nfp: support Corigine PCIE vendor ID
Historically the nfp driver has supported NFP chips with Netronome's
PCIE vendor ID. This patch extends the driver to also support NFP
chips, which at this point are assumed to be otherwise identical from
a software perspective, that have Corigine's PCIE vendor ID (0x1da8).
This patchset begins by cleaning up strings to make them:
* Vendor neutral for the NFP chip
* Relate to Corigine for the driver itself
It then adds support to the driver for the Corigine's PCIE vendor ID
====================
Link: https://lore.kernel.org/r/20220508173816.476357-1-simon.horman@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Historically the nfp driver has supported NFP chips with Netronome's
PCIE vendor ID. This patch extends the driver to also support NFP
chips, which at this point are assumed to be otherwise identical from
a software perspective, that have Corigine's PCIE vendor ID (0x1da8).
Also, Rename the macro definitions PCI_DEVICE_ID_NERTONEOME_NFPXXXX
to PCI_DEVICE_ID_NFPXXXX, as they are now used in conjunction with two
PCIE vendor IDs.
Signed-off-by: Yu Xiao <yu.xiao@corigine.com>
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Historically the nfp driver has supported NFP chips with Netronome's
PCIE vendor ID. In preparation for extending the to also support NFP
chips that have Corigine's PCIE vendor ID (0x1da8) make printk statements
relating to the chip vendor neutral.
An alternate approach is to set the string based on the PCI vendor ID.
In our judgement this proved to cumbersome so we have taken this simpler
approach.
Update strings relating to the driver to use Corigine, who have taken
over maintenance of the driver.
Signed-off-by: Yu Xiao <yu.xiao@corigine.com>
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
Here is a batman-adv bugfix:
- Don't skb_split skbuffs with frag_list, by Sven Eckelmann
* tag 'batadv-net-pullrequest-20220508' of git://git.open-mesh.org/linux-merge:
batman-adv: Don't skb_split skbuffs with frag_list
====================
Link: https://lore.kernel.org/r/20220508132110.20451-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There is a race between switchdev_bridge_port_offload() and the
dsa_port_switchdev_sync_attrs() call right below it.
When switchdev_bridge_port_offload() finishes, FDB entries have been
replayed by the bridge, but are scheduled for deferred execution later.
However dsa_port_switchdev_sync_attrs -> dsa_port_can_apply_vlan_filtering()
may impose restrictions on the vlan_filtering attribute and refuse
offloading.
When this happens, the delayed FDB entries will dereference dp->bridge,
which is a NULL pointer because we have stopped the process of
offloading this bridge.
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
Workqueue: dsa_ordered dsa_slave_switchdev_event_work
pc : dsa_port_bridge_host_fdb_del+0x64/0x100
lr : dsa_slave_switchdev_event_work+0x130/0x1bc
Call trace:
dsa_port_bridge_host_fdb_del+0x64/0x100
dsa_slave_switchdev_event_work+0x130/0x1bc
process_one_work+0x294/0x670
worker_thread+0x80/0x460
---[ end trace 0000000000000000 ]---
Error: dsa_core: Must first remove VLAN uppers having VIDs also present in bridge.
Fix the bug by doing what we do on the normal bridge leave path as well,
which is to wait until the deferred FDB entries complete executing, then
exit.
The placement of dsa_flush_workqueue() after switchdev_bridge_port_unoffload()
guarantees that both the FDB additions and deletions on rollback are waited for.
Fixes: d7d0d423dbaa ("net: dsa: flush switchdev workqueue when leaving the bridge")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220507134550.1849834-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-05-06
This series contains updates to ice driver only.
Ivan Vecera fixes a race with aux plug/unplug by delaying setting adev
until initialization is complete and adding locking.
Anatolii ensures VF queues are completely disabled before attempting to
reconfigure them.
Michal ensures stale Tx timestamps are cleared from hardware.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: fix PTP stale Tx timestamps cleanup
ice: clear stale Tx queue settings before configuring
ice: Fix race during aux device (un)plugging
====================
Link: https://lore.kernel.org/r/20220506174129.4976-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
100GbE Intel Wired LAN Driver Updates 2022-05-06
Marcin Szycik says:
This patchset adds support for systemd defined naming scheme for port
representors, as well as re-enables displaying PCI bus-info in ethtool.
bus-info information has previously been removed from ethtool for port
representors, as a workaround for a bug in lshw tool, where the tool would
sometimes display wrong descriptions for port representors/PF. Now the bug
has been fixed in lshw tool [1].
Removing the workaround can be considered a regression (user might be
running an older, unpatched version of lshw) (see [2] for discussion).
However, calling SET_NETDEV_DEV also produces the same effect as removing
the workaround, i.e. lshw is able to access PCI bus-info (this time not
via ethtool, but in some other way) and the bug can occur.
Adding SET_NETDEV_DEV is important, as it greatly improves netdev naming -
- port representors are named based on PF name. Currently port representors
are named "ethX", which might be confusing, especially when spawning VFs on
multiple PFs. Furthermore, it's currently harder to determine to which PF
does a particular port representor belong, as bus-info is not shown in
ethtool.
Consider the following three cases:
Case 1: current code - driver workaround in place, no SET_NETDEV_DEV,
lshw with or without fix. Port representors are not displayed because they
don't have bus-info (the workaround), PFs are labelled correctly:
$ sudo ./lshw -c net -businfo
Bus info Device Class Description
========================================================
pci@0000:02:00.0 ens6f0 network Ethernet Controller E810-XXV for SFP <-- PF
pci@0000:02:00.1 ens6f1 network Ethernet Controller E810-XXV for SFP
pci@0000:02:01.0 ens6f0v0 network Ethernet Adaptive Virtual Function <-- VF
pci@0000:02:01.1 ens6f0v1 network Ethernet Adaptive Virtual Function
...
Case 2: driver workaround in place, SET_NETDEV_DEV, no lshw fix. Port
representors have predictable names. lshw is able to get bus-info because
of SET_NETDEV_DEV and netdevs CAN be mislabelled:
$ sudo ./lshw -c net -businfo
Bus info Device Class Description
=============================================================
pci@0000:02:00.0 ens6f0npf0vf60 network Ethernet Controller E810-XXV for SFP <-- mislabeled port representor
pci@0000:02:00.1 ens6f1 network Ethernet Controller E810-XXV for SFP
pci@0000:02:01.0 ens6f0v0 network Ethernet Adaptive Virtual Function
pci@0000:02:01.1 ens6f0v1 network Ethernet Adaptive Virtual Function
...
pci@0000:02:00.0 ens6f0npf0vf26 network Ethernet interface
pci@0000:02:00.0 ens6f0 network Ethernet interface <-- mislabeled PF
pci@0000:02:00.0 ens6f0npf0vf81 network Ethernet interface
...
$ sudo ethtool -i ens6f0npf0vf60
driver: ice
...
bus-info:
...
Output of lshw would be the same with workaround removed; it does not
change the fact that lshw labels netdevs incorrectly, while at the same
time it prevents ethtool from displaying potentially useful data
(bus-info).
Case 3: workaround removed, SET_NETDEV_DEV, lshw fix:
$ sudo ./lshw -c net -businfo
Bus info Device Class Description
=============================================================
pci@0000:02:00.0 ens6f0npf0vf73 network Ethernet Controller E810-XXV for SFP
pci@0000:02:00.1 ens6f1 network Ethernet Controller E810-XXV for SFP
pci@0000:02:01.0 ens6f0v0 network Ethernet Adaptive Virtual Function
pci@0000:02:01.1 ens6f0v1 network Ethernet Adaptive Virtual Function
...
pci@0000:02:00.0 ens6f0npf0vf5 network Ethernet Controller E810-XXV for SFP
pci@0000:02:00.0 ens6f0 network Ethernet Controller E810-XXV for SFP
pci@0000:02:00.0 ens6f0npf0vf60 network Ethernet Controller E810-XXV for SFP
...
$ sudo ethtool -i ens6f0npf0vf73
driver: ice
...
bus-info: 0000:02:00.0
...
In this case poort representors have predictable names, netdevs are not
mislabelled in lshw, and bus-info is shown in ethtool.
[1] https://ezix.org/src/pkg/lshw/commit/9bf4e4c9c1
[2] https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20220321144731.3935-1-marcin.szycik@linux.intel.com
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
Revert "ice: Hide bus-info in ethtool for PRs in switchdev mode"
ice: link representors to PCI device
====================
Link: https://lore.kernel.org/r/20220506180052.5256-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This fixes the following error caused by a race condition between
phydev->adjust_link() and a MDIO transaction in the phy interrupt
handler. The issue was reproduced with the ethernet FEC driver and a
micrel KSZ9031 phy.
[ 146.195696] fec 2188000.ethernet eth0: MDIO read timeout
[ 146.201779] ------------[ cut here ]------------
[ 146.206671] WARNING: CPU: 0 PID: 571 at drivers/net/phy/phy.c:942 phy_error+0x24/0x6c
[ 146.214744] Modules linked in: bnep imx_vdoa imx_sdma evbug
[ 146.220640] CPU: 0 PID: 571 Comm: irq/128-2188000 Not tainted 5.18.0-rc3-00080-gd569e86915b7 #9
[ 146.229563] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[ 146.236257] unwind_backtrace from show_stack+0x10/0x14
[ 146.241640] show_stack from dump_stack_lvl+0x58/0x70
[ 146.246841] dump_stack_lvl from __warn+0xb4/0x24c
[ 146.251772] __warn from warn_slowpath_fmt+0x5c/0xd4
[ 146.256873] warn_slowpath_fmt from phy_error+0x24/0x6c
[ 146.262249] phy_error from kszphy_handle_interrupt+0x40/0x48
[ 146.268159] kszphy_handle_interrupt from irq_thread_fn+0x1c/0x78
[ 146.274417] irq_thread_fn from irq_thread+0xf0/0x1dc
[ 146.279605] irq_thread from kthread+0xe4/0x104
[ 146.284267] kthread from ret_from_fork+0x14/0x28
[ 146.289164] Exception stack(0xe6fa1fb0 to 0xe6fa1ff8)
[ 146.294448] 1fa0: 00000000 00000000 00000000 00000000
[ 146.302842] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 146.311281] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 146.318262] irq event stamp: 12325
[ 146.321780] hardirqs last enabled at (12333): [<c01984c4>] __up_console_sem+0x50/0x60
[ 146.330013] hardirqs last disabled at (12342): [<c01984b0>] __up_console_sem+0x3c/0x60
[ 146.338259] softirqs last enabled at (12324): [<c01017f0>] __do_softirq+0x2c0/0x624
[ 146.346311] softirqs last disabled at (12319): [<c01300ac>] __irq_exit_rcu+0x138/0x178
[ 146.354447] ---[ end trace 0000000000000000 ]---
With the FEC driver phydev->adjust_link() calls fec_enet_adjust_link()
calls fec_stop()/fec_restart() and both these function reset and
temporary disable the FEC disrupting any MII transaction that
could be happening at the same time.
fec_enet_adjust_link() and phy_read() can be running at the same time
when we have one additional interrupt before the phy_state_machine() is
able to terminate.
Thread 1 (phylib WQ) | Thread 2 (phy interrupt)
|
| phy_interrupt() <-- PHY IRQ
| handle_interrupt()
| phy_read()
| phy_trigger_machine()
| --> schedule phylib WQ
|
|
phy_state_machine() |
phy_check_link_status() |
phy_link_change() |
phydev->adjust_link() |
fec_enet_adjust_link() |
--> FEC reset | phy_interrupt() <-- PHY IRQ
| phy_read()
|
Fix this by acquiring the phydev lock in phy_interrupt().
Link: https://lore.kernel.org/all/20220422152612.GA510015@francesco-nb.int.toradex.com/
Fixes: c974bdbc3e77 ("net: phy: Use threaded IRQ, to allow IRQ from sleeping devices")
cc: <stable@vger.kernel.org>
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220506060815.327382-1-francesco.dolcini@toradex.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The W=2 build pointed out that the code wasn't initializing all the
variables in the dim_cq_moder declarations with the struct initializers.
The net change here is zero since these structs were already static
const globals and were initialized with zeros by the compiler, but
removing compiler warnings has value in and of itself.
lib/dim/net_dim.c: At top level:
lib/dim/net_dim.c:54:9: warning: missing initializer for field ‘comps’ of ‘const struct dim_cq_moder’ [-Wmissing-field-initializers]
54 | NET_DIM_RX_EQE_PROFILES,
| ^~~~~~~~~~~~~~~~~~~~~~~
In file included from lib/dim/net_dim.c:6:
./include/linux/dim.h:45:13: note: ‘comps’ declared here
45 | u16 comps;
| ^~~~~
and repeats for the tx struct, and once you fix the comps entry then
the cq_period_mode field needs the same treatment.
Use the commonly accepted style to indicate to the compiler that we
know what we're doing, and add a comma at the end of each struct
initializer to clean up the issue, and use explicit initializers
for the fields we are initializing which makes the compiler happy.
While here and fixing these lines, clean up the code slightly with
a fix for the super long lines by removing the word "_MODERATION" from a
couple defines only used in this file.
Fixes: f8be17b81d44 ("lib/dim: Fix -Wunused-const-variable warnings")
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Link: https://lore.kernel.org/r/20220507011038.14568-1-jesse.brandeburg@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Eliminate the follow smatch warning:
net/rose/rose_route.c:1136 rose_node_show() warn: inconsistent
indenting.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220507034207.18651-1-jiapeng.chong@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The initial code used roundup() to round the starting time to
a multiple of a period. This generated an error on 32-bit
systems, so was replaced with DIV_ROUND_UP_ULL().
However, this truncates to 32-bits on a 64-bit system. Replace
with DIV64_U64_ROUND_UP() instead.
Fixes: b325af3cfab9 ("ptp: ocp: Add signal generators and update sysfs nodes")
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Link: https://lore.kernel.org/r/20220506223739.1930-2-jonathan.lemon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix the missing pci_disable_device() before return
from tulip_init_one() in the error handling case.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20220506094250.3630615-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If ionic_map_bars() fails, pci_release_regions() need be called.
Fixes: fbfb8031533c ("ionic: Add hardware init and device commands")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20220506034040.2614129-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
- thinkpad_acpi AMD suspend/resume + fan detection fixes
- two other small fixes
- one hardware-id addition
* tag 'platform-drivers-x86-v5.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/surface: aggregator: Fix initialization order when compiling as builtin module
platform/surface: gpe: Add support for Surface Pro 8
platform/x86/intel: Fix 'rmmod pmt_telemetry' panic
platform/x86: thinkpad_acpi: Correct dual fan probe
platform/x86: thinkpad_acpi: Add a s2idle resume quirk for a number of laptops
platform/x86: thinkpad_acpi: Convert btusb DMI list to quirks
|
|
Guangbin Huang says:
====================
net: hns3: updates for -next
This series includes some updates for the HNS3 ethernet driver.
Change logs:
V1 -> V2:
- Fix some sparse warnings of patch 3# and 4#.
- Add patch #6 to fix sparse warnings of incorrect type of argument.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
hclge_comm_get_rss_indir_tbl
The argument rss_ind_tbl_size is type u16 in function definition of
hclge_comm_get_rss_indir_tbl(), but it is set to type __le16 in function
declaration by mistake, so fix it.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds a new mailbox opcode to query map relation between
vf ring and vector.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch uses __le16/__32 to define mailbox data structures. Then byte
order conversion are added for mailbox messages from VF to PF.
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, hns3 mailbox processing between PF and VF missed to convert
message byte order and use data type u16 instead of __le16 for mailbox
data process. These processes may cause problems between different
architectures.
So this patch uses __le16/__le32 data type to define mailbox data
structures. To be compatible with old hns3 driver, these structures use
one-byte alignment. Then byte order conversions are added to mailbox
messages from PF to VF.
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Vector0 is used for common interrupt control events and is
irrelevant to performance. Currently, the driver sets the
default affinity of vector0 to NUMA nodes, which is unnecessary.
Therefore, the default setting is removed, and the driver does
not set the affinity for vector0.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When set tx-buf-size as 0 by ethtool, hns3_init_tx_spare_buffer()
will return directly and priv->ring->tx_spare->len is uninitialized,
then print function access priv->ring->tx_spare->len will cause
this issue.
When set tx-buf-size as 0 by ethtool, the print function will
print 0 directly and not access priv->ring->tx_spare->len.
Fixes: 2373b35c24ff ("net: hns3: add log for setting tx spare buf size")
Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|