Age | Commit message (Collapse) | Author |
|
Christoph reported a divide by zero bug in mptcp_recvmsg():
divide error: 0000 [#1] PREEMPT SMP
CPU: 1 PID: 19978 Comm: syz-executor.6 Not tainted 6.4.0-rc2-gffcc7899081b #20
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
RIP: 0010:__tcp_select_window+0x30e/0x420 net/ipv4/tcp_output.c:3018
Code: 11 ff 0f b7 cd c1 e9 0c b8 ff ff ff ff d3 e0 89 c1 f7 d1 01 cb 21 c3 eb 17 e8 2e 83 11 ff 31 db eb 0e e8 25 83 11 ff 89 d8 99 <f7> 7c 24 04 29 d3 65 48 8b 04 25 28 00 00 00 48 3b 44 24 10 75 60
RSP: 0018:ffffc90000a07a18 EFLAGS: 00010246
RAX: 000000000000ffd7 RBX: 000000000000ffd7 RCX: 0000000000040000
RDX: 0000000000000000 RSI: 000000000003ffff RDI: 0000000000040000
RBP: 000000000000ffd7 R08: ffffffff820cf297 R09: 0000000000000001
R10: 0000000000000000 R11: ffffffff8103d1a0 R12: 0000000000003f00
R13: 0000000000300000 R14: ffff888101cf3540 R15: 0000000000180000
FS: 00007f9af4c09640(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b33824000 CR3: 000000012f241001 CR4: 0000000000170ee0
Call Trace:
<TASK>
__tcp_cleanup_rbuf+0x138/0x1d0 net/ipv4/tcp.c:1611
mptcp_recvmsg+0xcb8/0xdd0 net/mptcp/protocol.c:2034
inet_recvmsg+0x127/0x1f0 net/ipv4/af_inet.c:861
____sys_recvmsg+0x269/0x2b0 net/socket.c:1019
___sys_recvmsg+0xe6/0x260 net/socket.c:2764
do_recvmmsg+0x1a5/0x470 net/socket.c:2858
__do_sys_recvmmsg net/socket.c:2937 [inline]
__se_sys_recvmmsg net/socket.c:2953 [inline]
__x64_sys_recvmmsg+0xa6/0x130 net/socket.c:2953
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f9af58fc6a9
Code: 5c c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4f 37 0d 00 f7 d8 64 89 01 48
RSP: 002b:00007f9af4c08cd8 EFLAGS: 00000246 ORIG_RAX: 000000000000012b
RAX: ffffffffffffffda RBX: 00000000006bc050 RCX: 00007f9af58fc6a9
RDX: 0000000000000001 RSI: 0000000020000140 RDI: 0000000000000004
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000f00 R11: 0000000000000246 R12: 00000000006bc05c
R13: fffffffffffffea8 R14: 00000000006bc050 R15: 000000000001fe40
</TASK>
mptcp_recvmsg is allowed to release the msk socket lock when
blocking, and before re-acquiring it another thread could have
switched the sock to TCP_LISTEN status - with a prior
connect(AF_UNSPEC) - also clearing icsk_ack.rcv_mss.
Address the issue preventing the disconnect if some other process is
concurrently performing a blocking syscall on the same socket, alike
commit 4faeee0cf8a5 ("tcp: deny tcp_disconnect() when threads are waiting").
Fixes: a6b118febbab ("mptcp: add receive buffer auto-tuning")
Cc: stable@vger.kernel.org
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/404
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tested-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently the mptcp code has assumes that disconnect() can fail only
at mptcp_sendmsg_fastopen() time - to avoid a deadlock scenario - and
don't even bother returning an error code.
Soon mptcp_disconnect() will handle more error conditions: let's track
them explicitly.
As a bonus, explicitly annotate TCP-level disconnect as not failing:
the mptcp code never blocks for event on the subflows.
Fixes: 7d803344fdc3 ("mptcp: fix deadlock in fastopen error path")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tested-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch fixes a warning in the ena documentation
file identified by the kernel automatic tools.
The patch also adds a missing newline between
sections.
Signed-off-by: David Arinzon <darinzon@amazon.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306171804.U7E92zoE-lkp@intel.com/
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230620133544.32584-1-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If there is no net-memcg associated with the sock, don't bother
calculating its memory usage for charge.
Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
Link: https://lore.kernel.org/r/20230620092712.16217-1-wuyun.abel@bytedance.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add the test for additional reference to chains that are explicitly
created by RTM_NEWCHAIN message.
The test result:
1..1
ok 1 c2b4 - soft lockup alarm will be not generated after delete the prio 0
filter of the chain
This is a follow up to commit c9a82bec02c3 ("net/sched: cls_api: Fix lockup on flushing explicitly created chain").
Signed-off-by: Mingshuai Ren <renmingshuai@huawei.com>
Acked-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Victor Nogueira <victor@mojatatu.com>
Link: https://lore.kernel.org/r/20230620014939.2034054-1-renmingshuai@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The Micrel KS8851 can be attached to SPI or parallel bus and the
difference is expressed in compatibles. Allow common SPI properties
when this is a SPI variant and narrow the parallel memory bus properties
to the second case.
This fixes dtbs_check warning:
qcom-msm8960-cdp.dtb: ethernet@0: Unevaluated properties are not allowed ('spi-max-frequency' was unexpected)
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230619170134.65395-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
WCN3990 comes with two chains - CH0 and CH1 - where each takes VDD
regulator. It seems VDD_CH1 is optional (Linux driver does not care
about it), so document it to fix dtbs_check warnings like:
sdm850-lenovo-yoga-c630.dtb: bluetooth: 'vddch1-supply' does not match any of the regexes: 'pinctrl-[0-9]+'
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230617165716.279857-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When port-to-port forwarding for interfaces in HSR node is enabled,
disable promiscuous mode since L2 frame forward happens at the
offloaded hardware.
Signed-off-by: Ravi Gunasekaran <r-gunasekaran@ti.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230614114710.31400-1-r-gunasekaran@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Christian Marangi says:
====================
leds: trigger: netdev: add additional modes
This is a continue of [1]. It was decided to take a more gradual
approach to implement LEDs support for switch and phy starting with
basic support and then implementing the hw control part when we have all
the prereq done.
This should be the final part for the netdev trigger.
I added net-next tag and added netdev mailing list since I was informed
that this should be merged with netdev branch.
We collect some info around and we found a good set of modes that are
common in almost all the PHY and Switch.
These modes are:
- Modes for dedicated link speed(10, 100, 1000 mbps). Additional mode
can be added later following this example.
- Modes for half and full duplex.
The original idea was to add hw control only modes.
While the concept makes sense in practice it would results in lots of
additional code and extra check to make sure we are setting correct modes.
With the suggestion from Andrew it was pointed out that using the ethtool
APIs we can actually get the current link speed and duplex and this
effectively removed the problem of having hw control only modes since we
can fallback to software.
Since these modes are supported by software, we can skip providing an
user for this in the LED driver to support hw control for these new modes
(that will come right after this is merged) and prevent this to be another
multi subsystem series.
For link speed and duplex we use ethtool APIs.
To call ethtool APIs, rtnl lock is needed but this can be skipped on
handling netdev events as the lock is already held.
[1] https://lore.kernel.org/lkml/20230216013230.22978-1-ansuelsmth@gmail.com/
====================
Link: https://lore.kernel.org/r/20230619204700.6665-1-ansuelsmth@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Expose hw_control status via sysfs for the netdev trigger to give
userspace better understanding of the current state of the trigger and
the LED.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Acked-by: Lee Jones <lee@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add additional modes for specific link duplex. Use ethtool APIs to get the
current link duplex and enable the LED accordingly. Under netdev event
handler the rtnl lock is already held and is not needed to be set to
access ethtool APIs.
This is especially useful for PHY and Switch that supports LEDs hw
control for specific link duplex.
Add additional modes:
- half_duplex: Turn on LED when link is half duplex
- full_duplex: Turn on LED when link is full duplex
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Lee Jones <lee@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add additional modes for specific link speed. Use ethtool APIs to get the
current link speed and enable the LED accordingly. Under netdev event
handler the rtnl lock is already held and is not needed to be set to
access ethtool APIs.
This is especially useful for PHY and Switch that supports LEDs hw
control for specific link speed. (example scenario a PHY that have 2 LED
connected one green and one orange where the green is turned on with
1000mbps speed and orange is turned on with 10mpbs speed)
On mode set from sysfs we check if we have enabled split link speed mode
and reject enabling generic link mode to prevent wrong and redundant
configuration.
Rework logic on the set baseline state to support these new modes to
select if we need to turn on or off the LED.
Add additional modes:
- link_10: Turn on LED when link speed is 10mbps
- link_100: Turn on LED when link speed is 100mbps
- link_1000: Turn on LED when link speed is 1000mbps
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Lee Jones <lee@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Link VF representors to parent PCI device to benefit from
systemd defined naming scheme.
Without this change the representor is visible as ethN.
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20230620144855.288443-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
'selftests-preparations-for-out-of-order-operations-patches-in-mlxsw'
Petr Machata says:
====================
selftests: Preparations for out-of-order-operations patches in mlxsw
The mlxsw driver currently makes the assumption that the user applies
configuration in a bottom-up manner. Thus netdevices need to be added to
the bridge before IP addresses are configured on that bridge or SVI added
on top of it. Enslaving a netdevice to another netdevice that already has
uppers is in fact forbidden by mlxsw for this reason. Despite this safety,
it is rather easy to get into situations where the offloaded configuration
is just plain wrong.
Over the course of the following several patchsets, mlxsw code is going to
be adjusted to diminish the space of wrongly offloaded configurations.
Ideally the offload state will reflect the actual state, regardless of the
sequence of operation used to construct that state.
Several selftests build configurations that will not be offloadable in the
future on some systems. The reason is that what will get offloaded is the
actual configuration, not the configuration steps.
For example, when a port is added to a bridge that has an IP address, that
bridge will get a RIF, which it would not have with the current code. But
on Nvidia Spectrum-1 machines, MAC addresses of all RIFs need to have the
same prefix, which the bridge will violate. The RIF thus couldn't be
created, and the enslavement is therefore canceled, because it would lead
to an unoffloadable configuration. This breaks some selftests.
In this patchset, adjust selftests to avoid the configurations that mlxsw
would be incapable of offloading, while maintaining relevance with regards
to the feature that is being tested. There are generally two cases of
fixes:
- Disabling IPv6 autogen on bridges that do not participate in routing,
either because of the abovementioned requirement to keep the same MAC
prefix on all in-HW router interfaces, or, on 802.1ad bridges, because
in-HW router interfaces are not supported at all.
- Setting the bridge MAC address to what it will become after the first
member port is attached, so that the in-HW router interface is created
with a supported MAC address.
The patchset is then split thus:
- Patches #1-#7 adjust generic selftests
- Patches #8-#16 adjust mlxsw-specific selftests
====================
Link: https://lore.kernel.org/r/cover.1687265905.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In a future patch, mlxsw will start adding RIFs to uppers of front panel
port netdevices, if they have an IP address.
At the time that the front panel port is enslaved to the bridge, the bridge
MAC address does not have the same prefix as other interfaces in the
system. On Nvidia Spectrum-1 machines all the RIFs have to have the same
38-bit MAC address prefix. Since the bridge does not obey this limitation,
the RIF cannot be created, and the enslavement attempt is vetoed on the
grounds of the configuration not being offloadable.
The bridge eventually inherits MAC address from its first member, after the
enslavement is acked. A number of (mainly VXLAN) selftests already work
around the problem by setting the MAC address to whatever it will
eventually be anyway. Do the same for this selftest.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In a future patch, mlxsw will start adding RIFs to uppers of front panel
port netdevices, if they have an IP address.
At the time that the front panel port is enslaved to the bridge (this holds
for all bridges used here), the bridge MAC address does not have the same
prefix as other interfaces in the system. On Nvidia Spectrum-1 machines all
the RIFs have to have the same 38-bit MAC address prefix. Since the bridge
does not obey this limitation, the RIF cannot be created, and the
enslavement attempt is vetoed on the grounds of the configuration not being
offloadable.
The selftest itself however checks various aspects of VXLAN offloading and
the bridges do not need to participate in routing traffic. The IP addresses
or the RIFs are irrelevant.
Fix by disabling automatic IPv6 address generation for the HW-offloaded
bridges in this selftest, thus exempting them from mlxsw router attention.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In a future patch, mlxsw will start adding RIFs to uppers of front panel
port netdevices, if they have an IP address.
At the time that the front panel port is enslaved to the bridge, the bridge
MAC address does not have the same prefix as other interfaces in the
system. On Nvidia Spectrum-1 machines all the RIFs have to have the same
38-bit MAC address prefix. Since the bridge does not obey this limitation,
the RIF cannot be created, and the enslavement attempt is vetoed on the
grounds of the configuration not being offloadable.
The selftest itself however checks vetoing of a different aspect of the
configuration and the bridge does not need to participate in routing
traffic. The IP address or the RIF are irrelevant.
Fix by disabling automatic IPv6 address generation for the HW-offloaded
bridge in this selftest, thus exempting it from mlxsw router attention.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In a future patch, mlxsw will start adding RIFs to uppers of front panel
port netdevices, if they have an IP address.
At the time that the front panel port is enslaved to the bridge (this holds
for both bridges used here), the bridge MAC address does not have the same
prefix as other interfaces in the system. On Nvidia Spectrum-1 machines all
the RIFs have to have the same 38-bit MAC address prefix. Since the bridge
does not obey this limitation, the RIF cannot be created, and the
enslavement attempt is vetoed on the grounds of the configuration not being
offloadable.
The selftest itself however checks traffic prioritization and scheduling,
and the bridges serve for their L2 forwarding capabilities, and do not need
to participate in routing traffic. The IP addresses or the RIFs are
irrelevant.
Fix by disabling automatic IPv6 address generation for the HW-offloaded
bridges in this selftest, thus exempting them from mlxsw router attention.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In a future patch, mlxsw will start adding RIFs to uppers of front panel
port netdevices, if they have an IP address.
At the time that the front panel port is enslaved to the bridge (this holds
for both bridges used here), the bridge MAC address does not have the same
prefix as other interfaces in the system. On Nvidia Spectrum-1 machines all
the RIFs have to have the same 38-bit MAC address prefix. Since the bridge
does not obey this limitation, the RIF cannot be created, and the
enslavement attempt is vetoed on the grounds of the configuration not being
offloadable.
The selftest itself however checks traffic prioritization and scheduling,
and the bridges serve for their L2 forwarding capabilities, and do not need
to participate in routing traffic. The IP addresses or the RIFs are
irrelevant.
Fix by disabling automatic IPv6 address generation for the HW-offloaded
bridges in this selftest, thus exempting them from mlxsw router attention.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In a future patch, mlxsw will start adding RIFs to uppers of front panel
port netdevices, if they have an IP address.
At the time that the front panel port is enslaved to the bridge, the bridge
MAC address does not have the same prefix as other interfaces in the
system. On Nvidia Spectrum-1 machines all the RIFs have to have the same
38-bit MAC address prefix. Since the bridge does not obey this limitation,
the RIF cannot be created, and the enslavement attempt is vetoed on the
grounds of the configuration not being offloadable.
The selftest itself however checks DCB DSCP-based prioritization, and the
bridge serves for its L2 forwarding capabilities, and does not need to
participate in routing traffic. The IP address or the RIF are irrelevant.
Fix by disabling automatic IPv6 address generation for the HW-offloaded
bridge in this selftest, thus exempting it from mlxsw router attention.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In a future patch, mlxsw will start adding RIFs to uppers of front panel
port netdevices, if they have an IP address.
At the time that the front panel port is enslaved to the bridge, the bridge
MAC address does not have the same prefix as other interfaces in the
system. On Nvidia Spectrum-1 machines all the RIFs have to have the same
38-bit MAC address prefix. Since the bridge does not obey this limitation,
the RIF cannot be created, and the enslavement attempt is vetoed on the
grounds of the configuration not being offloadable.
The selftest itself however checks how many mirroring sessions a machine is
capable of offloading. The IP address or the RIF are irrelevant.
Fix by disabling automatic IPv6 address generation for the HW-offloaded
bridge in this selftest, thus exempting it from mlxsw router attention.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In a future patch, mlxsw will start adding RIFs to uppers of front panel
port netdevices, if they have an IP address.
At the time that the front panel port is enslaved to the bridge (this holds
for all bridges used here), the bridge MAC address does not have the same
prefix as other interfaces in the system. On Nvidia Spectrum-1 machines all
the RIFs have to have the same 38-bit MAC address prefix. Since the bridge
does not obey this limitation, the RIF cannot be created, and the
enslavement attempt is vetoed on the grounds of the configuration not being
offloadable.
The selftest itself however checks whether a different vetoed aspect of the
configuration provides an extack. The IP address or the RIF are irrelevant.
Fix by disabling automatic IPv6 address generation for the HW-offloaded
bridges in this selftest, thus exempting them from mlxsw router attention.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In a future patch, mlxsw will start adding RIFs to uppers of front panel
port netdevices, if they have an IP address.
The swp enslavement to the 802.1ad bridge is not allowed, because RIFs are
not allowed to be created for 802.1ad bridges, but the address indicates
one needs to be created. Thus the veto selftests fail already during the
port enslavement. Then the attempt to create a VLAN on top of the same
bridge is not vetoed, because the bridge is not related to mlxsw, and the
selftest fails.
Fix by disabling automatic IPv6 address generation for the bridges in this
selftest, thus exempting them from the mlxsw router attention.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In a future patch, mlxsw will start adding RIFs to uppers of front panel
port netdevices, if they have an IP address.
At the time that the front panel port is enslaved to the bridge, the bridge
MAC address does not have the same prefix as other interfaces in the
system. On Nvidia Spectrum-1 machines all the RIFs have to have the same
38-bit MAC address prefix. Since the bridge does not obey this limitation,
the RIF cannot be created, and the enslavement attempt is vetoed on the
grounds of the configuration not being offloadable.
The bridge eventually inherits MAC address from its first member, after the
enslavement is acked. A number of (mainly VXLAN) selftests already work
around the problem by setting the MAC address to whatever it will
eventually be anyway. Do the same here.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In a future patch, mlxsw will start adding RIFs to uppers of front panel
port netdevices, if they have an IP address.
At the time that the front panel port is enslaved to the bridge, the bridge
MAC address does not have the same prefix as other interfaces in the
system. On Nvidia Spectrum-1 machines all the RIFs have to have the same
38-bit MAC address prefix. Since the bridge does not obey this limitation,
the RIF cannot be created, and the enslavement attempt is vetoed on the
grounds of the configuration not being offloadable.
The bridge eventually inherits MAC address from its first member, after the
enslavement is acked. A number of (mainly VXLAN) selftests already work
around the problem by setting the MAC address to whatever it will
eventually be anyway. Do the same for several mirror_gre selftests.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In a future patch, mlxsw will start adding RIFs to uppers of front panel
port netdevices, if they have an IP address.
At the time that the front panel port is enslaved to the bridge, the bridge
MAC address does not have the same prefix as other interfaces in the
system. On Nvidia Spectrum-1 machines all the RIFs have to have the same
38-bit MAC address prefix. Since the bridge does not obey this limitation,
the RIF cannot be created, and the enslavement attempt is vetoed on the
grounds of the configuration not being offloadable.
These two selftests however check mirroring traffic to a gretap netdevice.
The bridge here does not participate in routing traffic and the IP address
or the RIF are irrelevant.
Fix by disabling automatic IPv6 address generation for the HW-offloaded
bridges in these selftests, thus exempting them from mlxsw router
attention. Since the bridges are only used for L2 forwarding, this change
should not hinder usefulness of this selftest for testing SW datapath or HW
datapaths in other devices.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In a future patch, mlxsw will start adding RIFs to uppers of front panel
port netdevices, if they have an IP address.
At the time that the front panel port is enslaved to the bridge, the bridge
MAC address does not have the same prefix as other interfaces in the
system. On Nvidia Spectrum-1 machines all the RIFs have to have the same
38-bit MAC address prefix. Since the bridge does not obey this limitation,
the RIF cannot be created, and the enslavement attempt is vetoed on the
grounds of the configuration not being offloadable.
The selftest itself however checks whether skbedit changes packet priority
as appropriate. The bridge thus does not need to participate in routing
traffic and the IP address or the RIF are irrelevant.
Fix by disabling automatic IPv6 address generation for the HW-offloaded
bridge in this selftest, thus exempting it from mlxsw router attention.
Since the bridge is only used for L2 forwarding, this change should not
hinder usefulness of this selftest for testing SW datapath or HW datapaths
in other devices.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In a future patch, mlxsw will start adding RIFs to uppers of front panel
port netdevices, if they have an IP address.
At the time that the front panel port is enslaved to the bridge, the bridge
MAC address does not have the same prefix as other interfaces in the
system. On Nvidia Spectrum-1 machines all the RIFs have to have the same
38-bit MAC address prefix. Since the bridge does not obey this limitation,
the RIF cannot be created, and the enslavement attempt is vetoed on the
grounds of the configuration not being offloadable.
The selftest itself however checks operation of pedit on IPv4 and IPv6
dsfield and its parts. The bridge thus does not need to participate in
routing traffic and the IP address or the RIF are irrelevant.
Fix by disabling automatic IPv6 address generation for the HW-offloaded
bridge in this selftest, thus exempting it from mlxsw router attention.
Since the bridge is only used for L2 forwarding, this change should not
hinder usefulness of this selftest for testing SW datapath or HW datapaths
in other devices.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In a future patch, mlxsw will start adding RIFs to uppers of front panel
port netdevices, if they have an IP address.
This will cause this selftest to fail spuriously. The swp enslavement to
the 802.1ad bridge is not allowed, because RIFs are not allowed to be
created for 802.1ad bridges, but the address indicates one needs to be
created.
Fix by disabling automatic IPv6 address generation for the HW-offloaded
bridge in this selftest, thus exempting it from mlxsw router attention.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In a future patch, mlxsw will start adding RIFs to uppers of front panel
port netdevices, if they have an IP address.
This will cause this selftest to fail spuriously. The swp enslavement to
the 802.1ad bridge is not allowed, because RIFs are not allowed to be
created for 802.1ad bridges, but the address indicates one needs to be
created.
Fix by disabling automatic IPv6 address generation for the HW-offloaded
bridge in this selftest, thus exempting it from mlxsw router attention.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2023-06-21
We've added 7 non-merge commits during the last 14 day(s) which contain
a total of 7 files changed, 181 insertions(+), 15 deletions(-).
The main changes are:
1) Fix a verifier id tracking issue with scalars upon spill,
from Maxim Mikityanskiy.
2) Fix NULL dereference if an exception is generated while a BPF
subprogram is running, from Krister Johansen.
3) Fix a BTF verification failure when compiling kernel with LLVM_IAS=0,
from Florent Revest.
4) Fix expected_attach_type enforcement for kprobe_multi link,
from Jiri Olsa.
5) Fix a bpf_jit_dump issue for x86_64 to pick the correct JITed image,
from Yonghong Song.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Force kprobe multi expected_attach_type for kprobe_multi link
bpf/btf: Accept function names that contain dots
selftests/bpf: add a test for subprogram extables
bpf: ensure main program has an extable
bpf: Fix a bpf_jit_dump issue for x86_64 with sysctl bpf_jit_enable.
selftests/bpf: Add test cases to assert proper ID tracking on spill
bpf: Fix verifier id tracking of scalars on spill
====================
Link: https://lore.kernel.org/r/20230621101116.16122-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A single regression fix for a regression fix:
For a long time the tick was aligned to clock MONOTONIC so that the
tick event happened at a multiple of nanoseconds per tick starting
from clock MONOTONIC = 0.
At some point this changed as the refined jiffies clocksource which is
used during boot before the TSC or other clocksources becomes usable,
was adjusted with a boot offset, so that time 0 is closer to the point
where the kernel starts.
This broke the assumption in the tick code that when the tick setup
happens early on ktime_get() will return a multiple of nanoseconds per
tick. As a consequence applications which aligned their periodic
execution so that it does not collide with the tick were not longer
guaranteed that the tick period starts from time 0.
The fix for this regression was to realign the tick when it is
initially set up to a multiple of tick periods. That works as long as
the underlying tick device supports periodic mode, but breaks under
certain conditions when the tick device supports only one shot mode.
Depending on the offset, the alignment delta to clock MONOTONIC can
get in a range where the minimal programming delta of the underlying
clock event device is larger than the calculated delta to the next
tick. This results in a boot hang as the tick code tries to play catch
up, but as the tick never fires jiffies are not advanced so it keeps
trying for ever.
Solve this by moving the tick alignement into the NOHZ / HIGHRES
enablement code because at that point it is guaranteed that the
underlying clocksource is high resolution capable and not longer
depending on the tick.
This is far before user space starts, so at the point where
applications try to align their timers, the old behaviour of the tick
happening at a multiple of nanoseconds per tick starting from clock
MONOTONIC = 0 is restored"
* tag 'timers-urgent-2023-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/common: Align tick period during sched_timer setup
|
|
Pull virtio fix from Michael Tsirkin:
"A last minute revert to fix a regression"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
Revert "virtio-blk: support completion batching for the IRQ path"
|
|
This reverts commit e7b813b32a42a3a6281a4fd9ae7700a0257c1d50 (and the
subsequent fix for it: 41a15855c1ee "efi: random: fix NULL-deref when
refreshing seed").
It turns otu to cause non-deterministic boot stalls on at least a HP
6730b laptop.
Reported-and-bisected-by: Sami Korkalainen <sami.korkalainen@proton.me>
Link: https://lore.kernel.org/all/GQUnKz2al3yke5mB2i1kp3SzNHjK8vi6KJEh7rnLrOQ24OrlljeCyeWveLW9pICEmB9Qc8PKdNt3w1t_g3-Uvxq1l8Wj67PpoMeWDoH8PKk=@proton.me/
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fix from Mark Brown:
"One last fix for SPI, just a simple fix for incorrect handling of
probe deferral for DMA in the Qualcomm GENI driver"
* tag 'spi-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-geni-qcom: correctly handle -EPROBE_DEFER from dma_request_chan()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"One simple fix for v6.4, some incorrectly specified bitfield masks in
the PCA9450 driver"
* tag 'regulator-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: pca9450: Fix LDO3OUT and LDO4OUT MASK
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap fix from Mark Brown:
"One more fix for v6.4
The earlier fix to take account of the register data size when
limiting raw register writes exposed the fact that the Intel AVMM bus
was incorrectly specifying too low a limit on the maximum data
transfer, it is only capable of transmitting one register so had set a
transfer size limit that couldn't fit both the value and the the
register address into a single message"
* tag 'regmap-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: spi-avmm: Fix regmap_bus max_raw_write
|
|
Currently for each timestamp frame, the SW needs to go and read the
received timestamp over the MDIO bus. But the HW has the capability
to store the received nanoseconds part and the least significant two
bits of the seconds in the reserved field of the PTP header. In this
way we could save few MDIO transactions (actually a little more
transactions because the access to the PTP registers are indirect)
for each received frame.
Instead of reading the rest of seconds part of the timestamp of the
frame using MDIO transactions schedule PTP worker thread to read the
seconds part every 500ms and then for each of the received frames use
this information. Because if for example running with 512 frames per
second, there is no point to read 512 times the second part.
Doing all these changes will give a great CPU usage performance.
Running ptp4l with logSyncInterval of -9 will give a ~60% CPU
improvement.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Rather than assign the user pointer to msghdr->msg_control, assign it
to msghdr->msg_control_user to make sparse happy. They are in a union
so the end result is the same, but let's avoid new sparse warnings and
squash this one.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306210654.mDMcyMuB-lkp@intel.com/
Fixes: cac9e4418f4c ("io_uring/net: save msghdr->msg_control for retries")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We cannot sanely handle partial retries for recvmsg if we have cmsg
attached. If we don't, then we'd just be overwriting the initial cmsg
header on retries. Alternatively we could increment and handle this
appropriately, but it doesn't seem worth the complication.
Move the MSG_WAITALL check into the non-multishot case while at it,
since MSG_WAITALL is explicitly disabled for multishot anyway.
Link: https://lore.kernel.org/io-uring/0b0d4411-c8fd-4272-770b-e030af6919a0@kernel.dk/
Cc: stable@vger.kernel.org # 5.10+
Reported-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If we have cmsg attached AND we transferred partial data at least, clear
msg_controllen on retry so we don't attempt to send that again.
Cc: stable@vger.kernel.org # 5.10+
Fixes: cac9e4418f4c ("io_uring/net: save msghdr->msg_control for retries")
Reported-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We currently allow to create perf link for program with
expected_attach_type == BPF_TRACE_KPROBE_MULTI.
This will cause crash when we call helpers like get_attach_cookie or
get_func_ip in such program, because it will call the kprobe_multi's
version (current->bpf_ctx context setup) of those helpers while it
expects perf_link's current->bpf_ctx context setup.
Making sure that we use BPF_TRACE_KPROBE_MULTI expected_attach_type
only for programs attaching through kprobe_multi link.
Fixes: ca74823c6e16 ("bpf: Add cookie support to programs attached with kprobe multi link")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230618131414.75649-1-jolsa@kernel.org
|
|
When building a kernel with LLVM=1, LLVM_IAS=0 and CONFIG_KASAN=y, LLVM
leaves DWARF tags for the "asan.module_ctor" & co symbols. In turn,
pahole creates BTF_KIND_FUNC entries for these and this makes the BTF
metadata validation fail because they contain a dot.
In a dramatic turn of event, this BTF verification failure can cause
the netfilter_bpf initialization to fail, causing netfilter_core to
free the netfilter_helper hashmap and netfilter_ftp to trigger a
use-after-free. The risk of u-a-f in netfilter will be addressed
separately but the existence of "asan.module_ctor" debug info under some
build conditions sounds like a good enough reason to accept functions
that contain dots in BTF.
Although using only LLVM=1 is the recommended way to compile clang-based
kernels, users can certainly do LLVM=1, LLVM_IAS=0 as well and we still
try to support that combination according to Nick. To clarify:
- > v5.10 kernel, LLVM=1 (LLVM_IAS=0 is not the default) is recommended,
but user can still have LLVM=1, LLVM_IAS=0 to trigger the issue
- <= 5.10 kernel, LLVM=1 (LLVM_IAS=0 is the default) is recommended in
which case GNU as will be used
Fixes: 1dc92851849c ("bpf: kernel side support for BTF Var and DataSec")
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Cc: Yonghong Song <yhs@meta.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/bpf/20230615145607.3469985-1-revest@chromium.org
|
|
This reverts commit 07b679f70d73483930e8d3c293942416d9cd5c13.
This change appears to have broken things...
We now see applications hanging during disk accesses.
e.g.
multi-port virtio-blk device running in h/w (FPGA)
Host running a simple 'fio' test.
[global]
thread=1
direct=1
ioengine=libaio
norandommap=1
group_reporting=1
bs=4K
rw=read
iodepth=128
runtime=1
numjobs=4
time_based
[job0]
filename=/dev/vda
[job1]
filename=/dev/vdb
[job2]
filename=/dev/vdc
...
[job15]
filename=/dev/vdp
i.e. 16 disks; 4 queues per disk; simple burst of 4KB reads
This is repeatedly run in a loop.
After a few, normally <10 seconds, fio hangs.
With 64 queues (16 disks), failure occurs within a few seconds; with 8 queues (2 disks) it may take ~hour before hanging.
Last message:
fio-3.19
Starting 8 threads
Jobs: 1 (f=1): [_(7),R(1)][68.3%][eta 03h:11m:06s]
I think this means at the end of the run 1 queue was left incomplete.
'diskstats' (run while fio is hung) shows no outstanding transactions.
e.g.
$ cat /proc/diskstats
...
252 0 vda 1843140071 0 14745120568 712568645 0 0 0 0 0 3117947 712568645 0 0 0 0 0 0
252 16 vdb 1816291511 0 14530332088 704905623 0 0 0 0 0 3117711 704905623 0 0 0 0 0 0
...
Other stats (in the h/w, and added to the virtio-blk driver ([a]virtio_queue_rq(), [b]virtblk_handle_req(), [c]virtblk_request_done()) all agree, and show every request had a completion, and that virtblk_request_done() never gets called.
e.g.
PF= 0 vq=0 1 2 3
[a]request_count - 839416590 813148916 105586179 84988123
[b]completion1_count - 839416590 813148916 105586179 84988123
[c]completion2_count - 0 0 0 0
PF= 1 vq=0 1 2 3
[a]request_count - 823335887 812516140 104582672 75856549
[b]completion1_count - 823335887 812516140 104582672 75856549
[c]completion2_count - 0 0 0 0
i.e. the issue is after the virtio-blk driver.
This change was introduced in kernel 6.3.0.
I am seeing this using 6.3.3.
If I run with an earlier kernel (5.15), it does not occur.
If I make a simple patch to the 6.3.3 virtio-blk driver, to skip the blk_mq_add_to_batch()call, it does not fail.
e.g.
kernel 5.15 - this is OK
virtio_blk.c,virtblk_done() [irq handler]
if (likely(!blk_should_fake_timeout(req->q))) {
blk_mq_complete_request(req);
}
kernel 6.3.3 - this fails
virtio_blk.c,virtblk_handle_req() [irq handler]
if (likely(!blk_should_fake_timeout(req->q))) {
if (!blk_mq_complete_request_remote(req)) {
if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) {
virtblk_request_done(req); //this never gets called... so blk_mq_add_to_batch() must always succeed
}
}
}
If I do, kernel 6.3.3 - this is OK
virtio_blk.c,virtblk_handle_req() [irq handler]
if (likely(!blk_should_fake_timeout(req->q))) {
if (!blk_mq_complete_request_remote(req)) {
virtblk_request_done(req); //force this here...
if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) {
virtblk_request_done(req); //this never gets called... so blk_mq_add_to_batch() must always succeed
}
}
}
Perhaps you might like to fix/test/revert this change...
Martin
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306090826.C1fZmdMe-lkp@intel.com/
Cc: Suwan Kim <suwan.kim027@gmail.com>
Tested-by: edliaw@google.com
Reported-by: "Roberts, Martin" <martin.roberts@intel.com>
Message-Id: <336455b4f630f329380a8f53ee8cad3868764d5c.1686295549.git.mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Bartosz Golaszewski says:
====================
net: stmmac: dwmac-qcom-ethqos: add support for EMAC4
Extend the dwmac-qcom-ethqos driver to support EMAC4. While at it: rework the
code somewhat. The bindings have been reviewed by DT maintainers.
This is a sub-series of [1] with only the patches targetting the net subsystem
as they can go in independently.
[1] https://lore.kernel.org/lkml/20230617001644.4e093326@kernel.org/T/
====================
Link: https://lore.kernel.org/r/20230619092402.195578-1-brgl@bgdev.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
sa8775p uses EMAC version 4, add the relevant defines, rename the
has_emac3 switch to has_emac_ge_3 (has emac greater-or-equal than 3)
and add the new compatible.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add the compatible for the MAC controller on sa8775p platforms. This MAC
works with a single interrupt so add minItems to the interrupts property.
The fourth clock's name is different here so change it. Enable relevant
PHY properties. Add the relevant compatibles to the binding document for
snps,dwmac as well.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
On some platforms, the PCS can be integrated in the MAC so the driver
will not see any PCS link activity. Add a switch that allows the platform
drivers to let the core code know.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
On sa8775p the MAC is connected to the external PHY over SGMII so add
support for it to the driver.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In preparation for supporting SGMII, let's make the code a bit more
generic. Add a new callback for MAC configuration so that we can assign
a different variant of it in the future.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|