Age | Commit message (Collapse) | Author |
|
Rely on extack to return failure reason.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Rely on extack to return failure reason.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Rely on extack to return failure reason.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Rely on extack to return failure reason.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Almost all validation logic is in the drivers, but they are
missing reliable way to convey failure reason to userspace
applications.
Let's use extack to return this information to users.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Rely on extack to return failure reason.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Almost all validation logic is in the drivers, but they are
missing reliable way to convey failure reason to userspace
applications.
Let's use extack to return this information to users.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from netfilter.
Current release - regressions:
- sched: sch_taprio: do not schedule in taprio_reset()
Previous releases - regressions:
- core: fix UaF in netns ops registration error path
- ipv4: prevent potential spectre v1 gadgets
- ipv6: fix reachability confirmation with proxy_ndp
- netfilter: fix for the set rbtree
- eth: fec: use page_pool_put_full_page when freeing rx buffers
- eth: iavf: fix temporary deadlock and failure to set MAC address
Previous releases - always broken:
- netlink: prevent potential spectre v1 gadgets
- netfilter: fixes for SCTP connection tracking
- mctp: struct sock lifetime fixes
- eth: ravb: fix possible hang if RIS2_QFF1 happen
- eth: tg3: resolve deadlock in tg3_reset_task() during EEH
Misc:
- Mat stepped out as MPTCP co-maintainer"
* tag 'net-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (40 commits)
net: mdio-mux-meson-g12a: force internal PHY off on mux switch
docs: networking: Fix bridge documentation URL
tsnep: Fix TX queue stop/wake for multiple queues
net/tg3: resolve deadlock in tg3_reset_task() during EEH
net: mctp: mark socks as dead on unhash, prevent re-add
net: mctp: hold key reference when looking up a general key
net: mctp: move expiry timer delete to unhash
net: mctp: add an explicit reference from a mctp_sk_key to sock
net: ravb: Fix possible hang if RIS2_QFF1 happen
net: ravb: Fix lack of register setting after system resumed for Gen3
net/x25: Fix to not accept on connected socket
ice: move devlink port creation/deletion
sctp: fail if no bound addresses can be used for a given scope
net/sched: sch_taprio: do not schedule in taprio_reset()
Revert "Merge branch 'ethtool-mac-merge'"
netrom: Fix use-after-free of a listening socket.
netfilter: conntrack: unify established states for SCTP paths
Revert "netfilter: conntrack: add sctp DATA_SENT state"
netfilter: conntrack: fix bug in for_each_sctp_chunk
netfilter: conntrack: fix vtag checks for ABORT/SHUTDOWN_COMPLETE
...
|
|
I'm not exactly clear on what strange workflow causes people to do it,
but clearly occasionally some files end up being committed as executable
even though they clearly aren't.
This is a reprise of commit 90fda63fa115 ("treewide: fix up files
incorrectly marked executable"), just with a different set of files (but
with the same trivial shell scripting).
So apparently we need to re-do this every five years or so, and Joe
needs to just keep reminding me to do so ;)
Reported-by: Joe Perches <joe@perches.com>
Fixes: 523375c943e5 ("drm/vmwgfx: Port vmwgfx to arm64")
Fixes: 5c439937775d ("ASoC: codecs: add support for ES8326")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
CONFIG_ETHTOOL_NETLINK=n
ethtool_aggregate_*_stats() are implemented in net/ethtool/stats.c, a
file which is compiled out when CONFIG_ETHTOOL_NETLINK=n. In order to
avoid adding Kbuild dependencies from drivers (which call these helpers)
on CONFIG_ETHTOOL_NETLINK, let's add some shim definitions which simply
make the helpers dead code.
This means the function prototypes should have been located in
include/linux/ethtool_netlink.h rather than include/linux/ethtool.h.
Fixes: 449c5459641a ("net: ethtool: add helpers for aggregate statistics")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230125110214.4127759-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Matthieu Baerts says:
====================
mptcp: add mixed v4/v6 support for the in-kernel PM
Before these patches, the in-kernel Path-Manager would not allow, for
the same MPTCP connection, having a mix of subflows in v4 and v6.
MPTCP's RFC 8684 doesn't forbid that and it is even recommended to do so
as the path in v4 and v6 are likely different. Some networks are also
v4 or v6 only, we cannot assume they all have both v4 and v6 support.
Patch 1 then removes this artificial constraint in the in-kernel PM
currently enforcing there are no mixed subflows in place, either in
address announcement or in subflow creation areas.
Patch 2 makes sure the sk_ipv6only attribute is also propagated to
subflows, just in case a new PM wouldn't respect it.
Some selftests have also been added for the in-kernel PM (patch 3).
Patches 4 to 8 are just some cleanups and small improvements in the
printed messages in the userspace PM. It is not linked to the rest but
identified when working on a related patch modifying this selftest,
already in -net:
commit 4656d72c1efa ("selftests: mptcp: userspace: validate v4-v6 subflows mix")
---
====================
Link: https://lore.kernel.org/r/20230123-upstream-net-next-pm-v4-v6-v1-0-43fac502bfbf@tessares.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
During the cleanup phase, the server pids were killed with a SIGTERM
directly, not using a SIGUSR1 first to quit safely. As a result, this
test was often ending with two error messages:
read: Connection reset by peer
While at it, use a for-loop to terminate all the PIDs the same way.
Also the different files are now removed after having killed the PIDs
using them. It makes more sense to do that in this order.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Before, only '[FAIL]' was printed in case of error during the validation
phase.
Now, in case of failure, the variable name, its value and expected one
are displayed to help understand what was wrong.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Instead of having a long list of conditions to check, it is possible to
give a list of variable names to compare with their 'e_XXX' version.
This will ease the introduction of the following commit which will print
which condition has failed (if any).
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This script is running a few tests after having setup the environment.
Printing titles helps understand what is being tested.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Like in all other functions in this file, a single point of exit is used
when extra operations are needed: unlock, decrement refcount, etc.
There is no functional change for the moment but it is better to do the
same here to make sure all cleanups are done in case of intermediate
errors.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Note that we can't guess the listener family anymore based on the client
target address: always use IPv6.
The fullmesh flag with endpoints from different families is also
validated here.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Usually, attributes are propagated to subflows as well.
Here, if subflows are created by other ways than the MPTCP path-manager,
it is important to make sure they are in v6 if it is asked by the
userspace.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Currently the in-kernel PM arbitrary enforces that created subflow's
family must match the main MPTCP socket while the RFC allows mixing
IPv4 and IPv6 subflows.
This patch changes the in-kernel PM logic to create subflows matching
the currently selected source (or destination) address. IPv4 sockets
can pick only IPv4 addresses (and v4 mapped in v6), while IPv6 sockets
not restricted to V6ONLY can pick either IPv4 and IPv6 addresses as
long as the source and destination matches.
A helper, previously introduced is used to ease family matching checks,
taking care of IPv4 vs IPv4-mapped-IPv6 vs IPv6 only addresses.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/269
Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
There are multiple ICMP rate limiting mechanisms:
* Global limits: net.ipv4.icmp_msgs_burst/icmp_msgs_per_sec
* v4 per-host limits: net.ipv4.icmp_ratelimit/ratemask
* v6 per-host limits: net.ipv6.icmp_ratelimit/ratemask
However, when ICMP output is limited, there is no way to tell
which limit has been hit or even if the limits are responsible
for the lack of ICMP output.
Add counters for each of the cases above. As we are within
local_bh_disable(), use the __INC stats variant.
Example output:
# nstat -sz "*RateLimit*"
IcmpOutRateLimitGlobal 134 0.0
IcmpOutRateLimitHost 770 0.0
Icmp6OutRateLimitHost 84 0.0
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Suggested-by: Abhishek Rawal <rawal.abhishek92@gmail.com>
Link: https://lore.kernel.org/r/273b32241e6b7fdc5c609e6f5ebc68caf3994342.1674605770.git.jamie.bainbridge@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Steen Hegelund says:
====================
Adding Sparx5 IS0 VCAP support
This provides the Ingress Stage 0 (IS0) VCAP (Versatile Content-Aware
Processor) support for the Sparx5 platform.
The IS0 VCAP (also known in the datasheet as CLM) is a classifier VCAP that
mainly extracts frame information to metadata that follows the frame in the
Sparx5 processing flow all the way to the egress port.
The IS0 VCAP has 4 lookups and they are accessible with a TC chain id:
- chain 1000000: IS0 Lookup 0
- chain 1100000: IS0 Lookup 1
- chain 1200000: IS0 Lookup 2
- chain 1300000: IS0 Lookup 3
- chain 1400000: IS0 Lookup 4
- chain 1500000: IS0 Lookup 5
Each of these lookups have their own port keyset configuration that decides
which keys will be used for matching on which traffic type.
The IS0 VCAP has these traffic classifications:
- IPv4 frames
- IPv6 frames
- Unicast MPLS frames (ethertype = 0x8847)
- Multicast MPLS frames (ethertype = 0x8847)
- Other frame types than MPLS, IPv4 and IPv6
The IS0 VCAP has an action that allows setting the value of a PAG (Policy
Association Group) key field in the frame metadata, and this can be used
for matching in an IS2 VCAP rule.
This allow rules in the IS0 VCAP to be linked to rules in the IS2 VCAP.
The linking is exposed by using the TC "goto chain" action with an offset
from the IS2 chain ids.
As an example a "goto chain 8000001" will use a PAG value of 1 to chain to
a rule in IS2 Lookup 0.
====================
Link: https://lore.kernel.org/r/20230124104511.293938-1-steen.hegelund@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This adds support for parsing and matching on the CVLAN tags in the Sparx5
IS0 VCAP.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This allows the IS0 VCAP to have its own list of supported ethernet
protocol types matching what is supported by the VCAPs port lookup
classification.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
With more than one possible actionset in a VCAP instance, the VCAP API will
now use the actions in a VCAP rule to select the actionset that fits these
actions the best possible way.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This allows rules to be chained between VCAP instances, e.g. from IS0
Lookup 0 to IS0 Lookup 1, or from one of the IS0 Lookups to one of the IS2
Lookups.
Chaining from an IS2 Lookup to another IS2 Lookup is not supported in the
hardware.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This enables the TC command to use the Sparx5 IS0 VCAP
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This adds the actionset type id to the rule information. This is needed as
we now have more than one actionset in a VCAP instance (IS0).
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This adds the IS0 VCAP port keyset configuration for Sparx5 and also
updates the debugFS support to show the keyset configuration.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This provides the IS0 (Ingress Stage 0) or CLM VCAP model for Sparx5.
This VCAP provides classification actions for Sparx5.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Force the internal PHY off then on when switching to the internal path.
This fixes problems where the PHY ID is not properly set.
Fixes: 7090425104db ("net: phy: add amlogic g12a mdio mux support")
Suggested-by: Qi Duan <qi.duan@amlogic.com>
Co-developed-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Link: https://lore.kernel.org/r/20230124101157.232234-1-jbrunet@baylibre.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Jakub Sitnicki says:
====================
Add IP_LOCAL_PORT_RANGE socket option
This patch set is a follow up to the "How to share IPv4 addresses by
partitioning the port space" talk given at LPC 2022 [1].
Please see patch #1 for the motivation & the use case description.
Patch #2 adds tests exercising the new option in various scenarios.
Documentation
-------------
Proposed update to the ip(7) man-page:
IP_LOCAL_PORT_RANGE (since Linux X.Y)
Set or get the per-socket default local port range. This
option can be used to clamp down the global local port
range, defined by the ip_local_port_range /proc interface
described below, for a given socket.
The option takes an uint32_t value with the high 16 bits
set to the upper range bound, and the low 16 bits set to
the lower range bound. Range bounds are inclusive. The
16-bit values should be in host byte order.
The lower bound has to be less than the upper bound when
both bounds are not zero. Otherwise, setting the option
fails with EINVAL.
If either bound is outside of the global local port range,
or is zero, then that bound has no effect.
To reset the setting, pass zero as both the upper and the
lower bound.
Interaction with SELinux bind() hook
------------------------------------
SELinux bind() hook - selinux_socket_bind() - performs a permission check
if the requested local port number lies outside of the netns ephemeral port
range.
The proposed socket option cannot be used change the ephemeral port range
to extend beyond the per-netns port range, as set by
net.ipv4.ip_local_port_range.
Hence, there is no interaction with SELinux, AFAICT.
RFC -> v1
RFC: https://lore.kernel.org/netdev/20220912225308.93659-1-jakub@cloudflare.com/
* Allow either the high bound or the low bound, or both, to be zero
* Add getsockopt support
* Add selftests
Links:
------
[1]: https://lpc.events/event/16/contributions/1349/
====================
Link: https://lore.kernel.org/r/20221221-sockopt-port-range-v6-0-be255cc0e51f@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Exercise IP_LOCAL_PORT_RANGE socket option in various scenarios:
1. pass invalid values to setsockopt
2. pass a range outside of the per-netns port range
3. configure a single-port range
4. exhaust a configured multi-port range
5. check interaction with late-bind (IP_BIND_ADDRESS_NO_PORT)
6. set then get the per-socket port range
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Users who want to share a single public IP address for outgoing connections
between several hosts traditionally reach for SNAT. However, SNAT requires
state keeping on the node(s) performing the NAT.
A stateless alternative exists, where a single IP address used for egress
can be shared between several hosts by partitioning the available ephemeral
port range. In such a setup:
1. Each host gets assigned a disjoint range of ephemeral ports.
2. Applications open connections from the host-assigned port range.
3. Return traffic gets routed to the host based on both, the destination IP
and the destination port.
An application which wants to open an outgoing connection (connect) from a
given port range today can choose between two solutions:
1. Manually pick the source port by bind()'ing to it before connect()'ing
the socket.
This approach has a couple of downsides:
a) Search for a free port has to be implemented in the user-space. If
the chosen 4-tuple happens to be busy, the application needs to retry
from a different local port number.
Detecting if 4-tuple is busy can be either easy (TCP) or hard
(UDP). In TCP case, the application simply has to check if connect()
returned an error (EADDRNOTAVAIL). That is assuming that the local
port sharing was enabled (REUSEADDR) by all the sockets.
# Assume desired local port range is 60_000-60_511
s = socket(AF_INET, SOCK_STREAM)
s.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
s.bind(("192.0.2.1", 60_000))
s.connect(("1.1.1.1", 53))
# Fails only if 192.0.2.1:60000 -> 1.1.1.1:53 is busy
# Application must retry with another local port
In case of UDP, the network stack allows binding more than one socket
to the same 4-tuple, when local port sharing is enabled
(REUSEADDR). Hence detecting the conflict is much harder and involves
querying sock_diag and toggling the REUSEADDR flag [1].
b) For TCP, bind()-ing to a port within the ephemeral port range means
that no connecting sockets, that is those which leave it to the
network stack to find a free local port at connect() time, can use
the this port.
IOW, the bind hash bucket tb->fastreuse will be 0 or 1, and the port
will be skipped during the free port search at connect() time.
2. Isolate the app in a dedicated netns and use the use the per-netns
ip_local_port_range sysctl to adjust the ephemeral port range bounds.
The per-netns setting affects all sockets, so this approach can be used
only if:
- there is just one egress IP address, or
- the desired egress port range is the same for all egress IP addresses
used by the application.
For TCP, this approach avoids the downsides of (1). Free port search and
4-tuple conflict detection is done by the network stack:
system("sysctl -w net.ipv4.ip_local_port_range='60000 60511'")
s = socket(AF_INET, SOCK_STREAM)
s.setsockopt(SOL_IP, IP_BIND_ADDRESS_NO_PORT, 1)
s.bind(("192.0.2.1", 0))
s.connect(("1.1.1.1", 53))
# Fails if all 4-tuples 192.0.2.1:60000-60511 -> 1.1.1.1:53 are busy
For UDP this approach has limited applicability. Setting the
IP_BIND_ADDRESS_NO_PORT socket option does not result in local source
port being shared with other connected UDP sockets.
Hence relying on the network stack to find a free source port, limits the
number of outgoing UDP flows from a single IP address down to the number
of available ephemeral ports.
To put it another way, partitioning the ephemeral port range between hosts
using the existing Linux networking API is cumbersome.
To address this use case, add a new socket option at the SOL_IP level,
named IP_LOCAL_PORT_RANGE. The new option can be used to clamp down the
ephemeral port range for each socket individually.
The option can be used only to narrow down the per-netns local port
range. If the per-socket range lies outside of the per-netns range, the
latter takes precedence.
UAPI-wise, the low and high range bounds are passed to the kernel as a pair
of u16 values in host byte order packed into a u32. This avoids pointer
passing.
PORT_LO = 40_000
PORT_HI = 40_511
s = socket(AF_INET, SOCK_STREAM)
v = struct.pack("I", PORT_HI << 16 | PORT_LO)
s.setsockopt(SOL_IP, IP_LOCAL_PORT_RANGE, v)
s.bind(("127.0.0.1", 0))
s.getsockname()
# Local address between ("127.0.0.1", 40_000) and ("127.0.0.1", 40_511),
# if there is a free port. EADDRINUSE otherwise.
[1] https://github.com/cloudflare/cloudflare-blog/blob/232b432c1d57/2022-02-connectx/connectx.py#L116
Reviewed-by: Marek Majkowski <marek@cloudflare.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Current documentation URL [1] is no longer valid.
[1] https://www.linuxfoundation.org/collaborate/workgroups/networking/bridge
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/20230124145127.189221-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
netif_stop_queue() and netif_wake_queue() act on TX queue 0. This is ok
as long as only a single TX queue is supported. But support for multiple
TX queues was introduced with 762031375d5c and I missed to adapt stop
and wake of TX queues.
Use netif_stop_subqueue() and netif_tx_wake_queue() to act on specific
TX queue.
Fixes: 762031375d5c ("tsnep: Support multiple TX/RX queue pairs")
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Link: https://lore.kernel.org/r/20230124191440.56887-1-gerhard@engleder-embedded.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix spelling in net/ Kconfig files.
(reported by codespell)
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>
Cc: coreteam@netfilter.org
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Link: https://lore.kernel.org/r/20230124181724.18166-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
During EEH error injection testing, a deadlock was encountered in the tg3
driver when tg3_io_error_detected() was attempting to cancel outstanding
reset tasks:
crash> foreach UN bt
...
PID: 159 TASK: c0000000067c6000 CPU: 8 COMMAND: "eehd"
...
#5 [c00000000681f990] __cancel_work_timer at c00000000019fd18
#6 [c00000000681fa30] tg3_io_error_detected at c00800000295f098 [tg3]
#7 [c00000000681faf0] eeh_report_error at c00000000004e25c
...
PID: 290 TASK: c000000036e5f800 CPU: 6 COMMAND: "kworker/6:1"
...
#4 [c00000003721fbc0] rtnl_lock at c000000000c940d8
#5 [c00000003721fbe0] tg3_reset_task at c008000002969358 [tg3]
#6 [c00000003721fc60] process_one_work at c00000000019e5c4
...
PID: 296 TASK: c000000037a65800 CPU: 21 COMMAND: "kworker/21:1"
...
#4 [c000000037247bc0] rtnl_lock at c000000000c940d8
#5 [c000000037247be0] tg3_reset_task at c008000002969358 [tg3]
#6 [c000000037247c60] process_one_work at c00000000019e5c4
...
PID: 655 TASK: c000000036f49000 CPU: 16 COMMAND: "kworker/16:2"
...:1
#4 [c0000000373ebbc0] rtnl_lock at c000000000c940d8
#5 [c0000000373ebbe0] tg3_reset_task at c008000002969358 [tg3]
#6 [c0000000373ebc60] process_one_work at c00000000019e5c4
...
Code inspection shows that both tg3_io_error_detected() and
tg3_reset_task() attempt to acquire the RTNL lock at the beginning of
their code blocks. If tg3_reset_task() should happen to execute between
the times when tg3_io_error_deteced() acquires the RTNL lock and
tg3_reset_task_cancel() is called, a deadlock will occur.
Moving tg3_reset_task_cancel() call earlier within the code block, prior
to acquiring RTNL, prevents this from happening, but also exposes another
deadlock issue where tg3_reset_task() may execute AFTER
tg3_io_error_detected() has executed:
crash> foreach UN bt
PID: 159 TASK: c0000000067d2000 CPU: 9 COMMAND: "eehd"
...
#4 [c000000006867a60] rtnl_lock at c000000000c940d8
#5 [c000000006867a80] tg3_io_slot_reset at c0080000026c2ea8 [tg3]
#6 [c000000006867b00] eeh_report_reset at c00000000004de88
...
PID: 363 TASK: c000000037564000 CPU: 6 COMMAND: "kworker/6:1"
...
#3 [c000000036c1bb70] msleep at c000000000259e6c
#4 [c000000036c1bba0] napi_disable at c000000000c6b848
#5 [c000000036c1bbe0] tg3_reset_task at c0080000026d942c [tg3]
#6 [c000000036c1bc60] process_one_work at c00000000019e5c4
...
This issue can be avoided by aborting tg3_reset_task() if EEH error
recovery is already in progress.
Fixes: db84bf43ef23 ("tg3: tg3_reset_task() needs to use rtnl_lock to synchronize")
Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/20230124185339.225806-1-drc@linux.vnet.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Kui-Feng Lee says:
====================
This patchset implements a change to bpf_setsockopt() which allows
ktls enabled sockets to be used with the SOL_TCP level. This is
necessary as when ktls is enabled, it changes the function pointer of
setsockopt of the socket, which bpf_setsockopt() checks in order to
make sure that the socket is a TCP socket. Checking sk_protocol
instead of the function pointer will ensure that bpf_setsockopt() with
the SOL_TCP level still works on sockets with ktls enabled.
The major differences form v2 are:
- Add a read() call to make sure that the FIN has arrived.
- Remove the dependency on other test's header.
The major differences from v1 are:
- Test with a IPv6 connect as well.
- Use ASSERT_OK()
v2: https://lore.kernel.org/bpf/20230124181220.2871611-1-kuifeng@meta.com/
v1: https://lore.kernel.org/bpf/20230121025716.3039933-1-kuifeng@meta.com/
====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Ensures that whenever bpf_setsockopt() is called with the SOL_TCP
option on a ktls enabled socket, the call will be accepted by the
system. The provided test makes sure of this by performing an
examination when the server side socket is in the CLOSE_WAIT state. At
this stage, ktls is still enabled on the server socket and can be used
to test if bpf_setsockopt() works correctly with linux.
Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Link: https://lore.kernel.org/r/20230125201608.908230-3-kuifeng@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Resolve an issue when calling sol_tcp_sockopt() on a socket with ktls
enabled. Prior to this patch, sol_tcp_sockopt() would only allow calls
if the function pointer of setsockopt of the socket was set to
tcp_setsockopt(). However, any socket with ktls enabled would have its
function pointer set to tls_setsockopt(). To resolve this issue, the
patch adds a check of the protocol of the linux socket and allows
bpf_setsockopt() to be called if ktls is initialized on the linux
socket. This ensures that calls to sol_tcp_sockopt() will succeed on
sockets with ktls enabled.
Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Link: https://lore.kernel.org/r/20230125201608.908230-2-kuifeng@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
David Vernet says:
====================
This is part 4 of https://lore.kernel.org/bpf/20230123232228.646563-1-void@manifault.com/
Part 3: https://lore.kernel.org/all/20230125050359.339273-1-void@manifault.com/
Part 2: https://lore.kernel.org/all/20230124160802.1122124-1-void@manifault.com/
Changelog:
----------
v3 -> v4:
- Fix accidental typo in name of dummy_st_ops introduced in v2, moving
it back to dummy_st_ops from dummy_st_ops_success. Should fix s390x
testruns.
v2 -> v3:
- Don't call a KF_SLEEPABLE kfunc from the dummy_st_ops testsuite, and
remove the newly added bpf_kfunc_call_test_sleepable() test kfunc
(Martin).
- Include vmlinux.h from progs/dummy_st_ops_success.c (previously
progs/dummy_st_ops.c) rather than manually defining
struct bpf_dummy_ops_state and struct bpf_dummy_ops.
(Martin).
- Fix a typo added to prog_tests/dummy_st_ops.c in a previous version:
s/trace_dummy_st_ops_success__open/trace_dummy_st_ops__open.
v1 -> v2:
- Add support for specifying sleepable struct_ops programs with
struct_ops.s in libbpf (Alexei).
- Move failure test case into new dummy_st_ops_fail.c prog file.
- Update test_dummy_sleepable() to use struct_ops.s instead of manually
setting prog flags. Also remove open_load_skel() helper which is no
longer needed.
- Fix verifier tests to expect new sleepable prog failure message.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
In a set of prior changes, we added the ability for struct_ops programs
to be sleepable. This patch enhances the dummy_st_ops selftest suite to
validate this behavior by adding a new sleepable struct_ops entry to
dummy_st_ops.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230125164735.785732-5-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The .check_member field of struct bpf_struct_ops is currently passed the
member's btf_type via const struct btf_type *t, and a const struct
btf_member *member. This allows the struct_ops implementation to check
whether e.g. an ops is supported, but it would be useful to also enforce
that the struct_ops prog being loaded for that member has other
qualities, like being sleepable (or not). This patch therefore updates
the .check_member() callback to also take a const struct bpf_prog *prog
argument.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230125164735.785732-4-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
In a prior change, the verifier was updated to support sleepable
BPF_PROG_TYPE_STRUCT_OPS programs. A caller could set the program as
sleepable with bpf_program__set_flags(), but it would be more ergonomic
and more in-line with other sleepable program types if we supported
suffixing a struct_ops section name with .s to indicate that it's
sleepable.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230125164735.785732-3-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
BPF struct_ops programs currently cannot be marked as sleepable. This
need not be the case -- struct_ops programs can be sleepable, and e.g.
invoke kfuncs that export the KF_SLEEPABLE flag. So as to allow future
struct_ops programs to invoke such kfuncs, this patch updates the
verifier to allow struct_ops programs to be sleepable. A follow-on patch
will add support to libbpf for specifying struct_ops.s as a sleepable
struct_ops program, and then another patch will add testcases to the
dummy_st_ops selftest suite which test sleepable struct_ops behavior.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230125164735.785732-2-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
As stated in README.rst, in order to resolve errors with linker errors,
'LDLIBS=-static' should be used. Most problems will be solved by this
option, but in the case of urandom_read, this won't fix the problem. So
the Makefile is currently implemented to strip the 'static' option when
compiling the urandom_read. However, stripping this static option isn't
configured properly on $(LDLIBS) correctly, which is now causing errors
on static compilation.
# LDLIBS=-static ./vmtest.sh
ld.lld: error: attempted static link of dynamic object liburandom_read.so
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [Makefile:190: /linux/tools/testing/selftests/bpf/urandom_read] Error 1
make: *** Waiting for unfinished jobs....
This commit fixes this problem by configuring the strip with $(LDLIBS).
Fixes: 68084a136420 ("selftests/bpf: Fix building bpf selftests statically")
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230125100440.21734-1-danieltimlee@gmail.com
|
|
HOSTCC is always wanted when building. Setting CC to HOSTCC happens
after tools/scripts/Makefile.include is included, meaning flags are
set assuming say CC is gcc, but then it can be later set to HOSTCC
which may be clang. tools/scripts/Makefile.include is needed for host
set up and common macros in objtool's Makefile. Rather than override
CC to HOSTCC, just pass CC as HOSTCC to Makefile.build, the libsubcmd
builds and the linkage step. This means the Makefiles don't see things
like CC changing and tool flag determination, and similar, work
properly.
Also, clear the passed subdir as otherwise an outer build may break by
inadvertently passing an inappropriate value.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20230124064324.672022-2-irogers@google.com
|
|
Previously tools/lib/subcmd was added to the include path, switch to
installing the headers and then including from that directory. This
avoids dependencies on headers internal to tools/lib/subcmd. Add the
missing subcmd directory to the affected #include.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20230124064324.672022-1-irogers@google.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
Pull fuse ACL fix from Christian Brauner:
"The new posix acl API doesn't depend on the xattr handler
infrastructure anymore and instead only relies on the posix acl inode
operations. As a result daemons without FUSE_POSIX_ACL are unable to
use posix acls like they used to.
Fix this by copying what we did for overlayfs during the posix acl api
conversion. Make fuse implement a dedicated ->get_inode_acl() method
as does overlayfs. Fuse can then also uses this to express different
needs for vfs permission checking during lookup and acl based
retrieval via the regular system call path.
This allows fuse to continue to refuse retrieving posix acls for
daemons that don't set FUSE_POSXI_ACL for permission checking while
also allowing a fuse server to retrieve it via the usual system calls"
* tag 'fs.fuse.acl.v6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
fuse: fixes after adapting to new posix acl api
|
|
Since the latest Intel hardware does both IWARP and ROCE, rename the
term IWARP in the virtchnl header to be RDMA. Do this for both upper and
lower case instances. Many of the non-virtchnl.h changes were done with
regular expression replacements using perl like:
perl -p -i -e 's/_IWARP/_RDMA/' <files>
perl -p -i -e 's/_iwarp/_rdma/' <files>
and I had to pick up a few instances manually.
The virtchnl.h header has some comments and clarity added around when to
use certain defines.
note: had to fix a checkpatch warning for a long line by wrapping one of
the lines I changed.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Jakub Andrysiak <jakub.andrysiak@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|