summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-01-26selftests: mptcp: userspace: avoid read errorsMatthieu Baerts
During the cleanup phase, the server pids were killed with a SIGTERM directly, not using a SIGUSR1 first to quit safely. As a result, this test was often ending with two error messages: read: Connection reset by peer While at it, use a for-loop to terminate all the PIDs the same way. Also the different files are now removed after having killed the PIDs using them. It makes more sense to do that in this order. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-26selftests: mptcp: userspace: print error details if anyMatthieu Baerts
Before, only '[FAIL]' was printed in case of error during the validation phase. Now, in case of failure, the variable name, its value and expected one are displayed to help understand what was wrong. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-26selftests: mptcp: userspace: refactor assertsMatthieu Baerts
Instead of having a long list of conditions to check, it is possible to give a list of variable names to compare with their 'e_XXX' version. This will ease the introduction of the following commit which will print which condition has failed (if any). Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-26selftests: mptcp: userspace: print titlesMatthieu Baerts
This script is running a few tests after having setup the environment. Printing titles helps understand what is being tested. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-26mptcp: userspace pm: use a single point of exitMatthieu Baerts
Like in all other functions in this file, a single point of exit is used when extra operations are needed: unlock, decrement refcount, etc. There is no functional change for the moment but it is better to do the same here to make sure all cleanups are done in case of intermediate errors. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-26selftests: mptcp: add test-cases for mixed v4/v6 subflowsPaolo Abeni
Note that we can't guess the listener family anymore based on the client target address: always use IPv6. The fullmesh flag with endpoints from different families is also validated here. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-26mptcp: propagate sk_ipv6only to subflowsMatthieu Baerts
Usually, attributes are propagated to subflows as well. Here, if subflows are created by other ways than the MPTCP path-manager, it is important to make sure they are in v6 if it is asked by the userspace. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-26mptcp: let the in-kernel PM use mixed IPv4 and IPv6 addressesPaolo Abeni
Currently the in-kernel PM arbitrary enforces that created subflow's family must match the main MPTCP socket while the RFC allows mixing IPv4 and IPv6 subflows. This patch changes the in-kernel PM logic to create subflows matching the currently selected source (or destination) address. IPv4 sockets can pick only IPv4 addresses (and v4 mapped in v6), while IPv6 sockets not restricted to V6ONLY can pick either IPv4 and IPv6 addresses as long as the source and destination matches. A helper, previously introduced is used to ease family matching checks, taking care of IPv4 vs IPv4-mapped-IPv6 vs IPv6 only addresses. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/269 Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-26icmp: Add counters for rate limitsJamie Bainbridge
There are multiple ICMP rate limiting mechanisms: * Global limits: net.ipv4.icmp_msgs_burst/icmp_msgs_per_sec * v4 per-host limits: net.ipv4.icmp_ratelimit/ratemask * v6 per-host limits: net.ipv6.icmp_ratelimit/ratemask However, when ICMP output is limited, there is no way to tell which limit has been hit or even if the limits are responsible for the lack of ICMP output. Add counters for each of the cases above. As we are within local_bh_disable(), use the __INC stats variant. Example output: # nstat -sz "*RateLimit*" IcmpOutRateLimitGlobal 134 0.0 IcmpOutRateLimitHost 770 0.0 Icmp6OutRateLimitHost 84 0.0 Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com> Suggested-by: Abhishek Rawal <rawal.abhishek92@gmail.com> Link: https://lore.kernel.org/r/273b32241e6b7fdc5c609e6f5ebc68caf3994342.1674605770.git.jamie.bainbridge@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-26Merge branch 'adding-sparx5-is0-vcap-support'Paolo Abeni
Steen Hegelund says: ==================== Adding Sparx5 IS0 VCAP support This provides the Ingress Stage 0 (IS0) VCAP (Versatile Content-Aware Processor) support for the Sparx5 platform. The IS0 VCAP (also known in the datasheet as CLM) is a classifier VCAP that mainly extracts frame information to metadata that follows the frame in the Sparx5 processing flow all the way to the egress port. The IS0 VCAP has 4 lookups and they are accessible with a TC chain id: - chain 1000000: IS0 Lookup 0 - chain 1100000: IS0 Lookup 1 - chain 1200000: IS0 Lookup 2 - chain 1300000: IS0 Lookup 3 - chain 1400000: IS0 Lookup 4 - chain 1500000: IS0 Lookup 5 Each of these lookups have their own port keyset configuration that decides which keys will be used for matching on which traffic type. The IS0 VCAP has these traffic classifications: - IPv4 frames - IPv6 frames - Unicast MPLS frames (ethertype = 0x8847) - Multicast MPLS frames (ethertype = 0x8847) - Other frame types than MPLS, IPv4 and IPv6 The IS0 VCAP has an action that allows setting the value of a PAG (Policy Association Group) key field in the frame metadata, and this can be used for matching in an IS2 VCAP rule. This allow rules in the IS0 VCAP to be linked to rules in the IS2 VCAP. The linking is exposed by using the TC "goto chain" action with an offset from the IS2 chain ids. As an example a "goto chain 8000001" will use a PAG value of 1 to chain to a rule in IS2 Lookup 0. ==================== Link: https://lore.kernel.org/r/20230124104511.293938-1-steen.hegelund@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-26net: microchip: sparx5: Add support for IS0 VCAP CVLAN TC keysSteen Hegelund
This adds support for parsing and matching on the CVLAN tags in the Sparx5 IS0 VCAP. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-26net: microchip: sparx5: Add support for IS0 VCAP ethernet protocol typesSteen Hegelund
This allows the IS0 VCAP to have its own list of supported ethernet protocol types matching what is supported by the VCAPs port lookup classification. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-26net: microchip: sparx5: Add automatic selection of VCAP rule actionsetSteen Hegelund
With more than one possible actionset in a VCAP instance, the VCAP API will now use the actions in a VCAP rule to select the actionset that fits these actions the best possible way. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-26net: microchip: sparx5: Add TC filter chaining support for IS0 and IS2 VCAPsSteen Hegelund
This allows rules to be chained between VCAP instances, e.g. from IS0 Lookup 0 to IS0 Lookup 1, or from one of the IS0 Lookups to one of the IS2 Lookups. Chaining from an IS2 Lookup to another IS2 Lookup is not supported in the hardware. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-26net: microchip: sparx5: Add TC support for IS0 VCAPSteen Hegelund
This enables the TC command to use the Sparx5 IS0 VCAP Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-26net: microchip: sparx5: Add actionset type id information to ruleSteen Hegelund
This adds the actionset type id to the rule information. This is needed as we now have more than one actionset in a VCAP instance (IS0). Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-26net: microchip: sparx5: Add IS0 VCAP keyset configuration for Sparx5Steen Hegelund
This adds the IS0 VCAP port keyset configuration for Sparx5 and also updates the debugFS support to show the keyset configuration. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-26net: microchip: sparx5: Add IS0 VCAP model and updated KUNIT VCAP modelSteen Hegelund
This provides the IS0 (Ingress Stage 0) or CLM VCAP model for Sparx5. This VCAP provides classification actions for Sparx5. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-26ARM: dts: imx7d-smegw01: Fix USB host over-current polarityFabio Estevam
Currently, when resetting the USB modem via AT commands, the modem is no longer re-connected. This problem is caused by the incorrect description of the USB_OTG2_OC pad. It should have pull-up enabled, hysteresis enabled and the property 'over-current-active-low' should be passed. With this change, the USB modem can be successfully re-connected after a reset. Cc: stable@vger.kernel.org Fixes: 9ac0ae97e349 ("ARM: dts: imx7d-smegw01: Add support for i.MX7D SMEGW01 board") Signed-off-by: Fabio Estevam <festevam@denx.de> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2023-01-26arm64: dts: imx8mm-verdin: Do not power down eth-phyPhilippe Schenker
Currently if suspending using either freeze or memory state, the fec driver tries to power down the phy which leads to crash of the kernel and non-responsible kernel with the following call trace: [ 24.839889 ] Call trace: [ 24.839892 ] phy_error+0x18/0x60 [ 24.839898 ] kszphy_handle_interrupt+0x6c/0x80 [ 24.839903 ] phy_interrupt+0x20/0x2c [ 24.839909 ] irq_thread_fn+0x30/0xa0 [ 24.839919 ] irq_thread+0x178/0x2c0 [ 24.839925 ] kthread+0x154/0x160 [ 24.839932 ] ret_from_fork+0x10/0x20 Since there is currently no functionality in the phy subsystem to power down phys let's just disable the feature of powering-down the ethernet phy. Fixes: 6a57f224f734 ("arm64: dts: freescale: add initial support for verdin imx8m mini") Signed-off-by: Philippe Schenker <philippe.schenker@toradex.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2023-01-26MAINTAINERS: match freescale ARM64 DT directory in i.MX entryAhmad Fatoum
The majority of device trees in arch/arm64/boot/dts/freescale/ are built around i.MX SoCs with the rest being for Layerscape. Yet, calling get_maintainers.pl -f on this directory will not match the MAINTAINERS entry, because the directory name doesn't contain the substring "imx". Add an explicit file match for the directory and exclude the Layerscape specific files. This ensures To/Cc is not only generated from git history, but takes e.g. the R: entries into account as well. Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2023-01-25net: mdio-mux-meson-g12a: force internal PHY off on mux switchJerome Brunet
Force the internal PHY off then on when switching to the internal path. This fixes problems where the PHY ID is not properly set. Fixes: 7090425104db ("net: phy: add amlogic g12a mdio mux support") Suggested-by: Qi Duan <qi.duan@amlogic.com> Co-developed-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Link: https://lore.kernel.org/r/20230124101157.232234-1-jbrunet@baylibre.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-25Merge branch 'add-ip_local_port_range-socket-option'Jakub Kicinski
Jakub Sitnicki says: ==================== Add IP_LOCAL_PORT_RANGE socket option This patch set is a follow up to the "How to share IPv4 addresses by partitioning the port space" talk given at LPC 2022 [1]. Please see patch #1 for the motivation & the use case description. Patch #2 adds tests exercising the new option in various scenarios. Documentation ------------- Proposed update to the ip(7) man-page: IP_LOCAL_PORT_RANGE (since Linux X.Y) Set or get the per-socket default local port range. This option can be used to clamp down the global local port range, defined by the ip_local_port_range /proc interface described below, for a given socket. The option takes an uint32_t value with the high 16 bits set to the upper range bound, and the low 16 bits set to the lower range bound. Range bounds are inclusive. The 16-bit values should be in host byte order. The lower bound has to be less than the upper bound when both bounds are not zero. Otherwise, setting the option fails with EINVAL. If either bound is outside of the global local port range, or is zero, then that bound has no effect. To reset the setting, pass zero as both the upper and the lower bound. Interaction with SELinux bind() hook ------------------------------------ SELinux bind() hook - selinux_socket_bind() - performs a permission check if the requested local port number lies outside of the netns ephemeral port range. The proposed socket option cannot be used change the ephemeral port range to extend beyond the per-netns port range, as set by net.ipv4.ip_local_port_range. Hence, there is no interaction with SELinux, AFAICT. RFC -> v1 RFC: https://lore.kernel.org/netdev/20220912225308.93659-1-jakub@cloudflare.com/ * Allow either the high bound or the low bound, or both, to be zero * Add getsockopt support * Add selftests Links: ------ [1]: https://lpc.events/event/16/contributions/1349/ ==================== Link: https://lore.kernel.org/r/20221221-sockopt-port-range-v6-0-be255cc0e51f@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-25selftests/net: Cover the IP_LOCAL_PORT_RANGE socket optionJakub Sitnicki
Exercise IP_LOCAL_PORT_RANGE socket option in various scenarios: 1. pass invalid values to setsockopt 2. pass a range outside of the per-netns port range 3. configure a single-port range 4. exhaust a configured multi-port range 5. check interaction with late-bind (IP_BIND_ADDRESS_NO_PORT) 6. set then get the per-socket port range Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-25inet: Add IP_LOCAL_PORT_RANGE socket optionJakub Sitnicki
Users who want to share a single public IP address for outgoing connections between several hosts traditionally reach for SNAT. However, SNAT requires state keeping on the node(s) performing the NAT. A stateless alternative exists, where a single IP address used for egress can be shared between several hosts by partitioning the available ephemeral port range. In such a setup: 1. Each host gets assigned a disjoint range of ephemeral ports. 2. Applications open connections from the host-assigned port range. 3. Return traffic gets routed to the host based on both, the destination IP and the destination port. An application which wants to open an outgoing connection (connect) from a given port range today can choose between two solutions: 1. Manually pick the source port by bind()'ing to it before connect()'ing the socket. This approach has a couple of downsides: a) Search for a free port has to be implemented in the user-space. If the chosen 4-tuple happens to be busy, the application needs to retry from a different local port number. Detecting if 4-tuple is busy can be either easy (TCP) or hard (UDP). In TCP case, the application simply has to check if connect() returned an error (EADDRNOTAVAIL). That is assuming that the local port sharing was enabled (REUSEADDR) by all the sockets. # Assume desired local port range is 60_000-60_511 s = socket(AF_INET, SOCK_STREAM) s.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1) s.bind(("192.0.2.1", 60_000)) s.connect(("1.1.1.1", 53)) # Fails only if 192.0.2.1:60000 -> 1.1.1.1:53 is busy # Application must retry with another local port In case of UDP, the network stack allows binding more than one socket to the same 4-tuple, when local port sharing is enabled (REUSEADDR). Hence detecting the conflict is much harder and involves querying sock_diag and toggling the REUSEADDR flag [1]. b) For TCP, bind()-ing to a port within the ephemeral port range means that no connecting sockets, that is those which leave it to the network stack to find a free local port at connect() time, can use the this port. IOW, the bind hash bucket tb->fastreuse will be 0 or 1, and the port will be skipped during the free port search at connect() time. 2. Isolate the app in a dedicated netns and use the use the per-netns ip_local_port_range sysctl to adjust the ephemeral port range bounds. The per-netns setting affects all sockets, so this approach can be used only if: - there is just one egress IP address, or - the desired egress port range is the same for all egress IP addresses used by the application. For TCP, this approach avoids the downsides of (1). Free port search and 4-tuple conflict detection is done by the network stack: system("sysctl -w net.ipv4.ip_local_port_range='60000 60511'") s = socket(AF_INET, SOCK_STREAM) s.setsockopt(SOL_IP, IP_BIND_ADDRESS_NO_PORT, 1) s.bind(("192.0.2.1", 0)) s.connect(("1.1.1.1", 53)) # Fails if all 4-tuples 192.0.2.1:60000-60511 -> 1.1.1.1:53 are busy For UDP this approach has limited applicability. Setting the IP_BIND_ADDRESS_NO_PORT socket option does not result in local source port being shared with other connected UDP sockets. Hence relying on the network stack to find a free source port, limits the number of outgoing UDP flows from a single IP address down to the number of available ephemeral ports. To put it another way, partitioning the ephemeral port range between hosts using the existing Linux networking API is cumbersome. To address this use case, add a new socket option at the SOL_IP level, named IP_LOCAL_PORT_RANGE. The new option can be used to clamp down the ephemeral port range for each socket individually. The option can be used only to narrow down the per-netns local port range. If the per-socket range lies outside of the per-netns range, the latter takes precedence. UAPI-wise, the low and high range bounds are passed to the kernel as a pair of u16 values in host byte order packed into a u32. This avoids pointer passing. PORT_LO = 40_000 PORT_HI = 40_511 s = socket(AF_INET, SOCK_STREAM) v = struct.pack("I", PORT_HI << 16 | PORT_LO) s.setsockopt(SOL_IP, IP_LOCAL_PORT_RANGE, v) s.bind(("127.0.0.1", 0)) s.getsockname() # Local address between ("127.0.0.1", 40_000) and ("127.0.0.1", 40_511), # if there is a free port. EADDRINUSE otherwise. [1] https://github.com/cloudflare/cloudflare-blog/blob/232b432c1d57/2022-02-connectx/connectx.py#L116 Reviewed-by: Marek Majkowski <marek@cloudflare.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-25docs: networking: Fix bridge documentation URLIvan Vecera
Current documentation URL [1] is no longer valid. [1] https://www.linuxfoundation.org/collaborate/workgroups/networking/bridge Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://lore.kernel.org/r/20230124145127.189221-1-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-25tsnep: Fix TX queue stop/wake for multiple queuesGerhard Engleder
netif_stop_queue() and netif_wake_queue() act on TX queue 0. This is ok as long as only a single TX queue is supported. But support for multiple TX queues was introduced with 762031375d5c and I missed to adapt stop and wake of TX queues. Use netif_stop_subqueue() and netif_tx_wake_queue() to act on specific TX queue. Fixes: 762031375d5c ("tsnep: Support multiple TX/RX queue pairs") Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Link: https://lore.kernel.org/r/20230124191440.56887-1-gerhard@engleder-embedded.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-25net: Kconfig: fix spellosRandy Dunlap
Fix spelling in net/ Kconfig files. (reported by codespell) Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@netfilter.org> Cc: Florian Westphal <fw@strlen.de> Cc: coreteam@netfilter.org Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Link: https://lore.kernel.org/r/20230124181724.18166-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-25net/tg3: resolve deadlock in tg3_reset_task() during EEHDavid Christensen
During EEH error injection testing, a deadlock was encountered in the tg3 driver when tg3_io_error_detected() was attempting to cancel outstanding reset tasks: crash> foreach UN bt ... PID: 159 TASK: c0000000067c6000 CPU: 8 COMMAND: "eehd" ... #5 [c00000000681f990] __cancel_work_timer at c00000000019fd18 #6 [c00000000681fa30] tg3_io_error_detected at c00800000295f098 [tg3] #7 [c00000000681faf0] eeh_report_error at c00000000004e25c ... PID: 290 TASK: c000000036e5f800 CPU: 6 COMMAND: "kworker/6:1" ... #4 [c00000003721fbc0] rtnl_lock at c000000000c940d8 #5 [c00000003721fbe0] tg3_reset_task at c008000002969358 [tg3] #6 [c00000003721fc60] process_one_work at c00000000019e5c4 ... PID: 296 TASK: c000000037a65800 CPU: 21 COMMAND: "kworker/21:1" ... #4 [c000000037247bc0] rtnl_lock at c000000000c940d8 #5 [c000000037247be0] tg3_reset_task at c008000002969358 [tg3] #6 [c000000037247c60] process_one_work at c00000000019e5c4 ... PID: 655 TASK: c000000036f49000 CPU: 16 COMMAND: "kworker/16:2" ...:1 #4 [c0000000373ebbc0] rtnl_lock at c000000000c940d8 #5 [c0000000373ebbe0] tg3_reset_task at c008000002969358 [tg3] #6 [c0000000373ebc60] process_one_work at c00000000019e5c4 ... Code inspection shows that both tg3_io_error_detected() and tg3_reset_task() attempt to acquire the RTNL lock at the beginning of their code blocks. If tg3_reset_task() should happen to execute between the times when tg3_io_error_deteced() acquires the RTNL lock and tg3_reset_task_cancel() is called, a deadlock will occur. Moving tg3_reset_task_cancel() call earlier within the code block, prior to acquiring RTNL, prevents this from happening, but also exposes another deadlock issue where tg3_reset_task() may execute AFTER tg3_io_error_detected() has executed: crash> foreach UN bt PID: 159 TASK: c0000000067d2000 CPU: 9 COMMAND: "eehd" ... #4 [c000000006867a60] rtnl_lock at c000000000c940d8 #5 [c000000006867a80] tg3_io_slot_reset at c0080000026c2ea8 [tg3] #6 [c000000006867b00] eeh_report_reset at c00000000004de88 ... PID: 363 TASK: c000000037564000 CPU: 6 COMMAND: "kworker/6:1" ... #3 [c000000036c1bb70] msleep at c000000000259e6c #4 [c000000036c1bba0] napi_disable at c000000000c6b848 #5 [c000000036c1bbe0] tg3_reset_task at c0080000026d942c [tg3] #6 [c000000036c1bc60] process_one_work at c00000000019e5c4 ... This issue can be avoided by aborting tg3_reset_task() if EEH error recovery is already in progress. Fixes: db84bf43ef23 ("tg3: tg3_reset_task() needs to use rtnl_lock to synchronize") Signed-off-by: David Christensen <drc@linux.vnet.ibm.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://lore.kernel.org/r/20230124185339.225806-1-drc@linux.vnet.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-26arm64: dts: imx8mm: Fix pad control for UART1_DTE_RXPierluigi Passaro
According section     8.2.5.313 Select Input Register (IOMUXC_UART1_RXD_SELECT_INPUT) of      i.MX 8M Mini Applications Processor Reference Manual, Rev. 3, 11/2020 the required setting for this specific pin configuration is "1" Signed-off-by: Pierluigi Passaro <pierluigi.p@variscite.com> Reviewed-by: Fabio Estevam <festevam@gmail.com> Fixes: c1c9d41319c3 ("dt-bindings: imx: Add pinctrl binding doc for imx8mm") Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2023-01-25ksmbd: downgrade ndr version error message to debugNamjae Jeon
When user switch samba to ksmbd, The following message flood is coming when accessing files. Samba seems to changs dos attribute version to v5. This patch downgrade ndr version error message to debug. $ dmesg ... [68971.766914] ksmbd: v5 version is not supported [68971.779808] ksmbd: v5 version is not supported [68971.871544] ksmbd: v5 version is not supported [68971.910135] ksmbd: v5 version is not supported ... Cc: stable@vger.kernel.org Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-01-25ksmbd: limit pdu length size according to connection statusNamjae Jeon
Stream protocol length will never be larger than 16KB until session setup. After session setup, the size of requests will not be larger than 16KB + SMB2 MAX WRITE size. This patch limits these invalidly oversized requests and closes the connection immediately. Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers") Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-18259 Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-01-25cxl/pmem: Fix nvdimm unregistration when cxl_pmem driver is absentDan Williams
The cxl_pmem.ko module houses the driver for both cxl_nvdimm_bridge objects and cxl_nvdimm objects. When the core creates a cxl_nvdimm it arranges for it to be autoremoved when the bridge goes down. However, if the bridge never initialized because the cxl_pmem.ko module never loaded, it sets up a the following crash scenario: BUG: kernel NULL pointer dereference, address: 0000000000000478 [..] RIP: 0010:cxl_nvdimm_probe+0x99/0x140 [cxl_pmem] [..] Call Trace: <TASK> cxl_bus_probe+0x17/0x50 [cxl_core] really_probe+0xde/0x380 __driver_probe_device+0x78/0x170 driver_probe_device+0x1f/0x90 __driver_attach+0xd2/0x1c0 bus_for_each_dev+0x79/0xc0 bus_add_driver+0x1b1/0x200 driver_register+0x89/0xe0 cxl_pmem_init+0x50/0xff0 [cxl_pmem] It turns out the recent rework to simplify nvdimm probing obviated the need to unregister cxl_nvdimm objects at cxl_nvdimm_bridge ->remove() time. Leave the cxl_nvdimm device registered until the hosting cxl_memdev departs. The alternative is that the cxl_memdev needs to be reattached whenever the cxl_nvdimm_bridge attach state cycles, which is awkward and unnecessary. The only requirement is to make sure that when the cxl_nvdimm_bridge goes away any dependent cxl_nvdimm objects are shutdown. Handle that in unregister_nvdimm_bus(). With these registration entanglements removed there is no longer a need to pre-load the cxl_pmem module in cxl_acpi. Fixes: cb9cfff82f6a ("cxl/acpi: Simplify cxl_nvdimm_bridge probing") Reported-by: Gregory Price <gregory.price@memverge.com> Debugged-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/167426077263.3955046.9695309346988027311.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-01-25Merge branch 'Enable bpf_setsockopt() on ktls enabled sockets.'Martin KaFai Lau
Kui-Feng Lee says: ==================== This patchset implements a change to bpf_setsockopt() which allows ktls enabled sockets to be used with the SOL_TCP level. This is necessary as when ktls is enabled, it changes the function pointer of setsockopt of the socket, which bpf_setsockopt() checks in order to make sure that the socket is a TCP socket. Checking sk_protocol instead of the function pointer will ensure that bpf_setsockopt() with the SOL_TCP level still works on sockets with ktls enabled. The major differences form v2 are: - Add a read() call to make sure that the FIN has arrived. - Remove the dependency on other test's header. The major differences from v1 are: - Test with a IPv6 connect as well. - Use ASSERT_OK() v2: https://lore.kernel.org/bpf/20230124181220.2871611-1-kuifeng@meta.com/ v1: https://lore.kernel.org/bpf/20230121025716.3039933-1-kuifeng@meta.com/ ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-25selftests/bpf: Calls bpf_setsockopt() on a ktls enabled socket.Kui-Feng Lee
Ensures that whenever bpf_setsockopt() is called with the SOL_TCP option on a ktls enabled socket, the call will be accepted by the system. The provided test makes sure of this by performing an examination when the server side socket is in the CLOSE_WAIT state. At this stage, ktls is still enabled on the server socket and can be used to test if bpf_setsockopt() works correctly with linux. Signed-off-by: Kui-Feng Lee <kuifeng@meta.com> Link: https://lore.kernel.org/r/20230125201608.908230-3-kuifeng@meta.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-25bpf: Check the protocol of a sock to agree the calls to bpf_setsockopt().Kui-Feng Lee
Resolve an issue when calling sol_tcp_sockopt() on a socket with ktls enabled. Prior to this patch, sol_tcp_sockopt() would only allow calls if the function pointer of setsockopt of the socket was set to tcp_setsockopt(). However, any socket with ktls enabled would have its function pointer set to tls_setsockopt(). To resolve this issue, the patch adds a check of the protocol of the linux socket and allows bpf_setsockopt() to be called if ktls is initialized on the linux socket. This ensures that calls to sol_tcp_sockopt() will succeed on sockets with ktls enabled. Signed-off-by: Kui-Feng Lee <kuifeng@meta.com> Link: https://lore.kernel.org/r/20230125201608.908230-2-kuifeng@meta.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-25bcache: Silence memcpy() run-time false positive warningsKees Cook
struct bkey has internal padding in a union, but it isn't always named the same (e.g. key ## _pad, key_p, etc). This makes it extremely hard for the compiler to reason about the available size of copies done against such keys. Use unsafe_memcpy() for now, to silence the many run-time false positive warnings: memcpy: detected field-spanning write (size 264) of single field "&i->j" at drivers/md/bcache/journal.c:152 (size 240) memcpy: detected field-spanning write (size 24) of single field "&b->key" at drivers/md/bcache/btree.c:939 (size 16) memcpy: detected field-spanning write (size 24) of single field "&temp.key" at drivers/md/bcache/extents.c:428 (size 16) Reported-by: Alexandre Pereira <alexpereira@disroot.org> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216785 Acked-by: Coly Li <colyli@suse.de> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: linux-bcache@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230106060229.never.047-kees@kernel.org
2023-01-25gcc-plugins: Reorganize gimple includes for GCC 13Kees Cook
The gimple-iterator.h header must be included before gimple-fold.h starting with GCC 13. Reorganize gimple headers to work for all GCC versions. Reported-by: Palmer Dabbelt <palmer@rivosinc.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/all/20230113173033.4380-1-palmer@rivosinc.com/ Cc: linux-hardening@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2023-01-25kunit: memcpy: Split slow memcpy tests into MEMCPY_SLOW_KUNIT_TESTKees Cook
Since the long memcpy tests may stall a system for tens of seconds in virtualized architecture environments, split those tests off under CONFIG_MEMCPY_SLOW_KUNIT_TEST so they can be separately disabled. Reported-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/lkml/20221226195206.GA2626419@roeck-us.net Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-and-tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: David Gow <davidgow@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: linux-hardening@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2023-01-25drm/amd/display: Fix timing not changning when freesync video is enabledAurabindo Pillai
[Why&How] Switching between certain modes that are freesync video modes and those are not freesync video modes result in timing not changing as seen by the monitor due to incorrect timing being driven. The issue is fixed by ensuring that when a non freesync video mode is set, we reset the freesync status on the crtc. Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Acked-by: Alan Liu <HaoPing.Liu@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-25drm/display/dp_mst: Correct the kref of port.Wayne Lin
[why & how] We still need to refer to port while removing payload at commit_tail. we should keep the kref till then to release. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2171 Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Fixes: 4d07b0bc4034 ("drm/display/dp_mst: Move all payload info into the atomic state") Cc: stable@vger.kernel.org # 6.1 Acked-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Tested-by: Didier Raboud <odyx@debian.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-25drm/amdgpu/display/mst: update mst_mgr relevant variable when long HPDWayne Lin
[Why & How] Now the vc_start_slot is controlled at drm side. When we service a long HPD, we still need to run dm_helpers_dp_mst_write_payload_allocation_table() to update drm mst_mgr's relevant variable. Otherwise, on the next plug-in, payload will get assigned with a wrong start slot. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2171 Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Fixes: 4d07b0bc4034 ("drm/display/dp_mst: Move all payload info into the atomic state") Cc: stable@vger.kernel.org # 6.1 Acked-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Tested-by: Didier Raboud <odyx@debian.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-25drm/amdgpu/display/mst: limit payload to be updated one by oneWayne Lin
[Why] amdgpu expects to update payload table for one stream one time by calling dm_helpers_dp_mst_write_payload_allocation_table(). Currently, it get modified to try to update HW payload table at once by referring mst_state. [How] This is just a quick workaround. Should find way to remove the temporary struct dc_dp_mst_stream_allocation_table later if set struct link_mst_stream_allocatio directly is possible. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2171 Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Fixes: 4d07b0bc4034 ("drm/display/dp_mst: Move all payload info into the atomic state") Cc: stable@vger.kernel.org # 6.1 Acked-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Tested-by: Didier Raboud <odyx@debian.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-25drm/amdgpu/display/mst: Fix mst_state->pbn_div and slot count assignmentsLyude Paul
Looks like I made a pretty big mistake here without noticing: it seems when I moved the assignments of mst_state->pbn_div I completely missed the fact that the reason for us calling drm_dp_mst_update_slots() earlier was to account for the fact that we need to call this function using info from the root MST connector, instead of just trying to do this from each MST encoder's atomic check function. Otherwise, we end up filling out all of DC's link information with zeroes. So, let's restore that and hopefully fix this DSC regression. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2171 Signed-off-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Fixes: 4d07b0bc4034 ("drm/display/dp_mst: Move all payload info into the atomic state") Cc: stable@vger.kernel.org # 6.1 Reviewed-by: Harry Wentland <harry.wentland@amd.com> Tested-by: Didier Raboud <odyx@debian.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-25drm/amdgpu: declare firmware for new MES 11.0.4Li Ma
To support new mes ip block Signed-off-by: Li Ma <li.ma@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-25drm/amdgpu: enable imu firmware for GC 11.0.4Li Ma
The GC 11.0.4 needs load IMU to power up GFX before loads GFX firmware. Signed-off-by: Li Ma <li.ma@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-25drm/amd/pm: add missing AllowIHInterrupt message mapping for SMU13.0.0Evan Quan
Add SMU13.0.0 AllowIHInterrupt message mapping. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x
2023-01-25drm/amdgpu: remove unconditional trap enable on add gfx11 queuesJonathan Kim
Rebase of driver has incorrect unconditional trap enablement for GFX11 when adding mes queues. Reported-by: Graham Sider <graham.sider@amd.com> Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Graham Sider <graham.sider@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x
2023-01-25Merge branch 'Enable struct_ops programs to be sleepable'Alexei Starovoitov
David Vernet says: ==================== This is part 4 of https://lore.kernel.org/bpf/20230123232228.646563-1-void@manifault.com/ Part 3: https://lore.kernel.org/all/20230125050359.339273-1-void@manifault.com/ Part 2: https://lore.kernel.org/all/20230124160802.1122124-1-void@manifault.com/ Changelog: ---------- v3 -> v4: - Fix accidental typo in name of dummy_st_ops introduced in v2, moving it back to dummy_st_ops from dummy_st_ops_success. Should fix s390x testruns. v2 -> v3: - Don't call a KF_SLEEPABLE kfunc from the dummy_st_ops testsuite, and remove the newly added bpf_kfunc_call_test_sleepable() test kfunc (Martin). - Include vmlinux.h from progs/dummy_st_ops_success.c (previously progs/dummy_st_ops.c) rather than manually defining struct bpf_dummy_ops_state and struct bpf_dummy_ops. (Martin). - Fix a typo added to prog_tests/dummy_st_ops.c in a previous version: s/trace_dummy_st_ops_success__open/trace_dummy_st_ops__open. v1 -> v2: - Add support for specifying sleepable struct_ops programs with struct_ops.s in libbpf (Alexei). - Move failure test case into new dummy_st_ops_fail.c prog file. - Update test_dummy_sleepable() to use struct_ops.s instead of manually setting prog flags. Also remove open_load_skel() helper which is no longer needed. - Fix verifier tests to expect new sleepable prog failure message. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-25bpf/selftests: Verify struct_ops prog sleepable behaviorDavid Vernet
In a set of prior changes, we added the ability for struct_ops programs to be sleepable. This patch enhances the dummy_st_ops selftest suite to validate this behavior by adding a new sleepable struct_ops entry to dummy_st_ops. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230125164735.785732-5-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>