linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2024-07-01	Merge tag 'nf-next-24-06-28' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next into main Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for net-next: Patch #1 to #11 to shrink memory consumption for transaction objects: struct nft_trans_chain { /* size: 120 (-32), cachelines: 2, members: 10 / struct nft_trans_elem { / size: 72 (-40), cachelines: 2, members: 4 / struct nft_trans_flowtable { / size: 80 (-48), cachelines: 2, members: 5 / struct nft_trans_obj { / size: 72 (-40), cachelines: 2, members: 4 / struct nft_trans_rule { / size: 80 (-32), cachelines: 2, members: 6 / struct nft_trans_set { / size: 96 (-24), cachelines: 2, members: 8 / struct nft_trans_table { / size: 56 (-40), cachelines: 1, members: 2 / struct nft_trans_elem can now be allocated from kmalloc-96 instead of kmalloc-128 slab. Series from Florian Westphal. For the record, I have mangled patch #1 to add nft_trans_container_() and use if for every transaction object. I have also added BUILD_BUG_ON to ensure struct nft_trans always comes at the beginning of the container transaction object. And few minor cleanups, any new bugs are of my own. Patch #12 simplify check for SCTP GSO in IPVS, from Ismael Luceno. Patch #13 nf_conncount key length remains in the u32 bound, from Yunjian Wang. Patch #14 removes unnecessary check for CTA_TIMEOUT_L3PROTO when setting default conntrack timeouts via nfnetlink_cttimeout API, from Lin Ma. Patch #15 updates NFT_SECMARK_CTX_MAXLEN to 4096, SELinux could use larger secctx names than the existing 256 bytes length. Patch #16 adds a selftest to exercise nfnetlink_queue listeners leaving nfnetlink_queue, from Florian Westphal. Patch #17 increases hitcount from 255 to 65535 in xt_recent, from Phil Sutter. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01	tcp_metrics: add netlink protocol spec in YAML	Jakub Kicinski
	Add a protocol spec for tcp_metrics, so that it's accessible via YNL. Useful at the very least for testing fixes. In this episode of "10,000 ways to complicate netlink" the metric nest has defines which are off by 1. iproute2 does: struct rtattr *m[TCP_METRIC_MAX + 1 + 1]; parse_rtattr_nested(m, TCP_METRIC_MAX + 1, a); for (i = 0; i < TCP_METRIC_MAX + 1; i++) { // ... attr = m[i + 1]; This is too weird to support in YNL, add a new set of defines with _correct_ values to the official kernel header. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	selftests: drv-net: rss_ctx: convert to defer()	Jakub Kicinski
	Use just added defer(). Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20240627185502.3069139-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-28	selftests: drv-net: add ability to schedule cleanup with defer()	Jakub Kicinski
	This implements what I was describing in [1]. When writing a test author can schedule cleanup / undo actions right after the creation completes, eg: cmd("touch /tmp/file") defer(cmd, "rm /tmp/file") defer() takes the function name as first argument, and the rest are arguments for that function. defer()red functions are called in inverse order after test exits. It's also possible to capture them and execute earlier (in which case they get automatically de-queued). undo = defer(cmd, "rm /tmp/file") # ... some unsafe code ... undo.exec() As a nice safety all exceptions from defer()ed calls are captured, printed, and ignored (they do make the test fail, however). This addresses the common problem of exceptions in cleanup paths often being unhandled, leading to potential leaks. There is a global action queue, flushed by ksft_run(). We could support function level defers too, I guess, but there's no immediate need.. Link: https://lore.kernel.org/all/877cedb2ki.fsf@nvidia.com/ # [1] Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20240627185502.3069139-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-28	selftests: net: ksft: avoid continue when handling results	Jakub Kicinski
	Exception handlers print the result and use continue to skip the non-exception result printing. This makes inserting common post-test code hard. Refactor to avoid the continues and have only one ktap_result() call. Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20240627185502.3069139-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-28	selftests/net: Add test coverage for UDP GSO software fallback	Jakub Sitnicki
	Extend the existing test to exercise UDP GSO egress through devices with various offload capabilities, including lack of checksum offload, which is the default case for TUN/TAP devices. Test against a dummy device because it is simpler to set up then TUN/TAP. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240626-linux-udpgso-v2-2-422dfcbd6b48@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-28	selftests: netfilter: nft_queue.sh: add test for disappearing listener	Florian Westphal
	If userspace program exits while the queue its subscribed to has packets those need to be discarded. commit dc21c6cc3d69 ("netfilter: nfnetlink_queue: acquire rcu_read_lock() in instance_destroy_rcu()") fixed a (harmless) rcu splat that could be triggered in this case. Add a test case to cover this. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-06-28	selftests: mlxsw: mirror_gre: Obey TESTS	Petr Machata
	This test is unusual in that overriding TESTS does not change the tests to be run. Split the individual tests into several functions and invoke them through tests_run() as appropriate. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	selftests: libs: Drop unused functions	Petr Machata
	Nothing calls these. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	selftests: libs: Drop slow_path_trap_install()/_uninstall()	Petr Machata
	These functions are not used anymore. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	selftests: mirror_gre_lag_lacp: Drop unnecessary code	Petr Machata
	The selftest does not use functions from mirror_gre_lib, ditch the import. It does not use arping either, so drop the require_command as well. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	selftests: mlxsw: mirror_gre: Simplify	Petr Machata
	After the previous patch, the function test_span_failable() is always called with should_fail=1. Drop the argument and streamline the code. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	selftests: mirror: Drop dual SW/HW testing	Petr Machata
	The mirroring tests are currently run in a skip_hw and optionally a skip_sw mode. The former tests the SW datapath, the latter the HW datapath, if available. In order to be able to test SW datapath on HW loopbacks, traps are installed on ingress to get traffic from the HW datapath to the SW one. This adds an unnecessary complexity when it would be much simpler to just use a veth-based topology to test the SW datapath. Thus drop all the code that supports this dual testing. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	selftests: mirror: mirror_test(): Allow exact count of packets	Petr Machata
	The mirroring selftests work by sending ICMP traffic between two hosts. Along the way, this traffic is mirrored to a gretap netdevice, and counter taps are then installed strategically along the path of the mirrored traffic to verify the mirroring took place. The problem with this is that besides mirroring the primary traffic, any other service traffic is mirrored as well. At the same time, because the tests need to work in HW-offloaded scenarios, the ability of the device to do arbitrary packet inspection should not be taken for granted. Most tests therefore simply use matchall, one uses flower to match on IP address. As a result, the selftests are noisy, because besides the primary ICMP traffic, any amount of other service traffic is mirrored as well. mirror_test() accommodated this noisiness by giving the counters an allowance of several packets. But in the previous patch, where possible, counter taps were changed to match only on an exact ICMP message. At least in those cases, we can demand an exact number of packets to match. Where the tap is installed on a connective netdevice, the exact matching is not practical (though with u32, anything is possible). In those places, there should still be some leeway -- and probably bigger than before, because experience shows that these tests are very noisy. To that end, change mirror_test() so that it can be either called with an exact number to expect, or with an expression. Where leeway is needed, adjust callers to pass a ">= 10" instead of mere 10. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	selftests: mirror: do_test_span_dir_ips(): Install accurate taps	Petr Machata
	The mirroring selftests work by sending ICMP traffic between two hosts. Along the way, this traffic is mirrored to a gretap netdevice, and counter taps are then installed strategically along the path of the mirrored traffic to verify the mirroring took place. The problem with this is that besides mirroring the primary traffic, any other service traffic is mirrored as well. At the same time, because the tests need to work in HW-offloaded scenarios, the ability of the device to do arbitrary packet inspection should not be taken for granted. Most tests therefore simply use matchall, one uses flower to match on IP address. As a result, the selftests are noisy, because besides the primary ICMP traffic, any amount of other service traffic is mirrored as well. However, often the counter tap is installed at the remote end of the gretap tunnel. Since this is a SW-datapath scenario anyway, we can make the filter arbitrarily accurate. Thus in this patch, add parameters forward_type and backward_type to several mirroring test helpers, as some other helpers already have. Then change do_test_span_dir_ips() to instead of installing one generic tap and using it for test in both directions, install the tap for each direction separately, matching on the ICMP type given by these parameters. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	selftests: mirror_gre_lag_lacp: Check counters at tunnel	Petr Machata
	The test works by sending packets through a tunnel, whence they are forwarded to a LAG. One of the LAG children is removed from the LAG prior to the exercise, and the test then counts how many packets pass through the other one. The issue with this is that it counts all packets, not just the encapsulated ones. So instead add a second gretap endpoint to receive the sent packets, and check reception counters there. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	selftests: lib: tc_rule_stats_get(): Move default to argument definition	Petr Machata
	The argument $dir has a fallback value of "ingress". Move the fallback from the usage site to the argument definition block to make the fact clearer. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	selftests: mirror: Drop direction argument from several functions	Petr Machata
	The argument is not used by these functions except to propagate it for ultimately no purpose. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	selftests: libs: Expand "$@" where possible	Petr Machata
	In some functions, argument-forwarding through "$@" without listing the individual arguments explicitly is fundamental to the operation of a function. E.g. xfail_on_veth() should be able to run various tests in the fail-to-xfail regime, and usage of "$@" is appropriate as an abstraction mechanism. For functions such as simple_if_init(), $@ is a handy way to pass an array. In other functions, it's merely a mechanism to save some typing, which however ends up obscuring the real arguments and makes life hard for those that end up reading the code. This patch adds some of the implicit function arguments and correspondingly expands $@'s. In several cases this will come in handy as following patches adjust the parameter lists. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	ethtool: Add an interface for flashing transceiver modules' firmware	Danielle Ratson
	CMIS compliant modules such as QSFP-DD might be running a firmware that can be updated in a vendor-neutral way by exchanging messages between the host and the module as described in section 7.3.1 of revision 5.2 of the CMIS standard. Add a pair of new ethtool messages that allow: * User space to trigger firmware update of transceiver modules * The kernel to notify user space about the progress of the process The user interface is designed to be asynchronous in order to avoid RTNL being held for too long and to allow several modules to be updated simultaneously. The interface is designed with CMIS compliant modules in mind, but kept generic enough to accommodate future use cases, if these arise. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-27	selftests: net: add config for openvswitch	Aaron Conole
	The pmtu testing will require that the OVS module is installed, so do that. Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Simon Horman <horms@kernel.org> Signed-off-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/20240625172245.233874-8-aconole@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-27	selftests: net: Use the provided dpctl rather than the vswitchd for tests.	Aaron Conole
	The current pmtu test infrastucture requires an installed copy of the ovs-vswitchd userspace. This means that any automated or constrained environments may not have the requisite tools to run the tests. However, the pmtu tests don't require any special classifier processing. Indeed they are only using the vswitchd in the most basic mode - as a NORMAL switch. However, the ovs-dpctl kernel utility can now program all the needed basic flows to allow traffic to traverse the tunnels and provide support for at least testing some basic pmtu scenarios. More complicated flow pipelines can be added to the internal ovs test infrastructure, but that is work for the future. For now, enable the most common cases - wide mega flows with no other prerequisites. Enhance the pmtu testing to try testing using the internal utility, first. As a fallback, if the internal utility isn't running, then try with the ovs-vswitchd userspace tools. Additionally, make sure that when the pyroute2 package is not available the ovs-dpctl utility will error out to properly signal an error has occurred and skip using the internal utility. Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Aaron Conole <aconole@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240625172245.233874-7-aconole@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-27	selftests: openvswitch: Support implicit ipv6 arguments.	Aaron Conole
	The current iteration of IPv6 support requires explicit fields to be set in addition to not properly support the actual IPv6 addresses properly. With this change, make it so that the ipv6() bare option is usable to create wildcarded flows to match broad swaths of ipv6 traffic. Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Simon Horman <horms@kernel.org> Signed-off-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/20240625172245.233874-6-aconole@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-27	selftests: openvswitch: Add support for tunnel() key.	Aaron Conole
	This will be used when setting details about the tunnel to use as transport. There is a difference between the ODP format between tunnel(): the 'key' flag is not actually a flag field, so we don't support it in the same way that the vswitchd userspace supports displaying it. Signed-off-by: Aaron Conole <aconole@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240625172245.233874-5-aconole@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-27	selftests: openvswitch: Add set() and set_masked() support.	Aaron Conole
	These will be used in upcoming commits to set specific attributes for interacting with tunnels. Since set() will use the key parsing routine, we also make sure to prepend it with an open paren, for the action parsing to properly understand it. Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Simon Horman <horms@kernel.org> Signed-off-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/20240625172245.233874-4-aconole@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-27	selftests: openvswitch: Refactor actions parsing.	Aaron Conole
	Until recently, the ovs-dpctl utility was used with a limited actions set and didn't need to have support for multiple similar actions. However, when adding support for tunnels, it will be important to support multiple set() actions in a single flow. When printing these actions, the existing code will be unable to print all of the sets - it will only print the first. Refactor this code to be easier to read and support multiple actions of the same type in an action list. Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Simon Horman <horms@kernel.org> Signed-off-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/20240625172245.233874-3-aconole@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-27	selftests: openvswitch: Support explicit tunnel port creation.	Aaron Conole
	The OVS module can operate in conjunction with various types of tunnel ports. These are created as either explicit tunnel vport types, OR by creating a tunnel interface which acts as an anchor for the lightweight tunnel support. This patch adds the ability to add tunnel ports to an OVS datapath for testing various scenarios with tunnel ports. With this addition, the vswitch "plumbing" will at least be able to push packets around using the tunnel vports. Future patches will add support for setting required tunnel metadata for lwts in the datapath. The end goal will be to push packets via these tunnels, and will be used in an upcoming commit for testing the path MTU. Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Simon Horman <horms@kernel.org> Signed-off-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/20240625172245.233874-2-aconole@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-27	tools: ynl: use display hints for formatting of scalar attrs	Jakub Kicinski
	Use display hints for formatting scalar attrs. This is specifically useful for formatting IPv4 addresses carried typically as u32. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20240626201234.2572964-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-27	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: e3f02f32a050 ("ionic: fix kernel panic due to multi-buffer handling") d9c04209990b ("ionic: Mark error paths in the data path as unlikely") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-27	Merge tag 'net-6.10-rc6' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from can, bpf and netfilter. There are a bunch of regressions addressed here, but hopefully nothing spectacular. We are still waiting the driver fix from Intel, mentioned by Jakub in the previous networking pull. Current release - regressions: - core: add softirq safety to netdev_rename_lock - tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed TFO - batman-adv: fix RCU race at module unload time Previous releases - regressions: - openvswitch: get related ct labels from its master if it is not confirmed - eth: bonding: fix incorrect software timestamping report - eth: mlxsw: fix memory corruptions on spectrum-4 systems - eth: ionic: use dev_consume_skb_any outside of napi Previous releases - always broken: - netfilter: fully validate NFT_DATA_VALUE on store to data registers - unix: several fixes for OoB data - tcp: fix race for duplicate reqsk on identical SYN - bpf: - fix may_goto with negative offset - fix the corner case with may_goto and jump to the 1st insn - fix overrunning reservations in ringbuf - can: - j1939: recover socket queue on CAN bus error during BAM transmission - mcp251xfd: fix infinite loop when xmit fails - dsa: microchip: monitor potential faults in half-duplex mode - eth: vxlan: pull inner IP header in vxlan_xmit_one() - eth: ionic: fix kernel panic due to multi-buffer handling Misc: - selftest: unix tests refactor and a lot of new cases added" * tag 'net-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits) net: mana: Fix possible double free in error handling path selftest: af_unix: Check SIOCATMARK after every send()/recv() in msg_oob.c. af_unix: Fix wrong ioctl(SIOCATMARK) when consumed OOB skb is at the head. selftest: af_unix: Check EPOLLPRI after every send()/recv() in msg_oob.c selftest: af_unix: Check SIGURG after every send() in msg_oob.c selftest: af_unix: Add SO_OOBINLINE test cases in msg_oob.c af_unix: Don't stop recv() at consumed ex-OOB skb. selftest: af_unix: Add non-TCP-compliant test cases in msg_oob.c. af_unix: Don't stop recv(MSG_DONTWAIT) if consumed OOB skb is at the head. af_unix: Stop recv(MSG_PEEK) at consumed OOB skb. selftest: af_unix: Add msg_oob.c. selftest: af_unix: Remove test_unix_oob.c. tracing/net_sched: NULL pointer dereference in perf_trace_qdisc_reset() netfilter: nf_tables: fully validate NFT_DATA_VALUE on store to data registers net: usb: qmi_wwan: add Telit FN912 compositions tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed TFO ionic: use dev_consume_skb_any outside of napi net: dsa: microchip: fix wrong register write when masking interrupt Fix race for duplicate reqsk on identical SYN ibmvnic: Add tx check to prevent skb leak ...
2024-06-27	selftest: af_unix: Check SIOCATMARK after every send()/recv() in msg_oob.c.	Kuniyuki Iwashima
	To catch regression, let's check ioctl(SIOCATMARK) after every send() and recv() calls. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-27	af_unix: Fix wrong ioctl(SIOCATMARK) when consumed OOB skb is at the head.	Kuniyuki Iwashima
	Even if OOB data is recv()ed, ioctl(SIOCATMARK) must return 1 when the OOB skb is at the head of the receive queue and no new OOB data is queued. Without fix: # RUN msg_oob.no_peek.oob ... # msg_oob.c:305:oob:Expected answ[0] (0) == oob_head (1) # oob: Test terminated by assertion # FAIL msg_oob.no_peek.oob not ok 2 msg_oob.no_peek.oob With fix: # RUN msg_oob.no_peek.oob ... # OK msg_oob.no_peek.oob ok 2 msg_oob.no_peek.oob Fixes: 314001f0bf92 ("af_unix: Add OOB support") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-27	selftest: af_unix: Check EPOLLPRI after every send()/recv() in msg_oob.c	Kuniyuki Iwashima
	When OOB data is in recvq, we can detect it with epoll by checking EPOLLPRI. This patch add checks for EPOLLPRI after every send() and recv() in all test cases. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-27	selftest: af_unix: Check SIGURG after every send() in msg_oob.c	Kuniyuki Iwashima
	When data is sent with MSG_OOB, SIGURG is sent to a process if the receiver socket has set its owner to the process by ioctl(FIOSETOWN) or fcntl(F_SETOWN). This patch adds SIGURG check after every send(MSG_OOB) call. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-27	selftest: af_unix: Add SO_OOBINLINE test cases in msg_oob.c	Kuniyuki Iwashima
	When SO_OOBINLINE is enabled on a socket, MSG_OOB can be recv()ed without MSG_OOB flag, and ioctl(SIOCATMARK) will behaves differently. This patch adds some test cases for SO_OOBINLINE. Note the new test cases found two bugs in TCP. 1) After reading OOB data with non-inline mode, we can re-read the data by setting SO_OOBINLINE. # RUN msg_oob.no_peek.inline_oob_ahead_break ... # msg_oob.c:146:inline_oob_ahead_break:AF_UNIX :world # msg_oob.c:147:inline_oob_ahead_break:TCP :oworld # OK msg_oob.no_peek.inline_oob_ahead_break ok 14 msg_oob.no_peek.inline_oob_ahead_break 2) The head OOB data is dropped if SO_OOBINLINE is disabled if a new OOB data is queued. # RUN msg_oob.no_peek.inline_ex_oob_drop ... # msg_oob.c:171:inline_ex_oob_drop:AF_UNIX :x # msg_oob.c:172:inline_ex_oob_drop:TCP :y # msg_oob.c:146:inline_ex_oob_drop:AF_UNIX :y # msg_oob.c:147:inline_ex_oob_drop:TCP :Resource temporarily unavailable # OK msg_oob.no_peek.inline_ex_oob_drop ok 17 msg_oob.no_peek.inline_ex_oob_drop Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-27	af_unix: Don't stop recv() at consumed ex-OOB skb.	Kuniyuki Iwashima
	Currently, recv() is stopped at a consumed OOB skb even if a new OOB skb is queued and we can ignore the old OOB skb. >>> from socket import * >>> c1, c2 = socket(AF_UNIX, SOCK_STREAM) >>> c1.send(b'hellowor', MSG_OOB) 8 >>> c2.recv(1, MSG_OOB) # consume OOB data stays at middle of recvq. b'r' >>> c1.send(b'ld', MSG_OOB) 2 >>> c2.recv(10) # recv() stops at the old consumed OOB b'hellowo' # should be 'hellowol' manage_oob() should not stop recv() at the old consumed OOB skb if there is a new OOB data queued. Note that TCP behaviour is apparently wrong in this test case because we can recv() the same OOB data twice. Without fix: # RUN msg_oob.no_peek.ex_oob_ahead_break ... # msg_oob.c:138:ex_oob_ahead_break:AF_UNIX :hellowo # msg_oob.c:139:ex_oob_ahead_break:Expected:hellowol # msg_oob.c:141:ex_oob_ahead_break:Expected ret[0] (7) == expected_len (8) # ex_oob_ahead_break: Test terminated by assertion # FAIL msg_oob.no_peek.ex_oob_ahead_break not ok 11 msg_oob.no_peek.ex_oob_ahead_break With fix: # RUN msg_oob.no_peek.ex_oob_ahead_break ... # msg_oob.c:146:ex_oob_ahead_break:AF_UNIX :hellowol # msg_oob.c:147:ex_oob_ahead_break:TCP :helloworl # OK msg_oob.no_peek.ex_oob_ahead_break ok 11 msg_oob.no_peek.ex_oob_ahead_break Fixes: 314001f0bf92 ("af_unix: Add OOB support") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-27	selftest: af_unix: Add non-TCP-compliant test cases in msg_oob.c.	Kuniyuki Iwashima
	While testing, I found some weird behaviour on the TCP side as well. For example, TCP drops the preceding OOB data when queueing a new OOB data if the old OOB data is at the head of recvq. # RUN msg_oob.no_peek.ex_oob_drop ... # msg_oob.c:146:ex_oob_drop:AF_UNIX :x # msg_oob.c:147:ex_oob_drop:TCP :Resource temporarily unavailable # msg_oob.c:146:ex_oob_drop:AF_UNIX :y # msg_oob.c:147:ex_oob_drop:TCP :Invalid argument # OK msg_oob.no_peek.ex_oob_drop ok 9 msg_oob.no_peek.ex_oob_drop # RUN msg_oob.no_peek.ex_oob_drop_2 ... # msg_oob.c:146:ex_oob_drop_2:AF_UNIX :x # msg_oob.c:147:ex_oob_drop_2:TCP :Resource temporarily unavailable # OK msg_oob.no_peek.ex_oob_drop_2 ok 10 msg_oob.no_peek.ex_oob_drop_2 This patch allows AF_UNIX's MSG_OOB implementation to produce different results from TCP when operations are guarded with tcp_incompliant{}. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-27	af_unix: Don't stop recv(MSG_DONTWAIT) if consumed OOB skb is at the head.	Kuniyuki Iwashima
	Let's say a socket send()s "hello" with MSG_OOB and "world" without flags, >>> from socket import * >>> c1, c2 = socketpair(AF_UNIX) >>> c1.send(b'hello', MSG_OOB) 5 >>> c1.send(b'world') 5 and its peer recv()s "hell" and "o". >>> c2.recv(10) b'hell' >>> c2.recv(1, MSG_OOB) b'o' Now the consumed OOB skb stays at the head of recvq to return a correct value for ioctl(SIOCATMARK), which is broken now and fixed by a later patch. Then, if peer issues recv() with MSG_DONTWAIT, manage_oob() returns NULL, so recv() ends up with -EAGAIN. >>> c2.setblocking(False) # This causes -EAGAIN even with available data >>> c2.recv(5) Traceback (most recent call last): File "<stdin>", line 1, in <module> BlockingIOError: [Errno 11] Resource temporarily unavailable However, next recv() will return the following available data, "world". >>> c2.recv(5) b'world' When the consumed OOB skb is at the head of the queue, we need to fetch the next skb to fix the weird behaviour. Note that the issue does not happen without MSG_DONTWAIT because we can retry after manage_oob(). This patch also adds a test case that covers the issue. Without fix: # RUN msg_oob.no_peek.ex_oob_break ... # msg_oob.c:134:ex_oob_break:AF_UNIX :Resource temporarily unavailable # msg_oob.c:135:ex_oob_break:Expected:ld # msg_oob.c:137:ex_oob_break:Expected ret[0] (-1) == expected_len (2) # ex_oob_break: Test terminated by assertion # FAIL msg_oob.no_peek.ex_oob_break not ok 8 msg_oob.no_peek.ex_oob_break With fix: # RUN msg_oob.no_peek.ex_oob_break ... # OK msg_oob.no_peek.ex_oob_break ok 8 msg_oob.no_peek.ex_oob_break Fixes: 314001f0bf92 ("af_unix: Add OOB support") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-27	af_unix: Stop recv(MSG_PEEK) at consumed OOB skb.	Kuniyuki Iwashima
	After consuming OOB data, recv() reading the preceding data must break at the OOB skb regardless of MSG_PEEK. Currently, MSG_PEEK does not stop recv() for AF_UNIX, and the behaviour is not compliant with TCP. >>> from socket import * >>> c1, c2 = socketpair(AF_UNIX) >>> c1.send(b'hello', MSG_OOB) 5 >>> c1.send(b'world') 5 >>> c2.recv(1, MSG_OOB) b'o' >>> c2.recv(9, MSG_PEEK) # This should return b'hell' b'hellworld' # even with enough buffer. Let's fix it by returning NULL for consumed skb and unlinking it only if MSG_PEEK is not specified. This patch also adds test cases that add recv(MSG_PEEK) before each recv(). Without fix: # RUN msg_oob.peek.oob_ahead_break ... # msg_oob.c:134:oob_ahead_break:AF_UNIX :hellworld # msg_oob.c:135:oob_ahead_break:Expected:hell # msg_oob.c:137:oob_ahead_break:Expected ret[0] (9) == expected_len (4) # oob_ahead_break: Test terminated by assertion # FAIL msg_oob.peek.oob_ahead_break not ok 13 msg_oob.peek.oob_ahead_break With fix: # RUN msg_oob.peek.oob_ahead_break ... # OK msg_oob.peek.oob_ahead_break ok 13 msg_oob.peek.oob_ahead_break Fixes: 314001f0bf92 ("af_unix: Add OOB support") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-27	selftest: af_unix: Add msg_oob.c.	Kuniyuki Iwashima
	AF_UNIX's MSG_OOB functionality lacked thorough testing, and we found some bizarre behaviour. The new selftest validates every MSG_OOB operation against TCP as a reference implementation. This patch adds only a few tests with basic send() and recv() that do not fail. The following patches will add more test cases for SO_OOBINLINE, SIGURG, EPOLLPRI, and SIOCATMARK. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-27	selftest: af_unix: Remove test_unix_oob.c.	Kuniyuki Iwashima
	test_unix_oob.c does not fully cover AF_UNIX's MSG_OOB functionality, thus there are discrepancies between TCP behaviour. Also, the test uses fork() to create message producer, and it's not easy to understand and add more test cases. Let's remove test_unix_oob.c and rewrite a new test. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-26	selftests: drv-net: rss_ctx: add tests for RSS configuration and contexts	Jakub Kicinski
	Add tests focusing on indirection table configuration and creating extra RSS contexts in drivers which support it. $ export NETIF=eth0 REMOTE_... $ ./drivers/net/hw/rss_ctx.py KTAP version 1 1..8 ok 1 rss_ctx.test_rss_key_indir ok 2 rss_ctx.test_rss_context ok 3 rss_ctx.test_rss_context4 # Increasing queue count 44 -> 66 # Failed to create context 32, trying to test what we got ok 4 rss_ctx.test_rss_context32 # SKIP Tested only 31 contexts, wanted 32 ok 5 rss_ctx.test_rss_context_overlap ok 6 rss_ctx.test_rss_context_overlap2 # .. sprays traffic like a headless chicken .. not ok 7 rss_ctx.test_rss_context_out_of_order ok 8 rss_ctx.test_rss_context4_create_with_cfg # Totals: pass:6 fail:1 xfail:0 xpass:0 skip:1 error:0 Note that rss_ctx.test_rss_context_out_of_order fails with the device I tested with, but it seems to be a device / driver bug. Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240626012456.2326192-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-26	selftests: drv-net: add ability to wait for at least N packets to load gen	Jakub Kicinski
	Teach the load generator how to wait for at least given number of packets to be received. This will be useful for filtering where we'll want to send a non-trivial number of packets and make sure they landed in right queues. Reviewed-by: Breno Leitao <leitao@debian.org> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240626012456.2326192-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-26	selftests: drv-net: add helper to wait for HW stats to sync	Jakub Kicinski
	Some devices DMA stats to the host periodically. Add a helper which can wait for that to happen, based on frequency reported by the driver in ethtool. Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240626012456.2326192-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-26	selftests: drv-net: try to check if port is in use	Jakub Kicinski
	We use random ports for communication. As Willem predicted this leads to occasional failures. Try to check if port is already in use by opening a socket and binding to that port. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240626012456.2326192-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-25	selftests: net: remove unneeded IP_GRE config	Yujie Liu
	It seems that there is no definition for config IP_GRE, and it is not a dependency of other configs, so remove it. linux$ find -name Kconfig \| xargs grep "IP_GRE" <-- nothing There is a IPV6_GRE config defined in net/ipv6/Kconfig. It only depends on NET_IPGRE_DEMUX but not IP_GRE. Signed-off-by: Yujie Liu <yujie.liu@intel.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20240624055539.2092322-1-yujie.liu@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24	selftests/mm:fix test_prctl_fork_exec return failure	aigourensheng
	After calling fork() in test_prctl_fork_exec(), the global variable ksm_full_scans_fd is initialized to 0 in the child process upon entering the main function of ./ksm_functional_tests. In the function call chain test_child_ksm() -> __mmap_and_merge_range -> ksm_merge-> ksm_get_full_scans, start_scans = ksm_get_full_scans() will return an error. Therefore, the value of ksm_full_scans_fd needs to be initialized before calling test_child_ksm in the child process. Link: https://lkml.kernel.org/r/20240617052934.5834-1-shechenglong001@gmail.com Signed-off-by: aigourensheng <shechenglong001@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-24	Merge tag 'for-netdev' of ↵	Jakub Kicinski
	ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-06-24 We've added 12 non-merge commits during the last 10 day(s) which contain a total of 10 files changed, 412 insertions(+), 16 deletions(-). The main changes are: 1) Fix a BPF verifier issue validating may_goto with a negative offset, from Alexei Starovoitov. 2) Fix a BPF verifier validation bug with may_goto combined with jump to the first instruction, also from Alexei Starovoitov. 3) Fix a bug with overrunning reservations in BPF ring buffer, from Daniel Borkmann. 4) Fix a bug in BPF verifier due to missing proper var_off setting related to movsx instruction, from Yonghong Song. 5) Silence unnecessary syzkaller-triggered warning in __xdp_reg_mem_model(), from Daniil Dulov. * tag 'for-netdev' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: xdp: Remove WARN() from __xdp_reg_mem_model() selftests/bpf: Add tests for may_goto with negative offset. bpf: Fix may_goto with negative offset. selftests/bpf: Add more ring buffer test coverage bpf: Fix overrunning reservations in ringbuf selftests/bpf: Tests with may_goto and jumps to the 1st insn bpf: Fix the corner case with may_goto and jump to the 1st insn. bpf: Update BPF LSM maintainer list bpf: Fix remap of arena. selftests/bpf: Add a few tests to cover bpf: Add missed var_off setting in coerce_subreg_to_size_sx() bpf: Add missed var_off setting in set_sext32_default_val() ==================== Link: https://patch.msgid.link/20240624124330.8401-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24	selftests/bpf: Add tests for may_goto with negative offset.	Alexei Starovoitov
	Add few tests with may_goto and negative offset. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240619235355.85031-2-alexei.starovoitov@gmail.com
2024-06-24	selftests/bpf: Add more ring buffer test coverage	Daniel Borkmann
	Add test coverage for reservations beyond the ring buffer size in order to validate that bpf_ringbuf_reserve() rejects the request with NULL, all other ring buffer tests keep passing as well: # ./vmtest.sh -- ./test_progs -t ringbuf [...] ./test_progs -t ringbuf [ 1.165434] bpf_testmod: loading out-of-tree module taints kernel. [ 1.165825] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel [ 1.284001] tsc: Refined TSC clocksource calibration: 3407.982 MHz [ 1.286871] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fc34e357, max_idle_ns: 440795379773 ns [ 1.289555] clocksource: Switched to clocksource tsc #274/1 ringbuf/ringbuf:OK #274/2 ringbuf/ringbuf_n:OK #274/3 ringbuf/ringbuf_map_key:OK #274/4 ringbuf/ringbuf_write:OK #274 ringbuf:OK #275 ringbuf_multi:OK [...] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> [ Test fixups for getting BPF CI back to work ] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240621140828.18238-2-daniel@iogearbox.net