summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-11-01skmsg: Lose offset info in sk_psock_skb_ingressLiu Jian
If sockmap enable strparser, there are lose offset info in sk_psock_skb_ingress(). If the length determined by parse_msg function is not skb->len, the skb will be converted to sk_msg multiple times, and userspace app will get the data multiple times. Fix this by get the offset and length from strp_msg. And as Cong suggested, add one bit in skb->_sk_redir to distinguish enable or disable strparser. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20211029141216.211899-1-liujian56@huawei.com
2021-11-01selftests/bpf: Fix strobemeta selftest regressionAndrii Nakryiko
After most recent nightly Clang update strobemeta selftests started failing with the following error (relevant portion of assembly included): 1624: (85) call bpf_probe_read_user_str#114 1625: (bf) r1 = r0 1626: (18) r2 = 0xfffffffe 1628: (5f) r1 &= r2 1629: (55) if r1 != 0x0 goto pc+7 1630: (07) r9 += 104 1631: (6b) *(u16 *)(r9 +0) = r0 1632: (67) r0 <<= 32 1633: (77) r0 >>= 32 1634: (79) r1 = *(u64 *)(r10 -456) 1635: (0f) r1 += r0 1636: (7b) *(u64 *)(r10 -456) = r1 1637: (79) r1 = *(u64 *)(r10 -368) 1638: (c5) if r1 s< 0x1 goto pc+778 1639: (bf) r6 = r8 1640: (0f) r6 += r7 1641: (b4) w1 = 0 1642: (6b) *(u16 *)(r6 +108) = r1 1643: (79) r3 = *(u64 *)(r10 -352) 1644: (79) r9 = *(u64 *)(r10 -456) 1645: (bf) r1 = r9 1646: (b4) w2 = 1 1647: (85) call bpf_probe_read_user_str#114 R1 unbounded memory access, make sure to bounds check any such access In the above code r0 and r1 are implicitly related. Clang knows that, but verifier isn't able to infer this relationship. Yonghong Song narrowed down this "regression" in code generation to a recent Clang optimization change ([0]), which for BPF target generates code pattern that BPF verifier can't handle and loses track of register boundaries. This patch works around the issue by adding an BPF assembly-based helper that helps to prove to the verifier that upper bound of the register is a given constant by controlling the exact share of generated BPF instruction sequence. This fixes the immediate issue for strobemeta selftest. [0] https://github.com/llvm/llvm-project/commit/acabad9ff6bf13e00305d9d8621ee8eafc1f8b08 Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20211029182907.166910-1-andrii@kernel.org
2021-11-01Merge tag 'locks-v5.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking updates from Jeff Layton: "Most of this is just follow-on cleanup work of documentation and comments from the mandatory locking removal in v5.15. The only real functional change is that LOCK_MAND flock() support is also being removed, as it has basically been non-functional since the v2.5 days" * tag 'locks-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: fs: remove leftover comments from mandatory locking removal locks: remove changelog comments docs: fs: locks.rst: update comment about mandatory file locking Documentation: remove reference to now removed mandatory-locking doc locks: remove LOCK_MAND flock lock support
2021-11-01bpf: Disallow unprivileged bpf by defaultPawan Gupta
Disabling unprivileged BPF would help prevent unprivileged users from creating certain conditions required for potential speculative execution side-channel attacks on unmitigated affected hardware. A deep dive on such attacks and current mitigations is available here [0]. Sync with what many distros are currently applying already, and disable unprivileged BPF by default. An admin can enable this at runtime, if necessary, as described in 08389d888287 ("bpf: Add kconfig knob for disabling unpriv bpf by default"). [0] "BPF and Spectre: Mitigating transient execution attacks", Daniel Borkmann, eBPF Summit '21 https://ebpf.io/summit-2021-slides/eBPF_Summit_2021-Keynote-Daniel_Borkmann-BPF_and_Spectre.pdf Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/bpf/0ace9ce3f97656d5f62d11093ad7ee81190c3c25.1635535215.git.pawan.kumar.gupta@linux.intel.com
2021-11-01Merge tag 'tpmdd-next-v5.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd Pull tpm updates from Jarkko Sakkinen: "Only bug fixes" * tag 'tpmdd-next-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: tpm_tis_spi: Add missing SPI ID tpm: fix Atmel TPM crash caused by too frequent queries tpm: Check for integer overflow in tpm2_map_response_body() tpm: tis: Kconfig: Add helper dependency on COMPILE_TEST
2021-11-01Merge tag 'folio-5.16' of git://git.infradead.org/users/willy/pagecacheLinus Torvalds
Pull memory folios from Matthew Wilcox: "Add memory folios, a new type to represent either order-0 pages or the head page of a compound page. This should be enough infrastructure to support filesystems converting from pages to folios. The point of all this churn is to allow filesystems and the page cache to manage memory in larger chunks than PAGE_SIZE. The original plan was to use compound pages like THP does, but I ran into problems with some functions expecting only a head page while others expect the precise page containing a particular byte. The folio type allows a function to declare that it's expecting only a head page. Almost incidentally, this allows us to remove various calls to VM_BUG_ON(PageTail(page)) and compound_head(). This converts just parts of the core MM and the page cache. For 5.17, we intend to convert various filesystems (XFS and AFS are ready; other filesystems may make it) and also convert more of the MM and page cache to folios. For 5.18, multi-page folios should be ready. The multi-page folios offer some improvement to some workloads. The 80% win is real, but appears to be an artificial benchmark (postgres startup, which isn't a serious workload). Real workloads (eg building the kernel, running postgres in a steady state, etc) seem to benefit between 0-10%. I haven't heard of any performance losses as a result of this series. Nobody has done any serious performance tuning; I imagine that tweaking the readahead algorithm could provide some more interesting wins. There are also other places where we could choose to create large folios and currently do not, such as writes that are larger than PAGE_SIZE. I'd like to thank all my reviewers who've offered review/ack tags: Christoph Hellwig, David Howells, Jan Kara, Jeff Layton, Johannes Weiner, Kirill A. Shutemov, Michal Hocko, Mike Rapoport, Vlastimil Babka, William Kucharski, Yu Zhao and Zi Yan. I'd also like to thank those who gave feedback I incorporated but haven't offered up review tags for this part of the series: Nick Piggin, Mel Gorman, Ming Lei, Darrick Wong, Ted Ts'o, John Hubbard, Hugh Dickins, and probably a few others who I forget" * tag 'folio-5.16' of git://git.infradead.org/users/willy/pagecache: (90 commits) mm/writeback: Add folio_write_one mm/filemap: Add FGP_STABLE mm/filemap: Add filemap_get_folio mm/filemap: Convert mapping_get_entry to return a folio mm/filemap: Add filemap_add_folio() mm/filemap: Add filemap_alloc_folio mm/page_alloc: Add folio allocation functions mm/lru: Add folio_add_lru() mm/lru: Convert __pagevec_lru_add_fn to take a folio mm: Add folio_evictable() mm/workingset: Convert workingset_refault() to take a folio mm/filemap: Add readahead_folio() mm/filemap: Add folio_mkwrite_check_truncate() mm/filemap: Add i_blocks_per_folio() mm/writeback: Add folio_redirty_for_writepage() mm/writeback: Add folio_account_redirty() mm/writeback: Add folio_clear_dirty_for_io() mm/writeback: Add folio_cancel_dirty() mm/writeback: Add folio_account_cleaned() mm/writeback: Add filemap_dirty_folio() ...
2021-11-01Merge branch 'SMC-tracepoints'David S. Miller
Tony Lu says: ==================== Tracepoints for SMC This patch set introduces tracepoints for SMC, including the tracepoints basic code. The tracepoitns would help us to track SMC's behaviors by automatic tools, or other BPF tools, and zero overhead if not enabled. Compared with kprobe and other dymatic tools, the tracepoints are considered as stable API, and less overhead for tracing with easy-to-use API. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01net/smc: Introduce tracepoint for smcr link downTony Lu
SMC-R link down event is important to help us find links' issues, we should track this event, especially in the single nic mode, which means upper layer connection would be shut down. Then find out the direct link-down reason in time, not only increased the counter, also the location of the code who triggered this event. Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01net/smc: Introduce tracepoints for tx and rx msgTony Lu
This introduce two tracepoints for smc tx and rx msg to help us diagnosis issues of data path. These two tracepoitns don't cover the path of CORK or MSG_MORE in tx, just the top half of data path. Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01net/smc: Introduce tracepoint for fallbackTony Lu
This introduces tracepoint for smc fallback to TCP, so that we can track which connection and why it fallbacks, and map the clcsocks' pointer with /proc/net/tcp to find more details about TCP connections. Compared with kprobe or other dynamic tracing, tracepoints are stable and easy to use. Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01Merge branch 'amt-driver'David S. Miller
Taehee Yoo says: ==================== amt: add initial driver for Automatic Multicast Tunneling (AMT) This is an implementation of AMT(Automatic Multicast Tunneling), RFC 7450. https://datatracker.ietf.org/doc/html/rfc7450 This implementation supports IGMPv2, IGMPv3, MLDv1, MLDv2, and IPv4 underlay. Summary of RFC 7450 The purpose of this protocol is to provide multicast tunneling. The main use-case of this protocol is to provide delivery multicast traffic from a multicast-enabled network to sites that lack multicast connectivity to the source network. There are two roles in AMT protocol, Gateway, and Relay. The main purpose of Gateway mode is to forward multicast listening information(IGMP, MLD) to the source. The main purpose of Relay mode is to forward multicast data to listeners. These multicast traffics(IGMP, MLD, multicast data packets) are tunneled. Listeners are located behind Gateway endpoint. But gateway itself can be a listener too. Senders are located behind Relay endpoint. ___________ _________ _______ ________ | | | | | | | | | Listeners <-----> Gateway <-----> Relay <-----> Source | |___________| |_________| |_______| |________| IGMP/MLD---------(encap)-----------> <-------------(decap)--------(encap)------Multicast Data Usage of AMT interface 1. Create gateway interface ip link add amtg type amt mode gateway local 10.0.0.1 discovery 10.0.0.2 \ dev gw1_rt gateway_port 2268 relay_port 2268 2. Create Relay interface ip link add amtr type amt mode relay local 10.0.0.2 dev relay_rt \ relay_port 2268 max_tunnels 4 v1 -> v2: - Eliminate sparse warnings. - Use bool type instead of __be16 for identifying v4/v6 protocol. v2 -> v3: - Fix compile warning due to unsed variable. - Add missing spinlock comment. - Update help message of amt in Kconfig. v3 -> v4: - Split patch. - Use CHECKSUM_NONE instead of CHECKSUM_UNNECESSARY. - Fix compile error. v4 -> v5: - Remove unnecessary rcu_read_lock(). - Remove unnecessary amt_change_mtu(). - Change netlink error message. - Add validation for IFLA_AMT_LOCAL_IP and IFLA_AMT_DISCOVERY_IP. - Add comments in amt.h. - Add missing dev_put() in error path of amt_newlink(). - Fix typo. - Add BUILD_BUG_ON() in amt_smb_cb(). - Use macro instead of magic values. - Use kzalloc() instead of kmalloc(). - Add selftest script. v5 -> v6: - Reset remote_ip in amt_dev_stop(). v6 -> v7: - Fix compile error. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01selftests: add amt interface selftest scriptTaehee Yoo
This is selftest script for amt interface. This script includes basic forwarding scenarion and torture scenario. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01amt: add mld report message handlerTaehee Yoo
In the previous patch, igmp report handler was added. That handler can be used for mld too. So, it uses that common code to parse mld report message. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01amt: add multicast(IGMP) report message handlerTaehee Yoo
amt 'Relay' interface manages multicast groups(igmp/mld) and sources. In order to manage, it should have the function to parse igmp/mld report messages. So, this adds the logic for parsing igmp report messages and saves them on their own data structure. struct amt_group_node means one group(igmp/mld). struct amt_source_node means one source. The same source can't exist in the same group. The same group can exist in the same tunnel because it manages the host address too. The group information is used when forwarding multicast data. If there are no groups in the specific tunnel, Relay doesn't forward it. Although Relay manages sources, it doesn't support the source filtering feature. Because the reason to manage sources is just that in order to manage group more correctly. In the next patch, MLD part will be added. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01amt: add data plane of amt interfaceTaehee Yoo
Before forwarding multicast traffic, the amt interface establishes between gateway and relay. In order to establish, amt defined some message type and those message flow looks like the below. Gateway Relay ------- ----- : Request : [1] | N | |---------------------->| | Membership Query | [2] | N,MAC,gADDR,gPORT | |<======================| [3] | Membership Update | | ({G:INCLUDE({S})}) | |======================>| | | ---------------------:-----------------------:--------------------- | | | | | | *Multicast Data | *IP Packet(S,G) | | | gADDR,gPORT |<-----------------() | | *IP Packet(S,G) |<======================| | | ()<-----------------| | | | | | | ---------------------:-----------------------:--------------------- ~ ~ ~ Request ~ [4] | N' | |---------------------->| | Membership Query | [5] | N',MAC',gADDR',gPORT' | |<======================| [6] | | | Teardown | | N,MAC,gADDR,gPORT | |---------------------->| | | [7] | Membership Update | | ({G:INCLUDE({S})}) | |======================>| | | ---------------------:-----------------------:--------------------- | | | | | | *Multicast Data | *IP Packet(S,G) | | | gADDR',gPORT' |<-----------------() | | *IP Packet (S,G) |<======================| | | ()<-----------------| | | | | | | ---------------------:-----------------------:--------------------- | | : : 1. Discovery - Sent by Gateway to Relay - To find Relay unique ip address 2. Advertisement - Sent by Relay to Gateway - Contains the unique IP address 3. Request - Sent by Gateway to Relay - Solicit to receive 'Query' message. 4. Query - Sent by Relay to Gateway - Contains General Query message. 5. Update - Sent by Gateway to Relay - Contains report message. 6. Multicast Data - Sent by Relay to Gateway - encapsulated multicast traffic. 7. Teardown - Not supported at this time. Except for the Teardown message, it supports all messages. In the next patch, IGMP/MLD logic will be added. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01amt: add control plane of amt interfaceTaehee Yoo
It adds definitions and control plane code for AMT. this is very similar to udp tunneling interfaces such as gtp, vxlan, etc. In the next patch, data plane code will be added. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01Merge branch 'netdevsim-device-and-bus'David S. Miller
Jakub Kicinski says: ==================== netdevsim: improve separation between device and bus VF config falls strangely in between device and bus responsibilities today. Because of this bus.c sticks fingers directly into struct nsim_dev and we look at nsim_bus_dev in many more places than necessary. Make bus.c contain pure interface code, and move the particulars of the logic (which touch on eswitch, devlink reloads etc) to dev.c. Rename the functions at the boundary of the interface to make the separation clearer. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01netdevsim: rename 'driver' entry pointsJakub Kicinski
Rename functions serving as driver entry points from nsim_dev_... to nsim_drv_... this makes the API boundary between bus and dev clearer. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01netdevsim: move max vf config to devJakub Kicinski
max_vfs is a strange little beast because the file hangs off of nsim's debugfs, but it configures a field in the bus device. Move it to dev.c, let's look at it as if the device driver was imposing VF limit based on FW info (like pci_sriov_set_totalvfs()). Again, when moving refactor the function not to hold the vfs lock pointlessly while parsing the input. Wrap the access from the read side in READ_ONCE() to appease concurrency checkers. Do not check if return value from snprintf() is negative... Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01netdevsim: move details of vf config to devJakub Kicinski
Since "eswitch" configuration was added bus.c contains a lot of device details which really belong to dev.c. Restructure the code while moving it. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01netdevsim: move vfconfig to nsim_devJakub Kicinski
When netdevsim got split into the faux bus vfconfig ended up in the bus device (think pci_dev) which is strange because it contains very networky not to say netdevy information. Move it to nsim_dev, which is the driver "priv" structure for the device. To make sure we don't race with probe/remove take the device lock (much like PCI). While at it remove the NULL-checking of vfconfigs. It appears to be pointless. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01netdevsim: take rtnl_lock when assigning num_vfsJakub Kicinski
Legacy VF NDOs look at num_vfs and then based on that index into vfconfig. If we don't rtnl_lock() num_vfs may get set to 0 and vfconfig freed/replaced while the NDO is running. We don't need to protect replacing vfconfig since it's only done when num_vfs is 0. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01Merge branch 'devlink-locking'David S. Miller
Jakub Kicinski says: ==================== improve ethtool/rtnl vs devlink locking During ethtool netlink development we decided to move some of the commmands to devlink. Since we don't want drivers to implement both devlink and ethtool version of the commands ethtool ioctl falls back to calling devlink. Unfortunately devlink locks must be taken before rtnl_lock. This results in a questionable dev_hold() / rtnl_unlock() / devlink / rtnl_lock() / dev_put() pattern. This method "works" but it working depends on drivers in question not doing much in ethtool_ops->begin / complete, and on the netdev not having needs_free_netdev set. Since commit 437ebfd90a25 ("devlink: Count struct devlink consumers") we can hold a reference on a devlink instance and prevent it from going away (sort of like netdev with dev_hold()). We can use this to create a more natural reference nesting where we get a ref on the devlink instance and make the devlink call entirely outside of the rtnl_lock section. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01ethtool: don't drop the rtnl_lock half way thru the ioctlJakub Kicinski
devlink compat code needs to drop rtnl_lock to take devlink->lock to ensure correct lock ordering. This is problematic because we're not strictly guaranteed that the netdev will not disappear after we re-lock. It may open a possibility of nested ->begin / ->complete calls. Instead of calling into devlink under rtnl_lock take a ref on the devlink instance and make the call after we've dropped rtnl_lock. We (continue to) assume that netdevs have an implicit reference on the devlink returned from ndo_get_devlink_port Note that ndo_get_devlink_port will now get called under rtnl_lock. That should be fine since none of the drivers seem to be taking serious locks inside ndo_get_devlink_port. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01devlink: expose get/put functionsJakub Kicinski
Allow those who hold implicit reference on a devlink instance to try to take a full ref on it. This will be used from netdev code which has an implicit ref because of driver call ordering. Note that after recent changes devlink_unregister() may happen before netdev unregister, but devlink_free() should still happen after, so we are safe to try, but we can't just refcount_inc() and assume it's not zero. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01ethtool: handle info/flash data copying outside rtnl_lockJakub Kicinski
We need to increase the lifetime of the data for .get_info and .flash_update beyond their handlers inside rtnl_lock. Allocate a union on the heap and use it instead. Note that we now copy the ethcmd before we lookup dev, hopefully there is no crazy user space depending on error codes. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01ethtool: push the rtnl_lock into dev_ethtool()Jakub Kicinski
Don't take the lock in net/core/dev_ioctl.c, we'll have things to do outside rtnl_lock soon. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01Merge branch 'mana-misc'David S. Miller
Dexuan Cui says: ==================== net: mana: some misc patches Patch 1 is a small fix. Patch 2 reports OS info to the PF driver. Before the patch, the req fields were all zeros. Patch 3 fixes and cleans up the error handling of HWC creation failure. Patch 4 adds the callbacks for hibernation/kexec. It's based on patch 3. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01net: mana: Support hibernation and kexecDexuan Cui
Implement the suspend/resume/shutdown callbacks for hibernation/kexec. Add mana_gd_setup() and mana_gd_cleanup() for some common code, and use them in the mand_gd_* callbacks. Reuse mana_probe/remove() for the hibernation path. Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01net: mana: Improve the HWC error handlingDexuan Cui
Currently when the HWC creation fails, the error handling is flawed, e.g. if mana_hwc_create_channel() -> mana_hwc_establish_channel() fails, the resources acquired in mana_hwc_init_queues() is not released. Enhance mana_hwc_destroy_channel() to do the proper cleanup work and call it accordingly. Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01net: mana: Report OS info to the PF driverDexuan Cui
The PF driver might use the OS info for statistical purposes. Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01net: mana: Fix the netdev_err()'s vPort argument in mana_init_port()Dexuan Cui
Use the correct port index rather than 0. Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01Merge branch 'mptcp-selftests'David S. Miller
Mat Martineau says: ==================== mptcp: Some selftest improvements Here are a couple of selftest changes for MPTCP. Patch 1 fixes a mistake where the wrong protocol (TCP vs MPTCP) could be requested on the listening socket in some link failure tests. Patch 2 refactors the simulataneous flow tests to improve timing accuracy and give more consistent results. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01selftests: mptcp: more stable simult_flows testsPaolo Abeni
Currently the simult_flows.sh self-tests are not very stable, especially when running on slow VMs. The tests measure runtime for transfers on multiple subflows and check that the time is near the theoretical maximum. The current test infra introduces a bit of jitter in test runtime, due to multiple explicit delays. Additionally the runtime is measured by the shell script wrapper. On a slow VM, the script overhead is measurable and subject to relevant jitter. One solution to make the test more stable would be adding more slack to the expected time; that could possibly hide real regressions. Instead move the measurement inside the command doing the transfer, and drop most unneeded sleeps. Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01selftests: mptcp: fix proto type in link_failure testsGeliang Tang
In listener_ns, we should pass srv_proto argument to mptcp_connect command, not cl_proto. Fixes: 7d1e6f1639044 ("selftests: mptcp: add testcase for active-back") Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01ibmvnic: delay complete()Sukadev Bhattiprolu
If we get CRQ_INIT, we set errno to -EIO and first call complete() to notify the waiter. Then we try to schedule a FAILOVER reset. If this occurs while adapter is in PROBING state, ibmvnic_reset() changes the error code to EAGAIN and returns without scheduling the FAILOVER. The purpose of setting error code to EAGAIN is to ask the waiter to retry. But due to the earlier complete() call, the waiter may already have seen the -EIO response and decided not to retry. This can cause intermittent failures when bringing up ibmvnic adapters during boot, specially in in kexec/kdump kernels. Defer the complete() call until after scheduling the reset. Also streamline the error code to EAGAIN. Don't see why we need EIO sometimes. All 3 callers of ibmvnic_reset_init() can handle EAGAIN. Fixes: 17c8705838a5 ("ibmvnic: Return error code if init interrupted by transport event") Reported-by: Vaishnavi Bhat <vaish123@in.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Reviewed-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01ibmvnic: Process crqs after enabling interruptsSukadev Bhattiprolu
Soon after registering a CRQ it is possible that we get a fail over or maybe a CRQ_INIT from the VIOS while interrupts were disabled. Look for any such CRQs after enabling interrupts. Otherwise we can intermittently fail to bring up ibmvnic adapters during boot, specially in kexec/kdump kernels. Fixes: 032c5e82847a ("Driver for IBM System i/p VNIC protocol") Reported-by: Vaishnavi Bhat <vaish123@in.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Reviewed-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01ibmvnic: don't stop queue in xmitSukadev Bhattiprolu
If adapter's resetting bit is on, discard the packet but don't stop the transmit queue - instead leave that to the reset code. With this change, it is possible that we may get several calls to ibmvnic_xmit() that simply discard packets and return. But if we stop the queue here, we might end up doing so just after __ibmvnic_open() started the queues (during a hard/soft reset) and before the ->resetting bit was cleared. If that happens, there will be no one to restart queue and transmissions will be blocked indefinitely. This can cause a TIMEOUT reset and with auto priority failover enabled, an unnecessary FAILOVER reset to less favored backing device and then a FAILOVER back to the most favored backing device. If we hit the window repeatedly, we can get stuck in a loop of TIMEOUT, FAILOVER, FAILOVER resets leaving the adapter unusable for extended periods of time. Fixes: 7f5b030830fe ("ibmvnic: Free skb's in cases of failure in transmit") Reported-by: Abdul Haleem <abdhalee@in.ibm.com> Reported-by: Vaishnavi Bhat <vaish123@in.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Reviewed-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01Merge branch 'SO_MARK-routing'David S. Miller
Jakub Kicinski says: ==================== udp6: allow SO_MARK ctrl msg to affect routing Looks like SO_MARK from cmsg does not affect routing policy. This seems accidental. I opted for net because of the discrepancy between IPv4 and IPv6, but it never worked and doesn't cause crashes.. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01selftests: udp: test for passing SO_MARK as cmsgJakub Kicinski
Before fix: | Case IPv6 rejection returned 0, expected 1 |FAIL - 1/4 cases failed With the fix: | OK Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01udp6: allow SO_MARK ctrl msg to affect routingJakub Kicinski
Commit c6af0c227a22 ("ip: support SO_MARK cmsg") added propagation of SO_MARK from cmsg to skb->mark. For IPv4 and raw sockets the mark also affects route lookup, but in case of IPv6 the flow info is initialized before cmsg is parsed. Fixes: c6af0c227a22 ("ip: support SO_MARK cmsg") Reported-and-tested-by: Xintong Hu <huxintong@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01nfp: flower: Allow ipv6gretap interface for offloadingYu Xiao
The tunnel_type check only allows for "netif_is_gretap", but for OVS the port is actually "netif_is_ip6gretap" when setting up GRE for ipv6, which means offloading request was rejected before. Therefore, adding "netif_is_ip6gretap" allow ipv6gretap interface for offloading. Signed-off-by: Yu Xiao <yu.xiao@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01net: dsa: populate supported_interfaces memberMarek Behún
Add a new DSA switch operation, phylink_get_interfaces, which should fill in which PHY_INTERFACE_MODE_* are supported by given port. Use this before phylink_create() to fill phylinks supported_interfaces member, allowing phylink to determine which PHY_INTERFACE_MODEs are supported. Signed-off-by: Marek Behún <kabel@kernel.org> [tweaked patch and description to add more complete support -- rmk] Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01Merge branch '100GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2021-10-29 This series contains updates to ice and iavf drivers and virtchnl header file. Brett removes vlan_promisc argument from a function call for ice driver. In the virtchnl header file he removes an unused, reserved define and converts raw value defines to instead use the BIT macro. Marcin adds syncing of MAC addresses when creating switchdev VFs to remove error messages on link up and stops showing buffer information for port representors to remove duplicated entries being displayed for ice driver. Karen introduces a helper to go from pci_dev to iavf_adapter in the iavf driver. Przemyslaw fixes an issue where iavf was attempting to free IRQs before calling disable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2021-10-30 Just two minor changes this time: 1) Remove some superfluous header files from xfrm4_tunnel.c From Mianhan Liu. 2) Simplify some error checks in xfrm_input(). From luo penghao. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Use array_size() in ebtables, from Gustavo A. R. Silva. 2) Attach IPS_ASSURED to internal UDP stream state, reported by Maciej Zenczykowski. 3) Add NFT_META_IFTYPE to match on the interface type either from ingress or egress. 4) Generalize pktinfo->tprot_set to flags field. 5) Allow to match on inner headers / payload data. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01Merge tag 'mlx5-updates-2021-10-29' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2021-10-29 1) Minor trivial refactoring and improvements 2) Check for unsupported parameters fields in SW steering 3) Support TC offload for OVS internal port, from Ariel, see below. Ariel Levkovich says: ===================== Support HW offload of TC rules involving OVS internal port device type as the filter device or the destination device. The support is for flows which explicitly use the internal port as source or destination device as well as indirect offload for flows performing tunnel set or unset via a tunnel device and the internal port is the tunnel overlay device. Since flows with internal port as source port are added as egress rules while redirecting to internal port is done as an ingress redirect, the series introduces the necessary changes in mlx5_core driver to support the new types of flows and actions. ===================== ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01parisc: Fix set_fixmap() on PA1.x CPUsHelge Deller
Fix a kernel crash which happens on PA1.x CPUs while initializing the FTRACE/KPROBE breakpoints. The PTE table entries for the fixmap area were not created correctly. Signed-off-by: Helge Deller <deller@gmx.de> Fixes: ccfbc68d41c2 ("parisc: add set_fixmap()/clear_fixmap()") Cc: stable@vger.kernel.org # v5.2+
2021-11-01parisc: Use swap() to swap values in setup_bootmem()Yihao Han
Signed-off-by: Yihao Han <hanyihao@vivo.com> Signed-off-by: Helge Deller <deller@gmx.de>
2021-11-01netfilter: nft_payload: support for inner header matching / manglingPablo Neira Ayuso
Allow to match and mangle on inner headers / payload data after the transport header. There is a new field in the pktinfo structure that stores the inner header offset which is calculated only when requested. Only TCP and UDP supported at this stage. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>