summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-05-04net/mlx5e: Fix the calling of update_buffer_lossy() APIMark Zhang
The arguments of update_buffer_lossy() is in a wrong order. Fix it. Fixes: 88b3d5c90e96 ("net/mlx5e: Fix port buffers cell size value") Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-04net/mlx5e: Don't match double-vlan packets if cvlan is not setVlad Buslov
Currently, match VLAN rule also matches packets that have multiple VLAN headers. This behavior is similar to buggy flower classifier behavior that has recently been fixed. Fix the issue by matching on outer_second_cvlan_tag with value 0 which will cause the HW to verify the packet doesn't contain second vlan header. Fixes: 699e96ddf47f ("net/mlx5e: Support offloading tc double vlan headers match") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-04net/mlx5: Fix slab-out-of-bounds while reading resource dump menuAya Levin
Resource dump menu may span over more than a single page, support it. Otherwise, menu read may result in a memory access violation: reading outside of the allocated page. Note that page format of the first menu page contains menu headers while the proceeding menu pages contain only records. The KASAN logs are as follows: BUG: KASAN: slab-out-of-bounds in strcmp+0x9b/0xb0 Read of size 1 at addr ffff88812b2e1fd0 by task systemd-udevd/496 CPU: 5 PID: 496 Comm: systemd-udevd Tainted: G B 5.16.0_for_upstream_debug_2022_01_10_23_12 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x57/0x7d print_address_description.constprop.0+0x1f/0x140 ? strcmp+0x9b/0xb0 ? strcmp+0x9b/0xb0 kasan_report.cold+0x83/0xdf ? strcmp+0x9b/0xb0 strcmp+0x9b/0xb0 mlx5_rsc_dump_init+0x4ab/0x780 [mlx5_core] ? mlx5_rsc_dump_destroy+0x80/0x80 [mlx5_core] ? lockdep_hardirqs_on_prepare+0x286/0x400 ? raw_spin_unlock_irqrestore+0x47/0x50 ? aomic_notifier_chain_register+0x32/0x40 mlx5_load+0x104/0x2e0 [mlx5_core] mlx5_init_one+0x41b/0x610 [mlx5_core] .... The buggy address belongs to the object at ffff88812b2e0000 which belongs to the cache kmalloc-4k of size 4096 The buggy address is located 4048 bytes to the right of 4096-byte region [ffff88812b2e0000, ffff88812b2e1000) The buggy address belongs to the page: page:000000009d69807a refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88812b2e6000 pfn:0x12b2e0 head:000000009d69807a order:3 compound_mapcount:0 compound_pincount:0 flags: 0x8000000000010200(slab|head|zone=2) raw: 8000000000010200 0000000000000000 dead000000000001 ffff888100043040 raw: ffff88812b2e6000 0000000080040000 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88812b2e1e80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88812b2e1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88812b2e1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff88812b2e2000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88812b2e2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Fixes: 12206b17235a ("net/mlx5: Add support for resource dump") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-04net/mlx5e: Fix wrong source vport matching on tunnel ruleAriel Levkovich
When OVS internal port is the vtep device, the first decap rule is matching on the internal port's vport metadata value and then changes the metadata to be the uplink's value. Therefore, following rules on the tunnel, in chain > 0, should avoid matching on internal port metadata and use the uplink vport metadata instead. Select the uplink's metadata value for the source vport match in case the rule is in chain greater than zero, even if the tunnel route device is internal port. Fixes: 166f431ec6be ("net/mlx5e: Add indirect tc offload of ovs internal port") Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03net/mlx5: Allow future addition of IPsec object modifiersLeon Romanovsky
Currently, all released FW versions support only two IPsec object modifiers, and modify_field_select get and set same value with proper bits. However, it is not future compatible, as new FW can have more modifiers and "default" will cause to overwrite not-changed fields. Fix it by setting explicitly fields that need to be overwritten. Fixes: 7ed92f97a1ad ("net/mlx5e: IPsec: Add Connect-X IPsec ESN update offload support") Signed-off-by: Huy Nguyen <huyn@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03net/mlx5: Don't perform lookup after already known sec_pathLeon Romanovsky
There is no need to perform extra lookup in order to get already known sec_path that was set a couple of lines above. Simply reuse it. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03net/mlx5: Cleanup XFRM attributes structLeon Romanovsky
Remove everything that is not used or from mlx5_accel_esp_xfrm_attrs, together with change type of spi to store proper type from the beginning. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03net/mlx5: Remove not-supported ICV lengthLeon Romanovsky
mlx5 doesn't allow to configure any AEAD ICV length other than 128, so remove the logic that configures other unsupported values. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03net/mlx5: Simplify IPsec capabilities logicLeon Romanovsky
Reduce number of hard-coded IPsec capabilities by making sure that mlx5_ipsec_device_caps() sets only supported bits. As part of this change, remove _ACCEL_ notations from the capabilities names as they represent IPsec-capable device, so it is aligned with MLX5_CAP_IPSEC() macro. And prepare the code to IPsec full offload mode. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03net/mlx5: Don't advertise IPsec netdev support for non-IPsec deviceLeon Romanovsky
Device that lacks proper IPsec capabilities won't pass mlx5e_ipsec_init() later, so no need to advertise HW netdev offload support for something that isn't going to work anyway. Fixes: 8ad893e516a7 ("net/mlx5e: Remove dependency in IPsec initialization flows") Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03net/mlx5: Make sure that no dangling IPsec FS pointers existLeon Romanovsky
The IPsec FS code was implemented with anti-pattern there failures in create functions left the system with dangling pointers that were cleaned in global routines. The less error prone approach is to make sure that failed function cleans everything internally. As part of this change, we remove the batch of one liners and rewrite get/put functions to remove ambiguity. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03net/mlx5: Clean IPsec FS add/delete rulesLeon Romanovsky
Reuse existing struct to pass parameters instead of open code them. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03net/mlx5: Simplify HW context interfaces by using SA entryLeon Romanovsky
SA context logic used multiple structures to store same data over and over. By simplifying the SA context interfaces, we can remove extra structs. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03net/mlx5: Remove indirections from esp functionsLeon Romanovsky
This change cleanups the mlx5 esp interface. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03net/mlx5: Merge various control path IPsec headers into one fileLeon Romanovsky
The mlx5 IPsec code has logical separation between code that operates with XFRM objects (ipsec.c), HW objects (ipsec_offload.c), flow steering logic (ipsec_fs.c) and data path (ipsec_rxtx.c). Such separation makes sense for C-files, but isn't needed at all for H-files as they are included in batch anyway. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03net/mlx5: Remove useless validity checkLeon Romanovsky
All callers build xfrm attributes with help of mlx5e_ipsec_build_accel_xfrm_attrs() function that ensure validity of attributes. There is no need to recheck them again. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03net/mlx5: Store IPsec ESN update work in XFRM stateLeon Romanovsky
mlx5 IPsec code updated ESN through workqueue with allocation calls in the data path, which can be saved easily if the work is created during XFRM state initialization routine. The locking used later in the work didn't protect from anything because change of HW context is possible during XFRM state add or delete only, which can cancel work and make sure that it is not running. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03net/mlx5: Reduce useless indirection in IPsec FS add/delete flowsLeon Romanovsky
There is no need in one-liners wrappers to call internal functions. Let's remove them. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03net/mlx5: Don't hide fallback to software IPsec in FS codeLeon Romanovsky
The XFRM code performs fallback to software IPsec if .xdo_dev_state_add() returns -EOPNOTSUPP. This is what mlx5 did very deep in its stack trace, despite have all the knowledge that IPsec is not going to work in very early stage. This is achieved by making sure that priv->ipsec pointer is valid for fully working and supported hardware crypto IPsec engine. In case, the hardware IPsec is not supported, the XFRM code will set NULL to xso->dev and it will prevent from calls to various .xdo_dev_state_*() callbacks. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03net/mlx5: Check IPsec TX flow steering namespace in advanceLeon Romanovsky
Ensure that flow steering is usable as early as possible, to understand if crypto IPsec is supported or not. Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03net/mlx5: Simplify IPsec flow steering init/cleanup functionsLeon Romanovsky
Remove multiple function wrappers to make sure that IPsec FS initialization and cleanup functions present in one place to help with code readability. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03Merge branch 'bnxt_en-bug-fixes'Jakub Kicinski
Michael Chan says: ==================== bnxt_en: Bug fixes This patch series includes 3 fixes: - Fix an occasional VF open failure. - Fix a PTP spinlock usage before initialization - Fix unnecesary RX packet drops under high TX traffic load. ==================== Link: https://lore.kernel.org/r/1651540392-2260-1-git-send-email-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-03bnxt_en: Fix unnecessary dropping of RX packetsMichael Chan
In bnxt_poll_p5(), we first check cpr->has_more_work. If it is true, we are in NAPI polling mode and we will call __bnxt_poll_cqs() to continue polling. It is possible to exhanust the budget again when __bnxt_poll_cqs() returns. We then enter the main while loop to check for new entries in the NQ. If we had previously exhausted the NAPI budget, we may call __bnxt_poll_work() to process an RX entry with zero budget. This will cause packets to be dropped unnecessarily, thinking that we are in the netpoll path. Fix it by breaking out of the while loop if we need to process an RX NQ entry with no budget left. We will then exit NAPI and stay in polling mode. Fixes: 389a877a3b20 ("bnxt_en: Process the NQ under NAPI continuous polling.") Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-03bnxt_en: Initiallize bp->ptp_lock first before using itMichael Chan
bnxt_ptp_init() calls bnxt_ptp_init_rtc() which will acquire the ptp_lock spinlock. The spinlock is not initialized until later. Move the bnxt_ptp_init_rtc() call after the spinlock is initialized. Fixes: 24ac1ecd5240 ("bnxt_en: Add driver support to use Real Time Counter for PTP") Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-03bnxt_en: Fix possible bnxt_open() failure caused by wrong RFS flagSomnath Kotur
bnxt_open() can fail in this code path, especially on a VF when it fails to reserve default rings: bnxt_open() __bnxt_open_nic() bnxt_clear_int_mode() bnxt_init_dflt_ring_mode() RX rings would be set to 0 when we hit this error path. It is possible for a subsequent bnxt_open() call to potentially succeed with a code path like this: bnxt_open() bnxt_hwrm_if_change() bnxt_fw_init_one() bnxt_fw_init_one_p3() bnxt_set_dflt_rfs() bnxt_rfs_capable() bnxt_hwrm_reserve_rings() On older chips, RFS is capable if we can reserve the number of vnics that is equal to RX rings + 1. But since RX rings is still set to 0 in this code path, we may mistakenly think that RFS is supported for 0 RX rings. Later, when the default RX rings are reserved and we try to enable RFS, it would fail and cause bnxt_open() to fail unnecessarily. We fix this in 2 places. bnxt_rfs_capable() will always return false if RX rings is not yet set. bnxt_init_dflt_ring_mode() will call bnxt_set_dflt_rfs() which will always clear the RFS flags if RFS is not supported. Fixes: 20d7d1c5c9b1 ("bnxt_en: reliably allocate IRQ table on reset to avoid crash") Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-03Merge tag 'wireless-next-2022-05-03' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v5.19 First set of patches for v5.19 and this is a big one. We have two new drivers, a change in mac80211 STA API affecting most drivers and ath11k getting support for WCN6750. And as usual lots of fixes and cleanups all over. Major changes: new drivers - wfx: silicon labs devices - plfxlc: pureLiFi X, XL, XC devices mac80211 - host based BSS color collision detection - prepare sta handling for IEEE 802.11be Multi-Link Operation (MLO) support rtw88 - support TP-Link T2E devices rtw89 - support firmware crash simulation - preparation for 8852ce hardware support ath11k - Wake-on-WLAN support for QCA6390 and WCN6855 - device recovery (firmware restart) support for QCA6390 and WCN6855 - support setting Specific Absorption Rate (SAR) for WCN6855 - read country code from SMBIOS for WCN6855/QCA6390 - support for WCN6750 wcn36xx - support for transmit rate reporting to user space * tag 'wireless-next-2022-05-03' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (228 commits) rtw89: 8852c: rfk: add DPK rtw89: 8852c: rfk: add IQK rtw89: 8852c: rfk: add RX DCK rtw89: 8852c: rfk: add RCK rtw89: 8852c: rfk: add TSSI rtw89: 8852c: rfk: add LCK rtw89: 8852c: rfk: add DACK rtw89: 8852c: rfk: add RFK tables plfxlc: fix le16_to_cpu warning for beacon_interval rtw88: remove a copy of the NAPI_POLL_WEIGHT define carl9170: tx: fix an incorrect use of list iterator wil6210: use NAPI_POLL_WEIGHT for napi budget ath10k: remove a copy of the NAPI_POLL_WEIGHT define ath11k: Add support for WCN6750 device ath11k: Datapath changes to support WCN6750 ath11k: HAL changes to support WCN6750 ath11k: Add QMI changes for WCN6750 ath11k: Fetch device information via QMI for WCN6750 ath11k: Add register access logic for WCN6750 ath11k: Add HW params for WCN6750 ... ==================== Link: https://lore.kernel.org/r/20220503153622.C1671C385A4@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-03netdev: reshuffle netif_napi_add() APIs to allow dropping weightJakub Kicinski
Most drivers should not have to worry about selecting the right weight for their NAPI instances and pass NAPI_POLL_WEIGHT. It'd be best if we didn't require the argument at all and selected the default internally. This change prepares the ground for such reshuffling, allowing for a smooth transition. The following API should remain after the next release cycle: netif_napi_add() netif_napi_add_weight() netif_napi_add_tx() netif_napi_add_tx_weight() Where the _weight() variants take an explicit weight argument. I opted for a _weight() suffix rather than a __ prefix, because we use __ in places to mean that caller needs to also issue a synchronize_net() call. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20220502232703.396351-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-03selftests: forwarding: add basic QoS classification test for Ocelot switchesVladimir Oltean
Test basic (port-default, VLAN PCP and IP DSCP) QoS classification for Ocelot switches. Advanced QoS classification using tc filters is covered by tc_flower_chains.sh in the same directory. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220502155424.4098917-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-03smsc911x: allow using IRQ0Sergey Shtylyov
The AlphaProject AP-SH4A-3A/AP-SH4AD-0A SH boards use IRQ0 for their SMSC LAN911x Ethernet chip, so the networking on them must have been broken by commit 965b2aa78fbc ("net/smsc911x: fix irq resource allocation failure") which filtered out 0 as well as the negative error codes -- it was kinda correct at the time, as platform_get_irq() could return 0 on of_irq_get() failure and on the actual 0 in an IRQ resource. This issue was fixed by me (back in 2016!), so we should be able to fix this driver to allow IRQ0 usage again... When merging this to the stable kernels, make sure you also merge commit e330b9a6bb35 ("platform: don't return 0 from platform_get_irq[_byname]() on error") -- that's my fix to platform_get_irq() for the DT platforms... Fixes: 965b2aa78fbc ("net/smsc911x: fix irq resource allocation failure") Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Link: https://lore.kernel.org/r/656036e4-6387-38df-b8a7-6ba683b16e63@omp.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-03Merge branch 'mptcp-userspace-path-manager-prerequisites'Jakub Kicinski
Mat Martineau says: ==================== mptcp: Userspace path manager prerequisites This series builds upon the path manager mode selection changes merged in 4994d4fa99ba ("Merge branch 'mptcp-path-manager-mode-selection'") to further modify the path manager code in preparation for adding the new netlink commands to announce/remove advertised addresses and create/destroy subflows of an MPTCP connection. The third and final patch series for the userspace path manager will implement those commands as discussed in https://lore.kernel.org/netdev/23ff3b49-2563-1874-fa35-3af55d3088e7@linux.intel.com/#r Patches 1, 5, and 7 remove some internal constraints on path managers (in general) without changing in-kernel PM behavior. Patch 2 adds a self test to validate MPTCP address advertisement ack behavior. Patches 3, 4, and 6 add new attributes to existing MPTCP netlink events and track internal state for populating those attributes. ==================== Link: https://lore.kernel.org/r/20220502205237.129297-1-mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-03mptcp: allow ADD_ADDR reissuance by userspace PMsKishen Maloor
This change allows userspace PM implementations to reissue ADD_ADDR announcements (if necessary) based on their chosen policy. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Kishen Maloor <kishen.maloor@intel.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-03mptcp: expose server_side attribute in MPTCP netlink eventsKishen Maloor
This change records the 'server_side' attribute of MPTCP_EVENT_CREATED and MPTCP_EVENT_ESTABLISHED events to inform their recipient about the Client/Server role of the running MPTCP application. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/246 Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Kishen Maloor <kishen.maloor@intel.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-03mptcp: establish subflows from either end of connectionKishen Maloor
This change updates internal logic to permit subflows to be established from either the client or server ends of MPTCP connections. This symmetry and added flexibility may be harnessed by PM implementations running on either end in creating new subflows. The essence of this change lies in not relying on the "server_side" flag (which continues to be available if needed). Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Kishen Maloor <kishen.maloor@intel.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-03mptcp: reflect remote port (not 0) in ANNOUNCED eventsKishen Maloor
Per RFC 8684, if no port is specified in an ADD_ADDR message, MPTCP SHOULD attempt to connect to the specified address on the same port as the port that is already in use by the subflow on which the ADD_ADDR signal was sent. To facilitate that, this change reflects the specific remote port in use by that subflow in MPTCP_EVENT_ANNOUNCED events. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Kishen Maloor <kishen.maloor@intel.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-03mptcp: store remote id from MP_JOIN SYN/ACK in local ctxKishen Maloor
This change reads the addr id assigned to the remote endpoint of a subflow from the MP_JOIN SYN/ACK message and stores it in the related subflow context. The remote id was not being captured prior to this change, and will now provide a consistent view of remote endpoints and their ids as seen through netlink events. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Kishen Maloor <kishen.maloor@intel.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-03selftests: mptcp: ADD_ADDR echo test with missing userspace daemonMat Martineau
Check userspace PM behavior to ensure ADD_ADDR echoes are only sent when there is an active userspace daemon. If the daemon is restarting or hasn't loaded yet, the missing echo will cause the peer to retransmit the ADD_ADDR - and hopefully the daemon will be ready to receive it at that later time. Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-03mptcp: bypass in-kernel PM restrictions for non-kernel PMsKishen Maloor
Current limits on the # of addresses/subflows must apply only to in-kernel PM managed sockets. Thus this change removes such restrictions on connections overseen by non-kernel (e.g. userspace) PMs. This change also ensures that the kernel does not record stats inside struct mptcp_pm_data updated along kernel code paths when exercised via non-kernel PMs. Additionally, address announcements are acknolwedged and subflow requests are honored only when it's deemed that a userspace path manager is active at the time. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Kishen Maloor <kishen.maloor@intel.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-03net: sfp: Add tx-fault workaround for Huawei MA5671A SFP ONTMatthew Hagan
As noted elsewhere, various GPON SFP modules exhibit non-standard TX-fault behaviour. In the tested case, the Huawei MA5671A, when used in combination with a Marvell mv88e6085 switch, was found to persistently assert TX-fault, resulting in the module being disabled. This patch adds a quirk to ignore the SFP_F_TX_FAULT state, allowing the module to function. Change from v1: removal of erroneous return statment (Andrew Lunn) Signed-off-by: Matthew Hagan <mnhagan88@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220502223315.1973376-1-mnhagan88@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-03Merge tag 'seccomp-v5.18-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp selftest fix from Kees Cook: - Avoid using stdin for read syscall testing (Jann Horn) * tag 'seccomp-v5.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: selftests/seccomp: Don't call read() on TTY from background pgrp
2022-05-03Merge tag 'hwmon-for-v5.18-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: - Work around a hardware problem in the delta-ahe50dc-fan driver - Explicitly disable PEC in PMBus core if not enabled - Fix negative temperature values in f71882fg driver - Fix warning on removal of adt7470 driver - Fix CROSSHAIR VI HERO name in asus_wmi_sensors driver - Fix build warning seen in xdpe12284 driver if CONFIG_SENSORS_XDPE122_REGULATOR is disabled - Fix type of 'ti,n-factor' in ti,tmp421 driver bindings * tag 'hwmon-for-v5.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (pmbus) delta-ahe50dc-fan: work around hardware quirk hwmon: (pmbus) disable PEC if not enabled hwmon: (f71882fg) Fix negative temperature dt-bindings: hwmon: ti,tmp421: Fix type for 'ti,n-factor' hwmon: (adt7470) Fix warning on module removal hwmon: (asus_wmi_sensors) Fix CROSSHAIR VI HERO name hwmon: (xdpe12284) Fix build warning seen if CONFIG_SENSORS_XDPE122_REGULATOR is disabled
2022-05-03net: rds: acquire refcount on TCP socketsTetsuo Handa
syzbot is reporting use-after-free read in tcp_retransmit_timer() [1], for TCP socket used by RDS is accessing sock_net() without acquiring a refcount on net namespace. Since TCP's retransmission can happen after a process which created net namespace terminated, we need to explicitly acquire a refcount. Link: https://syzkaller.appspot.com/bug?extid=694120e1002c117747ed [1] Reported-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com> Fixes: 26abe14379f8e2fa ("net: Modify sk_alloc to not reference count the netns of kernel sockets.") Fixes: 8a68173691f03661 ("net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com> Link: https://lore.kernel.org/r/a5fb1fc4-2284-3359-f6a0-e4e390239d7b@I-love.SAKURA.ne.jp Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-03selftests/net: so_txtime: usage(): fix documentation of default clockMarc Kleine-Budde
The program uses CLOCK_TAI as default clock since it was added to the Linux repo. In commit: | 040806343bb4 ("selftests/net: so_txtime multi-host support") a help text stating the wrong default clock was added. This patch fixes the help text. Fixes: 040806343bb4 ("selftests/net: so_txtime multi-host support") Cc: Carlos Llamas <cmllamas@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20220502094638.1921702-3-mkl@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-03selftests/net: so_txtime: fix parsing of start time stamp on 32 bit systemsMarc Kleine-Budde
This patch fixes the parsing of the cmd line supplied start time on 32 bit systems. A "long" on 32 bit systems is only 32 bit wide and cannot hold a timestamp in nano second resolution. Fixes: 040806343bb4 ("selftests/net: so_txtime multi-host support") Cc: Carlos Llamas <cmllamas@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20220502094638.1921702-2-mkl@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-03Merge tag 'mlx5-updates-2022-05-02' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2022-05-02 1) Trivial Misc updates to mlx5 driver 2) From Mark Bloch: Flow steering, general steering refactoring/cleaning An issue with flow steering deletion flow (when creating a rule without dests) turned out to be easy to fix but during the fix some issue with the flow steering creation/deletion flows have been found. The following patch series tries to fix long standing issues with flow steering code and hopefully preventing silly future bugs. A) Fix an issue where a proper dest type wasn't assigned. B) Refactor and fix dests enums values, refactor deletion function and do proper bookkeeping of dests. C) Change mlx5_del_flow_rules() to delete rules when there are no no more rules attached associated with an FTE. D) Don't call hard coded deletion function but use the node's defined one. E) Add a WARN_ON() to catch future bugs when an FTE with dests is deleted. * tag 'mlx5-updates-2022-05-02' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: fs, an FTE should have no dests when deleted net/mlx5: fs, call the deletion function of the node net/mlx5: fs, delete the FTE when there are no rules attached to it net/mlx5: fs, do proper bookkeeping for forward destinations net/mlx5: fs, add unused destination type net/mlx5: fs, jump to exit point and don't fall through net/mlx5: fs, refactor software deletion rule net/mlx5: fs, split software and IFC flow destination definitions net/mlx5e: TC, set proper dest type net/mlx5e: Remove unused mlx5e_dcbnl_build_rep_netdev function net/mlx5e: Drop error CQE handling from the XSK RX handler net/mlx5: Print initializing field in case of timeout net/mlx5: Delete redundant default assignment of runtime devlink params net/mlx5: Remove useless kfree net/mlx5: use kvfree() for kvzalloc() in mlx5_ct_fs_smfs_matcher_create ==================== Link: https://lore.kernel.org/r/ Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-03Merge branch 'mlxsw-remove-size-limitations-on-egress-descriptor-buffer'Paolo Abeni
Ido Schimmel says: ==================== mlxsw: Remove size limitations on egress descriptor buffer Petr says: Spectrum machines have two resources related to keeping packets in an internal buffer: bytes (allocated in cell-sized units) for packet payload, and descriptors, for keeping headers. Currently, mlxsw only configures the bytes part of the resource management. Spectrum switches permit a full parallel configuration for the descriptor resources, including port-pool and port-TC-pool quotas. By default, these are all configured to use pool 14, with an infinite quota. The ingress pool 14 is then infinite in size. However, egress pool 14 has finite size by default. The size is chip dependent, but always much lower than what the chip actually permits. As a result, we can easily construct workloads that exhaust the configured descriptor limit. Going forward, mlxsw will have to fix this issue properly by maintaining descriptor buffer sizes, TC bindings, and quotas that match the architecture recommendation. Short term, fix the issue by configuring the egress descriptor pool to be infinite in size as well. This will maintain the same configuration philosophy, but will unlock all chip resources to be usable. In this patchset, patch #1 first adds the "desc" field into the pool configuration register. Then in patch #2, the new field is used to configure both ingress and egress pool 14 as infinite. In patches #3 and #4, add a selftest that verifies that a large burst can be absorbed by the shared buffer. This test specifically exercises a scenario where descriptor buffer is the limiting factor and the test fails without the above patches. ==================== Link: https://lore.kernel.org/r/20220502084926.365268-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-03selftests: mlxsw: Add a test for soaking up a burst of trafficPetr Machata
Add a test that sends 1Gbps of traffic through the switch, into which it then injects a burst of traffic and tests that there are no drops. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-03selftests: forwarding: lib: Add start_traffic_pktsize() helpersPetr Machata
Add two helpers, start_traffic_pktsize() and start_tcp_traffic_pktsize(), that allow explicit overriding of packet size. Change start_traffic() and start_tcp_traffic() to dispatch through these helpers with the default packet size. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-03mlxsw: Configure descriptor buffersPetr Machata
Spectrum machines have two resources related to keeping packets in an internal buffer: bytes (allocated in cell-sized units) for packet payload, and descriptors, for keeping metadata. Currently, mlxsw only configures the bytes part of the resource management. Spectrum switches permit a full parallel configuration for the descriptor resources, including port-pool and port-TC-pool quotas. By default, these are all configured to use pool 14, with an infinite quota. The ingress pool 14 is then infinite in size. However, egress pool 14 has finite size by default. The size is chip dependent, but always much lower than what the chip actually permits. As a result, we can easily construct workloads that exhaust the configured descriptor limit. Fix the issue by configuring the egress descriptor pool to be infinite in size as well. This will maintain the configuration philosophy of the default configuration, but will unlock all chip resources to be usable. In the code, include both the configuration of ingress and ingress, mostly for clarity. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-03mlxsw: reg: Add "desc" field to SBPRPetr Machata
SBPR, or Shared Buffer Pools Register, configures and retrieves the shared buffer pools and configuration. The desc field determines whether the configuration relates to the byte pool or the descriptor pool. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-03selftests: mirror_gre_bridge_1q: Avoid changing PVID while interface is ↵Ido Schimmel
operational In emulated environments, the bridge ports enslaved to br1 get a carrier before changing br1's PVID. This means that by the time the PVID is changed, br1 is already operational and configured with an IPv6 link-local address. When the test is run with netdevs registered by mlxsw, changing the PVID is vetoed, as changing the VID associated with an existing L3 interface is forbidden. This restriction is similar to the 8021q driver's restriction of changing the VID of an existing interface. Fix this by taking br1 down and bringing it back up when it is fully configured. With this fix, the test reliably passes on top of both the SW and HW data paths (emulated or not). Fixes: 239e754af854 ("selftests: forwarding: Test mirror-to-gretap w/ UL 802.1q") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/20220502084507.364774-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>