summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-11-22net: sched: allow act_ct to be built without NF_NATXin Long
In commit f11fe1dae1c4 ("net/sched: Make NET_ACT_CT depends on NF_NAT"), it fixed the build failure when NF_NAT is m and NET_ACT_CT is y by adding depends on NF_NAT for NET_ACT_CT. However, it would also cause NET_ACT_CT cannot be built without NF_NAT, which is not expected. This patch fixes it by changing to use "(!NF_NAT || NF_NAT)" as the depend. Fixes: f11fe1dae1c4 ("net/sched: Make NET_ACT_CT depends on NF_NAT") Signed-off-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/b6386f28d1ba34721795fb776a91cbdabb203447.1668807183.git.lucien.xin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-22net: sparx5: fix error handling in sparx5_port_open()Liu Jian
If phylink_of_phy_connect() fails, the port should be disabled. If sparx5_serdes_set()/phy_power_on() fails, the port should be disabled and the phylink should be stopped and disconnected. Fixes: 946e7fd5053a ("net: sparx5: add port module support") Fixes: f3cad2611a77 ("net: sparx5: add hostmode with phylink support") Signed-off-by: Liu Jian <liujian56@huawei.com> Tested-by: Bjarni Jonasson <bjarni.jonasson@microchip.com> Reviewed-by: Steen Hegelund <steen.hegelund@microchip.com> Link: https://lore.kernel.org/r/20221117125918.203997-1-liujian56@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-22sfc: fix potential memleak in __ef100_hard_start_xmit()Zhang Changzhong
The __ef100_hard_start_xmit() returns NETDEV_TX_OK without freeing skb in error handling case, add dev_kfree_skb_any() to fix it. Fixes: 51b35a454efd ("sfc: skeleton EF100 PF driver") Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/1668671409-10909-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-22net: wwan: iosm: use ACPI_FREE() but not kfree() in ipc_pcie_read_bios_cfg()Wang ShaoBo
acpi_evaluate_dsm() should be coupled with ACPI_FREE() to free the ACPI memory, because we need to track the allocation of acpi_object when ACPI_DBG_TRACK_ALLOCATIONS enabled, so use ACPI_FREE() instead of kfree(). Fixes: d38a648d2d6c ("net: wwan: iosm: fix memory leak in ipc_pcie_read_bios_cfg") Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com> Link: https://lore.kernel.org/r/20221118062447.2324881-1-bobo.shaobowang@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-22xfrm: Fix ignored return value in xfrm6_init()Chen Zhongjin
When IPv6 module initializing in xfrm6_init(), register_pernet_subsys() is possible to fail but its return value is ignored. If IPv6 initialization fails later and xfrm6_fini() is called, removing uninitialized list in xfrm6_net_ops will cause null-ptr-deref: KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] CPU: 1 PID: 330 Comm: insmod RIP: 0010:unregister_pernet_operations+0xc9/0x450 Call Trace: <TASK> unregister_pernet_subsys+0x31/0x3e xfrm6_fini+0x16/0x30 [ipv6] ip6_route_init+0xcd/0x128 [ipv6] inet6_init+0x29c/0x602 [ipv6] ... Fix it by catching the error return value of register_pernet_subsys(). Fixes: 8d068875caca ("xfrm: make gc_thresh configurable in all namespaces") Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-11-22xfrm: Fix oops in __xfrm_state_delete()Thomas Jarosch
Kernel 5.14 added a new "byseq" index to speed up xfrm_state lookups by sequence number in commit fe9f1d8779cb ("xfrm: add state hashtable keyed by seq") While the patch was thorough, the function pfkey_send_new_mapping() in net/af_key.c also modifies x->km.seq and never added the current xfrm_state to the "byseq" index. This leads to the following kernel Ooops: BUG: kernel NULL pointer dereference, address: 0000000000000000 .. RIP: 0010:__xfrm_state_delete+0xc9/0x1c0 .. Call Trace: <TASK> xfrm_state_delete+0x1e/0x40 xfrm_del_sa+0xb0/0x110 [xfrm_user] xfrm_user_rcv_msg+0x12d/0x270 [xfrm_user] ? remove_entity_load_avg+0x8a/0xa0 ? copy_to_user_state_extra+0x580/0x580 [xfrm_user] netlink_rcv_skb+0x51/0x100 xfrm_netlink_rcv+0x30/0x50 [xfrm_user] netlink_unicast+0x1a6/0x270 netlink_sendmsg+0x22a/0x480 __sys_sendto+0x1a6/0x1c0 ? __audit_syscall_entry+0xd8/0x130 ? __audit_syscall_exit+0x249/0x2b0 __x64_sys_sendto+0x23/0x30 do_syscall_64+0x3a/0x90 entry_SYSCALL_64_after_hwframe+0x61/0xcb Exact location of the crash in __xfrm_state_delete(): if (x->km.seq) hlist_del_rcu(&x->byseq); The hlist_node "byseq" was never populated. The bug only triggers if a new NAT traversal mapping (changed IP or port) is detected in esp_input_done2() / esp6_input_done2(), which in turn indirectly calls pfkey_send_new_mapping() *if* the kernel is compiled with CONFIG_NET_KEY and "af_key" is active. The PF_KEYv2 message SADB_X_NAT_T_NEW_MAPPING is not part of RFC 2367. Various implementations have been examined how they handle the "sadb_msg_seq" header field: - racoon (Android): does not process SADB_X_NAT_T_NEW_MAPPING - strongswan: does not care about sadb_msg_seq - openswan: does not care about sadb_msg_seq There is no standard how PF_KEYv2 sadb_msg_seq should be populated for SADB_X_NAT_T_NEW_MAPPING and it's not used in popular implementations either. Herbert Xu suggested we should just use the current km.seq value as is. This fixes the root cause of the oops since we no longer modify km.seq itself. The update of "km.seq" looks like a copy'n'paste error from pfkey_send_acquire(). SADB_ACQUIRE must indeed assign a unique km.seq number according to RFC 2367. It has been verified that code paths involving pfkey_send_acquire() don't cause the same Oops. PF_KEYv2 SADB_X_NAT_T_NEW_MAPPING support was originally added here: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git commit cbc3488685b20e7b2a98ad387a1a816aada569d8 Author: Derek Atkins <derek@ihtfp.com> AuthorDate: Wed Apr 2 13:21:02 2003 -0800 [IPSEC]: Implement UDP Encapsulation framework. In particular, implement ESPinUDP encapsulation for IPsec Nat Traversal. A note on triggering the bug: I was not able to trigger it using VMs. There is one VPN using a high latency link on our production VPN server that triggered it like once a day though. Link: https://github.com/strongswan/strongswan/issues/992 Link: https://lore.kernel.org/netdev/00959f33ee52c4b3b0084d42c430418e502db554.1652340703.git.antony.antony@secunet.com/T/ Link: https://lore.kernel.org/netdev/20221027142455.3975224-1-chenzhihao@meizu.com/T/ Fixes: fe9f1d8779cb ("xfrm: add state hashtable keyed by seq") Reported-by: Roth Mark <rothm@mail.com> Reported-by: Zhihao Chen <chenzhihao@meizu.com> Tested-by: Roth Mark <rothm@mail.com> Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com> Acked-by: Antony Antony <antony.antony@secunet.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-11-21ice: fix handling of burst Tx timestampsJacob Keller
Commit 1229b33973c7 ("ice: Add low latency Tx timestamp read") refactored PTP timestamping logic to use a threaded IRQ instead of a separate kthread. This implementation introduced ice_misc_intr_thread_fn and redefined the ice_ptp_process_ts function interface to return a value of whether or not the timestamp processing was complete. ice_misc_intr_thread_fn would take the return value from ice_ptp_process_ts and convert it into either IRQ_HANDLED if there were no more timestamps to be processed, or IRQ_WAKE_THREAD if the thread should continue processing. This is not correct, as the kernel does not re-schedule threaded IRQ functions automatically. IRQ_WAKE_THREAD can only be used by the main IRQ function. This results in the ice_ptp_process_ts function (and in turn the ice_ptp_tx_tstamp function) from only being called exactly once per interrupt. If an application sends a burst of Tx timestamps without waiting for a response, the interrupt will trigger for the first timestamp. However, later timestamps may not have arrived yet. This can result in dropped or discarded timestamps. Worse, on E822 hardware this results in the interrupt logic getting stuck such that no future interrupts will be triggered. The result is complete loss of Tx timestamp functionality. Fix this by modifying the ice_misc_intr_thread_fn to perform its own polling of the ice_ptp_process_ts function. We sleep for a few microseconds between attempts to avoid wasting significant CPU time. The value was chosen to allow time for the Tx timestamps to complete without wasting so much time that we overrun application wait budgets in the worst case. The ice_ptp_process_ts function also currently returns false in the event that the Tx tracker is not initialized. This would result in the threaded IRQ handler never exiting if it gets started while the tracker is not initialized. Fix the function to appropriately return true when the tracker is not initialized. Note that this will not reproduce with default ptp4l behavior, as the program always synchronously waits for a timestamp response before sending another timestamp request. Reported-by: Siddaraju DH <siddaraju.dh@intel.com> Fixes: 1229b33973c7 ("ice: Add low latency Tx timestamp read") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20221118222729.1565317-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-21tipc: check skb_linearize() return value in tipc_disc_rcv()YueHaibing
If skb_linearize() fails in tipc_disc_rcv(), we need to free the skb instead of handle it. Fixes: 25b0b9c4e835 ("tipc: handle collisions of 32-bit node address hash values") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Link: https://lore.kernel.org/r/20221119072832.7896-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-21Merge branch '40GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-11-18 (iavf) Ivan Vecera resolves issues related to reset by adding back call to netif_tx_stop_all_queues() and adding calls to dev_close() to ensure device is properly closed during reset. Stefan Assmann removes waiting for setting of MAC address as this breaks ARP. Slawomir adds setting of __IAVF_IN_REMOVE_TASK bit to prevent deadlock between remove and shutdown. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: iavf: Fix race condition between iavf_shutdown and iavf_remove iavf: remove INITIAL_MAC_SET to allow gARP to work properly iavf: Do not restart Tx queues after reset task failure iavf: Fix a crash during reset task ==================== Link: https://lore.kernel.org/r/20221118222439.1565245-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-21Merge branch 'tipc-fix-two-race-issues-in-tipc_conn_alloc'Jakub Kicinski
Xin Long says: ==================== tipc: fix two race issues in tipc_conn_alloc The race exists beteen tipc_topsrv_accept() and tipc_conn_close(), one is allocating the con while the other is freeing it and there is no proper lock protecting it. Therefore, a null-pointer-defer and a use-after-free may be triggered, see details on each patch. ==================== Link: https://lore.kernel.org/r/cover.1668807842.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-21tipc: add an extra conn_get in tipc_conn_allocXin Long
One extra conn_get() is needed in tipc_conn_alloc(), as after tipc_conn_alloc() is called, tipc_conn_close() may free this con before deferencing it in tipc_topsrv_accept(): tipc_conn_alloc(); newsk = newsock->sk; <---- tipc_conn_close(); write_lock_bh(&sk->sk_callback_lock); newsk->sk_data_ready = tipc_conn_data_ready; Then an uaf issue can be triggered: BUG: KASAN: use-after-free in tipc_topsrv_accept+0x1e7/0x370 [tipc] Call Trace: <TASK> dump_stack_lvl+0x33/0x46 print_report+0x178/0x4b0 kasan_report+0x8c/0x100 kasan_check_range+0x179/0x1e0 tipc_topsrv_accept+0x1e7/0x370 [tipc] process_one_work+0x6a3/0x1030 worker_thread+0x8a/0xdf0 This patch fixes it by holding it in tipc_conn_alloc(), then after all accessing in tipc_topsrv_accept() releasing it. Note when does this in tipc_topsrv_kern_subscr(), as tipc_conn_rcv_sub() returns 0 or -1 only, we don't need to check for "> 0". Fixes: c5fa7b3cf3cb ("tipc: introduce new TIPC server infrastructure") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-21tipc: set con sock in tipc_conn_allocXin Long
A crash was reported by Wei Chen: BUG: kernel NULL pointer dereference, address: 0000000000000018 RIP: 0010:tipc_conn_close+0x12/0x100 Call Trace: tipc_topsrv_exit_net+0x139/0x320 ops_exit_list.isra.9+0x49/0x80 cleanup_net+0x31a/0x540 process_one_work+0x3fa/0x9f0 worker_thread+0x42/0x5c0 It was caused by !con->sock in tipc_conn_close(). In tipc_topsrv_accept(), con is allocated in conn_idr then its sock is set: con = tipc_conn_alloc(); ... <----[1] con->sock = newsock; If tipc_conn_close() is called in anytime of [1], the null-pointer-def is triggered by con->sock->sk due to con->sock is not yet set. This patch fixes it by moving the con->sock setting to tipc_conn_alloc() under s->idr_lock. So that con->sock can never be NULL when getting the con from s->conn_idr. It will be also safer to move con->server and flag CF_CONNECTED setting under s->idr_lock, as they should all be set before tipc_conn_alloc() is called. Fixes: c5fa7b3cf3cb ("tipc: introduce new TIPC server infrastructure") Reported-by: Wei Chen <harperchen1110@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-21net: phy: at803x: fix error return code in at803x_probe()Wei Yongjun
Fix to return a negative error code from the ccr read error handling case instead of 0, as done elsewhere in this function. Fixes: 3265f4218878 ("net: phy: at803x: add fiber support") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20221118103635.254256-1-weiyongjun@huaweicloud.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-21net/mlx5e: Fix possible race condition in macsec extended packet number ↵Emeel Hakim
update routine Currenty extended packet number (EPN) update routine is accessing macsec object without holding the general macsec lock hence facing a possible race condition when an EPN update occurs while updating or deleting the SA. Fix by holding the general macsec lock before accessing the object. Fixes: 4411a6c0abd3 ("net/mlx5e: Support MACsec offload extended packet number (EPN)") Signed-off-by: Emeel Hakim <ehakim@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-21net/mlx5e: Fix MACsec update SecYEmeel Hakim
Currently updating SecY destroys and re-creates RX SA objects, the re-created RX SA objects are not identical to the destroyed objects and it disagree on the encryption enabled property which holds the value false after recreation, this value is not supported with offload which leads to no traffic after an update. Fix by recreating an identical objects. Fixes: 5a39816a75e5 ("net/mlx5e: Add MACsec offload SecY support") Signed-off-by: Emeel Hakim <ehakim@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-21net/mlx5e: Fix MACsec SA initialization routineEmeel Hakim
Currently as part of MACsec SA initialization routine extended packet number (EPN) object attribute is always being set without checking if EPN is actually enabled, the above could lead to a NULL dereference. Fix by adding such a check. Fixes: 4411a6c0abd3 ("net/mlx5e: Support MACsec offload extended packet number (EPN)") Signed-off-by: Emeel Hakim <ehakim@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-21net/mlx5e: Remove leftovers from old XSK queues enumerationTariq Toukan
Before the cited commit, for N channels, a dedicated set of N queues was created to support XSK, in indices [N, 2N-1], doubling the number of queues. In addition, changing the number of channels was prohibited, as it would shift the indices. Remove these two leftovers, as we moved XSK to a new queueing scheme, starting from index 0. Fixes: 3db4c85cde7a ("net/mlx5e: xsk: Use queue indices starting from 0 for XSK queues") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-21net/mlx5e: Offload rule only when all encaps are validChris Mi
The cited commit adds a for loop to support multiple encapsulations. But it only checks if the last encap is valid. Fix it by setting slow path flag when one of the encap is invalid. Fixes: f493f15534ec ("net/mlx5e: Move flow attr reformat action bit to per dest flags") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-21net/mlx5e: Fix missing alignment in size of MTT/KLM entriesTariq Toukan
In the cited patch, an alignment required by the HW spec was mistakenly dropped. Bring it back to fix error completions like the below: mlx5_core 0000:00:08.0 eth2: Error cqe on cqn 0x40b, ci 0x0, qn 0x104f, opcode 0xd, syndrome 0x2, vendor syndrome 0x68 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000030: 00 00 00 00 86 00 68 02 25 00 10 4f 00 00 bb d2 WQE DUMP: WQ size 1024 WQ cur size 0, WQE index 0x0, len: 192 00000000: 00 00 00 25 00 10 4f 0c 00 00 00 00 00 18 2e 00 00000010: 90 00 00 00 00 02 00 00 00 00 00 00 20 00 00 00 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000080: 08 00 00 00 48 6a 00 02 08 00 00 00 0e 10 00 02 00000090: 08 00 00 00 0c db 00 02 08 00 00 00 0e 82 00 02 000000a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Fixes: 9f123f740428 ("net/mlx5e: Improve MTT/KSM alignment") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-21net/mlx5: Fix sync reset event handler error flowMoshe Shemesh
When sync reset now event handling fails on mlx5_pci_link_toggle() then no reset was done. However, since mlx5_cmd_fast_teardown_hca() was already done, the firmware function is closed and the driver is left without firmware functionality. Fix it by setting device error state and reopen the firmware resources. Reopening is done by the thread that was called for devlink reload fw_activate as it already holds the devlink lock. Fixes: 5ec697446f46 ("net/mlx5: Add support for devlink reload action fw activate") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-21net/mlx5: E-Switch, Set correctly vport destinationRoi Dayan
The cited commit moved from using reformat_id integer to packet_reformat pointer which introduced the possibility to null pointer dereference. When setting packet reformat flag and pkt_reformat pointer must exists so checking MLX5_ESW_DEST_ENCAP is not enough, we need to make sure the pkt_reformat is valid and check for MLX5_ESW_DEST_ENCAP_VALID. If the dest encap valid flag does not exists then pkt_reformat can be either invalid address or null. Also, to make sure we don't try to access invalid pkt_reformat set it to null when invalidated and invalidate it before calling add flow code as its logically more correct and to be safe. Fixes: 2b688ea5efde ("net/mlx5: Add flow steering actions to fs_cmd shim layer") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Chris Mi <cmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-21net/mlx5: Lag, avoid lockdep warningsEli Cohen
ldev->lock is used to serialize lag change operations. Since multiport eswtich functionality was added, we now change the mode dynamically. However, acquiring ldev->lock is not allowed as it could possibly lead to a deadlock as reported by the lockdep mechanism. [ 836.154963] WARNING: possible circular locking dependency detected [ 836.155850] 5.19.0-rc5_net_56b7df2 #1 Not tainted [ 836.156549] ------------------------------------------------------ [ 836.157418] handler1/12198 is trying to acquire lock: [ 836.158178] ffff888187d52b58 (&ldev->lock){+.+.}-{3:3}, at: mlx5_lag_do_mirred+0x3b/0x70 [mlx5_core] [ 836.159575] [ 836.159575] but task is already holding lock: [ 836.160474] ffff8881d4de2930 (&block->cb_lock){++++}-{3:3}, at: tc_setup_cb_add+0x5b/0x200 [ 836.161669] which lock already depends on the new lock. [ 836.162905] [ 836.162905] the existing dependency chain (in reverse order) is: [ 836.164008] -> #3 (&block->cb_lock){++++}-{3:3}: [ 836.164946] down_write+0x25/0x60 [ 836.165548] tcf_block_get_ext+0x1c6/0x5d0 [ 836.166253] ingress_init+0x74/0xa0 [sch_ingress] [ 836.167028] qdisc_create.constprop.0+0x130/0x5e0 [ 836.167805] tc_modify_qdisc+0x481/0x9f0 [ 836.168490] rtnetlink_rcv_msg+0x16e/0x5a0 [ 836.169189] netlink_rcv_skb+0x4e/0xf0 [ 836.169861] netlink_unicast+0x190/0x250 [ 836.170543] netlink_sendmsg+0x243/0x4b0 [ 836.171226] sock_sendmsg+0x33/0x40 [ 836.171860] ____sys_sendmsg+0x1d1/0x1f0 [ 836.172535] ___sys_sendmsg+0xab/0xf0 [ 836.173183] __sys_sendmsg+0x51/0x90 [ 836.173836] do_syscall_64+0x3d/0x90 [ 836.174471] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 836.175282] [ 836.175282] -> #2 (rtnl_mutex){+.+.}-{3:3}: [ 836.176190] __mutex_lock+0x6b/0xf80 [ 836.176830] register_netdevice_notifier+0x21/0x120 [ 836.177631] rtnetlink_init+0x2d/0x1e9 [ 836.178289] netlink_proto_init+0x163/0x179 [ 836.178994] do_one_initcall+0x63/0x300 [ 836.179672] kernel_init_freeable+0x2cb/0x31b [ 836.180403] kernel_init+0x17/0x140 [ 836.181035] ret_from_fork+0x1f/0x30 [ 836.181687] -> #1 (pernet_ops_rwsem){+.+.}-{3:3}: [ 836.182628] down_write+0x25/0x60 [ 836.183235] unregister_netdevice_notifier+0x1c/0xb0 [ 836.184029] mlx5_ib_roce_cleanup+0x94/0x120 [mlx5_ib] [ 836.184855] __mlx5_ib_remove+0x35/0x60 [mlx5_ib] [ 836.185637] mlx5_eswitch_unregister_vport_reps+0x22f/0x440 [mlx5_core] [ 836.186698] auxiliary_bus_remove+0x18/0x30 [ 836.187409] device_release_driver_internal+0x1f6/0x270 [ 836.188253] bus_remove_device+0xef/0x160 [ 836.188939] device_del+0x18b/0x3f0 [ 836.189562] mlx5_rescan_drivers_locked+0xd6/0x2d0 [mlx5_core] [ 836.190516] mlx5_lag_remove_devices+0x69/0xe0 [mlx5_core] [ 836.191414] mlx5_do_bond_work+0x441/0x620 [mlx5_core] [ 836.192278] process_one_work+0x25c/0x590 [ 836.192963] worker_thread+0x4f/0x3d0 [ 836.193609] kthread+0xcb/0xf0 [ 836.194189] ret_from_fork+0x1f/0x30 [ 836.194826] -> #0 (&ldev->lock){+.+.}-{3:3}: [ 836.195734] __lock_acquire+0x15b8/0x2a10 [ 836.196426] lock_acquire+0xce/0x2d0 [ 836.197057] __mutex_lock+0x6b/0xf80 [ 836.197708] mlx5_lag_do_mirred+0x3b/0x70 [mlx5_core] [ 836.198575] tc_act_parse_mirred+0x25b/0x800 [mlx5_core] [ 836.199467] parse_tc_actions+0x168/0x5a0 [mlx5_core] [ 836.200340] __mlx5e_add_fdb_flow+0x263/0x480 [mlx5_core] [ 836.201241] mlx5e_configure_flower+0x8a0/0x1820 [mlx5_core] [ 836.202187] tc_setup_cb_add+0xd7/0x200 [ 836.202856] fl_hw_replace_filter+0x14c/0x1f0 [cls_flower] [ 836.203739] fl_change+0xbbe/0x1730 [cls_flower] [ 836.204501] tc_new_tfilter+0x407/0xd90 [ 836.205168] rtnetlink_rcv_msg+0x406/0x5a0 [ 836.205877] netlink_rcv_skb+0x4e/0xf0 [ 836.206535] netlink_unicast+0x190/0x250 [ 836.207217] netlink_sendmsg+0x243/0x4b0 [ 836.207915] sock_sendmsg+0x33/0x40 [ 836.208538] ____sys_sendmsg+0x1d1/0x1f0 [ 836.209219] ___sys_sendmsg+0xab/0xf0 [ 836.209878] __sys_sendmsg+0x51/0x90 [ 836.210510] do_syscall_64+0x3d/0x90 [ 836.211137] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 836.211954] other info that might help us debug this: [ 836.213174] Chain exists of: [ 836.213174] &ldev->lock --> rtnl_mutex --> &block->cb_lock 836.214650] Possible unsafe locking scenario: [ 836.214650] [ 836.215574] CPU0 CPU1 [ 836.216255] ---- ---- [ 836.216943] lock(&block->cb_lock); [ 836.217518] lock(rtnl_mutex); [ 836.218348] lock(&block->cb_lock); [ 836.219212] lock(&ldev->lock); [ 836.219758] [ 836.219758] *** DEADLOCK *** [ 836.219758] [ 836.220747] 2 locks held by handler1/12198: [ 836.221390] #0: ffff8881d4de2930 (&block->cb_lock){++++}-{3:3}, at: tc_setup_cb_add+0x5b/0x200 [ 836.222646] #1: ffff88810c9a92c0 (&esw->mode_lock){++++}-{3:3}, at: mlx5_esw_hold+0x39/0x50 [mlx5_core] [ 836.224063] stack backtrace: [ 836.224799] CPU: 6 PID: 12198 Comm: handler1 Not tainted 5.19.0-rc5_net_56b7df2 #1 [ 836.225923] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 836.227476] Call Trace: [ 836.227929] <TASK> [ 836.228332] dump_stack_lvl+0x57/0x7d [ 836.228924] check_noncircular+0x104/0x120 [ 836.229562] __lock_acquire+0x15b8/0x2a10 [ 836.230201] lock_acquire+0xce/0x2d0 [ 836.230776] ? mlx5_lag_do_mirred+0x3b/0x70 [mlx5_core] [ 836.231614] ? find_held_lock+0x2b/0x80 [ 836.232221] __mutex_lock+0x6b/0xf80 [ 836.232799] ? mlx5_lag_do_mirred+0x3b/0x70 [mlx5_core] [ 836.233636] ? mlx5_lag_do_mirred+0x3b/0x70 [mlx5_core] [ 836.234451] ? xa_load+0xc3/0x190 [ 836.234995] mlx5_lag_do_mirred+0x3b/0x70 [mlx5_core] [ 836.235803] tc_act_parse_mirred+0x25b/0x800 [mlx5_core] [ 836.236636] ? tc_act_can_offload_mirred+0x135/0x210 [mlx5_core] [ 836.237550] parse_tc_actions+0x168/0x5a0 [mlx5_core] [ 836.238364] __mlx5e_add_fdb_flow+0x263/0x480 [mlx5_core] [ 836.239202] mlx5e_configure_flower+0x8a0/0x1820 [mlx5_core] [ 836.240076] ? lock_acquire+0xce/0x2d0 [ 836.240668] ? tc_setup_cb_add+0x5b/0x200 [ 836.241294] tc_setup_cb_add+0xd7/0x200 [ 836.241917] fl_hw_replace_filter+0x14c/0x1f0 [cls_flower] [ 836.242709] fl_change+0xbbe/0x1730 [cls_flower] [ 836.243408] tc_new_tfilter+0x407/0xd90 [ 836.244043] ? tc_del_tfilter+0x880/0x880 [ 836.244672] rtnetlink_rcv_msg+0x406/0x5a0 [ 836.245310] ? netlink_deliver_tap+0x7a/0x4b0 [ 836.245991] ? if_nlmsg_stats_size+0x2b0/0x2b0 [ 836.246675] netlink_rcv_skb+0x4e/0xf0 [ 836.258046] netlink_unicast+0x190/0x250 [ 836.258669] netlink_sendmsg+0x243/0x4b0 [ 836.259288] sock_sendmsg+0x33/0x40 [ 836.259857] ____sys_sendmsg+0x1d1/0x1f0 [ 836.260473] ___sys_sendmsg+0xab/0xf0 [ 836.261064] ? lock_acquire+0xce/0x2d0 [ 836.261669] ? find_held_lock+0x2b/0x80 [ 836.262272] ? __fget_files+0xb9/0x190 [ 836.262871] ? __fget_files+0xd3/0x190 [ 836.263462] __sys_sendmsg+0x51/0x90 [ 836.264064] do_syscall_64+0x3d/0x90 [ 836.264652] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 836.265425] RIP: 0033:0x7fdbe5e2677d [ 836.266012] Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 ba ee ff ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 ee ee ff ff 48 [ 836.268485] RSP: 002b:00007fdbe48a75a0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e [ 836.269598] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fdbe5e2677d [ 836.270576] RDX: 0000000000000000 RSI: 00007fdbe48a7640 RDI: 000000000000003c [ 836.271565] RBP: 00007fdbe48a8368 R08: 0000000000000000 R09: 0000000000000000 [ 836.272546] R10: 00007fdbe48a84b0 R11: 0000000000000293 R12: 0000557bd17dc860 [ 836.273527] R13: 0000000000000000 R14: 0000557bd17dc860 R15: 00007fdbe48a7640 [ 836.274521] </TASK> To avoid using mode holding ldev->lock in the configure flow, we queue a work to the lag workqueue and cease wait on a completion object. In addition, we remove the lock from mlx5_lag_do_mirred() since it is not really protecting anything. It should be noted that an actual deadlock has not been observed. Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-21net/mlx5: Fix handling of entry refcount when command is not issued to FWMoshe Shemesh
In case command interface is down, or the command is not allowed, driver did not increment the entry refcount, but might have decrement as part of forced completion handling. Fix that by always increment and decrement the refcount to make it symmetric for all flows. Fixes: 50b2412b7e78 ("net/mlx5: Avoid possible free of command entry while timeout comp handler") Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com> Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reported-by: Jack Wang <jinpu.wang@ionos.com> Tested-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-21net/mlx5: cmdif, Print info on any firmware cmd failure to tracepointMoshe Shemesh
While moving to new CMD API (quiet API), some pre-existing flows may call the new API function that in case of error, returns the error instead of printing it as previously done. For such flows we bring back the print but to tracepoint this time for sys admins to have the ability to check for errors especially for commands using the new quiet API. Tracepoint output example: devlink-1333 [001] ..... 822.746922: mlx5_cmd: ACCESS_REG(0x805) op_mod(0x0) failed, status bad resource(0x5), syndrome (0xb06e1f), err(-22) Fixes: f23519e542e5 ("net/mlx5: cmdif, Add new api for command execution") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-21net/mlx5: SF: Fix probing active SFs during driver probe phaseShay Drory
When SF devices and SF port representors are located on different functions, unloading and reloading of SF parent driver doesn't recreate the existing SF present in the device. Fix it by querying SFs and probe active SFs during driver probe phase. Fixes: 90d010b8634b ("net/mlx5: SF, Add auxiliary device support") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-21net/mlx5: Fix FW tracer timestamp calculationMoshe Shemesh
Fix a bug in calculation of FW tracer timestamp. Decreasing one in the calculation should effect only bits 52_7 and not effect bits 6_0 of the timestamp, otherwise bits 6_0 are always set in this calculation. Fixes: 70dd6fdb8987 ("net/mlx5: FW tracer, parse traces and kernel tracing support") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Feras Daoud <ferasda@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-21net/mlx5: Do not query pci info while pci disabledRoy Novich
The driver should not interact with PCI while PCI is disabled. Trying to do so may result in being unable to get vital signs during PCI reset, driver gets timed out and fails to recover. Fixes: fad1783a6d66 ("net/mlx5: Print more info on pci error handlers") Signed-off-by: Roy Novich <royno@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-21netfilter: ipset: regression in ip_set_hash_ip.cVishwanath Pai
This patch introduced a regression: commit 48596a8ddc46 ("netfilter: ipset: Fix adding an IPv4 range containing more than 2^31 addresses") The variable e.ip is passed to adtfn() function which finally adds the ip address to the set. The patch above refactored the for loop and moved e.ip = htonl(ip) to the end of the for loop. What this means is that if the value of "ip" changes between the first assignement of e.ip and the forloop, then e.ip is pointing to a different ip address than "ip". Test case: $ ipset create jdtest_tmp hash:ip family inet hashsize 2048 maxelem 100000 $ ipset add jdtest_tmp 10.0.1.1/31 ipset v6.21.1: Element cannot be added to the set: it's already added The value of ip gets updated inside the "else if (tb[IPSET_ATTR_CIDR])" block but e.ip is still pointing to the old value. Fixes: 48596a8ddc46 ("netfilter: ipset: Fix adding an IPv4 range containing more than 2^31 addresses") Reviewed-by: Joshua Hunt <johunt@akamai.com> Signed-off-by: Vishwanath Pai <vpai@akamai.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-11-21octeontx2-af: cn10k: mcs: Fix copy and paste bug in mcs_bbe_intr_handler()Dan Carpenter
This code accidentally uses the RX macro twice instead of the RX and TX. Fixes: 6c635f78c474 ("octeontx2-af: cn10k: mcs: Handle MCS block interrupts") Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-21ipv4/fib: Replace zero-length array with DECLARE_FLEX_ARRAY() helperKees Cook
Zero-length arrays are deprecated[1] and are being replaced with flexible array members in support of the ongoing efforts to tighten the FORTIFY_SOURCE routines on memcpy(), correctly instrument array indexing with UBSAN_BOUNDS, and to globally enable -fstrict-flex-arrays=3. Replace zero-length array with flexible-array member in struct key_vector. This results in no differences in binary output. [1] https://github.com/KSPP/linux/issues/78 Cc: Jakub Kicinski <kuba@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: David Ahern <dsahern@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-21selftests/net: Find nettest in current directoryDaniel Díaz
The `nettest` binary, built from `selftests/net/nettest.c`, was expected to be found in the path during test execution of `fcnal-test.sh` and `pmtu.sh`, leading to tests getting skipped when the binary is not installed in the system, as can be seen in these logs found in the wild [1]: # TEST: vti4: PMTU exceptions [SKIP] [ 350.600250] IPv6: ADDRCONF(NETDEV_CHANGE): veth_b: link becomes ready [ 350.607421] IPv6: ADDRCONF(NETDEV_CHANGE): veth_a: link becomes ready # 'nettest' command not found; skipping tests # xfrm6udp not supported # TEST: vti6: PMTU exceptions (ESP-in-UDP) [SKIP] [ 351.605102] IPv6: ADDRCONF(NETDEV_CHANGE): veth_b: link becomes ready [ 351.612243] IPv6: ADDRCONF(NETDEV_CHANGE): veth_a: link becomes ready # 'nettest' command not found; skipping tests # xfrm4udp not supported The `unicast_extensions.sh` tests also rely on `nettest`, but it runs fine there because it looks for the binary in the current working directory [2]: The same mechanism that works for the Unicast extensions tests is here copied over to the PMTU and functional tests. [1] https://lkft.validation.linaro.org/scheduler/job/5839508#L6221 [2] https://lkft.validation.linaro.org/scheduler/job/5839508#L7958 Signed-off-by: Daniel Díaz <daniel.diaz@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== The following patchset contains late Netfilter fixes for net: 1) Use READ_ONCE()/WRITE_ONCE() to update ct->mark, from Daniel Xu. Not reported by syzbot, but I presume KASAN would trigger post a splat on this. This is a rather old issue, predating git history. 2) Do not set up extensions for set element with end interval flag set on. This leads to bogusly skipping this elements as expired when listing the set/map to userspace as well as increasing memory consumpton when stateful expressions are used. This issue has been present since 4.18, when timeout support for rbtree set was added. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-21net: microchip: sparx5: Fix return value in sparx5_tc_setup_qdisc_ets()Lu Wei
Function sparx5_tc_setup_qdisc_ets() always returns negative value because it return -EOPNOTSUPP in the end. This patch returns the rersult of sparx5_tc_ets_add() and sparx5_tc_ets_del() directly. Fixes: 211225428d65 ("net: microchip: sparx5: add support for offloading ets qdisc") Signed-off-by: Lu Wei <luwei32@huawei.com> Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-21Merge branch 'nfc-leaks'David S. Miller
Shang XiaoJing says: ==================== nfc: Fix potential memory leak of skb There are still somewhere maybe leak the skb, fix the memleaks by adding fail path. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-21nfc: s3fwrn5: Fix potential memory leak in s3fwrn5_nci_send()Shang XiaoJing
s3fwrn5_nci_send() won't free the skb when it failed for the check before s3fwrn5_write(). As the result, the skb will memleak. Free the skb when the check failed. Fixes: c04c674fadeb ("nfc: s3fwrn5: Add driver for Samsung S3FWRN5 NFC Chip") Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> Suggested-by: Pavel Machek <pavel@denx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-21nfc: nxp-nci: Fix potential memory leak in nxp_nci_send()Shang XiaoJing
nxp_nci_send() won't free the skb when it failed for the check before write(). As the result, the skb will memleak. Free the skb when the check failed. Fixes: dece45855a8b ("NFC: nxp-nci: Add support for NXP NCI chips") Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> Suggested-by: Pavel Machek <pavel@denx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-21nfc: nfcmrvl: Fix potential memory leak in nfcmrvl_i2c_nci_send()Shang XiaoJing
nfcmrvl_i2c_nci_send() will be called by nfcmrvl_nci_send(), and skb should be freed in nfcmrvl_i2c_nci_send(). However, nfcmrvl_nci_send() won't free the skb when it failed for the test_bit(). Free the skb when test_bit() failed. Fixes: b5b3e23e4cac ("NFC: nfcmrvl: add i2c driver") Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> Suggested-by: Pavel Machek <pavel@denx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-18macsec: Fix invalid error code setYueHaibing
'ret' is defined twice in macsec_changelink(), when it is set in macsec_is_offloaded case, it will be invalid before return. Fixes: 3cf3227a21d1 ("net: macsec: hardware offloading infrastructure") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Saeed Mahameed <saeed@kernel.org> Reviewed-by: Antoine Tenart <atenart@kernel.org> Link: https://lore.kernel.org/r/20221118011249.48112-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-18bonding: fix ICMPv6 header handling when receiving IPv6 messagesHangbin Liu
Currently, we get icmp6hdr via function icmp6_hdr(), which needs the skb transport header to be set first. But there is no rule to ask driver set transport header before netif_receive_skb() and bond_handle_frame(). So we will not able to get correct icmp6hdr on some drivers. Fix this by using skb_header_pointer to get the IPv6 and ICMPV6 headers. Reported-by: Liang Li <liali@redhat.com> Fixes: 4e24be018eb9 ("bonding: add new parameter ns_targets") Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/20221118034353.1736727-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-18Merge branch 'nfp-fixes-for-v6-1'Jakub Kicinski
Simon Horman says: ==================== nfp: fixes for v6.1 - Ensure that information displayed by "devlink port show" reflects the number of lanes available to be split. - Avoid NULL dereference in ethtool test code. ==================== Link: https://lore.kernel.org/r/20221117153744.688595-1-simon.horman@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-18nfp: add port from netdev validation for EEPROM accessJaco Coetzee
Setting of the port flag `NFP_PORT_CHANGED`, introduced to ensure the correct reading of EEPROM data, causes a fatal kernel NULL pointer dereference in cases where the target netdev type cannot be determined. Add validation of port struct pointer before attempting to set the `NFP_PORT_CHANGED` flag. Return that operation is not supported if the netdev type cannot be determined. Fixes: 4ae97cae07e1 ("nfp: ethtool: fix the display error of `ethtool -m DEVNAME`") Signed-off-by: Jaco Coetzee <jaco.coetzee@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-18nfp: fill splittable of devlink_port_attrs correctlyDiana Wang
The error is reflected in that it shows wrong splittable status of port when executing "devlink port show". The reason which leads the error is that the assigned operation of splittable is just a simple negation operation of split and it does not consider port lanes quantity. A splittable port should have several lanes that can be split(lanes quantity > 1). If without the judgement, it will show wrong message for some firmware, such as 2x25G, 2x10G. Fixes: a0f49b548652 ("devlink: Add a new devlink port split ability attribute and pass to netlink") Signed-off-by: Diana Wang <na.wang@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-18net: pch_gbe: fix pci device refcount leak while module exitingYang Yingliang
As comment of pci_get_domain_bus_and_slot() says, it returns a pci device with refcount increment, when finish using it, the caller must decrement the reference count by calling pci_dev_put(). In pch_gbe_probe(), pci_get_domain_bus_and_slot() is called, so in error path in probe() and remove() function, pci_dev_put() should be called to avoid refcount leak. Compile tested only. Fixes: 1a0bdadb4e36 ("net/pch_gbe: supports eg20t ptp clock") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20221117135148.301014-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-18octeontx2-af: debugsfs: fix pci device refcount leakYang Yingliang
As comment of pci_get_domain_bus_and_slot() says, it returns a pci device with refcount increment, when finish using it, the caller must decrement the reference count by calling pci_dev_put(). So before returning from rvu_dbg_rvu_pf_cgx_map_display() or cgx_print_dmac_flt(), pci_dev_put() is called to avoid refcount leak. Fixes: dbc52debf95f ("octeontx2-af: Debugfs support for DMAC filters") Fixes: e2fb37303865 ("octeontx2-af: Display CGX, NIX and PF map in debugfs.") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20221117124658.162409-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-18net/qla3xxx: fix potential memleak in ql3xxx_send()Zhang Changzhong
The ql3xxx_send() returns NETDEV_TX_OK without freeing skb in error handling case, add dev_kfree_skb_any() to fix it. Fixes: bd36b0ac5d06 ("qla3xxx: Add support for Qlogic 4032 chip.") Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Link: https://lore.kernel.org/r/1668675039-21138-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-18net: mvpp2: fix possible invalid pointer dereferenceHui Tang
It will cause invalid pointer dereference to priv->cm3_base behind, if PTR_ERR(priv->cm3_base) in mvpp2_get_sram(). Fixes: e54ad1e01c00 ("net: mvpp2: add CM3 SRAM memory map") Signed-off-by: Hui Tang <tanghui20@huawei.com> Link: https://lore.kernel.org/r/20221117084032.101144-1-tanghui20@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-18net/mlx4: Check retval of mlx4_bitmap_initPeter Kosyh
If mlx4_bitmap_init fails, mlx4_bitmap_alloc_range will dereference the NULL pointer (bitmap->table). Make sure, that mlx4_bitmap_alloc_range called in no error case. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: d57febe1a478 ("net/mlx4: Add A0 hybrid steering") Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Peter Kosyh <pkosyh@yandex.ru> Link: https://lore.kernel.org/r/20221117152806.278072-1-pkosyh@yandex.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-18net: ethernet: mtk_eth_soc: fix error handling in mtk_open()Liu Jian
If mtk_start_dma() fails, invoke phylink_disconnect_phy() to perform cleanup. phylink_disconnect_phy() contains the put_device action. If phylink_disconnect_phy is not performed, the Kref of netdev will leak. Fixes: b8fc9f30821e ("net: ethernet: mediatek: Add basic PHYLINK support") Signed-off-by: Liu Jian <liujian56@huawei.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20221117111356.161547-1-liujian56@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-18iavf: Fix race condition between iavf_shutdown and iavf_removeSlawomir Laba
Fix a deadlock introduced by commit 974578017fc1 ("iavf: Add waiting so the port is initialized in remove") due to race condition between iavf_shutdown and iavf_remove, where iavf_remove stucks forever in while loop since iavf_shutdown already set __IAVF_REMOVE adapter state. Fix this by checking if the __IAVF_IN_REMOVE_TASK has already been set and return if so. Fixes: 974578017fc1 ("iavf: Add waiting so the port is initialized in remove") Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-11-18iavf: remove INITIAL_MAC_SET to allow gARP to work properlyStefan Assmann
IAVF_FLAG_INITIAL_MAC_SET prevents waiting on iavf_is_mac_set_handled() the first time the MAC is set. This breaks gratuitous ARP because the MAC address has not been updated yet when the gARP packet is sent out. Current behaviour: $ echo 1 > /sys/class/net/ens4f0/device/sriov_numvfs iavf 0000:88:02.0: MAC address: ee:04:19:14:ec:ea $ ip addr add 192.168.1.1/24 dev ens4f0v0 $ ip link set dev ens4f0v0 up $ echo 1 > /proc/sys/net/ipv4/conf/ens4f0v0/arp_notify $ ip link set ens4f0v0 addr 00:11:22:33:44:55 07:23:41.676611 ee:04:19:14:ec:ea > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.1.1 tell 192.168.1.1, length 28 With IAVF_FLAG_INITIAL_MAC_SET removed: $ echo 1 > /sys/class/net/ens4f0/device/sriov_numvfs iavf 0000:88:02.0: MAC address: 3e:8a:16:a2:37:6d $ ip addr add 192.168.1.1/24 dev ens4f0v0 $ ip link set dev ens4f0v0 up $ echo 1 > /proc/sys/net/ipv4/conf/ens4f0v0/arp_notify $ ip link set ens4f0v0 addr 00:11:22:33:44:55 07:28:01.836608 00:11:22:33:44:55 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.1.1 tell 192.168.1.1, length 28 Fixes: 35a2443d0910 ("iavf: Add waiting for response from PF in set mac") Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>