summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-03-15net/mlx5e: Expose SQ SW state as part of SQ health diagnosticsAdham Faris
Add SQ SW state textual representation to devlink health diagnostics for tx reporter. SQ SW state can be retrieved by issuing the devlink command below: $ devlink health diagnose auxiliary/mlx5_core.eth.0/65535 reporter tx Output ======================================================================= Common Config: SQ: stride size: 64 size: 1024 ts_format: FRC CQ: stride size: 64 size: 1024 SQs: channel ix: 0 tc: 0 txq ix: 0 sqn: 4170 HW state: 1 stopped: false cc: 0 pc: 0 SW State: enabled: 1 mpwqe: 1 recovering: 0 ipsec: 0 am: 1 vlan_need_l2_inline: 1 pending_xsk_tx: 0 pending_tls_rx_resync: 0 xdp_multibuf: 0 CQ: cqn: 1031 HW status: 0 ci: 0 size: 1024 EQ: eqn: 7 irqn: 32 vecidx: 0 ci: 2 size: 2048 Signed-off-by: Adham Faris <afaris@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230314054234.267365-9-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15net/mlx5e: Stringify RQ SW state in RQ devlink health diagnosticsAdham Faris
One of the parameters that is retrieved/printed as a response to devlink health diagnostics for rx reporter is the RQ SW state. It's printed as a bitmap decimal number. Printing it as bitmap is problematic and non informative. In addition User can't count on SW state without accessing the kernel sources (mlx5e rq state enum in en.h). This patch prints RQ SW state in a textual representation, as a key: value pairs, where disabled rq states will appear as '0' and enabled ones will appear as '1'. See below the generated output for rx health diagnostics devlink command: $ devlink health diagnose auxiliary/mlx5_core.eth.0/65535 reporter rx Before: ======================================================================= Common config: RQ: type: 2 stride size: 2048 size: 8 ts_format: FRC CQ: stride size: 64 size: 1024 RQs: channel ix: 0 rqn: 4172 HW state: 1 SW state: 37 WQE counter: 7 posted WQEs: 7 cc: 7 CQ: cqn: 1033 HW status: 0 ci: 0 size: 1024 EQ: eqn: 7 irqn: 32 vecidx: 0 ci: 2 size: 2048 ICOSQ: sqn: 4169 HW state: 1 cc: 74 pc: 74 WQE size: 128 CQ: cqn: 1030 cc: 1 size: 128 channel ix: 1 ... . . After: ======================================================================= Common config: RQ: type: 2 stride size: 2048 size: 8 ts_format: FRC CQ: stride size: 64 size: 1024 RQs: channel ix: 0 rqn: 4172 HW state: 1 WQE counter: 7 posted WQEs: 7 cc: 7 SW State: enabled: 1 recovering: 0 am: 1 no_csum_complete: 0 csum_full: 0 mini_cqe_hw_stridx: 1 shampo: 0 mini_cqe_enhanced: 0 CQ: cqn: 1033 HW status: 0 ci: 0 size: 1024 EQ: eqn: 7 irqn: 32 vecidx: 0 ci: 2 size: 2048 ICOSQ: sqn: 4169 HW state: 1 cc: 74 pc: 74 WQE size: 128 CQ: cqn: 1030 cc: 1 size: 128 channel: ix: 1 ... . . Signed-off-by: Adham Faris <afaris@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230314054234.267365-8-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15net/mlx5e: Rename RQ/SQ adaptive moderation state flagAdham Faris
Dynamic interrupt moderation RQ and SQ feature represented by MLX5E_RQ_STATE_AM and MLX5E_SQ_STATE_AM enums respectively, is not consistent with the feature naming in the driver, and with the formal feature and library names. Hence, change MLX5E_RQ_STATE_AM and MLX5E_SQ_STATE_AM enum type names in core/en.h to MLX5E_RQ_STATE_DIM and MLX5E_SQ_STATE_DIM respectively. Signed-off-by: Adham Faris <afaris@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230314054234.267365-7-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15net/mlx5e: Utilize the entire fifoRahul Rameshbabu
Previous check was comparing against the fifo mask. The mask is size of the fifo (power of two) minus one, so a less than or equal comparator should be used for checking if the fifo has room for the SKB. Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230314054234.267365-6-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15net/mlx5: Implement thermal zoneSandipan Patra
Implement thermal zone support for mlx5 based HW. The NIC uses temperature sensor provided by ASIC to report current temperature to thermal core. Signed-off-by: Sandipan Patra <spatra@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230314054234.267365-5-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15net/mlx5: Add comment to mlx5_devlink_params_register()Jiri Pirko
Add comment to mlx5_devlink_params_register() functions so it is clear that only driver init params should be registered here. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230314054234.267365-4-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15net/mlx5: Stop waiting for PCI up if teardown was triggeredMoshe Shemesh
If driver teardown is called while PCI is turned off, there is a race between health recovery and teardown. If health recovery already started it will wait 60 sec trying to see if PCI gets back and it can recover, but actually there is no need to wait anymore once teardown was called. Use the MLX5_BREAK_FW_WAIT flag which is set on driver teardown to break waiting for PCI up. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230314054234.267365-3-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15net/mlx5: remove redundant clear_bitMoshe Shemesh
When shutdown or remove callbacks are called the driver sets the flag MLX5_BREAK_FW_WAIT, to stop waiting for FW as teardown was called. There is no need to clear the bit as once shutdown or remove were called as there is no way back, the driver is going down. Furthermore, if not cleared the flag can be used also in other loops where we may wait while teardown was already called. Use test_bit() instead of test_and_clear_bit() as there is no need to clear the flag. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230314054234.267365-2-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15net: phy: mscc: fix deadlock in phy_ethtool_{get,set}_wol()Vladimir Oltean
Since the blamed commit, phy_ethtool_get_wol() and phy_ethtool_set_wol() acquire phydev->lock, but the mscc phy driver implementations, vsc85xx_wol_get() and vsc85xx_wol_set(), acquire the same lock as well, resulting in a deadlock. $ ip link set swp3 down ============================================ WARNING: possible recursive locking detected mscc_felix 0000:00:00.5 swp3: Link is Down -------------------------------------------- ip/375 is trying to acquire lock: ffff3d7e82e987a8 (&dev->lock){+.+.}-{4:4}, at: vsc85xx_wol_get+0x2c/0xf4 but task is already holding lock: ffff3d7e82e987a8 (&dev->lock){+.+.}-{4:4}, at: phy_ethtool_get_wol+0x3c/0x6c other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&dev->lock); lock(&dev->lock); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by ip/375: #0: ffffd43b2a955788 (rtnl_mutex){+.+.}-{4:4}, at: rtnetlink_rcv_msg+0x144/0x58c #1: ffff3d7e82e987a8 (&dev->lock){+.+.}-{4:4}, at: phy_ethtool_get_wol+0x3c/0x6c Call trace: __mutex_lock+0x98/0x454 mutex_lock_nested+0x2c/0x38 vsc85xx_wol_get+0x2c/0xf4 phy_ethtool_get_wol+0x50/0x6c phy_suspend+0x84/0xcc phy_state_machine+0x1b8/0x27c phy_stop+0x70/0x154 phylink_stop+0x34/0xc0 dsa_port_disable_rt+0x2c/0xa4 dsa_slave_close+0x38/0xec __dev_close_many+0xc8/0x16c __dev_change_flags+0xdc/0x218 dev_change_flags+0x24/0x6c do_setlink+0x234/0xea4 __rtnl_newlink+0x46c/0x878 rtnl_newlink+0x50/0x7c rtnetlink_rcv_msg+0x16c/0x58c Removing the mutex_lock(&phydev->lock) calls from the driver restores the functionality. Fixes: 2f987d486610 ("net: phy: Add locks to ethtool functions") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230314153025.2372970-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15net: phy: micrel: drop superfluous use of temp variableWolfram Sang
'temp' was used before commit c0c99d0cd107 ("net: phy: micrel: remove the use of .ack_interrupt()") refactored the code. Now, we can simplify it a little. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20230314124928.44948-1-wsa+renesas@sang-engineering.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15net: phy: update obsolete comment about PHY_STARTINGWolfram Sang
Commit 899a3cbbf77a ("net: phy: remove states PHY_STARTING and PHY_PENDING") missed to update a comment in phy_probe. Remove superfluous "Description:" prefix while we are here. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/20230314124856.44878-1-wsa+renesas@sang-engineering.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ice: refactor mailbox overflow detection Jake Keller says: The primary motivation of this series is to cleanup and refactor the mailbox overflow detection logic such that it will work with Scalable IOV. In addition a few other minor cleanups are done while I was working on the code in the area. First, the mailbox overflow functions in ice_vf_mbx.c are refactored to store the data per-VF as an embedded structure in struct ice_vf, rather than stored separately as a fixed-size array which only works with Single Root IOV. This reduces the overall memory footprint when only a handful of VFs are used. The overflow detection functions are also cleaned up to reduce the need for multiple separate calls to determine when to report a VF as potentially malicious. Finally, the ice_is_malicious_vf function is cleaned up and moved into ice_virtchnl.c since it is not Single Root IOV specific, and thus does not belong in ice_sriov.c I could probably have done this in fewer patches, but I split pieces out to hopefully aid in reviewing the overall sequence of changes. This does cause some additional thrash as it results in intermediate versions of the refactor, but I think its worth it for making each step easier to understand. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: call ice_is_malicious_vf() from ice_vc_process_vf_msg() ice: move ice_is_malicious_vf() to ice_virtchnl.c ice: print message if ice_mbx_vf_state_handler returns an error ice: pass mbxdata to ice_is_malicious_vf() ice: remove unnecessary &array[0] and just use array ice: always report VF overflowing mailbox even without PF VSI ice: declare ice_vc_process_vf_msg in ice_virtchnl.h ice: initialize mailbox snapshot earlier in PF init ice: merge ice_mbx_report_malvf with ice_mbx_vf_state_handler ice: remove ice_mbx_deinit_snapshot ice: move VF overflow message count into struct ice_mbx_vf_info ice: track malicious VFs in new ice_mbx_vf_info structure ice: convert ice_mbx_clear_malvf to void and use WARN ice: re-order ice_mbx_reset_snapshot function ==================== Link: https://lore.kernel.org/r/20230313182123.483057-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15ptp: ines: drop of_match_ptr for ID tableKrzysztof Kozlowski
The driver can match only via the DT table so the table should be always used and the of_match_ptr does not have any sense (this also allows ACPI matching via PRP0001, even though it might not be relevant here). This also fixes !CONFIG_OF error: drivers/ptp/ptp_ines.c:783:34: error: ‘ines_ptp_ctrl_of_match’ defined but not used [-Werror=unused-const-variable=] Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Link: https://lore.kernel.org/r/20230312132637.352755-1-krzysztof.kozlowski@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15veth: Fix use after free in XDP_REDIRECTShawn Bohrer
Commit 718a18a0c8a6 ("veth: Rework veth_xdp_rcv_skb in order to accept non-linear skb") introduced a bug where it tried to use pskb_expand_head() if the headroom was less than XDP_PACKET_HEADROOM. This however uses kmalloc to expand the head, which will later allow consume_skb() to free the skb while is it still in use by AF_XDP. Previously if the headroom was less than XDP_PACKET_HEADROOM we continued on to allocate a new skb from pages so this restores that behavior. BUG: KASAN: use-after-free in __xsk_rcv+0x18d/0x2c0 Read of size 78 at addr ffff888976250154 by task napi/iconduit-g/148640 CPU: 5 PID: 148640 Comm: napi/iconduit-g Kdump: loaded Tainted: G O 6.1.4-cloudflare-kasan-2023.1.2 #1 Hardware name: Quanta Computer Inc. QuantaPlex T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018 Call Trace: <TASK> dump_stack_lvl+0x34/0x48 print_report+0x170/0x473 ? __xsk_rcv+0x18d/0x2c0 kasan_report+0xad/0x130 ? __xsk_rcv+0x18d/0x2c0 kasan_check_range+0x149/0x1a0 memcpy+0x20/0x60 __xsk_rcv+0x18d/0x2c0 __xsk_map_redirect+0x1f3/0x490 ? veth_xdp_rcv_skb+0x89c/0x1ba0 [veth] xdp_do_redirect+0x5ca/0xd60 veth_xdp_rcv_skb+0x935/0x1ba0 [veth] ? __netif_receive_skb_list_core+0x671/0x920 ? veth_xdp+0x670/0x670 [veth] veth_xdp_rcv+0x304/0xa20 [veth] ? do_xdp_generic+0x150/0x150 ? veth_xdp_rcv_one+0xde0/0xde0 [veth] ? _raw_spin_lock_bh+0xe0/0xe0 ? newidle_balance+0x887/0xe30 ? __perf_event_task_sched_in+0xdb/0x800 veth_poll+0x139/0x571 [veth] ? veth_xdp_rcv+0xa20/0xa20 [veth] ? _raw_spin_unlock+0x39/0x70 ? finish_task_switch.isra.0+0x17e/0x7d0 ? __switch_to+0x5cf/0x1070 ? __schedule+0x95b/0x2640 ? io_schedule_timeout+0x160/0x160 __napi_poll+0xa1/0x440 napi_threaded_poll+0x3d1/0x460 ? __napi_poll+0x440/0x440 ? __kthread_parkme+0xc6/0x1f0 ? __napi_poll+0x440/0x440 kthread+0x2a2/0x340 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 </TASK> Freed by task 148640: kasan_save_stack+0x23/0x50 kasan_set_track+0x21/0x30 kasan_save_free_info+0x2a/0x40 ____kasan_slab_free+0x169/0x1d0 slab_free_freelist_hook+0xd2/0x190 __kmem_cache_free+0x1a1/0x2f0 skb_release_data+0x449/0x600 consume_skb+0x9f/0x1c0 veth_xdp_rcv_skb+0x89c/0x1ba0 [veth] veth_xdp_rcv+0x304/0xa20 [veth] veth_poll+0x139/0x571 [veth] __napi_poll+0xa1/0x440 napi_threaded_poll+0x3d1/0x460 kthread+0x2a2/0x340 ret_from_fork+0x22/0x30 The buggy address belongs to the object at ffff888976250000 which belongs to the cache kmalloc-2k of size 2048 The buggy address is located 340 bytes inside of 2048-byte region [ffff888976250000, ffff888976250800) The buggy address belongs to the physical page: page:00000000ae18262a refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x976250 head:00000000ae18262a order:3 compound_mapcount:0 compound_pincount:0 flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff) raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004cf00 raw: 0000000000000000 0000000080080008 00000002ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888976250000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888976250080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff888976250100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888976250180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888976250200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: 718a18a0c8a6 ("veth: Rework veth_xdp_rcv_skb in order to accept non-linear skb") Signed-off-by: Shawn Bohrer <sbohrer@cloudflare.com> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Acked-by: Toke Høiland-Jørgensen <toke@kernel.org> Link: https://lore.kernel.org/r/20230314153351.2201328-1-sbohrer@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15io_uring: rsrc: Optimize return value variable 'ret'Li zeming
The initialization assignment of the variable ret is changed to 0, only in 'goto fail;' Use the ret variable as the function return value. Signed-off-by: Li zeming <zeming@nfschina.com> Link: https://lore.kernel.org/r/20230317182538.3027-1-zeming@nfschina.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-15net/mlx5e: TC, Remove error message log printOz Shlomo
The cited commit attempts to update the hw stats when dumping tc actions. However, the driver may be called to update the stats of a police action that may not be in hardware. In such cases the driver will fail to lookup the police action object and will output an error message both to extack and dmesg. The dmesg error is confusing as it may not indicate an actual error. Remove the dmesg error. Fixes: 2b68d659a704 ("net/mlx5e: TC, support per action stats") Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/mlx5e: TC, fix cloned flow attributeOz Shlomo
Currently the cloned flow attr resets the original tc action cookies count. Fix that by resetting the cloned flow attribute. Fixes: cca7eac13856 ("net/mlx5e: TC, store tc action cookies per attr") Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/mlx5e: TC, fix missing error codeOz Shlomo
Missing error code when mlx5e_tc_act_stats_create fails Fixes: d13674b1d14c ("net/mlx5e: TC, map tc action cookie to a hw counter") Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/sched: TC, fix raw counter initializationOz Shlomo
Freed counters may be reused by fs core. As such, raw counters may not be initialized to zero. Cache the counter values when the action stats object is initialized to have a proper base value for calculating the difference from the previous query. Fixes: 2b68d659a704 ("net/mlx5e: TC, support per action stats") Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/mlx5e: Lower maximum allowed MTU in XSK to match XDP prerequisitesAdham Faris
XSK redirecting XDP programs require linearity, hence applies restrictions on the MTU. For PAGE_SIZE=4K, MTU shouldn't exceed 3498. Features that contradict with XDP such HW-LRO and HW-GRO are enforced by the driver in advance, during XSK params validation, except for MTU, which was not enforced before this patch. This has been spotted during test scenario described below: Attaching xdpsock program (PAGE_SIZE=4K), with MTU < 3498, detaching XDP program, changing the MTU to arbitrary value in the range [3499, 3754], attaching XDP program again, which ended up with failure since MTU is > 3498. This commit lowers the XSK MTU limitation to be aligned with XDP MTU limitation, since XSK socket is meaningless without XDP program. Signed-off-by: Adham Faris <afaris@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/mlx5: Set BREAK_FW_WAIT flag first when removing driverShay Drory
Currently, BREAK_FW_WAIT flag is set after syncing with fw_reset. However, fw_reset can call mlx5_load_one() which is waiting for fw init bit and BREAK_FW_WAIT flag is intended to stop. e.g.: the driver might wait on a loop it should exit. Fix it by setting the flag before syncing with fw_reset. Fixes: 8324a02c342a ("net/mlx5: Add exit route when waiting for FW") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/mlx5e: kTLS, Fix missing error unwind on unsupported cipher typeGal Pressman
Do proper error unwinding when adding an unsupported TX/RX cipher type. Move the switch case prior to key creation so there's less to unwind, and change the goto label name to describe the action performed instead of what failed. Fixes: 4960c414db35 ("net/mlx5e: Support 256 bit keys with kTLS device offload") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/mlx5e: Fix cleanup null-ptr deref on encap lockPaul Blakey
During module is unloaded while a peer tc flow is still offloaded, first the peer uplink rep profile is changed to a nic profile, and so neigh encap lock is destroyed. Next during unload, the VF reps netdevs are unregistered which causes the original non-peer tc flow to be deleted, which deletes the peer flow. The peer flow deletion detaches the encap entry and try to take the already destroyed encap lock, causing the below trace. Fix this by clearing peer flows during tc eswitch cleanup (mlx5e_tc_esw_cleanup()). Relevant trace: [ 4316.837128] BUG: kernel NULL pointer dereference, address: 00000000000001d8 [ 4316.842239] RIP: 0010:__mutex_lock+0xb5/0xc40 [ 4316.851897] Call Trace: [ 4316.852481] <TASK> [ 4316.857214] mlx5e_rep_neigh_entry_release+0x93/0x790 [mlx5_core] [ 4316.858258] mlx5e_rep_encap_entry_detach+0xa7/0xf0 [mlx5_core] [ 4316.859134] mlx5e_encap_dealloc+0xa3/0xf0 [mlx5_core] [ 4316.859867] clean_encap_dests.part.0+0x5c/0xe0 [mlx5_core] [ 4316.860605] mlx5e_tc_del_fdb_flow+0x32a/0x810 [mlx5_core] [ 4316.862609] __mlx5e_tc_del_fdb_peer_flow+0x1a2/0x250 [mlx5_core] [ 4316.863394] mlx5e_tc_del_flow+0x(/0x630 [mlx5_core] [ 4316.864090] mlx5e_flow_put+0x5f/0x100 [mlx5_core] [ 4316.864771] mlx5e_delete_flower+0x4de/0xa40 [mlx5_core] [ 4316.865486] tc_setup_cb_reoffload+0x20/0x80 [ 4316.865905] fl_reoffload+0x47c/0x510 [cls_flower] [ 4316.869181] tcf_block_playback_offloads+0x91/0x1d0 [ 4316.869649] tcf_block_unbind+0xe7/0x1b0 [ 4316.870049] tcf_block_offload_cmd.isra.0+0x1ee/0x270 [ 4316.879266] tcf_block_offload_unbind+0x61/0xa0 [ 4316.879711] __tcf_block_put+0xa4/0x310 Fixes: 04de7dda7394 ("net/mlx5e: Infrastructure for duplicated offloading of TC flows") Fixes: 1418ddd96afd ("net/mlx5e: Duplicate offloaded TC eswitch rules under uplink LAG") Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/mlx5: E-switch, Fix missing set of split_count when forward to ovs ↵Maor Dickman
internal port Rules with mirror actions are split to two FTEs when the actions after the mirror action contains pedit, vlan push/pop or ct. Forward to ovs internal port adds implicit header rewrite (pedit) but missing trigger to do split. Fix by setting split_count when forwarding to ovs internal port which will trigger split in mirror rules. Fixes: 27484f7170ed ("net/mlx5e: Offload tc rules that redirect to ovs internal port") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/mlx5: E-switch, Fix wrong usage of source port rewrite in split rulesMaor Dickman
In few cases, rules with mirror use case are split to two FTEs, one which do the mirror action and forward to second FTE which do the rest of the rule actions and the second redirect action. In case of mirror rules which do split and forward to ovs internal port or VF stack devices, source port rewrite should be used in the second FTE but it is wrongly also set in the first FTE which break the offload. Fix this issue by removing the wrong check if source port rewrite is needed to be used on the first FTE of the split and instead return EOPNOTSUPP which will block offload of rules which mirror to ovs internal port or VF stack devices which isn't supported. Fixes: 10742efc20a4 ("net/mlx5e: VF tunnel TX traffic offloading") Fixes: a508728a4c8b ("net/mlx5e: VF tunnel RX traffic offloading") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/mlx5: Disable eswitch before waiting for VF pagesDaniel Jurgens
The offending commit changed the ordering of moving to legacy mode and waiting for the VF pages. Moving to legacy mode is important in bluefield, because it sends the host driver into error state, and frees its pages. Without this transition we end up waiting 2 minutes for pages that aren't coming before carrying on with the unload process. Fixes: f019679ea5f2 ("net/mlx5: E-switch, Remove dependency between sriov and eswitch mode") Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/mlx5: Fix setting ec_function bit in MANAGE_PAGESParav Pandit
When ECPF is a page supplier, reclaim pages missed to honor the ec_function bit provided by the firmware. It always used the ec_function to true during driver unload flow for ECPF. This is incorrect. Honor the ec_function bit provided by device during page allocation request event. Fixes: d6945242f45d ("net/mlx5: Hold pages RB tree per VF") Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/mlx5e: Don't cache tunnel offloads capabilityParav Pandit
When mlx5e attaches again after device health recovery, the device capabilities might have changed by the eswitch manager. For example in one flow when ECPF changes the eswitch mode between legacy and switchdev, it updates the flow table tunnel capability. The cached value is only used in one place, so just check the capability there instead. Fixes: 5bef709d76a2 ("net/mlx5: Enable host PF HCA after eswitch is initialized") Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/mlx5e: Fix macsec ASO context alignmentEmeel Hakim
Currently mlx5e_macsec_umr struct does not satisfy hardware memory alignment requirement. Hence the result of querying advanced steering operation (ASO) is not copied to the memory region as expected. Fix by satisfying hardware memory alignment requirement and move context to be first field in struct for better readability. Fixes: 1f53da676439 ("net/mlx5e: Create advanced steering operation (ASO) object for MACsec") Signed-off-by: Emeel Hakim <ehakim@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15drm/amdgpu: Don't resume IOMMU after incomplete initFelix Kuehling
Check kfd->init_complete in kgd2kfd_iommu_resume, consistent with other kgd2kfd calls. This should fix IOMMU errors on resume from suspend when KFD IOMMU initialization failed. Reported-by: Matt Fagnani <matt.fagnani@bell.net> Link: https://lore.kernel.org/r/4a3b225c-2ffd-e758-4de1-447375e34cad@bell.net/ Link: https://bugzilla.kernel.org/show_bug.cgi?id=217170 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2454 Cc: Vasant Hegde <vasant.hegde@amd.com> Cc: Linux regression tracking (Thorsten Leemhuis) <regressions@leemhuis.info> Cc: stable@vger.kernel.org Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Tested-by: Matt Fagnani <matt.fagnani@bell.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15drm/amdkfd: Fixed kfd_process cleanup on module exit.David Belanger
Handle case when module is unloaded (kfd_exit) before a process space (mm_struct) is released. v2: Fixed potential race conditions by removing all kfd_process from the process table first, then working on releasing the resources. v3: Fixed loop element access / synchronization. Fixed extra empty lines. Signed-off-by: David Belanger <david.belanger@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15drm/amd/display: disconnect MPCC only on OTG changeAyush Gupta
[Why] Framedrops are observed while playing Vp9 and Av1 10 bit video on 8k resolution using VSR while playback controls are disappeared/appeared [How] Now ODM 2 to 1 is disabled for 5k or greater resolutions on VSR. Cc: stable@vger.kernel.org Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Ayush Gupta <ayugupta@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15drm/amd/display: Fix DP MST sinks removal issueCruise Hung
[Why] In USB4 DP tunneling, it's possible to have this scenario that the path becomes unavailable and CM tears down the path a little bit late. So, in this case, the HPD is high but fails to read any DPCD register. That causes the link connection type to be set to sst. And not all sinks are removed behind the MST branch. [How] Restore the link connection type if it fails to read DPCD register. Cc: stable@vger.kernel.org Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Cruise Hung <Cruise.Hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15drm/amd/display: Do not set DRR on pipe CommitWesley Chalmers
[WHY] Writing to DRR registers such as OTG_V_TOTAL_MIN on the same frame as a pipe commit can cause underflow. Cc: stable@vger.kernel.org Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Jun Lei <Jun.Lei@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Wesley Chalmers <Wesley.Chalmers@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15drm/amd/display: Remove OTG DIV register write for Virtual signals.Saaem Rizvi
[WHY] Hot plugging and then hot unplugging leads to k1 and k2 values to change, as signal is detected as a virtual signal on hot unplug. Writing these values to OTG_PIXEL_RATE_DIV register might cause primary display to blank (known hw bug). [HOW] No longer write k1 and k2 values to register if signal is virtual, we have safe guards in place in the case that k1 and k2 is unassigned so that an unknown value is not written to the register either. Cc: stable@vger.kernel.org Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Samson Tam <Samson.Tam@amd.com> Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Saaem Rizvi <SyedSaaem.Rizvi@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15Merge tag 'linux-kselftest-fixes-6.3-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fixes from Shuah Khan: "A fix to amd-pstate test Makefile and a fix to LLVM build for x86 in kselftest common lib.mk" * tag 'linux-kselftest-fixes-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests: fix LLVM build for i386 and x86_64 selftests: amd-pstate: fix TEST_FILES
2023-03-15Merge branch 'md-fixes' of ↵Jens Axboe
https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-6.3 Pull MD fixes from Song: "This set contains two fixes for old issues (by Neil) and one fix for 6.3 (by Xiao)." * 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: md: select BLOCK_LEGACY_AUTOLOAD md: avoid signed overflow in slot_store() md: Free resources in __md_stop
2023-03-15md: select BLOCK_LEGACY_AUTOLOADNeilBrown
When BLOCK_LEGACY_AUTOLOAD is not enable, mdadm is not able to activate new arrays unless "CREATE names=yes" appears in mdadm.conf As this is a regression we need to always enable BLOCK_LEGACY_AUTOLOAD for when MD is selected - at least until mdadm is updated and the updates widely available. Cc: stable@vger.kernel.org # v5.18+ Fixes: fbdee71bb5d8 ("block: deprecate autoloading based on dev_t") Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Song Liu <song@kernel.org>
2023-03-15block: count 'ios' and 'sectors' when io is done for bio-based deviceYu Kuai
While using iostat for raid, I observed very strange 'await' occasionally, and turns out it's due to that 'ios' and 'sectors' is counted in bdev_start_io_acct(), while 'nsecs' is counted in bdev_end_io_acct(). I'm not sure why they are ccounted like that but I think this behaviour is obviously wrong because user will get wrong disk stats. Fix the problem by counting 'ios' and 'sectors' when io is done, like what rq-based device does. Fixes: 394ffa503bc4 ("blk: introduce generic io stat accounting help function") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230223091226.1135678-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-15block: sunvdc: add check for mdesc_grab() returning NULLLiang He
In vdc_port_probe(), we should check the return value of mdesc_grab() as it may return NULL, which can cause potential NPD bug. Fixes: 43fdf27470b2 ("[SPARC64]: Abstract out mdesc accesses for better MD update handling.") Signed-off-by: Liang He <windhl@126.com> Link: https://lore.kernel.org/r/20230315062032.1741692-1-windhl@126.com [axboe: style cleanup] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-15nvmet: avoid potential UAF in nvmet_req_complete()Damien Le Moal
An nvme target ->queue_response() operation implementation may free the request passed as argument. Such implementation potentially could result in a use after free of the request pointer when percpu_ref_put() is called in nvmet_req_complete(). Avoid such problem by using a local variable to save the sq pointer before calling __nvmet_req_complete(), thus avoiding dereferencing the req pointer after that function call. Fixes: a07b4970f464 ("nvmet: add a generic NVMe target") Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-03-15nvme-trace: show more opcode namesMinwoo Im
We have more commands to show in the trace. Sync up. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-03-15nvme-tcp: add nvme-tcp pdu size build protectionSagi Grimberg
Make sure that we don't somehow mess up the wire structures in the spec. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kkch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-03-15nvme-tcp: fix opcode reporting in the timeout handlerSagi Grimberg
For non in-capsule writes we reuse the request pdu space for a h2cdata pdu in order to avoid over allocating space (either preallocate or dynamically upon receving an r2t pdu). However if the request times out the core expects to find the opcode in the start of the request, which we override. In order to prevent that, without sacrificing additional 24 bytes per request, we just use the tail of the command pdu space instead (last 24 bytes from the 72 bytes command pdu). That should make the command opcode always available, and we get away from allocating more space. If in the future we would need the last 24 bytes of the nvme command available we would need to allocate a dedicated space for it in the request, but until then we can avoid doing so. Reported-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kkch@nvidia.com> Tested-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-03-15nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM620Philipp Geulen
Added a quirk to fix Lexar NM620 1TB SSD reporting duplicate NGUIDs. Signed-off-by: Philipp Geulen <p.geulen@js-elektronik.de> Reviewed-by: Chaitanya Kulkarni <kkch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-03-15nvme-pci: add NVME_QUIRK_BOGUS_NID for Netac NV3000Elmer Miroslav Mosher Golovin
Added a quirk to fix the Netac NV3000 SSD reporting duplicate NGUIDs. Cc: <stable@vger.kernel.org> Signed-off-by: Elmer Miroslav Mosher Golovin <miroslav@mishamosher.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-03-15nvme-pci: fixing memory leak in probe teardown pathIrvin Cote
In case the nvme_probe teardown path is triggered the ctrl ref count does not reach 0 thus creating a memory leak upon failure of nvme_probe. Signed-off-by: Irvin Cote <irvincoteg@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-03-15nvme: fix handling single range discard requestMing Lei
When investigating one customer report on warning in nvme_setup_discard, we observed the controller(nvme/tcp) actually exposes queue_max_discard_segments(req->q) == 1. Obviously the current code can't handle this situation, since contiguity merge like normal RW request is taken. Fix the issue by building range from request sector/nr_sectors directly. Fixes: b35ba01ea697 ("nvme: support ranged discard requests") Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-03-15MAINTAINERS: repair malformed T: entries in NVM EXPRESS DRIVERSLukas Bulwahn
The T: entries shall be composed of a SCM tree type (git, hg, quilt, stgit or topgit) and location. Add the SCM tree type to the T: entry, and reorder the file entries in alphabetical order. Fixes: b508fc354f6d ("nvme: update maintainers information") Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-03-15io_uring/sqpoll: Do not set PF_NO_SETAFFINITY on sqpoll threadsMichal Koutný
Users may specify a CPU where the sqpoll thread would run. This may conflict with cpuset operations because of strict PF_NO_SETAFFINITY requirement. That flag is unnecessary for polling "kernel" threads, see the reasoning in commit 01e68ce08a30 ("io_uring/io-wq: stop setting PF_NO_SETAFFINITY on io-wq workers"). Drop the flag on poll threads too. Fixes: 01e68ce08a30 ("io_uring/io-wq: stop setting PF_NO_SETAFFINITY on io-wq workers") Link: https://lore.kernel.org/all/20230314162559.pnyxdllzgw7jozgx@blackpad/ Signed-off-by: Michal Koutný <mkoutny@suse.com> Link: https://lore.kernel.org/r/20230314183332.25834-1-mkoutny@suse.com Signed-off-by: Jens Axboe <axboe@kernel.dk>