summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-03-16net: phy: mxl-gpy: enhance delay time required by loopback disable functionXu Liang
GPY2xx devices need 3 seconds to fully switch out of loopback mode before it can safely re-enter loopback mode. Implement timeout mechanism to guarantee 3 seconds waited before re-enter loopback mode. Signed-off-by: Xu Liang <lxu@maxlinear.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-16Merge branch 'net-virtio-vsock'David S. Miller
2023-03-16test/vsock: copy to user failure testArseniy Krasnov
This adds SOCK_STREAM and SOCK_SEQPACKET tests for invalid buffer case. It tries to read data to NULL buffer (data already presents in socket's queue), then uses valid buffer. For SOCK_STREAM second read must return data, because skbuff is not dropped, but for SOCK_SEQPACKET skbuff will be dropped by kernel, and 'recv()' will return EAGAIN. Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-16virtio/vsock: don't drop skbuff on copy failureArseniy Krasnov
This returns behaviour of SOCK_STREAM read as before skbuff usage. When copying to user fails current skbuff won't be dropped, but returned to sockets's queue. Technically instead of 'skb_dequeue()', 'skb_peek()' is called and when skbuff becomes empty, it is removed from queue by '__skb_unlink()'. Fixes: 71dc9ec9ac7d ("virtio/vsock: replace virtio_vsock_pkt with sk_buff") Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Bobby Eshleman <bobby.eshleman@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-16virtio/vsock: remove redundant 'skb_pull()' callArseniy Krasnov
Since we now no longer use 'skb->len' to update credit, there is no sense to update skbuff state, because it is used only once after dequeue to copy data and then will be released. Fixes: 71dc9ec9ac7d ("virtio/vsock: replace virtio_vsock_pkt with sk_buff") Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Bobby Eshleman <bobby.eshleman@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-16virtio/vsock: don't use skbuff state to account creditArseniy Krasnov
'skb->len' can vary when we partially read the data, this complicates the calculation of credit to be updated in 'virtio_transport_inc_rx_pkt()/ virtio_transport_dec_rx_pkt()'. Also in 'virtio_transport_dec_rx_pkt()' we were miscalculating the credit since 'skb->len' was redundant. For these reasons, let's replace the use of skbuff state to calculate new 'rx_bytes'/'fwd_cnt' values with explicit value as input argument. This makes code more simple, because it is not needed to change skbuff state before each call to update 'rx_bytes'/'fwd_cnt'. Fixes: 71dc9ec9ac7d ("virtio/vsock: replace virtio_vsock_pkt with sk_buff") Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Bobby Eshleman <bobby.eshleman@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-16block: remove obsolete config BLOCK_COMPATLukas Bulwahn
Before commit bdc1ddad3e5f ("compat_ioctl: block: move blkdev_compat_ioctl() into ioctl.c"), the config BLOCK_COMPAT was used to include compat_ioctl.c into the kernel build. With this commit, the code is moved into ioctl.c and included with the config COMPAT. So, since then, the config BLOCK_COMPAT has no effect and any further purpose. Remove this obsolete config BLOCK_COMPAT. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20230316111630.4897-1-lukas.bulwahn@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-16io_uring/rsrc: fix folio accountingPavel Begunkov
| BUG: Bad page state in process kworker/u8:0 pfn:5c001 | page:00000000bfda61c8 refcount:0 mapcount:0 mapping:0000000000000000 index:0x20001 pfn:0x5c001 | head:0000000011409842 order:9 entire_mapcount:0 nr_pages_mapped:0 pincount:1 | anon flags: 0x3fffc00000b0004(uptodate|head|mappedtodisk|swapbacked|node=0|zone=0|lastcpupid=0xffff) | raw: 03fffc0000000000 fffffc0000700001 ffffffff00700903 0000000100000000 | raw: 0000000000000200 0000000000000000 00000000ffffffff 0000000000000000 | head: 03fffc00000b0004 dead000000000100 dead000000000122 ffff00000a809dc1 | head: 0000000000020000 0000000000000000 00000000ffffffff 0000000000000000 | page dumped because: nonzero pincount | CPU: 3 PID: 9 Comm: kworker/u8:0 Not tainted 6.3.0-rc2-00001-gc6811bf0cd87 #1 | Hardware name: linux,dummy-virt (DT) | Workqueue: events_unbound io_ring_exit_work | Call trace: | dump_backtrace+0x13c/0x208 | show_stack+0x34/0x58 | dump_stack_lvl+0x150/0x1a8 | dump_stack+0x20/0x30 | bad_page+0xec/0x238 | free_tail_pages_check+0x280/0x350 | free_pcp_prepare+0x60c/0x830 | free_unref_page+0x50/0x498 | free_compound_page+0xcc/0x100 | free_transhuge_page+0x1f0/0x2b8 | destroy_large_folio+0x80/0xc8 | __folio_put+0xc4/0xf8 | gup_put_folio+0xd0/0x250 | unpin_user_page+0xcc/0x128 | io_buffer_unmap+0xec/0x2c0 | __io_sqe_buffers_unregister+0xa4/0x1e0 | io_ring_exit_work+0x68c/0x1188 | process_one_work+0x91c/0x1a58 | worker_thread+0x48c/0xe30 | kthread+0x278/0x2f0 | ret_from_fork+0x10/0x20 Mark reports an issue with the recent patches coalescing compound pages while registering them in io_uring. The reason is that we try to drop excessive references with folio_put_refs(), but pages were acquired with pin_user_pages(), which has extra accounting and so should be put down with matching unpin_user_pages() or at least gup_put_folio(). As a fix unpin_user_pages() all but first page instead, and let's figure out a better API after. Fixes: 57bebf807e2abcf8 ("io_uring/rsrc: optimise registered huge pages") Reported-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Tested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/10efd5507d6d1f05ea0f3c601830e08767e189bd.1678980230.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-16io_uring/msg_ring: let target know allocated indexPavel Begunkov
msg_ring requests transferring files support auto index selection via IORING_FILE_INDEX_ALLOC, however they don't return the selected index to the target ring and there is no other good way for the userspace to know where is the receieved file. Return the index for allocated slots and 0 otherwise, which is consistent with other fixed file installing requests. Cc: stable@vger.kernel.org # v6.0+ Fixes: e6130eba8a848 ("io_uring: add support for passing fixed file descriptors") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://github.com/axboe/liburing/issues/809 Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-16Merge tag 'nvme-6.3-2022-03-16' of git://git.infradead.org/nvme into block-6.3Jens Axboe
Pull NVMe fixes from Christoph: "nvme fixes for Linux 6.3 - avoid potential UAF in nvmet_req_complete (Damien Le Moal) - more quirks (Elmer Miroslav Mosher Golovin, Philipp Geulen) - fix a memory leak in the nvme-pci probe teardown path (Irvin Cote) - repair the MAINTAINERS entry (Lukas Bulwahn) - fix handling single range discard request (Ming Lei) - show more opcode names in trace events (Minwoo Im) - fix nvme-tcp timeout reporting (Sagi Grimberg)" * tag 'nvme-6.3-2022-03-16' of git://git.infradead.org/nvme: nvmet: avoid potential UAF in nvmet_req_complete() nvme-trace: show more opcode names nvme-tcp: add nvme-tcp pdu size build protection nvme-tcp: fix opcode reporting in the timeout handler nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM620 nvme-pci: add NVME_QUIRK_BOGUS_NID for Netac NV3000 nvme-pci: fixing memory leak in probe teardown path nvme: fix handling single range discard request MAINTAINERS: repair malformed T: entries in NVM EXPRESS DRIVERS
2023-03-16xen: remove unnecessary (void*) conversionsYu Zhe
Pointer variables of void * type do not require type cast. Signed-off-by: Yu Zhe <yuzhe@nfschina.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230316083954.4223-1-yuzhe@nfschina.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-03-15net: phy: micrel: Fix spelling mistake "minimim" -> "minimum"Colin Ian King
There is a spelling mistake in a pr_warn_ratelimited message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20230314082315.26532-1-colin.i.king@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15Merge branch 'nfp-flower-add-support-for-multi-zone-conntrack'Jakub Kicinski
Louis Peens says: ==================== nfp: flower: add support for multi-zone conntrack This series add changes to support offload of connection tracking across multiple zones. Previously the driver only supported offloading of a single goto_chain, spanning a single zone. This was implemented by merging a pre_ct rule, post_ct rule and the nft rule. This series provides updates to let the original post_ct rule act as the new pre_ct rule for a next set of merges if it contains another goto and conntrack action. In pseudo-tc rule format this adds support for: ingress chain 0 proto ip flower action ct zone 1 pipe action goto 1 ingress chain 1 proto ip flower ct_state +tr+new ct_zone 1 action ct_clear pipe action ct zone 2 pipe action goto 2 ingress chain 1 proto ip flower ct_state +tr+est ct_zone 1 action ct_clear pipe action ct zone 2 pipe action goto 2 ingress chain 2 proto ip flower ct_state +tr+new ct_zone 2 action mirred egress redirect dev ... ingress chain 2 proto ip flower ct_state +tr+est ct_zone 2 action mirred egress redirect dev ... This can continue for up to a maximum of 4 zone recirculations. The first few patches are some smaller preparation patches while the last one introduces the functionality. ==================== Link: https://lore.kernel.org/r/20230314063610.10544-1-louis.peens@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15nfp: flower: offload tc flows of multiple conntrack zonesWentao Jia
If goto_chain action present in the post ct flow rule, merge flow rules in this ct-zone, create a new pre_ct entry as the pre ct flow rule of next ct-zone, but do not offload merged flow rules to firmware. Repeat the process in the next ct-zone until no goto_chain action present in the post ct flow rule in a certain ct-zone, merged all the flow rules. Offload to firmware finally. Signed-off-by: Wentao Jia <wentao.jia@corigine.com> Acked-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15nfp: flower: prepare for parameterisation of number of offload rulesWentao Jia
The fixed number of offload flow rule is only supported scenario of one ct zone, in the scenario of multiple ct zones, dynamic number and more number of offload flow rules are required. In order to support scenario of multiple ct zones, parameter num_rules is added for to offload flow rules Signed-off-by: Wentao Jia <wentao.jia@corigine.com> Acked-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15nfp: flower: add goto_chain_index for ct entryWentao Jia
The chain_index has different means in pre ct entry and post ct entry. In pre ct entry, it means chain index, but in post ct entry, it means goto chain index, it is confused. chain_index and goto_chain_index may be present in one flow rule, It cannot be distinguished by one field chain_index, both chain_index and goto_chain_index are required in the follow-up patch to support multiple ct zones Another field goto_chain_index is added to record the goto chain index. If no goto action in post ct entry, goto_chain_index is 0. Signed-off-by: Wentao Jia <wentao.jia@corigine.com> Acked-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15nfp: flower: refactor function "is_post_ct_flow"Wentao Jia
'ct_clear' action only or no ct action is supported for 'post_ct_flow'. But in scenario of multiple ct zones, one non 'ct_clear' ct action or more ct actions, including 'ct_clear action', may be present in one flow rule. If ct state match key is 'ct_established', the flow rule is still expected to be classified as 'post_ct_flow'. Check ct status first in function "is_post_ct_flow" to achieve this. Signed-off-by: Wentao Jia <wentao.jia@corigine.com> Acked-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15nfp: flower: refactor function "is_pre_ct_flow"Wentao Jia
In the scenario of multiple ct zones, ct state key match and ct action is present in one flow rule, the flow rule is classified to post_ct_flow in design. There is no ct state key match for pre ct flow, the judging condition is added to function "is_pre_ct_flow". Chain_index is another field for judging which flows are pre ct flow If chain_index not 0, the flow is not pre ct flow. Signed-off-by: Wentao Jia <wentao.jia@corigine.com> Acked-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15nfp: flower: add get_flow_act_ct() for ct actionWentao Jia
CT action is a special case different from other actions, CT clear action is not required when get ct action, but this case is not considered. If CT clear action in the flow rule, skip the CT clear action when get ct action, return the first ct action that is not a CT clear action Signed-off-by: Wentao Jia <wentao.jia@corigine.com> Acked-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15Merge mlx5 updates 2023-03-13Jakub Kicinski
Saeed Mahameed says: ==================== mlx5-updates-2023-03-13 1) Trivial cleanup patches 2) By Sandipan Patra: Implement thermal zone to report NIC temperature 3) Adham Faris, Improves devlink health diagnostics for netdev objects 4) From Maor, Enable TC offload for egress and engress MACVLAN over bond 5) From Gal, add devlink hairpin queues parameters to replace debugfs as was discussed in [1]: [1] https://lore.kernel.org/all/20230111194608.7f15b9a1@kernel.org/ ==================== Link: https://lore.kernel.org/all/20230314054234.267365-1-saeed@kernel.org/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15net/mlx5e: Enable TC offload for egress MACVLAN over bondMaor Dickman
Support offloading of TC rules that mirror/redirect egress traffic to a MACVLAN device, which is attached to bond device which master mlx5 devices. Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230314054234.267365-16-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15net/mlx5e: Enable TC offload for ingress MACVLAN over bondMaor Dickman
Support offloading of TC rules that filter ingress traffic from a MACVLAN device, which is attached to bond device. Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230314054234.267365-15-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15net/mlx5e: TC, Extract indr setup block checks to functionMaor Dickman
In preparation for next patch which will add new check if device block can be setup, extract all existing checks to function to make it more readable and maintainable. Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230314054234.267365-14-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15net/mlx5e: Add more information to hairpin table dumpGal Pressman
Print the number of hairpin queues and size as part of the hairpin table dump. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230314054234.267365-13-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15net/mlx5e: Add devlink hairpin queues parametersGal Pressman
We refer to a TC NIC rule that involves forwarding as "hairpin". Hairpin queues are mlx5 hardware specific implementation for hardware forwarding of such packets. Per the discussion in [1], move the hairpin queues control (number and size) from debugfs to devlink. Expose two devlink params: - hairpin_num_queues: control the number of hairpin queues - hairpin_queue_size: control the size (in packets) of the hairpin queues [1] https://lore.kernel.org/all/20230111194608.7f15b9a1@kernel.org/ Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230314054234.267365-12-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15net/mlx5: Move needed PTYS functions to core layerGal Pressman
Downstream patches require devlink params to access the PTYS register, move the needed functions from mlx5e to the core layer. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230314054234.267365-11-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15net/mlx5e: Add XSK RQ state flag for RQ devlink health diagnosticsAdham Faris
Currently RQ health diagnostics doesn't inform the user whether an RQ is an XSK RQ or not. Address this, by adding XSK state flag to RQ SW state enum in core/en.h. XSK will be '1' if current RQ is an XSK RQ, and it will be '0' if it's not. In this example below, it can be seen that XSK field value is '1' since xdpsock program have been attached to channel 0 before issuing the devlink query command: $ devlink health diagnose auxiliary/mlx5_core.eth.0/65535 reporter rx Output: ======================================================================= Common config: RQ: type: 2 stride size: 4096 size: 16 ts_format: FRC CQ: stride size: 64 size: 1024 RQs: channel ix: 0 rqn: 4236 HW state: 1 WQE counter: 15 posted WQEs: 15 cc: 15 SW State: enabled: 1 recovering: 0 am: 1 no_csum_complete: 1 csum_full: 0 mini_cqe_hw_stridx: 1 shampo: 0 mini_cqe_enhanced: 0 xsk: 1 CQ: cqn: 1085 HW status: 0 ci: 0 size: 1024 EQ: eqn: 7 irqn: 32 vecidx: 0 ci: 5 size: 2048 ICOSQ: sqn: 4229 HW state: 1 cc: 158 pc: 158 WQE size: 2048 CQ: cqn: 1080 cc: 1 size: 2048 Signed-off-by: Adham Faris <afaris@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230314054234.267365-10-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15net/mlx5e: Expose SQ SW state as part of SQ health diagnosticsAdham Faris
Add SQ SW state textual representation to devlink health diagnostics for tx reporter. SQ SW state can be retrieved by issuing the devlink command below: $ devlink health diagnose auxiliary/mlx5_core.eth.0/65535 reporter tx Output ======================================================================= Common Config: SQ: stride size: 64 size: 1024 ts_format: FRC CQ: stride size: 64 size: 1024 SQs: channel ix: 0 tc: 0 txq ix: 0 sqn: 4170 HW state: 1 stopped: false cc: 0 pc: 0 SW State: enabled: 1 mpwqe: 1 recovering: 0 ipsec: 0 am: 1 vlan_need_l2_inline: 1 pending_xsk_tx: 0 pending_tls_rx_resync: 0 xdp_multibuf: 0 CQ: cqn: 1031 HW status: 0 ci: 0 size: 1024 EQ: eqn: 7 irqn: 32 vecidx: 0 ci: 2 size: 2048 Signed-off-by: Adham Faris <afaris@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230314054234.267365-9-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15net/mlx5e: Stringify RQ SW state in RQ devlink health diagnosticsAdham Faris
One of the parameters that is retrieved/printed as a response to devlink health diagnostics for rx reporter is the RQ SW state. It's printed as a bitmap decimal number. Printing it as bitmap is problematic and non informative. In addition User can't count on SW state without accessing the kernel sources (mlx5e rq state enum in en.h). This patch prints RQ SW state in a textual representation, as a key: value pairs, where disabled rq states will appear as '0' and enabled ones will appear as '1'. See below the generated output for rx health diagnostics devlink command: $ devlink health diagnose auxiliary/mlx5_core.eth.0/65535 reporter rx Before: ======================================================================= Common config: RQ: type: 2 stride size: 2048 size: 8 ts_format: FRC CQ: stride size: 64 size: 1024 RQs: channel ix: 0 rqn: 4172 HW state: 1 SW state: 37 WQE counter: 7 posted WQEs: 7 cc: 7 CQ: cqn: 1033 HW status: 0 ci: 0 size: 1024 EQ: eqn: 7 irqn: 32 vecidx: 0 ci: 2 size: 2048 ICOSQ: sqn: 4169 HW state: 1 cc: 74 pc: 74 WQE size: 128 CQ: cqn: 1030 cc: 1 size: 128 channel ix: 1 ... . . After: ======================================================================= Common config: RQ: type: 2 stride size: 2048 size: 8 ts_format: FRC CQ: stride size: 64 size: 1024 RQs: channel ix: 0 rqn: 4172 HW state: 1 WQE counter: 7 posted WQEs: 7 cc: 7 SW State: enabled: 1 recovering: 0 am: 1 no_csum_complete: 0 csum_full: 0 mini_cqe_hw_stridx: 1 shampo: 0 mini_cqe_enhanced: 0 CQ: cqn: 1033 HW status: 0 ci: 0 size: 1024 EQ: eqn: 7 irqn: 32 vecidx: 0 ci: 2 size: 2048 ICOSQ: sqn: 4169 HW state: 1 cc: 74 pc: 74 WQE size: 128 CQ: cqn: 1030 cc: 1 size: 128 channel: ix: 1 ... . . Signed-off-by: Adham Faris <afaris@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230314054234.267365-8-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15net/mlx5e: Rename RQ/SQ adaptive moderation state flagAdham Faris
Dynamic interrupt moderation RQ and SQ feature represented by MLX5E_RQ_STATE_AM and MLX5E_SQ_STATE_AM enums respectively, is not consistent with the feature naming in the driver, and with the formal feature and library names. Hence, change MLX5E_RQ_STATE_AM and MLX5E_SQ_STATE_AM enum type names in core/en.h to MLX5E_RQ_STATE_DIM and MLX5E_SQ_STATE_DIM respectively. Signed-off-by: Adham Faris <afaris@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230314054234.267365-7-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15net/mlx5e: Utilize the entire fifoRahul Rameshbabu
Previous check was comparing against the fifo mask. The mask is size of the fifo (power of two) minus one, so a less than or equal comparator should be used for checking if the fifo has room for the SKB. Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230314054234.267365-6-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15net/mlx5: Implement thermal zoneSandipan Patra
Implement thermal zone support for mlx5 based HW. The NIC uses temperature sensor provided by ASIC to report current temperature to thermal core. Signed-off-by: Sandipan Patra <spatra@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230314054234.267365-5-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15net/mlx5: Add comment to mlx5_devlink_params_register()Jiri Pirko
Add comment to mlx5_devlink_params_register() functions so it is clear that only driver init params should be registered here. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230314054234.267365-4-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15net/mlx5: Stop waiting for PCI up if teardown was triggeredMoshe Shemesh
If driver teardown is called while PCI is turned off, there is a race between health recovery and teardown. If health recovery already started it will wait 60 sec trying to see if PCI gets back and it can recover, but actually there is no need to wait anymore once teardown was called. Use the MLX5_BREAK_FW_WAIT flag which is set on driver teardown to break waiting for PCI up. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230314054234.267365-3-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15net/mlx5: remove redundant clear_bitMoshe Shemesh
When shutdown or remove callbacks are called the driver sets the flag MLX5_BREAK_FW_WAIT, to stop waiting for FW as teardown was called. There is no need to clear the bit as once shutdown or remove were called as there is no way back, the driver is going down. Furthermore, if not cleared the flag can be used also in other loops where we may wait while teardown was already called. Use test_bit() instead of test_and_clear_bit() as there is no need to clear the flag. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230314054234.267365-2-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15net: phy: mscc: fix deadlock in phy_ethtool_{get,set}_wol()Vladimir Oltean
Since the blamed commit, phy_ethtool_get_wol() and phy_ethtool_set_wol() acquire phydev->lock, but the mscc phy driver implementations, vsc85xx_wol_get() and vsc85xx_wol_set(), acquire the same lock as well, resulting in a deadlock. $ ip link set swp3 down ============================================ WARNING: possible recursive locking detected mscc_felix 0000:00:00.5 swp3: Link is Down -------------------------------------------- ip/375 is trying to acquire lock: ffff3d7e82e987a8 (&dev->lock){+.+.}-{4:4}, at: vsc85xx_wol_get+0x2c/0xf4 but task is already holding lock: ffff3d7e82e987a8 (&dev->lock){+.+.}-{4:4}, at: phy_ethtool_get_wol+0x3c/0x6c other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&dev->lock); lock(&dev->lock); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by ip/375: #0: ffffd43b2a955788 (rtnl_mutex){+.+.}-{4:4}, at: rtnetlink_rcv_msg+0x144/0x58c #1: ffff3d7e82e987a8 (&dev->lock){+.+.}-{4:4}, at: phy_ethtool_get_wol+0x3c/0x6c Call trace: __mutex_lock+0x98/0x454 mutex_lock_nested+0x2c/0x38 vsc85xx_wol_get+0x2c/0xf4 phy_ethtool_get_wol+0x50/0x6c phy_suspend+0x84/0xcc phy_state_machine+0x1b8/0x27c phy_stop+0x70/0x154 phylink_stop+0x34/0xc0 dsa_port_disable_rt+0x2c/0xa4 dsa_slave_close+0x38/0xec __dev_close_many+0xc8/0x16c __dev_change_flags+0xdc/0x218 dev_change_flags+0x24/0x6c do_setlink+0x234/0xea4 __rtnl_newlink+0x46c/0x878 rtnl_newlink+0x50/0x7c rtnetlink_rcv_msg+0x16c/0x58c Removing the mutex_lock(&phydev->lock) calls from the driver restores the functionality. Fixes: 2f987d486610 ("net: phy: Add locks to ethtool functions") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230314153025.2372970-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15net: phy: micrel: drop superfluous use of temp variableWolfram Sang
'temp' was used before commit c0c99d0cd107 ("net: phy: micrel: remove the use of .ack_interrupt()") refactored the code. Now, we can simplify it a little. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20230314124928.44948-1-wsa+renesas@sang-engineering.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15net: phy: update obsolete comment about PHY_STARTINGWolfram Sang
Commit 899a3cbbf77a ("net: phy: remove states PHY_STARTING and PHY_PENDING") missed to update a comment in phy_probe. Remove superfluous "Description:" prefix while we are here. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/20230314124856.44878-1-wsa+renesas@sang-engineering.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ice: refactor mailbox overflow detection Jake Keller says: The primary motivation of this series is to cleanup and refactor the mailbox overflow detection logic such that it will work with Scalable IOV. In addition a few other minor cleanups are done while I was working on the code in the area. First, the mailbox overflow functions in ice_vf_mbx.c are refactored to store the data per-VF as an embedded structure in struct ice_vf, rather than stored separately as a fixed-size array which only works with Single Root IOV. This reduces the overall memory footprint when only a handful of VFs are used. The overflow detection functions are also cleaned up to reduce the need for multiple separate calls to determine when to report a VF as potentially malicious. Finally, the ice_is_malicious_vf function is cleaned up and moved into ice_virtchnl.c since it is not Single Root IOV specific, and thus does not belong in ice_sriov.c I could probably have done this in fewer patches, but I split pieces out to hopefully aid in reviewing the overall sequence of changes. This does cause some additional thrash as it results in intermediate versions of the refactor, but I think its worth it for making each step easier to understand. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: call ice_is_malicious_vf() from ice_vc_process_vf_msg() ice: move ice_is_malicious_vf() to ice_virtchnl.c ice: print message if ice_mbx_vf_state_handler returns an error ice: pass mbxdata to ice_is_malicious_vf() ice: remove unnecessary &array[0] and just use array ice: always report VF overflowing mailbox even without PF VSI ice: declare ice_vc_process_vf_msg in ice_virtchnl.h ice: initialize mailbox snapshot earlier in PF init ice: merge ice_mbx_report_malvf with ice_mbx_vf_state_handler ice: remove ice_mbx_deinit_snapshot ice: move VF overflow message count into struct ice_mbx_vf_info ice: track malicious VFs in new ice_mbx_vf_info structure ice: convert ice_mbx_clear_malvf to void and use WARN ice: re-order ice_mbx_reset_snapshot function ==================== Link: https://lore.kernel.org/r/20230313182123.483057-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15ptp: ines: drop of_match_ptr for ID tableKrzysztof Kozlowski
The driver can match only via the DT table so the table should be always used and the of_match_ptr does not have any sense (this also allows ACPI matching via PRP0001, even though it might not be relevant here). This also fixes !CONFIG_OF error: drivers/ptp/ptp_ines.c:783:34: error: ‘ines_ptp_ctrl_of_match’ defined but not used [-Werror=unused-const-variable=] Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Link: https://lore.kernel.org/r/20230312132637.352755-1-krzysztof.kozlowski@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15veth: Fix use after free in XDP_REDIRECTShawn Bohrer
Commit 718a18a0c8a6 ("veth: Rework veth_xdp_rcv_skb in order to accept non-linear skb") introduced a bug where it tried to use pskb_expand_head() if the headroom was less than XDP_PACKET_HEADROOM. This however uses kmalloc to expand the head, which will later allow consume_skb() to free the skb while is it still in use by AF_XDP. Previously if the headroom was less than XDP_PACKET_HEADROOM we continued on to allocate a new skb from pages so this restores that behavior. BUG: KASAN: use-after-free in __xsk_rcv+0x18d/0x2c0 Read of size 78 at addr ffff888976250154 by task napi/iconduit-g/148640 CPU: 5 PID: 148640 Comm: napi/iconduit-g Kdump: loaded Tainted: G O 6.1.4-cloudflare-kasan-2023.1.2 #1 Hardware name: Quanta Computer Inc. QuantaPlex T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018 Call Trace: <TASK> dump_stack_lvl+0x34/0x48 print_report+0x170/0x473 ? __xsk_rcv+0x18d/0x2c0 kasan_report+0xad/0x130 ? __xsk_rcv+0x18d/0x2c0 kasan_check_range+0x149/0x1a0 memcpy+0x20/0x60 __xsk_rcv+0x18d/0x2c0 __xsk_map_redirect+0x1f3/0x490 ? veth_xdp_rcv_skb+0x89c/0x1ba0 [veth] xdp_do_redirect+0x5ca/0xd60 veth_xdp_rcv_skb+0x935/0x1ba0 [veth] ? __netif_receive_skb_list_core+0x671/0x920 ? veth_xdp+0x670/0x670 [veth] veth_xdp_rcv+0x304/0xa20 [veth] ? do_xdp_generic+0x150/0x150 ? veth_xdp_rcv_one+0xde0/0xde0 [veth] ? _raw_spin_lock_bh+0xe0/0xe0 ? newidle_balance+0x887/0xe30 ? __perf_event_task_sched_in+0xdb/0x800 veth_poll+0x139/0x571 [veth] ? veth_xdp_rcv+0xa20/0xa20 [veth] ? _raw_spin_unlock+0x39/0x70 ? finish_task_switch.isra.0+0x17e/0x7d0 ? __switch_to+0x5cf/0x1070 ? __schedule+0x95b/0x2640 ? io_schedule_timeout+0x160/0x160 __napi_poll+0xa1/0x440 napi_threaded_poll+0x3d1/0x460 ? __napi_poll+0x440/0x440 ? __kthread_parkme+0xc6/0x1f0 ? __napi_poll+0x440/0x440 kthread+0x2a2/0x340 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 </TASK> Freed by task 148640: kasan_save_stack+0x23/0x50 kasan_set_track+0x21/0x30 kasan_save_free_info+0x2a/0x40 ____kasan_slab_free+0x169/0x1d0 slab_free_freelist_hook+0xd2/0x190 __kmem_cache_free+0x1a1/0x2f0 skb_release_data+0x449/0x600 consume_skb+0x9f/0x1c0 veth_xdp_rcv_skb+0x89c/0x1ba0 [veth] veth_xdp_rcv+0x304/0xa20 [veth] veth_poll+0x139/0x571 [veth] __napi_poll+0xa1/0x440 napi_threaded_poll+0x3d1/0x460 kthread+0x2a2/0x340 ret_from_fork+0x22/0x30 The buggy address belongs to the object at ffff888976250000 which belongs to the cache kmalloc-2k of size 2048 The buggy address is located 340 bytes inside of 2048-byte region [ffff888976250000, ffff888976250800) The buggy address belongs to the physical page: page:00000000ae18262a refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x976250 head:00000000ae18262a order:3 compound_mapcount:0 compound_pincount:0 flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff) raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004cf00 raw: 0000000000000000 0000000080080008 00000002ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888976250000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888976250080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff888976250100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888976250180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888976250200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: 718a18a0c8a6 ("veth: Rework veth_xdp_rcv_skb in order to accept non-linear skb") Signed-off-by: Shawn Bohrer <sbohrer@cloudflare.com> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Acked-by: Toke Høiland-Jørgensen <toke@kernel.org> Link: https://lore.kernel.org/r/20230314153351.2201328-1-sbohrer@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15io_uring: rsrc: Optimize return value variable 'ret'Li zeming
The initialization assignment of the variable ret is changed to 0, only in 'goto fail;' Use the ret variable as the function return value. Signed-off-by: Li zeming <zeming@nfschina.com> Link: https://lore.kernel.org/r/20230317182538.3027-1-zeming@nfschina.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-15net/mlx5e: TC, Remove error message log printOz Shlomo
The cited commit attempts to update the hw stats when dumping tc actions. However, the driver may be called to update the stats of a police action that may not be in hardware. In such cases the driver will fail to lookup the police action object and will output an error message both to extack and dmesg. The dmesg error is confusing as it may not indicate an actual error. Remove the dmesg error. Fixes: 2b68d659a704 ("net/mlx5e: TC, support per action stats") Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/mlx5e: TC, fix cloned flow attributeOz Shlomo
Currently the cloned flow attr resets the original tc action cookies count. Fix that by resetting the cloned flow attribute. Fixes: cca7eac13856 ("net/mlx5e: TC, store tc action cookies per attr") Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/mlx5e: TC, fix missing error codeOz Shlomo
Missing error code when mlx5e_tc_act_stats_create fails Fixes: d13674b1d14c ("net/mlx5e: TC, map tc action cookie to a hw counter") Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/sched: TC, fix raw counter initializationOz Shlomo
Freed counters may be reused by fs core. As such, raw counters may not be initialized to zero. Cache the counter values when the action stats object is initialized to have a proper base value for calculating the difference from the previous query. Fixes: 2b68d659a704 ("net/mlx5e: TC, support per action stats") Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/mlx5e: Lower maximum allowed MTU in XSK to match XDP prerequisitesAdham Faris
XSK redirecting XDP programs require linearity, hence applies restrictions on the MTU. For PAGE_SIZE=4K, MTU shouldn't exceed 3498. Features that contradict with XDP such HW-LRO and HW-GRO are enforced by the driver in advance, during XSK params validation, except for MTU, which was not enforced before this patch. This has been spotted during test scenario described below: Attaching xdpsock program (PAGE_SIZE=4K), with MTU < 3498, detaching XDP program, changing the MTU to arbitrary value in the range [3499, 3754], attaching XDP program again, which ended up with failure since MTU is > 3498. This commit lowers the XSK MTU limitation to be aligned with XDP MTU limitation, since XSK socket is meaningless without XDP program. Signed-off-by: Adham Faris <afaris@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/mlx5: Set BREAK_FW_WAIT flag first when removing driverShay Drory
Currently, BREAK_FW_WAIT flag is set after syncing with fw_reset. However, fw_reset can call mlx5_load_one() which is waiting for fw init bit and BREAK_FW_WAIT flag is intended to stop. e.g.: the driver might wait on a loop it should exit. Fix it by setting the flag before syncing with fw_reset. Fixes: 8324a02c342a ("net/mlx5: Add exit route when waiting for FW") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/mlx5e: kTLS, Fix missing error unwind on unsupported cipher typeGal Pressman
Do proper error unwinding when adding an unsupported TX/RX cipher type. Move the switch case prior to key creation so there's less to unwind, and change the goto label name to describe the action performed instead of what failed. Fixes: 4960c414db35 ("net/mlx5e: Support 256 bit keys with kTLS device offload") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/mlx5e: Fix cleanup null-ptr deref on encap lockPaul Blakey
During module is unloaded while a peer tc flow is still offloaded, first the peer uplink rep profile is changed to a nic profile, and so neigh encap lock is destroyed. Next during unload, the VF reps netdevs are unregistered which causes the original non-peer tc flow to be deleted, which deletes the peer flow. The peer flow deletion detaches the encap entry and try to take the already destroyed encap lock, causing the below trace. Fix this by clearing peer flows during tc eswitch cleanup (mlx5e_tc_esw_cleanup()). Relevant trace: [ 4316.837128] BUG: kernel NULL pointer dereference, address: 00000000000001d8 [ 4316.842239] RIP: 0010:__mutex_lock+0xb5/0xc40 [ 4316.851897] Call Trace: [ 4316.852481] <TASK> [ 4316.857214] mlx5e_rep_neigh_entry_release+0x93/0x790 [mlx5_core] [ 4316.858258] mlx5e_rep_encap_entry_detach+0xa7/0xf0 [mlx5_core] [ 4316.859134] mlx5e_encap_dealloc+0xa3/0xf0 [mlx5_core] [ 4316.859867] clean_encap_dests.part.0+0x5c/0xe0 [mlx5_core] [ 4316.860605] mlx5e_tc_del_fdb_flow+0x32a/0x810 [mlx5_core] [ 4316.862609] __mlx5e_tc_del_fdb_peer_flow+0x1a2/0x250 [mlx5_core] [ 4316.863394] mlx5e_tc_del_flow+0x(/0x630 [mlx5_core] [ 4316.864090] mlx5e_flow_put+0x5f/0x100 [mlx5_core] [ 4316.864771] mlx5e_delete_flower+0x4de/0xa40 [mlx5_core] [ 4316.865486] tc_setup_cb_reoffload+0x20/0x80 [ 4316.865905] fl_reoffload+0x47c/0x510 [cls_flower] [ 4316.869181] tcf_block_playback_offloads+0x91/0x1d0 [ 4316.869649] tcf_block_unbind+0xe7/0x1b0 [ 4316.870049] tcf_block_offload_cmd.isra.0+0x1ee/0x270 [ 4316.879266] tcf_block_offload_unbind+0x61/0xa0 [ 4316.879711] __tcf_block_put+0xa4/0x310 Fixes: 04de7dda7394 ("net/mlx5e: Infrastructure for duplicated offloading of TC flows") Fixes: 1418ddd96afd ("net/mlx5e: Duplicate offloaded TC eswitch rules under uplink LAG") Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>