summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-05-19net/mlx5e: E-Switch: move debug print of adding mac to correct placeRoi Dayan
Move the debug print inside the if clause that actually does the change. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-19net/mlx5e: E-Switch, Check device is PF when stopping esw offloadsRoi Dayan
Checking sriov is done on the pci device so it can return true on other devices like SF but nothing should be done in this case. Add a check that the device is PF. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-19net/mlx5: Remove redundant vport_group_manager cap checkRoi Dayan
It's enough to check for esw_manager cap for get the esw flow table caps. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-19net/mlx5e: E-Switch, Use metadata for vport matching in send-to-vport rulesRoi Dayan
Like other rules use metadata matching if supported instead of source_port. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-19net/mlx5e: E-Switch, Allow get vport api if esw existsRoi Dayan
We could have an esw manager device which is not a vport group manager. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-19net/mlx5e: E-Switch, Update when to set other vport contextRoi Dayan
Other vport context should be set if vport number is not 0. In case of ECPF, vport 0 represents the host PF representor so also need to set other vport context. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-19net/mlx5e: Remove redundant __func__ arg from fs_err() callsRoi Dayan
fs_err() already logs the function name. remote the arg so the function name will not be logged twice. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-19net/mlx5e: E-Switch, Remove flow_source check for metadata matchingRoi Dayan
There is no reason to check for flow_source cap to allow metadata matching. When flow_source match is being used the flow_source cap is being checked. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-19net/mlx5: E-Switch, Remove redundant checkRoi Dayan
The call to mlx5_eswitch_enable() also does the same check and if E-Switch not supported it returns 0 without any change. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-19net/mlx5: Remove redundant esw multiport validate functionRoi Dayan
The function didn't validate the value and doesn't require value validation as it will always be valid true or false values. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-19Merge branch 'bpf: Show target_{obj,btf}_id for tracing link'Alexei Starovoitov
Yafang Shao says: ==================== The target_btf_id can help us understand which kernel function is linked by a tracing prog. The target_btf_id and target_obj_id have already been exposed to userspace, so we just need to show them. For some other link types like perf_event and kprobe_multi, it is not easy to find which functions are attached either. We may support ->fill_link_info for them in the future. v1->v2: - Skip showing them in the plain output for the old kernels. (Quentin) - Coding improvement. (Andrii) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-05-19bpftool: Show target_{obj,btf}_id in tracing link infoYafang Shao
The target_btf_id can help us understand which kernel function is linked by a tracing prog. The target_btf_id and target_obj_id have already been exposed to userspace, so we just need to show them. The result as follows, $ tools/bpf/bpftool/bpftool link show 2: tracing prog 13 prog_type tracing attach_type trace_fentry target_obj_id 1 target_btf_id 13964 pids trace(10673) $ tools/bpf/bpftool/bpftool link show -j [{"id":2,"type":"tracing","prog_id":13,"prog_type":"tracing","attach_type":"trace_fentry","target_obj_id":1,"target_btf_id":13964,"pids":[{"pid":10673,"comm":"trace"}]}] Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230517103126.68372-3-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-05-19bpf: Show target_{obj,btf}_id in tracing link fdinfoYafang Shao
The target_btf_id can help us understand which kernel function is linked by a tracing prog. The target_btf_id and target_obj_id have already been exposed to userspace, so we just need to show them. The result as follows, $ cat /proc/10673/fdinfo/10 pos: 0 flags: 02000000 mnt_id: 15 ino: 2094 link_type: tracing link_id: 2 prog_tag: a04f5eef06a7f555 prog_id: 13 attach_type: 24 target_obj_id: 1 target_btf_id: 13964 Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230517103126.68372-2-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-05-19bpf: Fix mask generation for 32-bit narrow loads of 64-bit fieldsWill Deacon
A narrow load from a 64-bit context field results in a 64-bit load followed potentially by a 64-bit right-shift and then a bitwise AND operation to extract the relevant data. In the case of a 32-bit access, an immediate mask of 0xffffffff is used to construct a 64-bit BPP_AND operation which then sign-extends the mask value and effectively acts as a glorified no-op. For example: 0: 61 10 00 00 00 00 00 00 r0 = *(u32 *)(r1 + 0) results in the following code generation for a 64-bit field: ldr x7, [x7] // 64-bit load mov x10, #0xffffffffffffffff and x7, x7, x10 Fix the mask generation so that narrow loads always perform a 32-bit AND operation: ldr x7, [x7] // 64-bit load mov w10, #0xffffffff and w7, w7, w10 Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Krzesimir Nowak <krzesimir@kinvolk.io> Cc: Andrey Ignatov <rdna@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Fixes: 31fd85816dbe ("bpf: permits narrower load from bpf program context fields") Signed-off-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230518102528.1341-1-will@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-05-19ice: use src VSI instead of src MAC in slow-pathMichal Swiatkowski
The use of a source MAC to direct packets from the VF to the corresponding port representor is only ok if there is only one MAC on a VF. To support this functionality when the number of MACs on a VF is greater, it is necessary to match a source VSI instead of a source MAC. Let's use the new switch API that allows matching on metadata. If MAC isn't used in match criteria there is no need to handle adding rule after virtchnl command. Instead add new rule while port representor is being configured. Remove rule_added field, checking for sp_rule can be used instead. Remove also checking for switchdev running in deleting rule as it can be called from unroll context when running flag isn't set. Checking for sp_rule covers both context (with and without running flag). Rules are added in eswitch configuration flow, so there is no need to have replay function. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-05-19ice: allow matching on meta dataMichal Swiatkowski
Add meta data matching criteria in the same place as protocol matching criteria. There is no need to add meta data as special words after parsing all lookups. Trade meta data in the same why as other lookups. The one difference between meta data lookups and protocol lookups is that meta data doesn't impact how the packets looks like. Because of that ignore it when filling testing packet. Match on tunnel type meta data always if tunnel type is different than TNL_LAST. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-05-19ice: specify field names in ice_prot_ext initMichal Swiatkowski
Anonymous initializers are now discouraged. Define ICE_PROTCOL_ENTRY macro to rewrite anonymous initializers to named one. No functional changes here. Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-05-19ice: remove redundant Rx field from rule infoMichal Swiatkowski
Information about the direction is currently stored in sw_act.flag. There is no need to duplicate it in another field. Setting direction flag doesn't mean that there is a match criteria for direction in rule. It is only a information for HW from where switch id should be collected (VSI or port). In current implementation of advance rule handling, without matching for direction meta data, we can always set one the same flag and everything will work the same. Ability to match on direction meta data will be added in follow up patches. Recipe 0, 3 and 9 loaded from package has direction match criteria, but they are handled in other function. Move ice_adv_rule_info fields to avoid holes. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-05-19ice: define meta data to match in switchMichal Swiatkowski
Add description for each meta data. Redefine tunnel mask to match only tunneled MAC and tunneled VLAN. It shouldn't try to match other flags (previously it was 0xff, it is redundant). VLAN mask was 0xd000, change it to 0xf000. 4 last bits are flags depending on the same field in packets (VLAN tag). Because of that, It isn't harmful to match also on ITAG. Group all MDID and MDID offsets into enums to keep things organized. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-05-19perf bench syscall: Fix __NR_execve undeclared build errorTiezhu Yang
The __NR_execve definition for i386 was deleted by mistake in the commit ece7f7c0507c ("perf bench syscall: Add fork syscall benchmark"), add it to fix the build error on i386. Fixes: ece7f7c0507cc147 ("perf bench syscall: Add fork syscall benchmark") Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: loongson-kernel@lists.loongnix.cn Closes: https://lore.kernel.org/all/CA+G9fYvgBR1iB0CorM8OC4AM_w_tFzyQKHc+rF6qPzJL=TbfDQ@mail.gmail.com/ Link: https://lore.kernel.org/r/1684480657-2375-1-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-19Merge branch 'pm-tools'Rafael J. Wysocki
Merge cpupower utility fixes for 6.4-rc3: - Read TSC on each CPU right before reading MPERF so as to reduce the potential time difference between the TSC and MPERF accesses and improve the C0 percentage calculation (Wyes Karny). - Fix a possible file handle leak and clean up the code in sysfs_get_enabled() (Hao Zeng). * pm-tools: cpupower: Make TSC read per CPU for Mperf monitor cpupower:Fix resource leaks in sysfs_get_enabled()
2023-05-19Merge tag 'linux-cpupower-6.4-rc3' of ↵Rafael J. Wysocki
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux Pull cpupower utility fixes for 6.4-rc3 from Shuah Khan: "This cpupower fixes update for Linux 67.4-rc3 consists of: - a resource leak fix - fix drift in C0 percentage calculation due to System-wide TSC read. To lower this drift read TSC per CPU and also just after mperf read. This technique improves C0 percentage calculation in Mperf monitor" * tag 'linux-cpupower-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux: cpupower: Make TSC read per CPU for Mperf monitor cpupower:Fix resource leaks in sysfs_get_enabled()
2023-05-19fbdev: omapfb: panel-tpo-td043mtea1: fix error code in probe()Dan Carpenter
This was using the wrong variable, "r", instead of "ddata->vcc_reg", so it returned success instead of a negative error code. Fixes: 0d3dbeb8142a ("video: fbdev: omapfb: panel-tpo-td043mtea1: Make use of the helper function dev_err_probe()") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Helge Deller <deller@gmx.de>
2023-05-19perf test attr: Fix python SafeConfigParser() deprecation warningIan Rogers
Address the warning: ``` tests/attr.py:155: DeprecationWarning: The SafeConfigParser class has been renamed to ConfigParser in Python 3.2. This alias will be removed in Python 3.12. Use ConfigParser directly instead. parser = configparser.SafeConfigParser() ``` by removing the word 'Safe'. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20230517225707.2682235-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-19perf test attr: Update no event/metric expectationsIan Rogers
Previously hard coded events/metrics were used, update for the use of the TopdownL1 json metric group. Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Fixes: 94b1a603fca78388 ("perf stat: Add TopdownL1 metric as a default if present") Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20230517225707.2682235-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-19net: arc: Make arc_emac_remove() return voidUwe Kleine-König
The function returns zero unconditionally. Change it to return void instead which simplifies its callers as error handing becomes unnecessary. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-19driver core: class: properly reference count class_dev_iter()Greg Kroah-Hartman
When class_dev_iter is initialized, the reference count for the subsys private structure is incremented, but never decremented, causing a memory leak over time. To resolve this, save off a pointer to the internal structure into the class_dev_iter structure and then when the iterator is finished, drop the reference count. Reported-and-tested-by: syzbot+e7afd76ad060fa0d2605@syzkaller.appspotmail.com Fixes: 7b884b7f24b4 ("driver core: class.c: convert to only use class_to_subsys") Reported-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr> Cc: Alan Stern <stern@rowland.harvard.edu> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Tested-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr> Link: https://lore.kernel.org/r/2023051610-stove-condense-9a77@gregkh Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-19net: fec: add dma_wmb to ensure correct descriptor valuesShenwei Wang
Two dma_wmb() are added in the XDP TX path to ensure proper ordering of descriptor and buffer updates: 1. A dma_wmb() is added after updating the last BD to make sure the updates to rest of the descriptor are visible before transferring ownership to FEC. 2. A dma_wmb() is also added after updating the bdp to ensure these updates are visible before updating txq->bd.cur. 3. Start the xmit of the frame immediately right after configuring the tx descriptor. Fixes: 6d6b39f180b8 ("net: fec: add initial XDP support") Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com> Reviewed-by: Wei Fang <wei.fang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-19MAINTAINERS: add myself as maintainer for enetcVladimir Oltean
I would like to be copied on new patches submitted on this driver. I am relatively familiar with the code, having practically maintained it for a while. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-19octeontx2-pf: Fix TSOv6 offloadSunil Goutham
HW adds segment size to the payload length in the IPv6 header. Fix payload length to just TCP header length instead of 'TCP header size + IPv6 header size'. Fixes: 86d7476078b8 ("octeontx2-pf: TCP segmentation offload support") Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-19sfc: fix devlink info error handlingAlejandro Lucero
Avoid early devlink info return if errors arise with MCDI commands executed for getting the required info from the device. The rationale is some commands can fail but later ones could still give useful data. Moreover, some nvram partitions could not be present which needs to be handled as a non error. The specific errors are reported through system messages and if any error appears, it will be reported generically through extack. Fixes 14743ddd2495 ("sfc: add devlink info support for ef100") Signed-off-by: Alejandro Lucero <alejandro.lucero-palau@amd.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-19net/smc: Reset connection when trying to use SMCRv2 fails.Wen Gu
We found a crash when using SMCRv2 with 2 Mellanox ConnectX-4. It can be reproduced by: - smc_run nginx - smc_run wrk -t 32 -c 500 -d 30 http://<ip>:<port> BUG: kernel NULL pointer dereference, address: 0000000000000014 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 8000000108713067 P4D 8000000108713067 PUD 151127067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 4 PID: 2441 Comm: kworker/4:249 Kdump: loaded Tainted: G W E 6.4.0-rc1+ #42 Workqueue: smc_hs_wq smc_listen_work [smc] RIP: 0010:smc_clc_send_confirm_accept+0x284/0x580 [smc] RSP: 0018:ffffb8294b2d7c78 EFLAGS: 00010a06 RAX: ffff8f1873238880 RBX: ffffb8294b2d7dc8 RCX: 0000000000000000 RDX: 00000000000000b4 RSI: 0000000000000001 RDI: 0000000000b40c00 RBP: ffffb8294b2d7db8 R08: ffff8f1815c5860c R09: 0000000000000000 R10: 0000000000000400 R11: 0000000000000000 R12: ffff8f1846f56180 R13: ffff8f1815c5860c R14: 0000000000000001 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff8f1aefd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000014 CR3: 00000001027a0001 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? mlx5_ib_map_mr_sg+0xa1/0xd0 [mlx5_ib] ? smcr_buf_map_link+0x24b/0x290 [smc] ? __smc_buf_create+0x4ee/0x9b0 [smc] smc_clc_send_accept+0x4c/0xb0 [smc] smc_listen_work+0x346/0x650 [smc] ? __schedule+0x279/0x820 process_one_work+0x1e5/0x3f0 worker_thread+0x4d/0x2f0 ? __pfx_worker_thread+0x10/0x10 kthread+0xe5/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 </TASK> During the CLC handshake, server sequentially tries available SMCRv2 and SMCRv1 devices in smc_listen_work(). If an SMCRv2 device is found. SMCv2 based link group and link will be assigned to the connection. Then assumed that some buffer assignment errors happen later in the CLC handshake, such as RMB registration failure, server will give up SMCRv2 and try SMCRv1 device instead. But the resources assigned to the connection won't be reset. When server tries SMCRv1 device, the connection creation process will be executed again. Since conn->lnk has been assigned when trying SMCRv2, it will not be set to the correct SMCRv1 link in smcr_lgr_conn_assign_link(). So in such situation, conn->lgr points to correct SMCRv1 link group but conn->lnk points to the SMCRv2 link mistakenly. Then in smc_clc_send_confirm_accept(), conn->rmb_desc->mr[link->link_idx] will be accessed. Since the link->link_idx is not correct, the related MR may not have been initialized, so crash happens. | Try SMCRv2 device first | |-> conn->lgr: assign existed SMCRv2 link group; | |-> conn->link: assign existed SMCRv2 link (link_idx may be 1 in SMC_LGR_SYMMETRIC); | |-> sndbuf & RMB creation fails, quit; | | Try SMCRv1 device then | |-> conn->lgr: create SMCRv1 link group and assign; | |-> conn->link: keep SMCRv2 link mistakenly; | |-> sndbuf & RMB creation succeed, only RMB->mr[link_idx = 0] | initialized. | | Then smc_clc_send_confirm_accept() accesses | conn->rmb_desc->mr[conn->link->link_idx, which is 1], then crash. v This patch tries to fix this by cleaning conn->lnk before assigning link. In addition, it is better to reset the connection and clean the resources assigned if trying SMCRv2 failed in buffer creation or registration. Fixes: e49300a6bf62 ("net/smc: add listen processing for SMC-Rv2") Link: https://lore.kernel.org/r/20220523055056.2078994-1-liuyacan@corp.netease.com/ Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-19selftests: fib_tests: mute cleanup error messagePo-Hsu Lin
In the end of the test, there will be an error message induced by the `ip netns del ns1` command in cleanup() Tests passed: 201 Tests failed: 0 Cannot remove namespace file "/run/netns/ns1": No such file or directory This can even be reproduced with just `./fib_tests.sh -h` as we're calling cleanup() on exit. Redirect the error message to /dev/null to mute it. V2: Update commit message and fixes tag. V3: resubmit due to missing netdev ML in V2 Fixes: b60417a9f2b8 ("selftest: fib_tests: Always cleanup before exit") Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-19net/mlx5e: do as little as possible in napi poll when budget is 0Jakub Kicinski
NAPI gets called with budget of 0 from netpoll, which has interrupts disabled. We should try to free some space on Tx rings and nothing else. Specifically do not try to handle XDP TX or try to refill Rx buffers - we can't use the page pool from IRQ context. Don't check if IRQs moved, either, that makes no sense in netpoll. Netpoll calls _all_ the rings from whatever CPU it happens to be invoked on. In general do as little as possible, the work quickly adds up when there's tens of rings to poll. The immediate stack trace I was seeing is: __do_softirq+0xd1/0x2c0 __local_bh_enable_ip+0xc7/0x120 </IRQ> <TASK> page_pool_put_defragged_page+0x267/0x320 mlx5e_free_xdpsq_desc+0x99/0xd0 mlx5e_poll_xdpsq_cq+0x138/0x3b0 mlx5e_napi_poll+0xc3/0x8b0 netpoll_poll_dev+0xce/0x150 AFAIU page pool takes a BH lock, releases it and since BH is now enabled tries to run softirqs. Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Fixes: 60bbf7eeef10 ("mlx5: use page_pool for xdp_return_frame call") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-19Merge branch 'tls-fixes'David S. Miller
Jakub Kicinski says: ==================== tls: rx: strp: fix inline crypto offload The local strparser version I added to TLS does not preserve decryption status, which breaks inline crypto (NIC offload). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-19tls: rx: strp: don't use GFP_KERNEL in softirq contextJakub Kicinski
When receive buffer is small, or the TCP rx queue looks too complicated to bother using it directly - we allocate a new skb and copy data into it. We already use sk->sk_allocation... but nothing actually sets it to GFP_ATOMIC on the ->sk_data_ready() path. Users of HW offload are far more likely to experience problems due to scheduling while atomic. "Copy mode" is very rarely triggered with SW crypto. Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser") Tested-by: Shai Amiram <samiram@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-19tls: rx: strp: preserve decryption status of skbs when neededJakub Kicinski
When receive buffer is small we try to copy out the data from TCP into a skb maintained by TLS to prevent connection from stalling. Unfortunately if a single record is made up of a mix of decrypted and non-decrypted skbs combining them into a single skb leads to loss of decryption status, resulting in decryption errors or data corruption. Similarly when trying to use TCP receive queue directly we need to make sure that all the skbs within the record have the same status. If we don't the mixed status will be detected correctly but we'll CoW the anchor, again collapsing it into a single paged skb without decrypted status preserved. So the "fixup" code will not know which parts of skb to re-encrypt. Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser") Tested-by: Shai Amiram <samiram@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-19tls: rx: strp: factor out copying skb dataJakub Kicinski
We'll need to copy input skbs individually in the next patch. Factor that code out (without assuming we're copying a full record). Tested-by: Shai Amiram <samiram@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-19tls: rx: strp: fix determining record length in copy modeJakub Kicinski
We call tls_rx_msg_size(skb) before doing skb->len += chunk. So the tls_rx_msg_size() code will see old skb->len, most likely leading to an over-read. Worst case we will over read an entire record, next iteration will try to trim the skb but may end up turning frag len negative or discarding the subsequent record (since we already told TCP we've read it during previous read but now we'll trim it out of the skb). Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser") Tested-by: Shai Amiram <samiram@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-19tls: rx: strp: force mixed decrypted records into copy modeJakub Kicinski
If a record is partially decrypted we'll have to CoW it, anyway, so go into copy mode and allocate a writable skb right away. This will make subsequent fix simpler because we won't have to teach tls_strp_msg_make_copy() how to copy skbs while preserving decrypt status. Tested-by: Shai Amiram <samiram@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-19tls: rx: strp: set the skb->len of detached / CoW'ed skbsJakub Kicinski
alloc_skb_with_frags() fills in page frag sizes but does not set skb->len and skb->data_len. Set those correctly otherwise device offload will most likely generate an empty skb and hit the BUG() at the end of __skb_nsg(). Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser") Tested-by: Shai Amiram <samiram@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-19tls: rx: device: fix checking decryption statusJakub Kicinski
skb->len covers the entire skb, including the frag_list. In fact we're guaranteed that rxm->full_len <= skb->len, so since the change under Fixes we were not checking decrypt status of any skb but the first. Note that the skb_pagelen() added here may feel a bit costly, but it's removed by subsequent fixes, anyway. Reported-by: Tariq Toukan <tariqt@nvidia.com> Fixes: 86b259f6f888 ("tls: rx: device: bound the frag walk") Tested-by: Shai Amiram <samiram@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-18Merge branch 'mptcp-refactor-inet_accept-and-mib-updates'Jakub Kicinski
Mat Martineau says: ==================== mptcp: Refactor inet_accept() and MIB updates Patches 1 and 2 refactor inet_accept() to provide a new __inet_accept() that's usable with locked sockets, and then make use of that helper to simplify mptcp_stream_accept(). Patches 3 and 4 add some new MIBS related to MPTCP address advertisement and update related selftest scripts. Patch 5 modifies the selftests to ensure MIBS are only printed once when a test case fails. ==================== Link: https://lore.kernel.org/r/20230516-send-net-next-20220516-v1-0-e91822b7b6e0@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-18selftests: mptcp: centralize stats dumpingPaolo Abeni
If a test case fails, the mptcp_join.sh script can dump the netns MIBs multiple times, leading to confusing output. Let's dump such info only once per test-case, when needed. This additionally allow removing some code duplication. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-18selftests: mptcp: add explicit check for new mibsPaolo Abeni
Instead of duplicating the all existing TX check with the TX side, add the new ones on selected test cases. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-18mptcp: introduces more address related mibsPaolo Abeni
Currently we don't track explicitly a few events related to address management suboption handling; this patch adds new mibs for ADD_ADDR and RM_ADDR options tx and for missed tx events due to internal storage exhaustion. The self-tests must be updated to properly handle different mibs with the same/shared prefix. Additionally removes a couple of warning tracking the loss event. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/378 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-18mptcp: refactor mptcp_stream_accept()Paolo Abeni
Rewrite the mptcp socket accept op, leveraging the new __inet_accept() helper. This way we can avoid acquiring the new socket lock twice and we can avoid a couple of indirect calls. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-18inet: factor out locked section of inet_accept() in a new helperPaolo Abeni
No functional changes intended. The new helper will be used by the MPTCP protocol in the next patch to avoid duplicating a few LoC. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-18Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-05-17 (ice, MAINTAINERS) This series contains updates to ice driver and MAINTAINERS file. Paul refactors PHY to link mode reporting and updates some PHY types to report more accurate link modes for ice. Dave removes mutual exclusion policy between LAG and SR-IOV in ice driver. Jesse updates link for Intel Wired LAN in the MAINTAINERS file. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: MAINTAINERS: update Intel Ethernet links ice: Remove LAG+SRIOV mutual exclusion ice: update PHY type to ethtool link mode mapping ice: refactor PHY type to ethtool link mode ice: update ICE_PHY_TYPE_HIGH_MAX_INDEX ==================== Link: https://lore.kernel.org/r/20230517165530.3179965-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-18net: cdc_ncm: Deal with too low values of dwNtbOutMaxSizeTudor Ambarus
Currently in cdc_ncm_check_tx_max(), if dwNtbOutMaxSize is lower than the calculated "min" value, but greater than zero, the logic sets tx_max to dwNtbOutMaxSize. This is then used to allocate a new SKB in cdc_ncm_fill_tx_frame() where all the data is handled. For small values of dwNtbOutMaxSize the memory allocated during alloc_skb(dwNtbOutMaxSize, GFP_ATOMIC) will have the same size, due to how size is aligned at alloc time: size = SKB_DATA_ALIGN(size); size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); Thus we hit the same bug that we tried to squash with commit 2be6d4d16a084 ("net: cdc_ncm: Allow for dwNtbOutMaxSize to be unset or zero") Low values of dwNtbOutMaxSize do not cause an issue presently because at alloc_skb() time more memory (512b) is allocated than required for the SKB headers alone (320b), leaving some space (512b - 320b = 192b) for CDC data (172b). However, if more elements (for example 3 x u64 = [24b]) were added to one of the SKB header structs, say 'struct skb_shared_info', increasing its original size (320b [320b aligned]) to something larger (344b [384b aligned]), then suddenly the CDC data (172b) no longer fits in the spare SKB data area (512b - 384b = 128b). Consequently the SKB bounds checking semantics fails and panics: skbuff: skb_over_panic: text:ffffffff831f755b len:184 put:172 head:ffff88811f1c6c00 data:ffff88811f1c6c00 tail:0xb8 end:0x80 dev:<NULL> ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:113! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 57 Comm: kworker/0:2 Not tainted 5.15.106-syzkaller-00249-g19c0ed55a470 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/14/2023 Workqueue: mld mld_ifc_work RIP: 0010:skb_panic net/core/skbuff.c:113 [inline] RIP: 0010:skb_over_panic+0x14c/0x150 net/core/skbuff.c:118 [snip] Call Trace: <TASK> skb_put+0x151/0x210 net/core/skbuff.c:2047 skb_put_zero include/linux/skbuff.h:2422 [inline] cdc_ncm_ndp16 drivers/net/usb/cdc_ncm.c:1131 [inline] cdc_ncm_fill_tx_frame+0x11ab/0x3da0 drivers/net/usb/cdc_ncm.c:1308 cdc_ncm_tx_fixup+0xa3/0x100 Deal with too low values of dwNtbOutMaxSize, clamp it in the range [USB_CDC_NCM_NTB_MIN_OUT_SIZE, CDC_NCM_NTB_MAX_SIZE_TX]. We ensure enough data space is allocated to handle CDC data by making sure dwNtbOutMaxSize is not smaller than USB_CDC_NCM_NTB_MIN_OUT_SIZE. Fixes: 289507d3364f ("net: cdc_ncm: use sysfs for rx/tx aggregation tuning") Cc: stable@vger.kernel.org Reported-by: syzbot+9f575a1f15fc0c01ed69@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=b982f1059506db48409d Link: https://lore.kernel.org/all/20211202143437.1411410-1-lee.jones@linaro.org/ Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230517133808.1873695-2-tudor.ambarus@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>