summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-07-27sfc: disable softirqs for ptp TXAlejandro Lucero
Sending a PTP packet can imply to use the normal TX driver datapath but invoked from the driver's ptp worker. The kernel generic TX code disables softirqs and preemption before calling specific driver TX code, but the ptp worker does not. Although current ptp driver functionality does not require it, there are several reasons for doing so: 1) The invoked code is always executed with softirqs disabled for non PTP packets. 2) Better if a ptp packet transmission is not interrupted by softirq handling which could lead to high latencies. 3) netdev_xmit_more used by the TX code requires preemption to be disabled. Indeed a solution for dealing with kernel preemption state based on static kernel configuration is not possible since the introduction of dynamic preemption level configuration at boot time using the static calls functionality. Fixes: f79c957a0b537 ("drivers: net: sfc: use netdev_xmit_more helper") Signed-off-by: Alejandro Lucero <alejandro.lucero-palau@amd.com> Link: https://lore.kernel.org/r/20220726064504.49613-1-alejandro.lucero-palau@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-27ptp: ocp: Select CRC16 in the Kconfig.Jonathan Lemon
The crc16() function is used to check the firmware validity, but the library was not explicitly selected. Fixes: 3c3673bde50c ("ptp: ocp: Add firmware header checks") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Vadim Fedorenko <vadfed@fb.com> Link: https://lore.kernel.org/r/20220726220604.1339972-1-jonathan.lemon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-27clk: sunxi-ng: Fix H6 RTC clock definitionJernej Skrabec
While RTC clock was added in H616 ccu_common list, it was not in H6 list. That caused invalid pointer dereference like this: Unable to handle kernel NULL pointer dereference at virtual address 000000000000020c Mem abort info: ESR = 0x96000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=000000004d574000 [000000000000020c] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 96000004 [#1] PREEMPT SMP CPU: 3 PID: 339 Comm: cat Tainted: G B 5.18.0-rc1+ #1352 Hardware name: Tanix TX6 (DT) pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : ccu_gate_is_enabled+0x48/0x74 lr : ccu_gate_is_enabled+0x40/0x74 sp : ffff80000c0b76d0 x29: ffff80000c0b76d0 x28: 00000000016e3600 x27: 0000000000000000 x26: 0000000000000000 x25: 0000000000000002 x24: ffff00000952fe08 x23: ffff800009611400 x22: ffff00000952fe79 x21: 0000000000000000 x20: 0000000000000001 x19: ffff80000aad6f08 x18: 0000000000000000 x17: 2d2d2d2d2d2d2d2d x16: 2d2d2d2d2d2d2d2d x15: 2d2d2d2d2d2d2d2d x14: 0000000000000000 x13: 00000000f2f2f2f2 x12: ffff700001816e89 x11: 1ffff00001816e88 x10: ffff700001816e88 x9 : dfff800000000000 x8 : ffff80000c0b7447 x7 : 0000000000000001 x6 : ffff700001816e88 x5 : ffff80000c0b7440 x4 : 0000000000000001 x3 : ffff800008935c50 x2 : dfff800000000000 x1 : 0000000000000000 x0 : 000000000000020c Call trace: ccu_gate_is_enabled+0x48/0x74 clk_core_is_enabled+0x7c/0x1c0 clk_summary_show_subtree+0x1dc/0x334 clk_summary_show_subtree+0x250/0x334 clk_summary_show_subtree+0x250/0x334 clk_summary_show_subtree+0x250/0x334 clk_summary_show_subtree+0x250/0x334 clk_summary_show+0x90/0xdc seq_read_iter+0x248/0x6d4 seq_read+0x17c/0x1fc full_proxy_read+0x90/0xf0 vfs_read+0xdc/0x28c ksys_read+0xc8/0x174 __arm64_sys_read+0x44/0x5c invoke_syscall+0x60/0x190 el0_svc_common.constprop.0+0x7c/0x160 do_el0_svc+0x38/0xa0 el0_svc+0x68/0x160 el0t_64_sync_handler+0x10c/0x140 el0t_64_sync+0x18c/0x190 Code: d1006260 97e5c981 785e8260 8b0002a0 (b9400000) ---[ end trace 0000000000000000 ]--- Fix that by adding rtc clock to H6 ccu_common list too. Fixes: 38d321b61bda ("clk: sunxi-ng: h6-r: Add RTC gate clock") Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com> Link: https://lore.kernel.org/r/20220719183725.2605141-1-jernej.skrabec@gmail.com Reviewed-by: Samuel Holland <samuel@sholland.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2022-07-27tcp: md5: fix IPv4-mapped supportEric Dumazet
After the blamed commit, IPv4 SYN packets handled by a dual stack IPv6 socket are dropped, even if perfectly valid. $ nstat | grep MD5 TcpExtTCPMD5Failure 5 0.0 For a dual stack listener, an incoming IPv4 SYN packet would call tcp_inbound_md5_hash() with @family == AF_INET, while tp->af_specific is pointing to tcp_sock_ipv6_specific. Only later when an IPv4-mapped child is created, tp->af_specific is changed to tcp_sock_ipv6_mapped_specific. Fixes: 7bbb765b7349 ("net/tcp: Merge TCP-MD5 inbound callbacks") Reported-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Dmitry Safonov <dima@arista.com> Tested-by: Leonard Crestez <cdleonard@gmail.com> Link: https://lore.kernel.org/r/20220726115743.2759832-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-27ARM: 9216/1: Fix MAX_DMA_ADDRESS overflowFlorian Fainelli
Commit 26f09e9b3a06 ("mm/memblock: add memblock memory allocation apis") added a check to determine whether arm_dma_zone_size is exceeding the amount of kernel virtual address space available between the upper 4GB virtual address limit and PAGE_OFFSET in order to provide a suitable definition of MAX_DMA_ADDRESS that should fit within the 32-bit virtual address space. The quantity used for comparison was off by a missing trailing 0, leading to MAX_DMA_ADDRESS to be overflowing a 32-bit quantity. This was caught thanks to CONFIG_DEBUG_VIRTUAL on the bcm2711 platform where we define a dma_zone_size of 1GB and we have a PAGE_OFFSET value of 0xc000_0000 (CONFIG_VMSPLIT_3G) leading to MAX_DMA_ADDRESS being 0x1_0000_0000 which overflows the unsigned long type used throughout __pa() and then __virt_addr_valid(). Because the virtual address passed to __virt_addr_valid() would now be 0, the function would loudly warn and flood the kernel log, thus making the platform unable to boot properly. Fixes: 26f09e9b3a06 ("mm/memblock: add memblock memory allocation apis") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-07-27Merge tag 'asm-generic-fixes-5.19-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic fixes from Arnd Bergmann: "Two more bug fixes for asm-generic, one addressing an incorrect Kconfig symbol reference and another one fixing a build failure for the perf tool on mips and possibly others" * tag 'asm-generic-fixes-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: asm-generic: remove a broken and needless ifdef conditional tools: Fixed MIPS builds due to struct flock re-definition
2022-07-27Merge tag 'soc-fixes-5.19-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "One last set of changes for the soc tree: - fix clock frequency on lan966x - fix incorrect GPIO numbers on some pxa machines - update Baolin's email address" * tag 'soc-fixes-5.19-4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: ARM: pxa2xx: Fix GPIO descriptor tables mailmap: update Baolin Wang's email ARM: dts: lan966x: fix sys_clk frequency
2022-07-27Revert "x86/sev: Expose sev_es_ghcb_hv_call() for use by HyperV"Borislav Petkov
This reverts commit 007faec014cb5d26983c1f86fd08c6539b41392e. Now that hyperv does its own protocol negotiation: 49d6a3c062a1 ("x86/Hyper-V: Add SEV negotiate protocol support in Isolation VM") revert this exposure of the sev_es_ghcb_hv_call() helper. Cc: Wei Liu <wei.liu@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by:Tianyu Lan <tiala@microsoft.com> Link: https://lore.kernel.org/r/20220614014553.1915929-1-ltykernel@gmail.com
2022-07-27x86/configs: Update configs in x86_debug.configLukas Bulwahn
Commit 4675ff05de2d ("kmemcheck: rip it out") removed kmemcheck and its corresponding build config KMEMCHECK. Commit 0f620cefd775 ("objtool: Rename "VMLINUX_VALIDATION" -> "NOINSTR_VALIDATION"") renamed the debug config option. Adjust x86_debug.config to those changes in debug configs. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220722121815.27535-1-lukas.bulwahn@gmail.com
2022-07-27Merge tag 'nvme-5.19-2022-07-27' of git://git.infradead.org/nvme into block-5.19Jens Axboe
Pull NVMe fix from Christoph: "nvme fix for Linux 5.19 - yet another duplicate ID quirk (Tobias Gruetzmacher)" * tag 'nvme-5.19-2022-07-27' of git://git.infradead.org/nvme: nvme-pci: Crucial P2 has bogus namespace ids
2022-07-27perf bpf: Remove undefined behavior from bpf_perf_object__next()Ian Rogers
bpf_perf_object__next() folded the last element in the list test with the empty list test. However, this meant that offsets were computed against null and that a struct list_head was compared against a 'struct bpf_perf_object'. Working around this with clang's undefined behavior sanitizer required -fno-sanitize=null and -fno-sanitize=object-size. Remove the undefined behavior by using the regular Linux list APIs and handling the starting case separately from the end testing case. Looking at uses like bpf_perf_object__for_each(), as the constant NULL or non-NULL argument can be constant propagated, the code is no less efficient. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Christy Lee <christylee@fb.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Rix <trix@redhat.com> Cc: bpf@vger.kernel.org Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20220726220921.2567761-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-27perf symbol: Skip symbols if SHF_ALLOC flag is not setLeo Yan
Some symbols are observed with the 'st_value' field zeroed. E.g. libc.so.6 in Ubuntu contains a symbol '__evoke_link_warning_getwd' which resides in the '.gnu.warning.getwd' section. Unlike normal sections, such kind of sections are used for linker warning when a file calls deprecated functions, but they are not part of memory images, the symbols in these sections should be dropped. This patch checks the section attribute SHF_ALLOC bit, if the bit is not set, it skips symbols to avoid spurious ones. Suggested-by: Fangrui Song <maskray@google.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Chang Rui <changruinj@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220724060013.171050-3-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-27perf symbol: Correct address for bss symbolsLeo Yan
When using 'perf mem' and 'perf c2c', an issue is observed that tool reports the wrong offset for global data symbols. This is a common issue on both x86 and Arm64 platforms. Let's see an example, for a test program, below is the disassembly for its .bss section which is dumped with objdump: ... Disassembly of section .bss: 0000000000004040 <completed.0>: ... 0000000000004080 <buf1>: ... 00000000000040c0 <buf2>: ... 0000000000004100 <thread>: ... First we used 'perf mem record' to run the test program and then used 'perf --debug verbose=4 mem report' to observe what's the symbol info for 'buf1' and 'buf2' structures. # ./perf mem record -e ldlat-loads,ldlat-stores -- false_sharing.exe 8 # ./perf --debug verbose=4 mem report ... dso__load_sym_internal: adjusting symbol: st_value: 0x40c0 sh_addr: 0x4040 sh_offset: 0x3028 symbol__new: buf2 0x30a8-0x30e8 ... dso__load_sym_internal: adjusting symbol: st_value: 0x4080 sh_addr: 0x4040 sh_offset: 0x3028 symbol__new: buf1 0x3068-0x30a8 ... The perf tool relies on libelf to parse symbols, in executable and shared object files, 'st_value' holds a virtual address; 'sh_addr' is the address at which section's first byte should reside in memory, and 'sh_offset' is the byte offset from the beginning of the file to the first byte in the section. The perf tool uses below formula to convert a symbol's memory address to a file address: file_address = st_value - sh_addr + sh_offset ^ ` Memory address We can see the final adjusted address ranges for buf1 and buf2 are [0x30a8-0x30e8) and [0x3068-0x30a8) respectively, apparently this is incorrect, in the code, the structure for 'buf1' and 'buf2' specifies compiler attribute with 64-byte alignment. The problem happens for 'sh_offset', libelf returns it as 0x3028 which is not 64-byte aligned, combining with disassembly, it's likely libelf doesn't respect the alignment for .bss section, therefore, it doesn't return the aligned value for 'sh_offset'. Suggested by Fangrui Song, ELF file contains program header which contains PT_LOAD segments, the fields p_vaddr and p_offset in PT_LOAD segments contain the execution info. A better choice for converting memory address to file address is using the formula: file_address = st_value - p_vaddr + p_offset This patch introduces elf_read_program_header() which returns the program header based on the passed 'st_value', then it uses the formula above to calculate the symbol file address; and the debugging log is updated respectively. After applying the change: # ./perf --debug verbose=4 mem report ... dso__load_sym_internal: adjusting symbol: st_value: 0x40c0 p_vaddr: 0x3d28 p_offset: 0x2d28 symbol__new: buf2 0x30c0-0x3100 ... dso__load_sym_internal: adjusting symbol: st_value: 0x4080 p_vaddr: 0x3d28 p_offset: 0x2d28 symbol__new: buf1 0x3080-0x30c0 ... Fixes: f17e04afaff84b5c ("perf report: Fix ELF symbol parsing") Reported-by: Chang Rui <changruinj@gmail.com> Suggested-by: Fangrui Song <maskray@google.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220724060013.171050-2-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-27perf scripts python: Let script to be python2 compliantLeo Yan
The mainline kernel can be used for relative old distros, e.g. RHEL 7. The distro doesn't upgrade from python2 to python3, this causes the building error that the python script is not python2 compliant. To fix the building failure, this patch changes from the python f-string format to traditional string format. Fixes: 12fdd6c009da0d02 ("perf scripts python: Support Arm CoreSight trace data disassembly") Reported-by: Akemi Yagi <toracat@elrepo.org> Signed-off-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: ElRepo <contact@elrepo.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220725104220.1106663-1-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-27tools headers cpufeatures: Sync with the kernel sourcesArnaldo Carvalho de Melo
To pick the changes from: 28a99e95f55c6185 ("x86/amd: Use IBPB for firmware calls") This only causes these perf files to be rebuilt: CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o And addresses this perf build warning: Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h' diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org Link: https://lore.kernel.org/lkml/Yt6oWce9UDAmBAtX@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-27Merge remote-tracking branch 'torvalds/master' into perf/coreArnaldo Carvalho de Melo
To get upstream fixes. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-27virtio-net: fix the race between refill work and closeJason Wang
We try using cancel_delayed_work_sync() to prevent the work from enabling NAPI. This is insufficient since we don't disable the source of the refill work scheduling. This means an NAPI poll callback after cancel_delayed_work_sync() can schedule the refill work then can re-enable the NAPI that leads to use-after-free [1]. Since the work can enable NAPI, we can't simply disable NAPI before calling cancel_delayed_work_sync(). So fix this by introducing a dedicated boolean to control whether or not the work could be scheduled from NAPI. [1] ================================================================== BUG: KASAN: use-after-free in refill_work+0x43/0xd4 Read of size 2 at addr ffff88810562c92e by task kworker/2:1/42 CPU: 2 PID: 42 Comm: kworker/2:1 Not tainted 5.19.0-rc1+ #480 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: events refill_work Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report.cold+0xbb/0x6ac ? _printk+0xad/0xde ? refill_work+0x43/0xd4 kasan_report+0xa8/0x130 ? refill_work+0x43/0xd4 refill_work+0x43/0xd4 process_one_work+0x43d/0x780 worker_thread+0x2a0/0x6f0 ? process_one_work+0x780/0x780 kthread+0x167/0x1a0 ? kthread_exit+0x50/0x50 ret_from_fork+0x22/0x30 </TASK> ... Fixes: b2baed69e605c ("virtio_net: set/cancel work on ndo_open/ndo_stop") Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-27EDAC/ghes: Set the DIMM label unconditionallyToshi Kani
The commit cb51a371d08e ("EDAC/ghes: Setup DIMM label from DMI and use it in error reports") enforced that both the bank and device strings passed to dimm_setup_label() are not NULL. However, there are BIOSes, for example on a HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 03/15/2019 which don't populate both strings: Handle 0x0020, DMI type 17, 84 bytes Memory Device Array Handle: 0x0013 Error Information Handle: Not Provided Total Width: 72 bits Data Width: 64 bits Size: 32 GB Form Factor: DIMM Set: None Locator: PROC 1 DIMM 1 <===== device Bank Locator: Not Specified <===== bank This results in a buffer overflow because ghes_edac_register() calls strlen() on an uninitialized label, which had non-zero values left over from krealloc_array(): detected buffer overflow in __fortify_strlen ------------[ cut here ]------------ kernel BUG at lib/string_helpers.c:983! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 1 Comm: swapper/0 Tainted: G I 5.18.6-200.fc36.x86_64 #1 Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 03/15/2019 RIP: 0010:fortify_panic ... Call Trace: <TASK> ghes_edac_register.cold ghes_probe platform_probe really_probe __driver_probe_device driver_probe_device __driver_attach ? __device_attach_driver bus_for_each_dev bus_add_driver driver_register acpi_ghes_init acpi_init ? acpi_sleep_proc_init do_one_initcall The label contains garbage because the commit in Fixes reallocs the DIMMs array while scanning the system but doesn't clear the newly allocated memory. Change dimm_setup_label() to always initialize the label to fix the issue. Set it to the empty string in case BIOS does not provide both bank and device so that ghes_edac_register() can keep the default label given by edac_mc_alloc_dimms(). [ bp: Rewrite commit message. ] Fixes: b9cae27728d1f ("EDAC/ghes: Scan the system once on driver init") Co-developed-by: Robert Richter <rric@kernel.org> Signed-off-by: Robert Richter <rric@kernel.org> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Robert Elliott <elliott@hpe.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220719220124.760359-1-toshi.kani@hpe.com
2022-07-26mptcp: Do not return EINPROGRESS when subflow creation succeedsMat Martineau
New subflows are created within the kernel using O_NONBLOCK, so EINPROGRESS is the expected return value from kernel_connect(). __mptcp_subflow_connect() has the correct logic to consider EINPROGRESS to be a successful case, but it has also used that error code as its return value. Before v5.19 this was benign: all the callers ignored the return value. Starting in v5.19 there is a MPTCP_PM_CMD_SUBFLOW_CREATE generic netlink command that does use the return value, so the EINPROGRESS gets propagated to userspace. Make __mptcp_subflow_connect() always return 0 on success instead. Fixes: ec3edaa7ca6c ("mptcp: Add handling of outgoing MP_JOIN requests") Fixes: 702c2f646d42 ("mptcp: netlink: allow userspace-driven subflow establishment") Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Link: https://lore.kernel.org/r/20220725205231.87529-1-mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski
Florian Westphal says: ==================== netfilter updates for net Three late fixes for netfilter: 1) If nf_queue user requests packet truncation below size of l3 header, we corrupt the skb, then crash. Reject such requests. 2) add cond_resched() calls when doing cycle detection in the nf_tables graph. This avoids softlockup warning with certain rulesets. 3) Reject rulesets that use nftables 'queue' expression in family/chain combinations other than those that are supported. Currently the ruleset will load, but when userspace attempts to reinject you get WARN splat + packet drops. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nft_queue: only allow supported familes and hooks netfilter: nf_tables: add rescheduling points during loop detection walks netfilter: nf_queue: do not allow packet truncation below transport header offset ==================== Link: https://lore.kernel.org/r/20220726192056.13497-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26Merge tag 'for-net-2022-07-26' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - Fix early wakeup after suspend - Fix double free on error - Fix use-after-free on l2cap_chan_put * tag 'for-net-2022-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put Bluetooth: Always set event mask on suspend Bluetooth: mgmt: Fix double free on error path ==================== Link: https://lore.kernel.org/r/20220726221328.423714-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26Merge tag 'mm-hotfixes-stable-2022-07-26' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "Thirteen hotfixes. Eight are cc:stable and the remainder are for post-5.18 issues or are too minor to warrant backporting" * tag 'mm-hotfixes-stable-2022-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mailmap: update Gao Xiang's email addresses userfaultfd: provide properly masked address for huge-pages Revert "ocfs2: mount shared volume without ha stack" hugetlb: fix memoryleak in hugetlb_mcopy_atomic_pte fs: sendfile handles O_NONBLOCK of out_fd ntfs: fix use-after-free in ntfs_ucsncmp() secretmem: fix unhandled fault in truncate mm/hugetlb: separate path for hwpoison entry in copy_hugetlb_page_range() mm: fix missing wake-up event for FSDAX pages mm: fix page leak with multiple threads mapping the same page mailmap: update Seth Forshee's email address tmpfs: fix the issue that the mount and remount results are inconsistent. mm: kfence: apply kmemleak_ignore_phys on early allocated pool
2022-07-26scsi: ufs: core: Fix a race condition related to device managementBart Van Assche
If a device management command completion happens after wait_for_completion_timeout() times out and before ufshcd_clear_cmds() is called, then the completion code may crash on the complete() call in __ufshcd_transfer_req_compl(). Fix the following crash: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 Call trace: complete+0x64/0x178 __ufshcd_transfer_req_compl+0x30c/0x9c0 ufshcd_poll+0xf0/0x208 ufshcd_sl_intr+0xb8/0xf0 ufshcd_intr+0x168/0x2f4 __handle_irq_event_percpu+0xa0/0x30c handle_irq_event+0x84/0x178 handle_fasteoi_irq+0x150/0x2e8 __handle_domain_irq+0x114/0x1e4 gic_handle_irq.31846+0x58/0x300 el1_irq+0xe4/0x1c0 efi_header_end+0x110/0x680 __irq_exit_rcu+0x108/0x124 __handle_domain_irq+0x118/0x1e4 gic_handle_irq.31846+0x58/0x300 el1_irq+0xe4/0x1c0 cpuidle_enter_state+0x3ac/0x8c4 do_idle+0x2fc/0x55c cpu_startup_entry+0x84/0x90 kernel_init+0x0/0x310 start_kernel+0x0/0x608 start_kernel+0x4ec/0x608 Link: https://lore.kernel.org/r/20220720170228.1598842-1-bvanassche@acm.org Fixes: 5a0b0cb9bee7 ("[SCSI] ufs: Add support for sending NOP OUT UPIU") Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Avri Altman <avri.altman@wdc.com> Cc: Bean Huo <beanhuo@micron.com> Cc: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-07-26scsi: core: Fix warning in scsi_alloc_sgtables()Jason Yan
As explained in SG_IO howto[1]: "If iovec_count is non-zero then 'dxfer_len' should be equal to the sum of iov_len lengths. If not, the minimum of the two is the transfer length." When iovec_count is non-zero and dxfer_len is zero, the sg_io() just genarated a null bio, and finally caused a warning below. To fix it, skip generating a bio for this request if dxfer_len is zero. [1] https://tldp.org/HOWTO/SCSI-Generic-HOWTO/x198.html WARNING: CPU: 2 PID: 3643 at drivers/scsi/scsi_lib.c:1032 scsi_alloc_sgtables+0xc7d/0xf70 drivers/scsi/scsi_lib.c:1032 Modules linked in: CPU: 2 PID: 3643 Comm: syz-executor397 Not tainted 5.17.0-rc3-syzkaller-00316-gb81b1829e7e3 #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-204/01/2014 RIP: 0010:scsi_alloc_sgtables+0xc7d/0xf70 drivers/scsi/scsi_lib.c:1032 Code: e7 fc 31 ff 44 89 f6 e8 c1 4e e7 fc 45 85 f6 0f 84 1a f5 ff ff e8 93 4c e7 fc 83 c5 01 0f b7 ed e9 0f f5 ff ff e8 83 4c e7 fc <0f> 0b 41 bc 0a 00 00 00 e9 2b fb ff ff 41 bc 09 00 00 00 e9 20 fb RSP: 0018:ffffc90000d07558 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff88801bfc96a0 RCX: 0000000000000000 RDX: ffff88801c876000 RSI: ffffffff849060bd RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff849055b9 R11: 0000000000000000 R12: ffff888012b8c000 R13: ffff88801bfc9580 R14: 0000000000000000 R15: ffff88801432c000 FS: 00007effdec8e700(0000) GS:ffff88802cc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007effdec6d718 CR3: 00000000206d6000 CR4: 0000000000150ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> scsi_setup_scsi_cmnd drivers/scsi/scsi_lib.c:1219 [inline] scsi_prepare_cmd drivers/scsi/scsi_lib.c:1614 [inline] scsi_queue_rq+0x283e/0x3630 drivers/scsi/scsi_lib.c:1730 blk_mq_dispatch_rq_list+0x6ea/0x22e0 block/blk-mq.c:1851 __blk_mq_sched_dispatch_requests+0x20b/0x410 block/blk-mq-sched.c:299 blk_mq_sched_dispatch_requests+0xfb/0x180 block/blk-mq-sched.c:332 __blk_mq_run_hw_queue+0xf9/0x350 block/blk-mq.c:1968 __blk_mq_delay_run_hw_queue+0x5b6/0x6c0 block/blk-mq.c:2045 blk_mq_run_hw_queue+0x30f/0x480 block/blk-mq.c:2096 blk_mq_sched_insert_request+0x340/0x440 block/blk-mq-sched.c:451 blk_execute_rq+0xcc/0x340 block/blk-mq.c:1231 sg_io+0x67c/0x1210 drivers/scsi/scsi_ioctl.c:485 scsi_ioctl_sg_io drivers/scsi/scsi_ioctl.c:866 [inline] scsi_ioctl+0xa66/0x1560 drivers/scsi/scsi_ioctl.c:921 sd_ioctl+0x199/0x2a0 drivers/scsi/sd.c:1576 blkdev_ioctl+0x37a/0x800 block/ioctl.c:588 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:874 [inline] __se_sys_ioctl fs/ioctl.c:860 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7effdecdc5d9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 81 14 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007effdec8e2f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007effded664c0 RCX: 00007effdecdc5d9 RDX: 0000000020002300 RSI: 0000000000002285 RDI: 0000000000000004 RBP: 00007effded34034 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 R13: 00007effded34054 R14: 2f30656c69662f2e R15: 00007effded664c8 Link: https://lore.kernel.org/r/20220720025120.3226770-1-yanaijie@huawei.com Fixes: 25636e282fe9 ("block: fix SG_IO vector request data length handling") Reported-by: syzbot+d44b35ecfb807e5af0b5@syzkaller.appspotmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-07-26scsi: ufs: host: Hold reference returned by of_parse_phandle()Liang He
In ufshcd_populate_vreg(), we should hold the reference returned by of_parse_phandle() and then use it to call of_node_put() for refcount balance. Link: https://lore.kernel.org/r/20220719071529.1081166-1-windhl@126.com Fixes: aa4976130934 ("ufs: Add regulator enable support") Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Liang He <windhl@126.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-07-26scsi: mpt3sas: Stop fw fault watchdog work item during system shutdownDavid Jeffery
During system shutdown or reboot, mpt3sas will reset the firmware back to ready state. However, the driver leaves running a watchdog work item intended to keep the firmware in operational state. This causes a second, unneeded reset on shutdown and moves the firmware back to operational instead of in ready state as intended. And if the mpt3sas_fwfault_debug module parameter is set, this extra reset also panics the system. mpt3sas's scsih_shutdown needs to stop the watchdog before resetting the firmware back to ready state. Link: https://lore.kernel.org/r/20220722142448.6289-1-djeffery@redhat.com Fixes: fae21608c31c ("scsi: mpt3sas: Transition IOC to Ready state during shutdown") Tested-by: Laurence Oberman <loberman@redhat.com> Acked-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: David Jeffery <djeffery@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-07-26mailmap: update Gao Xiang's email addressesGao Xiang
I've been in Alibaba Cloud for more than one year, mainly to address cloud-native challenges (such as high-performance container images) for open source communities. Update my email addresses on behalf of my current employer (Alibaba Cloud) to support all my (team) work in this area. Also add an outdated @redhat.com address of me. Link: https://lkml.kernel.org/r/20220719154246.62970-1-xiang@kernel.org Signed-off-by: Gao Xiang <xiang@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-26userfaultfd: provide properly masked address for huge-pagesNadav Amit
Commit 824ddc601adc ("userfaultfd: provide unmasked address on page-fault") was introduced to fix an old bug, in which the offset in the address of a page-fault was masked. Concerns were raised - although were never backed by actual code - that some userspace code might break because the bug has been around for quite a while. To address these concerns a new flag was introduced, and only when this flag is set by the user, userfaultfd provides the exact address of the page-fault. The commit however had a bug, and if the flag is unset, the offset was always masked based on a base-page granularity. Yet, for huge-pages, the behavior prior to the commit was that the address is masked to the huge-page granulrity. While there are no reports on real breakage, fix this issue. If the flag is unset, use the address with the masking that was done before. Link: https://lkml.kernel.org/r/20220711165906.2682-1-namit@vmware.com Fixes: 824ddc601adc ("userfaultfd: provide unmasked address on page-fault") Signed-off-by: Nadav Amit <namit@vmware.com> Reported-by: James Houghton <jthoughton@google.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: James Houghton <jthoughton@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-26Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_putLuiz Augusto von Dentz
This fixes the following trace which is caused by hci_rx_work starting up *after* the final channel reference has been put() during sock_close() but *before* the references to the channel have been destroyed, so instead the code now rely on kref_get_unless_zero/l2cap_chan_hold_unless_zero to prevent referencing a channel that is about to be destroyed. refcount_t: increment on 0; use-after-free. BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0 Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705 CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S W 4.14.234-00003-g1fb6d0bd49a4-dirty #28 Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM sm8150 Flame DVT (DT) Workqueue: hci0 hci_rx_work Call trace: dump_backtrace+0x0/0x378 show_stack+0x20/0x2c dump_stack+0x124/0x148 print_address_description+0x80/0x2e8 __kasan_report+0x168/0x188 kasan_report+0x10/0x18 __asan_load4+0x84/0x8c refcount_dec_and_test+0x20/0xd0 l2cap_chan_put+0x48/0x12c l2cap_recv_frame+0x4770/0x6550 l2cap_recv_acldata+0x44c/0x7a4 hci_acldata_packet+0x100/0x188 hci_rx_work+0x178/0x23c process_one_work+0x35c/0x95c worker_thread+0x4cc/0x960 kthread+0x1a8/0x1c4 ret_from_fork+0x10/0x18 Cc: stable@kernel.org Reported-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Tested-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-26Bluetooth: Always set event mask on suspendAbhishek Pandit-Subedi
When suspending, always set the event mask once disconnects are successful. Otherwise, if wakeup is disallowed, the event mask is not set before suspend continues and can result in an early wakeup. Fixes: 182ee45da083 ("Bluetooth: hci_sync: Rework hci_suspend_notifier") Cc: stable@vger.kernel.org Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-26Bluetooth: mgmt: Fix double free on error pathDan Carpenter
Don't call mgmt_pending_remove() twice (double free). Fixes: 6b88eff43704 ("Bluetooth: hci_sync: Refactor remove Adv Monitor") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-26wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop()Tetsuo Handa
lockdep complains use of uninitialized spinlock at ieee80211_do_stop() [1], for commit f856373e2f31ffd3 ("wifi: mac80211: do not wake queues on a vif that is being stopped") guards clear_bit() using fq.lock even before fq_init() from ieee80211_txq_setup_flows() initializes this spinlock. According to discussion [2], Toke was not happy with expanding usage of fq.lock. Since __ieee80211_wake_txqs() is called under RCU read lock, we can instead use synchronize_rcu() for flushing ieee80211_wake_txqs(). Link: https://syzkaller.appspot.com/bug?extid=eceab52db7c4b961e9d6 [1] Link: https://lkml.kernel.org/r/874k0zowh2.fsf@toke.dk [2] Reported-by: syzbot <syzbot+eceab52db7c4b961e9d6@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: f856373e2f31ffd3 ("wifi: mac80211: do not wake queues on a vif that is being stopped") Tested-by: syzbot <syzbot+eceab52db7c4b961e9d6@syzkaller.appspotmail.com> Acked-by: Toke Høiland-Jørgensen <toke@kernel.org> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/9cc9b81d-75a3-3925-b612-9d0ad3cab82b@I-love.SAKURA.ne.jp [ pick up commit 3598cb6e1862 ("wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop()") from -next] Link: https://lore.kernel.org/all/87o7xcq6qt.fsf@kernel.org/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26ice: do not setup vlan for loopback VSIMaciej Fijalkowski
Currently loopback test is failiing due to the error returned from ice_vsi_vlan_setup(). Skip calling it when preparing loopback VSI. Fixes: 0e674aeb0b77 ("ice: Add handler for ethtool selftest") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26ice: check (DD | EOF) bits on Rx descriptor rather than (EOP | RS)Maciej Fijalkowski
Tx side sets EOP and RS bits on descriptors to indicate that a particular descriptor is the last one and needs to generate an irq when it was sent. These bits should not be checked on completion path regardless whether it's the Tx or the Rx. DD bit serves this purpose and it indicates that a particular descriptor is either for Rx or was successfully Txed. EOF is also set as loopback test does not xmit fragmented frames. Look at (DD | EOF) bits setting in ice_lbtest_receive_frames() instead of EOP and RS pair. Fixes: 0e674aeb0b77 ("ice: Add handler for ethtool selftest") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26ice: Fix VSIs unable to share unicast MACAnirudh Venkataramanan
The driver currently does not allow two VSIs in the same PF domain to have the same unicast MAC address. This is incorrect in the sense that a policy decision is being made in the driver when it must be left to the user. This approach was causing issues when rebooting the system with VFs spawned not being able to change their MAC addresses. Such errors were present in dmesg: [ 7921.068237] ice 0000:b6:00.2 ens2f2: Unicast MAC 6a:0d:e4:70:ca:d1 already exists on this PF. Preventing setting VF 7 unicast MAC address to 6a:0d:e4:70:ca:d1 Fix that by removing this restriction. Doing this also allows us to remove some additional code that's checking if a unicast MAC filter already exists. Fixes: 47ebc7b02485 ("ice: Check if unicast MAC exists before setting VF MAC") Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26ice: Fix tunnel checksum offload with fragmented trafficPrzemyslaw Patynowski
Fix checksum offload on VXLAN tunnels. In case, when mpls protocol is not used, set l4 header to transport header of skb. This fixes case, when user tries to offload checksums of VXLAN tunneled traffic. Steps for reproduction (requires link partner with tunnels): ip l s enp130s0f0 up ip a f enp130s0f0 ip a a 10.10.110.2/24 dev enp130s0f0 ip l s enp130s0f0 mtu 1600 ip link add vxlan12_sut type vxlan id 12 group 238.168.100.100 dev enp130s0f0 dstport 4789 ip l s vxlan12_sut up ip a a 20.10.110.2/24 dev vxlan12_sut iperf3 -c 20.10.110.1 #should connect Offload params: td_offset, cd_tunnel_params were corrupted, due to l4 header pointing wrong address. NIC would then drop those packets internally, due to incorrect TX descriptor data, which increased GLV_TEPC register. Fixes: 69e66c04c672 ("ice: Add mpls+tso support") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26ice: Fix max VLANs available for VFPrzemyslaw Patynowski
Legacy VLAN implementation allows for untrusted VF to have 8 VLAN filters, not counting VLAN 0 filters. Current VLAN_V2 implementation lowers available filters for VF, by counting in VLAN 0 filter for both TPIDs. Fix this by counting only non zero VLAN filters. Without this patch, untrusted VF would not be able to access 8 VLAN filters. Fixes: cc71de8fa133 ("ice: Add support for VIRTCHNL_VF_OFFLOAD_VLAN_V2") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26perf test: Avoid sysfs state affecting fake eventsIan Rogers
Icelake has a slots event, on my Skylakex I have CPU events in sysfs of topdown-slots-issued and topdown-total-slots. Legacy event parsing would try to use '-' to separate parts of an event and so perf_pmu__parse_init sets 'slots' to be a PMU_EVENT_SYMBOL_SUFFIX2. As such parsing the slots event for a fake PMU fails as a PMU_EVENT_SYMBOL_SUFFIX2 isn't made into the PE_PMU_EVENT_FAKE token. Resolve this issue by test initializing the PMU parsing state before every parse. This must be done every parse as the state is removes after each parse_events. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kshipra Bopardikar <kshipra.bopardikar@intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sedat Dilek <sedat.dilek@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: http://lore.kernel.org/lkml/20220725223633.2301737-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-26perf vendor events intel: Update event list for haswellxZhengjun Xing
Update JSON core/uncore events for haswellx to perf. Based on HSX JSON list v24: https://download.01.org/perfmon/HSX Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220614145019.2177071-2-zhengjun.xing@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-26perf vendor events intel: Update event list for broadwellxZhengjun Xing
Update JSON core/uncore events for broadwellx to perf. Based on BDX JSON list v19: https://download.01.org/perfmon/BDX Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220614145019.2177071-1-zhengjun.xing@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-26perf vendor events intel: Update event list for SnowridgexZhengjun Xing
More uncore events are added in the converter tool: https://github.com/intel/event-converter-for-linux-perf Keep both alias and the original name for the events, in case someone already used the alias in their script. Generate the perf events based on Snowridgex(SNR) event list v1.20: https://download.01.org/perfmon/SNR/ Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220609094222.2030167-2-zhengjun.xing@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-26perf vendor events intel: Rename tremontx to snowridgexZhengjun Xing
Tremontx was an old name for Snowridgex, so rename Tremontx to Snowridgex. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220609094222.2030167-1-zhengjun.xing@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-26perf vendor events intel: Update event list for SapphirerapidsZhengjun Xing
Update JSON event list for Sapphirerapids to perf. Based on JSON list v1.02: https://download.01.org/perfmon/SPR/ Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220607092749.1976878-2-zhengjun.xing@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-26perf vendor events intel: Update event list for AlderlakeZhengjun Xing
Update JSON event list for Alderlake to perf. It is a hybrid event list for both Atom and Core. Based on JSON list v1.11: https://download.01.org/perfmon/ADL/ Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220607092749.1976878-1-zhengjun.xing@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-26perf inject: Fix spelling mistake "theads" -> "threads"Colin Ian King
There is a spelling mistake in a pr_err message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel-janitors@vger.kernel.org Link: https://lore.kernel.org/r/20220721124528.20997-1-colin.i.king@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-26perf kwork: Add workqueue trace BPF supportYang Jihong
Implements workqueue trace bpf function. Test cases: # perf kwork -k workqueue lat -b Starting trace, Hit <Ctrl+C> to stop and report ^C Kwork Name | Cpu | Avg delay | Count | Max delay | Max delay start | Max delay end | -------------------------------------------------------------------------------------------------------------------------------- (w)addrconf_verify_work | 0002 | 5.856 ms | 1 | 5.856 ms | 111994.634313 s | 111994.640169 s | (w)vmstat_update | 0001 | 1.247 ms | 1 | 1.247 ms | 111996.462651 s | 111996.463899 s | (w)neigh_periodic_work | 0001 | 1.183 ms | 1 | 1.183 ms | 111996.462789 s | 111996.463973 s | (w)neigh_managed_work | 0001 | 0.989 ms | 2 | 1.635 ms | 111996.462820 s | 111996.464455 s | (w)wb_workfn | 0000 | 0.667 ms | 1 | 0.667 ms | 111996.384273 s | 111996.384940 s | (w)bpf_prog_free_deferred | 0001 | 0.495 ms | 1 | 0.495 ms | 111986.314201 s | 111986.314696 s | (w)mix_interrupt_randomness | 0002 | 0.421 ms | 6 | 0.749 ms | 111995.927750 s | 111995.928499 s | (w)vmstat_shepherd | 0000 | 0.374 ms | 2 | 0.385 ms | 111991.265242 s | 111991.265627 s | (w)e1000_watchdog | 0002 | 0.356 ms | 5 | 0.390 ms | 111994.528380 s | 111994.528770 s | (w)vmstat_update | 0000 | 0.231 ms | 2 | 0.365 ms | 111996.384407 s | 111996.384772 s | (w)flush_to_ldisc | 0006 | 0.165 ms | 1 | 0.165 ms | 111995.930606 s | 111995.930771 s | (w)flush_to_ldisc | 0000 | 0.094 ms | 2 | 0.095 ms | 111996.460453 s | 111996.460548 s | -------------------------------------------------------------------------------------------------------------------------------- # perf kwork -k workqueue rep -b Starting trace, Hit <Ctrl+C> to stop and report ^C Kwork Name | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end | -------------------------------------------------------------------------------------------------------------------------------- (w)e1000_watchdog | 0002 | 0.627 ms | 2 | 0.324 ms | 112002.720665 s | 112002.720989 s | (w)flush_to_ldisc | 0007 | 0.598 ms | 2 | 0.534 ms | 112000.875226 s | 112000.875761 s | (w)wq_barrier_func | 0007 | 0.492 ms | 1 | 0.492 ms | 112000.876981 s | 112000.877473 s | (w)flush_to_ldisc | 0007 | 0.281 ms | 1 | 0.281 ms | 112005.826882 s | 112005.827163 s | (w)mix_interrupt_randomness | 0002 | 0.229 ms | 3 | 0.102 ms | 112005.825671 s | 112005.825774 s | (w)vmstat_shepherd | 0000 | 0.202 ms | 1 | 0.202 ms | 112001.504511 s | 112001.504713 s | (w)bpf_prog_free_deferred | 0001 | 0.181 ms | 1 | 0.181 ms | 112000.883251 s | 112000.883432 s | (w)wb_workfn | 0007 | 0.130 ms | 1 | 0.130 ms | 112001.505195 s | 112001.505325 s | (w)vmstat_update | 0000 | 0.053 ms | 1 | 0.053 ms | 112001.504763 s | 112001.504815 s | -------------------------------------------------------------------------------------------------------------------------------- Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220709015033.38326-18-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-26perf kwork: Add softirq trace BPF supportYang Jihong
Implements softirq trace bpf function. Test cases: Trace softirq latency without filter: # perf kwork -k softirq lat -b Starting trace, Hit <Ctrl+C> to stop and report ^C Kwork Name | Cpu | Avg delay | Count | Max delay | Max delay start | Max delay end | -------------------------------------------------------------------------------------------------------------------------------- (s)RCU:9 | 0005 | 0.281 ms | 3 | 0.338 ms | 111295.752222 s | 111295.752560 s | (s)RCU:9 | 0002 | 0.262 ms | 24 | 1.400 ms | 111301.335986 s | 111301.337386 s | (s)SCHED:7 | 0005 | 0.177 ms | 14 | 0.212 ms | 111295.752270 s | 111295.752481 s | (s)RCU:9 | 0007 | 0.161 ms | 47 | 2.022 ms | 111295.402159 s | 111295.404181 s | (s)NET_RX:3 | 0003 | 0.149 ms | 12 | 1.261 ms | 111301.192964 s | 111301.194225 s | (s)TIMER:1 | 0001 | 0.105 ms | 9 | 0.198 ms | 111301.180191 s | 111301.180389 s | ... <SNIP> ... (s)NET_RX:3 | 0002 | 0.098 ms | 6 | 0.124 ms | 111295.403760 s | 111295.403884 s | (s)SCHED:7 | 0001 | 0.093 ms | 19 | 0.242 ms | 111301.180256 s | 111301.180498 s | (s)SCHED:7 | 0007 | 0.078 ms | 15 | 0.188 ms | 111300.064226 s | 111300.064415 s | (s)SCHED:7 | 0004 | 0.077 ms | 11 | 0.213 ms | 111301.361759 s | 111301.361973 s | (s)SCHED:7 | 0000 | 0.063 ms | 33 | 0.805 ms | 111295.401811 s | 111295.402616 s | (s)SCHED:7 | 0003 | 0.063 ms | 14 | 0.085 ms | 111301.192255 s | 111301.192340 s | -------------------------------------------------------------------------------------------------------------------------------- Trace softirq latency with cpu filter: # perf kwork -k softirq lat -b -C 1 Starting trace, Hit <Ctrl+C> to stop and report ^C Kwork Name | Cpu | Avg delay | Count | Max delay | Max delay start | Max delay end | -------------------------------------------------------------------------------------------------------------------------------- (s)RCU:9 | 0001 | 0.178 ms | 5 | 0.572 ms | 111435.534135 s | 111435.534707 s | -------------------------------------------------------------------------------------------------------------------------------- Trace softirq latency with name filter: # perf kwork -k softirq lat -b -n SCHED Starting trace, Hit <Ctrl+C> to stop and report ^C Kwork Name | Cpu | Avg delay | Count | Max delay | Max delay start | Max delay end | -------------------------------------------------------------------------------------------------------------------------------- (s)SCHED:7 | 0001 | 0.295 ms | 15 | 2.183 ms | 111452.534950 s | 111452.537133 s | (s)SCHED:7 | 0002 | 0.215 ms | 10 | 0.315 ms | 111460.000238 s | 111460.000553 s | (s)SCHED:7 | 0005 | 0.190 ms | 29 | 0.338 ms | 111457.032538 s | 111457.032876 s | (s)SCHED:7 | 0003 | 0.097 ms | 10 | 0.319 ms | 111452.434351 s | 111452.434670 s | (s)SCHED:7 | 0006 | 0.089 ms | 1 | 0.089 ms | 111450.737450 s | 111450.737539 s | (s)SCHED:7 | 0007 | 0.085 ms | 17 | 0.169 ms | 111452.471333 s | 111452.471502 s | (s)SCHED:7 | 0004 | 0.071 ms | 15 | 0.221 ms | 111452.535252 s | 111452.535473 s | (s)SCHED:7 | 0000 | 0.044 ms | 32 | 0.130 ms | 111460.001982 s | 111460.002112 s | -------------------------------------------------------------------------------------------------------------------------------- Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220709015033.38326-17-yangjihong1@huawei.com [ Add {} for multiline if blocks ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-26perf kwork: Add IRQ trace BPF supportYang Jihong
Implements irq trace bpf function. Test cases: Trace irq without filter: # perf kwork -k irq rep -b Starting trace, Hit <Ctrl+C> to stop and report ^C Kwork Name | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end | -------------------------------------------------------------------------------------------------------------------------------- virtio0-requests:25 | 0000 | 31.026 ms | 285 | 1.493 ms | 110326.049963 s | 110326.051456 s | eth0:10 | 0002 | 7.875 ms | 96 | 1.429 ms | 110313.916835 s | 110313.918264 s | ata_piix:14 | 0002 | 2.510 ms | 28 | 0.396 ms | 110331.367987 s | 110331.368383 s | -------------------------------------------------------------------------------------------------------------------------------- Trace irq with cpu filter: # perf kwork -k irq rep -b -C 0 Starting trace, Hit <Ctrl+C> to stop and report ^C Kwork Name | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end | -------------------------------------------------------------------------------------------------------------------------------- virtio0-requests:25 | 0000 | 34.288 ms | 282 | 2.061 ms | 110358.078968 s | 110358.081029 s | -------------------------------------------------------------------------------------------------------------------------------- Trace irq with name filter: # perf kwork -k irq rep -b -n eth0 Starting trace, Hit <Ctrl+C> to stop and report ^C Kwork Name | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end | -------------------------------------------------------------------------------------------------------------------------------- eth0:10 | 0002 | 2.184 ms | 21 | 0.572 ms | 110386.541699 s | 110386.542271 s | -------------------------------------------------------------------------------------------------------------------------------- Trace irq with summary: # perf kwork -k irq rep -b -S Starting trace, Hit <Ctrl+C> to stop and report ^C Kwork Name | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end | -------------------------------------------------------------------------------------------------------------------------------- virtio0-requests:25 | 0000 | 42.923 ms | 285 | 1.181 ms | 110418.128867 s | 110418.130049 s | eth0:10 | 0002 | 2.085 ms | 20 | 0.668 ms | 110416.002935 s | 110416.003603 s | ata_piix:14 | 0002 | 0.970 ms | 4 | 0.656 ms | 110424.034482 s | 110424.035138 s | -------------------------------------------------------------------------------------------------------------------------------- Total count : 309 Total runtime (msec) : 45.977 (0.003% load average) Total time span (msec) : 17017.655 -------------------------------------------------------------------------------------------------------------------------------- Committer testing: # perf kwork -k irq rep -b Starting trace, Hit <Ctrl+C> to stop and report ^C Kwork Name | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end | -------------------------------------------------------------------------------------------------------------------------------- nvme0q20:145 | 0019 | 0.570 ms | 28 | 0.064 ms | 26966.635102 s | 26966.635167 s | amdgpu:162 | 0002 | 0.568 ms | 29 | 0.068 ms | 26966.644346 s | 26966.644414 s | nvme0q4:129 | 0003 | 0.565 ms | 31 | 0.037 ms | 26966.614830 s | 26966.614866 s | nvme0q16:141 | 0015 | 0.205 ms | 66 | 0.012 ms | 26967.145161 s | 26967.145174 s | nvme0q29:154 | 0028 | 0.154 ms | 44 | 0.014 ms | 26967.078970 s | 26967.078984 s | nvme0q10:135 | 0009 | 0.134 ms | 43 | 0.011 ms | 26967.132093 s | 26967.132104 s | nvme0q2:127 | 0001 | 0.132 ms | 26 | 0.011 ms | 26966.883584 s | 26966.883595 s | nvme0q25:150 | 0024 | 0.127 ms | 32 | 0.014 ms | 26966.631419 s | 26966.631433 s | nvme0q14:139 | 0013 | 0.110 ms | 21 | 0.017 ms | 26966.760843 s | 26966.760861 s | nvme0q30:155 | 0029 | 0.102 ms | 30 | 0.022 ms | 26966.677171 s | 26966.677193 s | nvme0q13:138 | 0012 | 0.088 ms | 20 | 0.015 ms | 26966.738733 s | 26966.738748 s | nvme0q6:131 | 0005 | 0.087 ms | 13 | 0.020 ms | 26966.648445 s | 26966.648465 s | nvme0q28:153 | 0027 | 0.066 ms | 12 | 0.015 ms | 26966.771431 s | 26966.771447 s | nvme0q26:151 | 0025 | 0.060 ms | 13 | 0.012 ms | 26966.704266 s | 26966.704278 s | nvme0q21:146 | 0020 | 0.054 ms | 20 | 0.011 ms | 26967.322082 s | 26967.322094 s | nvme0q1:126 | 0000 | 0.046 ms | 11 | 0.013 ms | 26966.859754 s | 26966.859767 s | nvme0q17:142 | 0016 | 0.046 ms | 10 | 0.011 ms | 26967.114513 s | 26967.114524 s | xhci_hcd:74 | 0015 | 0.041 ms | 3 | 0.016 ms | 26967.086004 s | 26967.086020 s | nvme0q8:133 | 0007 | 0.039 ms | 12 | 0.008 ms | 26966.712056 s | 26966.712063 s | nvme0q32:157 | 0031 | 0.036 ms | 10 | 0.014 ms | 26966.627054 s | 26966.627068 s | nvme0q9:134 | 0008 | 0.036 ms | 11 | 0.011 ms | 26967.258452 s | 26967.258462 s | nvme0q7:132 | 0006 | 0.024 ms | 3 | 0.014 ms | 26966.767404 s | 26966.767418 s | nvme0q11:136 | 0010 | 0.023 ms | 5 | 0.006 ms | 26966.935455 s | 26966.935461 s | nvme0q31:156 | 0030 | 0.018 ms | 5 | 0.006 ms | 26966.627517 s | 26966.627524 s | nvme0q12:137 | 0011 | 0.015 ms | 2 | 0.014 ms | 26966.799588 s | 26966.799602 s | enp5s0-rx-0:164 | 0006 | 0.009 ms | 2 | 0.005 ms | 26966.742024 s | 26966.742028 s | enp5s0-rx-1:165 | 0007 | 0.006 ms | 2 | 0.004 ms | 26966.939486 s | 26966.939490 s | enp5s0-tx-0:166 | 0008 | 0.005 ms | 1 | 0.005 ms | 26966.939484 s | 26966.939489 s | enp5s0-tx-1:167 | 0009 | 0.005 ms | 1 | 0.005 ms | 26966.939484 s | 26966.939489 s | -------------------------------------------------------------------------------------------------------------------------------- #t Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220709015033.38326-16-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-26perf kwork: Implement BPF traceYang Jihong
'perf record' generates perf.data, which generates extra interrupts for hard disk, amount of data to be collected increases with time. Using eBPF trace can process the data in kernel, which solves the preceding two problems. Add -b/--use-bpf option for latency and report to support tracing kwork events using eBPF: 1. Create bpf prog and attach to tracepoints, 2. Start tracing after command is entered, 3. After user hit "ctrl+c", stop tracing and report, 4. Support CPU and name filtering. This commit implements the framework code and does not add specific event support. Test cases: # perf kwork rep -h Usage: perf kwork report [<options>] -b, --use-bpf Use BPF to measure kwork runtime -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): runtime, max, count -S, --with-summary Show summary with statistics --time <str> Time span for analysis (start,stop) # perf kwork lat -h Usage: perf kwork latency [<options>] -b, --use-bpf Use BPF to measure kwork latency -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): avg, max, count --time <str> Time span for analysis (start,stop) # perf kwork lat -b Unsupported bpf trace class irq # perf kwork rep -b Unsupported bpf trace class irq Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220709015033.38326-15-yangjihong1@huawei.com [ Simplify work_findnew() ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-26perf kwork: Implement perf kwork timehistYang Jihong
Implements framework of perf kwork timehist, to provide an analysis of kernel work events. Test cases: # perf kwork tim Runtime start Runtime end Cpu Kwork name Runtime Delaytime (TYPE)NAME:NUM (msec) (msec) ----------------- ----------------- ------ ------------------------------ ---------- ---------- 91576.060290 91576.060344 [0000] (s)RCU:9 0.055 0.111 91576.061470 91576.061547 [0000] (s)SCHED:7 0.077 0.073 91576.062604 91576.062697 [0001] (s)RCU:9 0.094 0.409 91576.064443 91576.064517 [0002] (s)RCU:9 0.074 0.114 91576.065144 91576.065211 [0000] (s)SCHED:7 0.067 0.058 91576.066564 91576.066609 [0003] (s)RCU:9 0.045 0.110 91576.068495 91576.068559 [0000] (s)SCHED:7 0.064 0.059 91576.068900 91576.068996 [0004] (s)RCU:9 0.096 0.726 91576.069364 91576.069420 [0002] (s)RCU:9 0.056 0.082 91576.069649 91576.069701 [0004] (s)RCU:9 0.052 0.111 91576.070147 91576.070206 [0000] (s)SCHED:7 0.060 0.057 91576.073147 91576.073202 [0000] (s)SCHED:7 0.054 0.060 <SNIP> # perf kwork tim --max-stack 2 -g Runtime start Runtime end Cpu Kwork name Runtime Delaytime (TYPE)NAME:NUM (msec) (msec) ----------------- ----------------- ------ ------------------------------ ---------- ---------- 91576.060290 91576.060344 [0000] (s)RCU:9 0.055 0.111 irq_exit_rcu <- sysvec_apic_timer_interrupt 91576.061470 91576.061547 [0000] (s)SCHED:7 0.077 0.073 irq_exit_rcu <- sysvec_call_function_single 91576.062604 91576.062697 [0001] (s)RCU:9 0.094 0.409 irq_exit_rcu <- sysvec_apic_timer_interrupt 91576.064443 91576.064517 [0002] (s)RCU:9 0.074 0.114 irq_exit_rcu <- sysvec_apic_timer_interrupt 91576.065144 91576.065211 [0000] (s)SCHED:7 0.067 0.058 irq_exit_rcu <- sysvec_call_function_single 91576.066564 91576.066609 [0003] (s)RCU:9 0.045 0.110 irq_exit_rcu <- sysvec_apic_timer_interrupt 91576.068495 91576.068559 [0000] (s)SCHED:7 0.064 0.059 irq_exit_rcu <- sysvec_call_function_single 91576.068900 91576.068996 [0004] (s)RCU:9 0.096 0.726 irq_exit_rcu <- sysvec_apic_timer_interrupt 91576.069364 91576.069420 [0002] (s)RCU:9 0.056 0.082 irq_exit_rcu <- sysvec_apic_timer_interrupt 91576.069649 91576.069701 [0004] (s)RCU:9 0.052 0.111 irq_exit_rcu <- sysvec_apic_timer_interrupt <SNIP> Committer testing: # perf kwork -k workqueue timehist | head -40 Runtime start Runtime end Cpu Kwork name Runtime Delaytime (TYPE)NAME:NUM (msec) (msec) ----------------- ----------------- ------ ------------------------------ ---------- ---------- 26520.211825 26520.211832 [0019] (w)free_work 0.007 0.004 26520.212929 26520.212934 [0020] (w)free_work 0.005 0.004 26520.213226 26520.213228 [0014] (w)kfree_rcu_work 0.002 0.004 26520.214057 26520.214061 [0021] (w)free_work 0.004 0.004 26520.221239 26520.221241 [0007] (w)kfree_rcu_work 0.002 0.009 26520.223232 26520.223238 [0013] (w)psi_avgs_work 0.005 0.006 26520.230057 26520.230060 [0020] (w)free_work 0.003 0.003 26520.270428 26520.270434 [0015] (w)free_work 0.006 0.004 26520.270546 26520.270550 [0014] (w)free_work 0.004 0.003 26520.281626 26520.281629 [0015] (w)free_work 0.003 0.002 26520.287225 26520.287230 [0012] (w)psi_avgs_work 0.005 0.008 26520.287231 26520.287235 [0001] (w)psi_avgs_work 0.004 0.011 26520.287236 26520.287239 [0001] (w)psi_avgs_work 0.003 0.012 26520.329488 26520.329492 [0024] (w)free_work 0.004 0.004 26520.330600 26520.330605 [0007] (w)free_work 0.005 0.004 26520.334218 26520.334218 [0007] (w)kfree_rcu_monitor 0.001 0.002 26520.335220 26520.335221 [0005] (w)kfree_rcu_monitor 0.001 0.004 26520.343980 26520.343985 [0007] (w)free_work 0.005 0.002 26520.345093 26520.345097 [0006] (w)free_work 0.004 0.003 26520.351233 26520.351238 [0027] (w)psi_avgs_work 0.005 0.008 26520.353228 26520.353229 [0007] (w)kfree_rcu_work 0.001 0.002 26520.353229 26520.353231 [0005] (w)kfree_rcu_work 0.001 0.006 26520.382381 26520.382383 [0006] (w)free_work 0.003 0.002 26520.386547 26520.386548 [0006] (w)free_work 0.002 0.001 26520.391243 26520.391245 [0015] (w)console_callback 0.002 0.016 26520.415369 26520.415621 [0027] (w)btrfs_work_helper 0.252 26520.415351 26520.416174 [0002] (w)btrfs_work_helper 0.823 0.037 26520.415343 26520.416304 [0031] (w)btrfs_work_helper 0.961 26520.415335 26520.417078 [0001] (w)btrfs_work_helper 1.743 26520.415250 26520.417564 [0002] (w)wb_workfn 2.314 26520.424777 26520.424787 [0002] (w)btrfs_work_helper 0.010 26520.424788 26520.424798 [0002] (w)btrfs_work_helper 0.010 26520.424790 26520.424805 [0001] (w)btrfs_work_helper 0.016 0.016 26520.424801 26520.424807 [0002] (w)btrfs_work_helper 0.006 26520.424809 26520.424831 [0002] (w)btrfs_work_helper 0.022 0.030 26520.424824 26520.424835 [0027] (w)btrfs_work_helper 0.011 26520.424809 26520.424867 [0001] (w)btrfs_work_helper 0.059 0.032 # Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220709015033.38326-14-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>