summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-12-09libperf cpumap: Grow array of read CPUs in smaller incrementsIan Rogers
Instead of growing the array by 2048, grow by the larger of the current range or 16. As ranges are typical for things like the online CPUs this will mean a single allocation happens. While uncore CPU maps will grow 16 at a time which is a value that is generous except say on large servers. Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kyle Meyer <kyle.meyer@hpe.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241206044035.1062032-9-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-09libperf cpumap: Remove perf_cpu_map__read()Ian Rogers
Function is no longer used and duplicates the parsing logic from perf_cpu_map__new(). Remove to allow simplification. Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kyle Meyer <kyle.meyer@hpe.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241206044035.1062032-8-irogers@google.com [ Applied manually to cope with "libperf cpumap: Refactor perf_cpu_map__merge()" ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-09libperf cpumap: Remove use of perf_cpu_map__read()Ian Rogers
Remove use of a FILE and switch to reading a string that is then passed to perf_cpu_map__new(). Being able to remove perf_cpu_map__read() avoids duplicated parsing logic. Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kyle Meyer <kyle.meyer@hpe.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241206044035.1062032-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-09perf pmu: Remove use of perf_cpu_map__read()Ian Rogers
Remove use of a FILE and switch to reading a string that is then passed to perf_cpu_map__new(). Being able to remove perf_cpu_map__read() avoids duplicated parsing logic. Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kyle Meyer <kyle.meyer@hpe.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241206044035.1062032-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-09libperf cpumap: Be tolerant of newline at the end of a cpumaskIan Rogers
File cpumasks often have a newline that shouldn't trigger the invalid parsing case in perf_cpu_map__new(). Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kyle Meyer <kyle.meyer@hpe.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241206044035.1062032-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-09libperf cpumap: Hide/reduce scope of MAX_NR_CPUSIan Rogers
Avoid redefinition of MAX_NR_CPUS as a global constant, the original definition is tools/perf/perf.h. Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kyle Meyer <kyle.meyer@hpe.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241206044035.1062032-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-09perf cpumap: Reduce transitive dependencies on libperf MAX_NR_CPUSIan Rogers
libperf exposes MAX_NR_CPUS via tools/lib/perf/include/internal/cpumap.h which is internal. The preferred dependency should be the definition in tools/perf/perf.h. Add the includes of perf.h so that MAX_NR_CPUS can be hidden in libperf. Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kyle Meyer <kyle.meyer@hpe.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241206044035.1062032-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-09perf: Increase MAX_NR_CPUS to 4096Kyle Meyer
Systems have surpassed 2048 CPUs. Increase MAX_NR_CPUS to 4096. Bitmaps declared with MAX_NR_CPUS bits will increase from 256B to 512B, cpus_runtime will increase from 81960B to 163880B, and max_entries will increase from 8192B to 16384B. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241206044035.1062032-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-09perf arm-spe: Add support for SPE Data Source packet on AmpereOneIlkka Koskinen
Decode SPE Data Source packets on AmpereOne. The field is IMPDEF. Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Graham Woodward <graham.woodward@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20241108202946.16835-3-ilkka@os.amperecomputing.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-09perf arm-spe: Prepare for adding data source packet implementations for ↵Ilkka Koskinen
other cores Split Data Source Packet handling to prepare adding support for other implementations. Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Graham Woodward <graham.woodward@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20241108202946.16835-2-ilkka@os.amperecomputing.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-09perf cpumap: Add checking for reference counterLeo Yan
For the CPU map merging test, add an extra check for the reference counter before releasing the last CPU map. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Leo Yan <leo.yan@arm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241107125308.41226-4-leo.yan@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-09perf cpumap: Add more tests for CPU map mergingLeo Yan
Add additional tests for CPU map merging to cover more cases. These tests include different types of arguments, such as when one CPU map is a subset of another, as well as cases with or without overlap between the two maps. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Leo Yan <leo.yan@arm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241107125308.41226-3-leo.yan@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-09libperf cpumap: Refactor perf_cpu_map__merge()Leo Yan
The perf_cpu_map__merge() function has two arguments, 'orig' and 'other'. The function definition might cause confusion as it could give the impression that the CPU maps in the two arguments are copied into a new allocated structure, which is then returned as the result. The purpose of the function is to merge the CPU map 'other' into the CPU map 'orig'. This commit changes the 'orig' argument to a pointer to pointer, so the new result will be updated into 'orig'. The return value is changed to an int type, as an error number or 0 for success. Update callers and tests for the new function definition. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Leo Yan <leo.yan@arm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241107125308.41226-2-leo.yan@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-09perf config: Fix trival typo 'an' -> 'can'Arnaldo Carvalho de Melo
Just a trivial typo, should be 'can', did a spell check on the rest of the file just in case, nothing more stood out. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-09perf script python: Improve physical mem type resolutionIan Rogers
Previously system RAM and persistent memory were hard code matched, change so that the label of the memory region is just read from /proc/iomem. This avoids frequent N/A samples. Change the /proc/iomem reading, event processing and output so that nested entries appear and their counts count toward their parent. As labels may be repeated, include the memory ranges in the output to make it clear why, for example, "System RAM" appears twice. Before: Event: mem_inst_retired.all_loads:P Memory type count percentage ---------------------------------------- ---------- ---------- System RAM 9460 96.5% N/A 998 3.5% After: Event: mem_inst_retired.all_loads:P Memory type count percentage ---------------------------------------- ---------- ---------- 100000000-105f7fffff : System RAM 36741 96.5 841400000-8416599ff : Kernel data 89 0.2 840800000-8412a6fff : Kernel rodata 60 0.2 841ebe000-8423fffff : Kernel bss 34 0.1 0-fff : Reserved 1345 3.5 100000-89dd9fff : System RAM 2 0.0 Before: Event: mem_inst_retired.any:P Memory type count percentage ---------------------------------------- ----------- ----------- System RAM 9460 90.5% N/A 998 9.5% After: Event: mem_inst_retired.any:P Memory type count percentage ---------------------------------------- ---------- ---------- 100000000-105f7fffff : System RAM 9460 90.5 841400000-8416599ff : Kernel data 45 0.4 840800000-8412a6fff : Kernel rodata 19 0.2 841ebe000-8423fffff : Kernel bss 12 0.1 0-fff : Reserved 998 9.5 The code has been updated to python 3 with type hints and resolving issues reported by mypy and pylint. Tabs are swapped to spaces as preferred in PEP8, because most lines of code were modified (of this small file) and this makes pylint significantly less noisy. Committer testing: root@number:/tmp# grep -m1 "model name" /proc/cpuinfo model name : Intel(R) Core(TM) i7-14700K root@number:/tmp# root@number:/tmp# perf script mem-phys-addr -a find / /bin /lib /lib64 /sbin Warning: 744 out of order events recorded. Event: cpu_core/mem_inst_retired.all_loads/P Memory type count percentage ---------------------------------------- ---------- ---------- 100000000-8bfbfffff : System RAM 364561 76.5 621400000-6223a6fff : Kernel rodata 10474 2.2 622400000-62283d4bf : Kernel data 4828 1.0 623304000-6237fffff : Kernel bss 1063 0.2 620000000-6213fffff : Kernel code 98 0.0 0-fff : Reserved 111480 23.4 100000-2b0ca017 : System RAM 337 0.1 2fbad000-30d92fff : System RAM 44 0.0 2c79d000-2fbabfff : System RAM 30 0.0 30d94000-316d5fff : System RAM 16 0.0 2b131a58-2c71dfff : System RAM 7 0.0 root@number:/tmp# Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241119180130.19160-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-09perf disasm: Return a proper error when not determining the file typeArnaldo Carvalho de Melo
Before: ⬢ [acme@toolbox a]$ perf annotate --stdio2 -i acme-perf-injected.data 'java.lang.String com.fasterxml.jackson.core.sym.CharsToNameCanonicalizer.findSymbol(char[], int, int, int)' Error: Couldn't annotate java.lang.String com.fasterxml.jackson.core.sym.CharsToNameCanonicalizer.findSymbol(char[], int, int, int): Internal error: Invalid -1 error code ⬢ [acme@toolbox a]$ After: ⬢ [acme@toolbox a]$ perf annotate --stdio2 -i acme-perf-injected.data 'java.lang.String com.fasterxml.jackson.core.sym.CharsToNameCanonicalizer.findSymbol(char[], int, int, int)' Error: Couldn't annotate java.lang.String com.fasterxml.jackson.core.sym.CharsToNameCanonicalizer.findSymbol(char[], int, int, int): Couldn't determine the file /tmp/perf-3308868.map type. ⬢ [acme@toolbox a]$ Reported-by: Francesco Nigro <fnigro@redhat.com> Reported-by: Ilan Green <igreen@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Yonatan Goldschmidt <yonatan.goldschmidt@granulate.io> Link: https://lore.kernel.org/lkml/Z092D9-r_iOgwIWM@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-09tools features: Don't check for libunwind devel files by defaultArnaldo Carvalho de Melo
Since 13e17c9ff49119aa ("perf build: Make libunwind opt-in rather than opt-out"), so we shouldn't by default be testing for its availability at build time in tools/build/features/test-all.c. That test was designed to test the features we expect to be the most common ones in most builds, so if we test build just that file, then we assume the features there are present and will not test one by one. Removing it from test-all.c gets rid of the first impediment for test-all.c to build successfully: $ cat /tmp/build/perf-tools-next/feature/test-all.make.output In file included from test-all.c:62: test-libunwind.c:2:10: fatal error: libunwind.h: No such file or directory 2 | #include <libunwind.h> | ^~~~~~~~~~~~~ compilation terminated. $ We then get to: $ cat /tmp/build/perf-tools-next/feature/test-all.make.output /usr/bin/ld: cannot find -lunwind-x86_64: No such file or directory /usr/bin/ld: cannot find -lunwind: No such file or directory collect2: error: ld returned 1 exit status $ So make all the logic related to setting CFLAGS, LDFLAGS, etc for libunwind to be conditional on NO_LIBWUNWIND=1, which is now the default, now we get a faster build: $ cat /tmp/build/perf-tools-next/feature/test-all.make.output $ ldd /tmp/build/perf-tools-next/feature/test-all.bin linux-vdso.so.1 (0x00007fef04cde000) libdw.so.1 => /lib64/libdw.so.1 (0x00007fef04a49000) libpython3.12.so.1.0 => /lib64/libpython3.12.so.1.0 (0x00007fef04478000) libm.so.6 => /lib64/libm.so.6 (0x00007fef04394000) libtraceevent.so.1 => /lib64/libtraceevent.so.1 (0x00007fef0436c000) libtracefs.so.1 => /lib64/libtracefs.so.1 (0x00007fef04345000) libcrypto.so.3 => /lib64/libcrypto.so.3 (0x00007fef03e95000) libz.so.1 => /lib64/libz.so.1 (0x00007fef03e72000) libelf.so.1 => /lib64/libelf.so.1 (0x00007fef03e56000) libnuma.so.1 => /lib64/libnuma.so.1 (0x00007fef03e48000) libslang.so.2 => /lib64/libslang.so.2 (0x00007fef03b65000) libperl.so.5.38 => /lib64/libperl.so.5.38 (0x00007fef037c6000) libc.so.6 => /lib64/libc.so.6 (0x00007fef035d5000) liblzma.so.5 => /lib64/liblzma.so.5 (0x00007fef035a0000) libzstd.so.1 => /lib64/libzstd.so.1 (0x00007fef034e1000) libbz2.so.1 => /lib64/libbz2.so.1 (0x00007fef034cd000) /lib64/ld-linux-x86-64.so.2 (0x00007fef04ce0000) libcrypt.so.2 => /lib64/libcrypt.so.2 (0x00007fef03495000) $ Fixes: 13e17c9ff49119aa ("perf build: Make libunwind opt-in rather than opt-out") Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/Z09zTztD8X8qIWCX@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-09Merge tag 'locking_urgent_for_v6.13_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Borislav Petkov: - Remove if_not_guard() as it is generating incorrect code - Fix the initialization of the fake lockdep_map for the first locked ww_mutex * tag 'locking_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: headers/cleanup.h: Remove the if_not_guard() facility locking/ww_mutex: Fix ww_mutex dummy lockdep map selftest warnings
2024-12-09Merge tag 'perf_urgent_for_v6.13_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf fixes from Borislav Petkov: - Make sure the PEBS buffer is drained before reconfiguring the hardware - Add Arrow Lake U support * tag 'perf_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/ds: Unconditionally drain PEBS DS when changing PEBS_DATA_CFG perf/x86/intel: Add Arrow Lake U support
2024-12-09Merge tag 'sched_urgent_for_v6.13_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Borislav Petkov: - Remove wrong enqueueing of a task for a later wakeup when a task blocks on a RT mutex - Do not setup a new deadline entity on a boosted task as that has happened already - Update preempt= kernel command line param - Prevent needless softirqd wakeups in the idle task's context - Detect the case where the idle load balancer CPU becomes busy and avoid unnecessary load balancing invocation - Remove an unnecessary load balancing need_resched() call in nohz_csd_func() - Allow for raising of SCHED_SOFTIRQ softirq type on RT but retain the warning to catch any other cases - Remove a wrong warning when a cpuset update makes the task affinity no longer a subset of the cpuset * tag 'sched_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking: rtmutex: Fix wake_q logic in task_blocks_on_rt_mutex sched/deadline: Fix warning in migrate_enable for boosted tasks sched/core: Update kernel boot parameters for LAZY preempt. sched/core: Prevent wakeup of ksoftirqd during idle load balance sched/fair: Check idle_cpu() before need_resched() to detect ilb CPU turning busy sched/core: Remove the unnecessary need_resched() check in nohz_csd_func() softirq: Allow raising SCHED_SOFTIRQ from SMP-call-function on RT kernel sched: fix warning in sched_setaffinity sched/deadline: Fix replenish_dl_new_period dl_server condition
2024-12-09x86: Fix build regression with CONFIG_KEXEC_JUMP enabledDamien Le Moal
Build 6.13-rc12 for x86_64 with gcc 14.2.1 fails with the error: ld: vmlinux.o: in function `virtual_mapped': linux/arch/x86/kernel/relocate_kernel_64.S:249:(.text+0x5915b): undefined reference to `saved_context_gdt_desc' when CONFIG_KEXEC_JUMP is enabled. This was introduced by commit 07fa619f2a40 ("x86/kexec: Restore GDT on return from ::preserve_context kexec") which introduced a use of saved_context_gdt_desc without a declaration for it. Fix that by including asm/asm-offsets.h where saved_context_gdt_desc is defined (indirectly in include/generated/asm-offsets.h which asm/asm-offsets.h includes). Fixes: 07fa619f2a40 ("x86/kexec: Restore GDT on return from ::preserve_context kexec") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: David Woodhouse <dwmw@amazon.co.uk> Closes: https://lore.kernel.org/oe-kbuild-all/202411270006.ZyyzpYf8-lkp@intel.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-09futex: fix user access on powerpcLinus Torvalds
The powerpc user access code is special, and unlike other architectures distinguishes between user access for reading and writing. And commit 43a43faf5376 ("futex: improve user space accesses") messed that up. It went undetected elsewhere, but caused ppc32 to fail early during boot, because the user access had been started with user_read_access_begin(), but then finished off with just a plain "user_access_end()". Note that the address-masking user access helpers don't even have that read-vs-write distinction, so if powerpc ever wants to do address masking tricks, we'll have to do some extra work for it. [ Make sure to also do it for the EFAULT case, as pointed out by Christophe Leroy ] Reported-by: Andreas Schwab <schwab@linux-m68k.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Link: https://lore.kernel.org/all/87bjxl6b0i.fsf@igel.home/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-09Merge branch 'net-sparx5-lan969x-fixes'David S. Miller
Daniel Machon says: ==================== net: sparx5: misc fixes for sparx5 and lan969x This series fixes various issues in the Sparx5 and lan969x drivers. Most of the fixes are for new issues introduced by the recent series adding lan969x switch support in the Sparx5 driver. Most notable is patch 1/5 that moves the lan969x dir into the sparx5 dir, in order to address a cyclic dependency issue reported by depmod, when installing modules. Details are in the commit descriptions. To: Andrew Lunn <andrew+netdev@lunn.ch> To: David S. Miller <davem@davemloft.net> To: Eric Dumazet <edumazet@google.com> To: Jakub Kicinski <kuba@kernel.org> To: Paolo Abeni <pabeni@redhat.com> To: Lars Povlsen <lars.povlsen@microchip.com> To: Steen Hegelund <Steen.Hegelund@microchip.com> To: UNGLinuxDriver@microchip.com To: Richard Cochran <richardcochran@gmail.com> To: Bjarni Jonasson <bjarni.jonasson@microchip.com> To: jensemil.schulzostergaard@microchip.com To: horatiu.vultur@microchip.com To: arnd@arndb.de To: jacob.e.keller@intel.com To: Parthiban.Veerasooran@microchip.com Cc: Calvin Owens <calvin@wbinvd.org> Cc: Muhammad Usama Anjum <Usama.Anjum@collabora.com> Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org ==================== Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-09net: sparx5: fix the maximum frame length registerDaniel Machon
On port initialization, we configure the maximum frame length accepted by the receive module associated with the port. This value is currently written to the MAX_LEN field of the DEV10G_MAC_ENA_CFG register, when in fact, it should be written to the DEV10G_MAC_MAXLEN_CFG register. Fix this. Fixes: 946e7fd5053a ("net: sparx5: add port module support") Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-09net: sparx5: fix default value of monitor portsDaniel Machon
When doing port mirroring, the physical port to send the frame to, is written to the FRMC_PORT_VAL field of the QFWD_FRAME_COPY_CFG register. This field is 7 bits wide on sparx5 and 6 bits wide on lan969x, and has a default value of 65 and 30, respectively (the number of front ports). On mirror deletion, we set the default value of the monitor port to 65 for this field, in case no more ports exists for the mirror. Needless to say, this will not fit the 6 bits on lan969x. Fix this by correctly using the n_ports constant instead. Fixes: 3f9e46347a46 ("net: sparx5: use SPX5_CONST for constants which already have a symbol") Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-09net: sparx5: fix FDMA performance issueDaniel Machon
The FDMA handler is responsible for scheduling a NAPI poll, which will eventually fetch RX packets from the FDMA queue. Currently, the FDMA handler is run in a threaded context. For some reason, this kills performance. Admittedly, I did not do a thorough investigation to see exactly what causes the issue, however, I noticed that in the other driver utilizing the same FDMA engine, we run the FDMA handler in hard IRQ context. Fix this performance issue, by running the FDMA handler in hard IRQ context, not deferring any work to a thread. Prior to this change, the RX UDP performance was: Interval Transfer Bitrate Jitter 0.00-10.20 sec 44.6 MBytes 36.7 Mbits/sec 0.027 ms After this change, the rx UDP performance is: Interval Transfer Bitrate Jitter 0.00-9.12 sec 1.01 GBytes 953 Mbits/sec 0.020 ms Fixes: 10615907e9b5 ("net: sparx5: switchdev: adding frame DMA functionality") Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-09net: lan969x: fix the use of spin_lock in PTP handlerDaniel Machon
We are mixing the use of spin_lock() and spin_lock_irqsave() functions in the PTP handler of lan969x. Fix this by correctly using the _irqsave variants. Fixes: 24fe83541755 ("net: lan969x: add PTP handler function") Signed-off-by: Daniel Machon <daniel.machon@microchip.com> [1]: https://lore.kernel.org/netdev/20241024-sparx5-lan969x-switch-driver-2-v2-10-a0b5fae88a0f@microchip.com/ Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-09net: lan969x: fix cyclic dependency reported by depmodDaniel Machon
Depmod reports a cyclic dependency between modules sparx5-switch.ko and lan969x-switch.ko: depmod: ERROR: Cycle detected: lan969x_switch -> sparx5_switch -> lan969x_switch depmod: ERROR: Found 2 modules in dependency cycles! make[2]: *** [scripts/Makefile.modinst:132: depmod] Error 1 make: *** [Makefile:224: __sub-make] Error 2 This makes sense, as they both require symbols from each other. Fix this by compiling lan969x support into the sparx5-switch.ko module. In order to do this, in a sensible way, we move the lan969x/ dir into the sparx5/ dir and do some code cleanup of code that is no longer required. After this patch, depmod will no longer complain, as lan969x support is compiled into the sparx5-swicth.ko module, and can no longer be compiled as a standalone module. Fixes: 98a01119608d ("net: sparx5: add compatible string for lan969x") Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-08Linux 6.13-rc2v6.13-rc2Linus Torvalds
2024-12-08Merge tag 'kbuild-fixes-v6.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Fix a section mismatch warning in modpost - Fix Debian package build error with the O= option * tag 'kbuild-fixes-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: deb-pkg: fix build error with O= modpost: Add .irqentry.text to OTHER_SECTIONS
2024-12-08Merge tag 'irq_urgent_for_v6.13_rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Borislav Petkov: - Fix a /proc/interrupts formatting regression - Have the BCM2836 interrupt controller enter power management states properly - Other fixlets * tag 'irq_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/stm32mp-exti: CONFIG_STM32MP_EXTI should not default to y when compile-testing genirq/proc: Add missing space separator back irqchip/bcm2836: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND irqchip/gic-v3: Fix irq_complete_ack() comment
2024-12-08Merge tag 'timers_urgent_for_v6.13_rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Borislav Petkov: - Handle the case where clocksources with small counter width can, in conjunction with overly long idle sleeps, falsely trigger the negative motion detection of clocksources * tag 'timers_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource: Make negative motion detection more robust
2024-12-08Merge tag 'x86_urgent_for_v6.13_rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Have the Automatic IBRS setting check on AMD does not falsely fire in the guest when it has been set already on the host - Make sure cacheinfo structures memory is allocated to address a boot NULL ptr dereference on Intel Meteor Lake which has different numbers of subleafs in its CPUID(4) leaf - Take care of the GDT restoring on the kexec path too, as expected by the kernel - Make sure SMP is not disabled when IO-APIC is disabled on the kernel cmdline - Add a PGD flag _PAGE_NOPTISHADOW to instruct machinery not to propagate changes to the kernelmode page tables, to the user portion, in PTI - Mark Intel Lunar Lake as affected by an issue where MONITOR wakeups can get lost and thus user-visible delays happen - Make sure PKRU is properly restored with XRSTOR on AMD after a PRKU write of 0 (WRPKRU) which will mark PKRU in its init state and thus lose the actual buffer * tag 'x86_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/CPU/AMD: WARN when setting EFER.AUTOIBRS if and only if the WRMSR fails x86/cacheinfo: Delete global num_cache_leaves cacheinfo: Allocate memory during CPU hotplug if not done from the primary CPU x86/kexec: Restore GDT on return from ::preserve_context kexec x86/cpu/topology: Remove limit of CPUs due to disabled IO/APIC x86/mm: Add _PAGE_NOPTISHADOW bit to avoid updating userspace page tables x86/cpu: Add Lunar Lake to list of CPUs with a broken MONITOR implementation x86/pkeys: Ensure updated PKRU value is XRSTOR'd x86/pkeys: Change caller of update_pkru_in_sigframe()
2024-12-08Merge tag 'mm-hotfixes-stable-2024-12-07-22-39' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "24 hotfixes. 17 are cc:stable. 15 are MM and 9 are non-MM. The usual bunch of singletons - please see the relevant changelogs for details" * tag 'mm-hotfixes-stable-2024-12-07-22-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (24 commits) iio: magnetometer: yas530: use signed integer type for clamp limits sched/numa: fix memory leak due to the overwritten vma->numab_state mm/damon: fix order of arguments in damos_before_apply tracepoint lib: stackinit: hide never-taken branch from compiler mm/filemap: don't call folio_test_locked() without a reference in next_uptodate_folio() scatterlist: fix incorrect func name in kernel-doc mm: correct typo in MMAP_STATE() macro mm: respect mmap hint address when aligning for THP mm: memcg: declare do_memsw_account inline mm/codetag: swap tags when migrate pages ocfs2: update seq_file index in ocfs2_dlm_seq_next stackdepot: fix stack_depot_save_flags() in NMI context mm: open-code page_folio() in dump_page() mm: open-code PageTail in folio_flags() and const_folio_flags() mm: fix vrealloc()'s KASAN poisoning logic Revert "readahead: properly shorten readahead when falling back to do_page_cache_ra()" selftests/damon: add _damon_sysfs.py to TEST_FILES selftest: hugetlb_dio: fix test naming ocfs2: free inode when ocfs2_get_init_inode() fails nilfs2: fix potential out-of-bounds memory access in nilfs_find_entry() ...
2024-12-08tracing/eprobe: Fix to release eprobe when failed to add dyn_eventMasami Hiramatsu (Google)
Fix eprobe event to unregister event call and release eprobe when it fails to add dynamic event correctly. Link: https://lore.kernel.org/all/173289886698.73724.1959899350183686006.stgit@devnote2/ Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-12-08kbuild: deb-pkg: fix build error with O=Masahiro Yamada
Since commit 13b25489b6f8 ("kbuild: change working directory to external module directory with M="), the Debian package build fails if a relative path is specified with the O= option. $ make O=build bindeb-pkg [ snip ] dpkg-deb: building package 'linux-image-6.13.0-rc1' in '../linux-image-6.13.0-rc1_6.13.0-rc1-6_amd64.deb'. Rebuilding host programs with x86_64-linux-gnu-gcc... make[6]: Entering directory '/home/masahiro/linux/build' /home/masahiro/linux/Makefile:190: *** specified kernel directory "build" does not exist. Stop. This occurs because the sub_make_done flag is cleared, even though the working directory is already in the output directory. Passing KBUILD_OUTPUT=. resolves the issue. Fixes: 13b25489b6f8 ("kbuild: change working directory to external module directory with M=") Reported-by: Charlie Jenkins <charlie@rivosinc.com> Closes: https://lore.kernel.org/all/Z1DnP-GJcfseyrM3@ghost/ Tested-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-12-08modpost: Add .irqentry.text to OTHER_SECTIONSThomas Gleixner
The compiler can fully inline the actual handler function of an interrupt entry into the .irqentry.text entry point. If such a function contains an access which has an exception table entry, modpost complains about a section mismatch: WARNING: vmlinux.o(__ex_table+0x447c): Section mismatch in reference ... The relocation at __ex_table+0x447c references section ".irqentry.text" which is not in the list of authorized sections. Add .irqentry.text to OTHER_SECTIONS to cure the issue. Reported-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org # needed for linux-5.4-y Link: https://lore.kernel.org/all/20241128111844.GE10431@google.com/ Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-12-07rtnetlink: fix error code in rtnl_newlink()Dan Carpenter
If rtnl_get_peer_net() fails, then propagate the error code. Don't return success. Fixes: 48327566769a ("rtnetlink: fix double call of rtnl_link_get_net_ifla()") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/a2d20cd4-387a-4475-887c-bb7d0e88e25a@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-07Merge branch 'ocelot-ptp-fixes'Jakub Kicinski
Vladimir Oltean says: ==================== Ocelot PTP fixes This is another attempt at "net: mscc: ocelot: be resilient to loss of PTP packets during transmission". https://lore.kernel.org/netdev/20241203164755.16115-1-vladimir.oltean@nxp.com/ The central change is in patch 4/5. It recovers a port from a very reproducible condition where previously, it would become unable to take any hardware TX timestamp. Then we have patches 1/5 and 5/5 which are extra bug fixes. Patches 2/5 and 3/5 are logical changes split out of the monolithic patch previously submitted as v1, so that patch 4/5 is hopefully easier to follow. ==================== Link: https://patch.msgid.link/20241205145519.1236778-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-07net: mscc: ocelot: perform error cleanup in ocelot_hwstamp_set()Vladimir Oltean
An unsupported RX filter will leave the port with TX timestamping still applied as per the new request, rather than the old setting. When parsing the tx_type, don't apply it just yet, but delay that until after we've parsed the rx_filter as well (and potentially returned -ERANGE for that). Similarly, copy_to_user() may fail, which is a rare occurrence, but should still be treated by unwinding what was done. Fixes: 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241205145519.1236778-6-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-07net: mscc: ocelot: be resilient to loss of PTP packets during transmissionVladimir Oltean
The Felix DSA driver presents unique challenges that make the simplistic ocelot PTP TX timestamping procedure unreliable: any transmitted packet may be lost in hardware before it ever leaves our local system. This may happen because there is congestion on the DSA conduit, the switch CPU port or even user port (Qdiscs like taprio may delay packets indefinitely by design). The technical problem is that the kernel, i.e. ocelot_port_add_txtstamp_skb(), runs out of timestamp IDs eventually, because it never detects that packets are lost, and keeps the IDs of the lost packets on hold indefinitely. The manifestation of the issue once the entire timestamp ID range becomes busy looks like this in dmesg: mscc_felix 0000:00:00.5: port 0 delivering skb without TX timestamp mscc_felix 0000:00:00.5: port 1 delivering skb without TX timestamp At the surface level, we need a timeout timer so that the kernel knows a timestamp ID is available again. But there is a deeper problem with the implementation, which is the monotonically increasing ocelot_port->ts_id. In the presence of packet loss, it will be impossible to detect that and reuse one of the holes created in the range of free timestamp IDs. What we actually need is a bitmap of 63 timestamp IDs tracking which one is available. That is able to use up holes caused by packet loss, but also gives us a unique opportunity to not implement an actual timer_list for the timeout timer (very complicated in terms of locking). We could only declare a timestamp ID stale on demand (lazily), aka when there's no other timestamp ID available. There are pros and cons to this approach: the implementation is much more simple than per-packet timers would be, but most of the stale packets would be quasi-leaked - not really leaked, but blocked in driver memory, since this algorithm sees no reason to free them. An improved technique would be to check for stale timestamp IDs every time we allocate a new one. Assuming a constant flux of PTP packets, this avoids stale packets being blocked in memory, but of course, packets lost at the end of the flux are still blocked until the flux resumes (nobody left to kick them out). Since implementing per-packet timers is way too complicated, this should be good enough. Testing procedure: Persistently block traffic class 5 and try to run PTP on it: $ tc qdisc replace dev swp3 parent root taprio num_tc 8 \ map 0 1 2 3 4 5 6 7 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ base-time 0 sched-entry S 0xdf 100000 flags 0x2 [ 126.948141] mscc_felix 0000:00:00.5: port 3 tc 5 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS $ ptp4l -i swp3 -2 -P -m --socket_priority 5 --fault_reset_interval ASAP --logSyncInterval -3 ptp4l[70.351]: port 1 (swp3): INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[70.354]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[70.358]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING on INIT_COMPLETE [ 70.394583] mscc_felix 0000:00:00.5: port 3 timestamp id 0 ptp4l[70.406]: timed out while polling for tx timestamp ptp4l[70.406]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it ptp4l[70.406]: port 1 (swp3): send peer delay response failed ptp4l[70.407]: port 1 (swp3): clearing fault immediately ptp4l[70.952]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1 [ 71.394858] mscc_felix 0000:00:00.5: port 3 timestamp id 1 ptp4l[71.400]: timed out while polling for tx timestamp ptp4l[71.400]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it ptp4l[71.401]: port 1 (swp3): send peer delay response failed ptp4l[71.401]: port 1 (swp3): clearing fault immediately [ 72.393616] mscc_felix 0000:00:00.5: port 3 timestamp id 2 ptp4l[72.401]: timed out while polling for tx timestamp ptp4l[72.402]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it ptp4l[72.402]: port 1 (swp3): send peer delay response failed ptp4l[72.402]: port 1 (swp3): clearing fault immediately ptp4l[72.952]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1 [ 73.395291] mscc_felix 0000:00:00.5: port 3 timestamp id 3 ptp4l[73.400]: timed out while polling for tx timestamp ptp4l[73.400]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it ptp4l[73.400]: port 1 (swp3): send peer delay response failed ptp4l[73.400]: port 1 (swp3): clearing fault immediately [ 74.394282] mscc_felix 0000:00:00.5: port 3 timestamp id 4 ptp4l[74.400]: timed out while polling for tx timestamp ptp4l[74.401]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it ptp4l[74.401]: port 1 (swp3): send peer delay response failed ptp4l[74.401]: port 1 (swp3): clearing fault immediately ptp4l[74.953]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1 [ 75.396830] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 0 which seems lost [ 75.405760] mscc_felix 0000:00:00.5: port 3 timestamp id 0 ptp4l[75.410]: timed out while polling for tx timestamp ptp4l[75.411]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it ptp4l[75.411]: port 1 (swp3): send peer delay response failed ptp4l[75.411]: port 1 (swp3): clearing fault immediately (...) Remove the blocking condition and see that the port recovers: $ same tc command as above, but use "sched-entry S 0xff" instead $ same ptp4l command as above ptp4l[99.489]: port 1 (swp3): INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[99.490]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[99.492]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING on INIT_COMPLETE [ 100.403768] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 0 which seems lost [ 100.412545] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 1 which seems lost [ 100.421283] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 2 which seems lost [ 100.430015] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 3 which seems lost [ 100.438744] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 4 which seems lost [ 100.447470] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 100.505919] mscc_felix 0000:00:00.5: port 3 timestamp id 0 ptp4l[100.963]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1 [ 101.405077] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 101.507953] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 102.405405] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 102.509391] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 103.406003] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 103.510011] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 104.405601] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 104.510624] mscc_felix 0000:00:00.5: port 3 timestamp id 0 ptp4l[104.965]: selected best master clock d858d7.fffe.00ca6d ptp4l[104.966]: port 1 (swp3): assuming the grand master role ptp4l[104.967]: port 1 (swp3): LISTENING to GRAND_MASTER on RS_GRAND_MASTER [ 105.106201] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 105.232420] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 105.359001] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 105.405500] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 105.485356] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 105.511220] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 105.610938] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 105.737237] mscc_felix 0000:00:00.5: port 3 timestamp id 0 (...) Notice that in this new usage pattern, a non-congested port should basically use timestamp ID 0 all the time, progressing to higher numbers only if there are unacknowledged timestamps in flight. Compare this to the old usage, where the timestamp ID used to monotonically increase modulo OCELOT_MAX_PTP_ID. In terms of implementation, this simplifies the bookkeeping of the ocelot_port :: ts_id and ptp_skbs_in_flight. Since we need to traverse the list of two-step timestampable skbs for each new packet anyway, the information can already be computed and does not need to be stored. Also, ocelot_port->tx_skbs is always accessed under the switch-wide ocelot->ts_id_lock IRQ-unsafe spinlock, so we don't need the skb queue's lock and can use the unlocked primitives safely. This problem was actually detected using the tc-taprio offload, and is causing trouble in TSN scenarios, which Felix (NXP LS1028A / VSC9959) supports but Ocelot (VSC7514) does not. Thus, I've selected the commit to blame as the one adding initial timestamping support for the Felix switch. Fixes: c0bcf537667c ("net: dsa: ocelot: add hardware timestamping support for Felix") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241205145519.1236778-5-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-07net: mscc: ocelot: ocelot->ts_id_lock and ocelot_port->tx_skbs.lock are IRQ-safeVladimir Oltean
ocelot_get_txtstamp() is a threaded IRQ handler, requested explicitly as such by both ocelot_ptp_rdy_irq_handler() and vsc9959_irq_handler(). As such, it runs with IRQs enabled, and not in hardirq context. Thus, ocelot_port_add_txtstamp_skb() has no reason to turn off IRQs, it cannot be preempted by ocelot_get_txtstamp(). For the same reason, dev_kfree_skb_any_reason() will always evaluate as kfree_skb_reason() in this calling context, so just simplify the dev_kfree_skb_any() call to kfree_skb(). Also, ocelot_port_txtstamp_request() runs from NET_TX softirq context, not with hardirqs enabled. Thus, ocelot_get_txtstamp() which shares the ocelot_port->tx_skbs.lock lock with it, has no reason to disable hardirqs. This is part of a larger rework of the TX timestamping procedure. A logical subportion of the rework has been split into a separate change. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241205145519.1236778-4-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-07net: mscc: ocelot: improve handling of TX timestamp for unknown skbVladimir Oltean
This condition, theoretically impossible to trigger, is not really handled well. By "continuing", we are skipping the write to SYS_PTP_NXT which advances the timestamp FIFO to the next entry. So we are reading the same FIFO entry all over again, printing stack traces and eventually killing the kernel. No real problem has been observed here. This is part of a larger rework of the timestamp IRQ procedure, with this logical change split out into a patch of its own. We will need to "goto next_ts" for other conditions as well. Fixes: 9fde506e0c53 ("net: mscc: ocelot: warn when a PTP IRQ is raised for an unknown skb") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241205145519.1236778-3-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-07net: mscc: ocelot: fix memory leak on ocelot_port_add_txtstamp_skb()Vladimir Oltean
If ocelot_port_add_txtstamp_skb() fails, for example due to a full PTP timestamp FIFO, we must undo the skb_clone_sk() call with kfree_skb(). Otherwise, the reference to the skb clone is lost. Fixes: 52849bcf0029 ("net: mscc: ocelot: avoid overflowing the PTP timestamp FIFO") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241205145519.1236778-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-07ip: Return drop reason if in_dev is NULL in ip_route_input_rcu().Kuniyuki Iwashima
syzkaller reported a warning in __sk_skb_reason_drop(). Commit 61b95c70f344 ("net: ip: make ip_route_input_rcu() return drop reasons") missed a path where -EINVAL is returned. Then, the cited commit started to trigger the warning with the invalid error. Let's fix it by returning SKB_DROP_REASON_NOT_SPECIFIED. [0]: WARNING: CPU: 0 PID: 10 at net/core/skbuff.c:1216 __sk_skb_reason_drop net/core/skbuff.c:1216 [inline] WARNING: CPU: 0 PID: 10 at net/core/skbuff.c:1216 sk_skb_reason_drop+0x97/0x1b0 net/core/skbuff.c:1241 Modules linked in: CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Not tainted 6.12.0-10686-gbb18265c3aba #10 1c308307628619808b5a4a0495c4aab5637b0551 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 Workqueue: wg-crypt-wg2 wg_packet_decrypt_worker RIP: 0010:__sk_skb_reason_drop net/core/skbuff.c:1216 [inline] RIP: 0010:sk_skb_reason_drop+0x97/0x1b0 net/core/skbuff.c:1241 Code: 5d 41 5c 41 5d 41 5e e9 e7 9e 95 fd e8 e2 9e 95 fd 31 ff 44 89 e6 e8 58 a1 95 fd 45 85 e4 0f 85 a2 00 00 00 e8 ca 9e 95 fd 90 <0f> 0b 90 e8 c1 9e 95 fd 44 89 e6 bf 01 00 00 00 e8 34 a1 95 fd 41 RSP: 0018:ffa0000000007650 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 000000000000ffff RCX: ffffffff83bc3592 RDX: ff110001002a0000 RSI: ffffffff83bc34d6 RDI: 0000000000000007 RBP: ff11000109ee85f0 R08: 0000000000000001 R09: ffe21c00213dd0da R10: 000000000000ffff R11: 0000000000000000 R12: 00000000ffffffea R13: 0000000000000000 R14: ff11000109ee86d4 R15: ff11000109ee8648 FS: 0000000000000000(0000) GS:ff1100011a000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020177000 CR3: 0000000108a3d006 CR4: 0000000000771ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000600 PKRU: 55555554 Call Trace: <IRQ> kfree_skb_reason include/linux/skbuff.h:1263 [inline] ip_rcv_finish_core.constprop.0+0x896/0x2320 net/ipv4/ip_input.c:424 ip_list_rcv_finish.constprop.0+0x1b2/0x710 net/ipv4/ip_input.c:610 ip_sublist_rcv net/ipv4/ip_input.c:636 [inline] ip_list_rcv+0x34a/0x460 net/ipv4/ip_input.c:670 __netif_receive_skb_list_ptype net/core/dev.c:5715 [inline] __netif_receive_skb_list_core+0x536/0x900 net/core/dev.c:5762 __netif_receive_skb_list net/core/dev.c:5814 [inline] netif_receive_skb_list_internal+0x77c/0xdc0 net/core/dev.c:5905 gro_normal_list include/net/gro.h:515 [inline] gro_normal_list include/net/gro.h:511 [inline] napi_complete_done+0x219/0x8c0 net/core/dev.c:6256 wg_packet_rx_poll+0xbff/0x1e40 drivers/net/wireguard/receive.c:488 __napi_poll.constprop.0+0xb3/0x530 net/core/dev.c:6877 napi_poll net/core/dev.c:6946 [inline] net_rx_action+0x9eb/0xe30 net/core/dev.c:7068 handle_softirqs+0x1ac/0x740 kernel/softirq.c:554 do_softirq kernel/softirq.c:455 [inline] do_softirq+0x48/0x80 kernel/softirq.c:442 </IRQ> <TASK> __local_bh_enable_ip+0xed/0x110 kernel/softirq.c:382 spin_unlock_bh include/linux/spinlock.h:396 [inline] ptr_ring_consume_bh include/linux/ptr_ring.h:367 [inline] wg_packet_decrypt_worker+0x3ba/0x580 drivers/net/wireguard/receive.c:499 process_one_work+0x940/0x1a70 kernel/workqueue.c:3229 process_scheduled_works kernel/workqueue.c:3310 [inline] worker_thread+0x639/0xe30 kernel/workqueue.c:3391 kthread+0x283/0x350 kernel/kthread.c:389 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:244 </TASK> Fixes: 82d9983ebeb8 ("net: ip: make ip_route_input_noref() return drop reasons") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20241206020715.80207-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-07net: stmmac: fix TSO DMA API usage causing oopsRussell King (Oracle)
Commit 66600fac7a98 ("net: stmmac: TSO: Fix unbalanced DMA map/unmap for non-paged SKB data") moved the assignment of tx_skbuff_dma[]'s members to be later in stmmac_tso_xmit(). The buf (dma cookie) and len stored in this structure are passed to dma_unmap_single() by stmmac_tx_clean(). The DMA API requires that the dma cookie passed to dma_unmap_single() is the same as the value returned from dma_map_single(). However, by moving the assignment later, this is not the case when priv->dma_cap.addr64 > 32 as "des" is offset by proto_hdr_len. This causes problems such as: dwc-eth-dwmac 2490000.ethernet eth0: Tx DMA map failed and with DMA_API_DEBUG enabled: DMA-API: dwc-eth-dwmac 2490000.ethernet: device driver tries to +free DMA memory it has not allocated [device address=0x000000ffffcf65c0] [size=66 bytes] Fix this by maintaining "des" as the original DMA cookie, and use tso_des to pass the offset DMA cookie to stmmac_tso_allocator(). Full details of the crashes can be found at: https://lore.kernel.org/all/d8112193-0386-4e14-b516-37c2d838171a@nvidia.com/ https://lore.kernel.org/all/klkzp5yn5kq5efgtrow6wbvnc46bcqfxs65nz3qy77ujr5turc@bwwhelz2l4dw/ Reported-by: Jon Hunter <jonathanh@nvidia.com> Reported-by: Thierry Reding <thierry.reding@gmail.com> Fixes: 66600fac7a98 ("net: stmmac: TSO: Fix unbalanced DMA map/unmap for non-paged SKB data") Tested-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Furong Xu <0x1207@gmail.com> Link: https://patch.msgid.link/E1tJXcx-006N4Z-PC@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-07Merge tag '6.13-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: - DFS fix (for race with tree disconnect and dfs cache worker) - Four fixes for SMB3.1.1 posix extensions: - improve special file support e.g. to Samba, retrieving the file type earlier - reduce roundtrips (e.g. on ls -l, in some cases) * tag '6.13-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: fix potential race in cifs_put_tcon() smb3.1.1: fix posix mounts to older servers fs/smb/client: cifs_prime_dcache() for SMB3 POSIX reparse points fs/smb/client: Implement new SMB3 POSIX type fs/smb/client: avoid querying SMB2_OP_QUERY_WSL_EA for SMB3 POSIX
2024-12-07Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Large number of small fixes, all in drivers" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (32 commits) scsi: scsi_debug: Fix hrtimer support for ndelay scsi: storvsc: Do not flag MAINTENANCE_IN return of SRB_STATUS_DATA_OVERRUN as an error scsi: ufs: core: Add missing post notify for power mode change scsi: sg: Fix slab-use-after-free read in sg_release() scsi: ufs: core: sysfs: Prevent div by zero scsi: qla2xxx: Update version to 10.02.09.400-k scsi: qla2xxx: Supported speed displayed incorrectly for VPorts scsi: qla2xxx: Fix NVMe and NPIV connect issue scsi: qla2xxx: Remove check req_sg_cnt should be equal to rsp_sg_cnt scsi: qla2xxx: Fix use after free on unload scsi: qla2xxx: Fix abort in bsg timeout scsi: mpi3mr: Update driver version to 8.12.0.3.50 scsi: mpi3mr: Handling of fault code for insufficient power scsi: mpi3mr: Start controller indexing from 0 scsi: mpi3mr: Fix corrupt config pages PHY state is switched in sysfs scsi: mpi3mr: Synchronize access to ioctl data buffer scsi: mpt3sas: Update driver version to 51.100.00.00 scsi: mpt3sas: Diag-Reset when Doorbell-In-Use bit is set during driver load time scsi: ufs: pltfrm: Dellocate HBA during ufshcd_pltfrm_remove() scsi: ufs: pltfrm: Drop PM runtime reference count after ufshcd_remove() ...
2024-12-07Merge tag 'block-6.13-20241207' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - NVMe pull request via Keith: - Target fix using incorrect zero buffer (Nilay) - Device specifc deallocate quirk fixes (Christoph, Keith) - Fabrics fix for handling max command target bugs (Maurizio) - Cocci fix usage for kzalloc (Yu-Chen) - DMA size fix for host memory buffer feature (Christoph) - Fabrics queue cleanup fixes (Chunguang) - CPU hotplug ordering fixes - Add missing MODULE_DESCRIPTION for rnull - bcache error value fix - virtio-blk queue freeze fix * tag 'block-6.13-20241207' of git://git.kernel.dk/linux: blk-mq: move cpuhp callback registering out of q->sysfs_lock blk-mq: register cpuhp callback after hctx is added to xarray table virtio-blk: don't keep queue frozen during system suspend nvme-tcp: simplify nvme_tcp_teardown_io_queues() nvme-tcp: no need to quiesce admin_q in nvme_tcp_teardown_io_queues() nvme-rdma: unquiesce admin_q before destroy it nvme-tcp: fix the memleak while create new ctrl failed nvme-pci: don't use dma_alloc_noncontiguous with 0 merge boundary nvmet: replace kmalloc + memset with kzalloc for data allocation nvme-fabrics: handle zero MAXCMD without closing the connection bcache: revert replacing IS_ERR_OR_NULL with IS_ERR again nvme-pci: remove two deallocate zeroes quirks block: rnull: add missing MODULE_DESCRIPTION nvme: don't apply NVME_QUIRK_DEALLOCATE_ZEROES when DSM is not supported nvmet: use kzalloc instead of ZERO_PAGE in nvme_execute_identify_ns_nvm()
2024-12-07Merge tag 'io_uring-6.13-20241207' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring fix from Jens Axboe: "A single fix for a parameter type which affects 32-bit" * tag 'io_uring-6.13-20241207' of git://git.kernel.dk/linux: io_uring: Change res2 parameter type in io_uring_cmd_done