linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2024-05-02	KVM: selftests: Require KVM_CAP_USER_MEMORY2 for tests that create memslots	Sean Christopherson
	Explicitly require KVM_CAP_USER_MEMORY2 for selftests that create memslots, i.e. skip selftests that need memslots instead of letting them fail on KVM_SET_USER_MEMORY_REGION2. While it's ok to take a dependency on new kernel features, selftests should skip gracefully instead of failing hard when run on older kernels. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/69ae0694-8ca3-402c-b864-99b500b24f5d@moroto.mountain Suggested-by: Shuah Khan <skhan@linuxfoundation.org> Link: https://lore.kernel.org/r/20240430162133.337541-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-05-02	KVM: selftests: Allow skipping the KVM_RUN sanity check in rseq_test	Zide Chen
	The rseq test's migration worker delays 1-10 us, assuming that one KVM_RUN iteration only takes a few microseconds. But if the CPU low power wakeup latency is large enough, for example, hundreds or even thousands of microseconds for deep C-state exit latencies on x86 server CPUs, it may happen that the target CPU is unable to wakeup and run the vCPU before the migration worker starts to migrate the vCPU thread to the _next_ CPU. If the system workload is light, most CPUs could be at a certain low power state, which may result in less successful migrations and fail the migration/KVM_RUN ratio sanity check. But this is not supposed to be deemed a test failure. Add a command line option to skip the sanity check, along with a comment and a verbose assert message to try to help the user resolve the potential source of failures without having to resort to disabling the check. Co-developed-by: Dongsheng Zhang <dongsheng.x.zhang@intel.com> Signed-off-by: Dongsheng Zhang <dongsheng.x.zhang@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Link: https://lore.kernel.org/r/20240502213936.27619-1-zide.chen@intel.com [sean: massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-05-02	selftests/bpf: Add kernel socket operation tests	Jordan Rife
	This patch creates two sets of sock_ops that call out to the SYSCALL hooks in the sock_addr_kern BPF program and uses them to construct test cases for the range of supported operations (kernel_connect(), kernel_bind(), kernel_sendms(), sock_sendmsg(), kernel_getsockname(), kenel_getpeername()). This ensures that these interact with BPF sockaddr hooks as intended. Beyond this it also ensures that these operations do not modify their address parameter, providing regression coverage for the issues addressed by this set of patches: - commit 0bdf399342c5("net: Avoid address overwrite in kernel_connect") - commit 86a7e0b69bd5("net: prevent rewrite of msg_name in sock_sendmsg()") - commit c889a99a21bf("net: prevent address rewrite in kernel_bind()") - commit 01b2885d9415("net: Save and restore msg_namelen in sock_sendmsg") Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240429214529.2644801-7-jrife@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-02	selftests/bpf: Make sock configurable for each test case	Jordan Rife
	In order to reuse the same test code for both socket system calls (e.g. connect(), bind(), etc.) and kernel socket functions (e.g. kernel_connect(), kernel_bind(), etc.), this patch introduces the "ops" field to sock_addr_test. This field allows each test cases to configure the set of functions used in the test case to create, manipulate, and tear down a socket. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240429214529.2644801-6-jrife@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-02	selftests/bpf: Move IPv4 and IPv6 sockaddr test cases	Jordan Rife
	This patch lays the groundwork for testing IPv4 and IPv6 sockaddr hooks and their interaction with both socket syscalls and kernel functions (e.g. kernel_connect, kernel_bind, etc.). It moves some of the test cases from the old-style bpf/test_sock_addr.c self test into the sock_addr prog_test in a step towards fully retiring bpf/test_sock_addr.c. We will expand the test dimensions in the sock_addr prog_test in a later patch series in order to migrate the remaining test cases. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240429214529.2644801-5-jrife@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-02	perf maps: Remove check_invariants() from maps__lock()	Namhyung Kim
	I found that the debug build was a slowed down a lot by the maps lock code since it checks the invariants whenever it gets the pointer to the lock. This means it checks twice the invariants before and after the access. Instead, let's move the checking code within the lock area but after any modification and remove it from the read paths. This would remove (more than) half of the maps lock overhead. The time for perf report with a huge data file (200k+ of MMAP2 events). Non-debug Before After --------- -------- -------- 2m 43s 6m 45s 4m 21s Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240429225738.1491791-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-02	selftests/bpf: Implement BPF programs for kernel socket operations	Jordan Rife
	This patch lays out a set of SYSCALL programs that can be used to invoke the socket operation kfuncs in bpf_testmod, allowing a test program to manipulate kernel socket operations from userspace. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240429214529.2644801-4-jrife@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-02	selftests/bpf: Implement socket kfuncs for bpf_testmod	Jordan Rife
	This patch adds a set of kfuncs to bpf_testmod that can be used to manipulate a socket from kernel space. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240429214529.2644801-3-jrife@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-02	selftests/bpf: Fix bind program for big endian systems	Jordan Rife
	Without this fix, the bind4 and bind6 programs will reject bind attempts on big endian systems. This patch ensures that CI tests pass for the s390x architecture. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240429214529.2644801-2-jrife@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-02	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	Cross-merge networking fixes after downstream PR. Conflicts: include/linux/filter.h kernel/bpf/core.c 66e13b615a0c ("bpf: verifier: prevent userspace memory access") d503a04f8bc0 ("bpf: Add support for certain atomics in bpf_arena to x86 JIT") https://lore.kernel.org/all/20240429114939.210328b0@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-02	bpf: Missing trailing slash in tools/testing/selftests/bpf/Makefile	Jose E. Marchesi
	tools/lib/bpf/Makefile assumes that the patch in OUTPUT is a directory and that it includes a trailing slash. This seems to be a common expectation for OUTPUT among all the Makefiles. In the rule for runqslower in tools/testing/selftests/bpf/Makefile the variable BPFTOOL_OUTPUT is set to a directory name that lacks a trailing slash. This results in a malformed BPF_HELPER_DEFS being defined in lib/bpf/Makefile. This problem becomes evident when a file like tools/lib/bpf/bpf_tracing.h gets updated. This patch fixes the problem by adding the missing slash in the value for BPFTOOL_OUTPUT in the $(OUTPUT)/runqslower rule. Regtested by running selftests in bpf-next master and building samples/bpf programs. Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240502140831.23915-1-jose.marchesi@oracle.com
2024-05-02	libbpf: Fix error message in attach_kprobe_multi	Jiri Olsa
	We just failed to retrieve pattern, so we need to print spec instead. Fixes: ddc6b04989eb ("libbpf: Add bpf_program__attach_kprobe_multi_opts function") Reported-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240502075541.1425761-2-jolsa@kernel.org
2024-05-02	libbpf: Fix error message in attach_kprobe_session	Jiri Olsa
	We just failed to retrieve pattern, so we need to print spec instead. Fixes: 2ca178f02b2f ("libbpf: Add support for kprobe session attach") Reported-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240502075541.1425761-1-jolsa@kernel.org
2024-05-02	Merge tag 'net-6.9-rc7' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf. Relatively calm week, likely due to public holiday in most places. No known outstanding regressions. Current release - regressions: - rxrpc: fix wrong alignmask in __page_frag_alloc_align() - eth: e1000e: change usleep_range to udelay in PHY mdic access Previous releases - regressions: - gro: fix udp bad offset in socket lookup - bpf: fix incorrect runtime stat for arm64 - tipc: fix UAF in error path - netfs: fix a potential infinite loop in extract_user_to_sg() - eth: ice: ensure the copied buf is NUL terminated - eth: qeth: fix kernel panic after setting hsuid Previous releases - always broken: - bpf: - verifier: prevent userspace memory access - xdp: use flags field to disambiguate broadcast redirect - bridge: fix multicast-to-unicast with fraglist GSO - mptcp: ensure snd_nxt is properly initialized on connect - nsh: fix outer header access in nsh_gso_segment(). - eth: bcmgenet: fix racing registers access - eth: vxlan: fix stats counters. Misc: - a bunch of MAINTAINERS file updates" * tag 'net-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits) MAINTAINERS: mark MYRICOM MYRI-10G as Orphan MAINTAINERS: remove Ariel Elior net: gro: add flush check in udp_gro_receive_segment net: gro: fix udp bad offset in socket lookup by adding {inner_}network_offset to napi_gro_cb ipv4: Fix uninit-value access in __ip_make_skb() s390/qeth: Fix kernel panic after setting hsuid vxlan: Pull inner IP header in vxlan_rcv(). tipc: fix a possible memleak in tipc_buf_append tipc: fix UAF in error path rxrpc: Clients must accept conn from any address net: core: reject skb_copy(_expand) for fraglist GSO skbs net: bridge: fix multicast-to-unicast with fraglist GSO mptcp: ensure snd_nxt is properly initialized on connect e1000e: change usleep_range to udelay in PHY mdic access net: dsa: mv88e6xxx: Fix number of databases for 88E6141 / 88E6341 cxgb4: Properly lock TX queue for the selftest. rxrpc: Fix using alignmask being zero for __page_frag_alloc_align() vxlan: Add missing VNI filter counter update in arp_reduce(). vxlan: Fix racy device stats updates. net: qede: use return from qede_parse_actions() ...
2024-05-02	perf cs-etm: Improve version detection and error reporting	James Clark
	When the config validation functions are warning about ETMv3, they do it based on "not ETMv4". If the drivers aren't all loaded or the hardware doesn't support Coresight it will appear as "not ETMv4" and then Perf will print the error message "... not supported in ETMv3 ..." which is wrong and confusing. cs_etm_is_etmv4() is also misnamed because it also returns true for ETE because ETE has a superset of the ETMv4 metadata files. Although this was always done in the correct order so it wasn't a bug. Improve all this by making a single get version function which also handles not present as a separate case. Change the ETMv3 error message to only print when ETMv3 is detected, and add a new error message for the not present case. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: Leo Yan <leo.yan@linux.dev> Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240501135753.508022-4-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-02	perf cs-etm: Remove repeated fetches of the ETM PMU	James Clark
	Most functions already have cs_etm_pmu, so it's a bit neater to pass it through rather than itr only to convert it again. Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240501135753.508022-3-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-02	perf cs-etm: Use struct perf_cpu as much as possible	James Clark
	The perf_cpu struct makes some iterators simpler and avoids some mistakes with interchanging CPU IDs with indexes etc. At the moment in this file the conversion to an integer is done somewhere in the middle of the call tree. Change it to delay the conversion to an int until the leaf functions. Some of the usage patterns are duplicated, so instead of changing them all, make cs_etm_get_ro() more reusable and use that everywhere. cs_etm_get_ro() didn't return an error before, but return one now so that it can also be used where an error is needed. Continue to ignore the error where it was already ignored. Use cs_etm_pmu_path_exists() instead of cs_etm_get_ro() in cs_etm_is_etmv4() because cs_etm_get_ro() prints a warning, but path exists is sufficient for this use case. Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240501135753.508022-2-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-02	perf annotate-data: Check kind of stack variables	Namhyung Kim
	I sometimes see ("unknown type") in the result and it was because it didn't check the type of stack variables properly during the instruction tracking. The stack can carry constant values (without type info) and if the target instruction is accessing the stack location, it resulted in the "unknown type". Maybe we could pick one of integer types for the constant, but it doesn't really mean anything useful. Let's just drop the stack slot if it doesn't have a valid type info. Here's an example how it got the unknown type. Note that 0xffffff48 = -0xb8. ----------------------------------------------------------- find data type for 0xffffff48(reg6) at ... CU for ... frame base: cfa=0 fbreg=6 scope: [2/2] (die:11cb97f) bb: [37 - 3a] var [37] reg15 type='int' size=0x4 (die:0x1180633) bb: [40 - 4b] mov [40] imm=0x1 -> reg13 var [45] reg8 type='sigset_t' size=0x8 (die:0x11a39ee) mov [45] imm=0x1 -> reg2 <--- here reg2 has a constant bb: [215 - 237] mov [218] reg2 -> -0xb8(stack) constant <--- and save it to the stack mov [225] reg13 -> -0xc4(stack) constant call [22f] find_task_by_vgpid call [22f] return -> reg0 type='struct task_struct' size=0x8 (die:0x11881e8) bb: [5c8 - 5cf] bb: [2fb - 302] mov [2fb] -0xc4(stack) -> reg13 constant bb: [13b - 14d] mov [143] 0xd50(reg3) -> reg5 type='struct task_struct*' size=0x8 (die:0xa31f3c) bb: [153 - 153] chk [153] reg6 offset=0xffffff48 ok=0 kind=0 fbreg <--- access here found by insn track: 0xffffff48(reg6) type-offset=0 type='G<EF>^K<F6><AF>U' size=0 (die:0xffffffffffffffff) Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240502060011.1838090-7-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-02	perf annotate-data: Handle multi regs in find_data_type_block()	Namhyung Kim
	The instruction tracking should be the same for the both registers. Just do it once and compare the result with multi regs as with the previous patches. Then we don't need to call find_data_type_block() separately for each reg. Let's remove the 'reg' argument from the relevant functions. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240502060011.1838090-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-02	perf annotate-data: Check memory access with two registers	Namhyung Kim
	The following instruction pattern is used to access a global variable. mov $0x231c0, %rax movsql %edi, %rcx mov -0x7dc94ae0(,%rcx,8), %rcx cmpl $0x0, 0xa60(%rcx,%rax,1) <<<--- here The first instruction set the address of the per-cpu variable (here, it is 'runqueues' of type 'struct rq'). The second instruction seems like a cpu number of the per-cpu base. The third instruction get the base offset of per-cpu area for that cpu. The last instruction compares the value of the per-cpu variable at the offset of 0xa60. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240502060011.1838090-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-02	perf annotate-data: Handle direct global variable access	Namhyung Kim
	Like per-cpu base offset array, sometimes it accesses the global variable directly using the offset. Allow this type of instructions as long as it finds a global variable for the address. movslq %edi, %rcx mov -0x7dc94ae0(,%rcx,8), %rcx <<<--- here As %rcx has a valid type (i.e. array index) from the first instruction, it will be checked by the first case in check_matching_type(). But as it's not a pointer type, the match will fail. But in this case, it should check if it accesses the kernel global array variable. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240502060011.1838090-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-02	perf annotate-data: Collect global variables in advance	Namhyung Kim
	Currently it looks up global variables from the current CU using address and name. But it sometimes fails to find a variable as the variable can come from a different CU - but it's still strange it failed to find a declaration for some reason. Anyway, it can collect all global variables from all CU once and then lookup them later on. This slightly improves the success rate of my test data set. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240502060011.1838090-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-02	perf dwarf-aux: Add die_collect_global_vars()	Namhyung Kim
	This function is to search all global variables in the CU. We want to have the list of global variables at once and match them later. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240502060011.1838090-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-02	Merge branch 'x86/cpu' into perf/core, to pick up dependent commits	Ingo Molnar
	We are going to fix perf-events fallout of changes in tip:x86/cpu, so merge in that branch first. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-05-02	x86/insn: Add support for APX EVEX instructions to the opcode map	Adrian Hunter
	To support APX functionality, the EVEX prefix is used to: - promote legacy instructions - promote VEX instructions - add new instructions Promoted VEX instructions require no extra annotation because the opcodes do not change and the permissive nature of the instruction decoder already allows them to have an EVEX prefix. Promoted legacy instructions and new instructions are placed in map 4 which has not been used before. Create a new table for map 4 and add APX instructions. Annotate SCALABLE instructions with "(es)" - refer to patch "x86/insn: Add support for APX EVEX to the instruction decoder logic". SCALABLE instructions must be represented in both no-prefix (NP) and 66 prefix forms. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240502105853.5338-9-adrian.hunter@intel.com
2024-05-02	x86/insn: Add support for APX EVEX to the instruction decoder logic	Adrian Hunter
	Intel Advanced Performance Extensions (APX) extends the EVEX prefix to support: - extended general purpose registers (EGPRs) i.e. r16 to r31 - Push-Pop Acceleration (PPX) hints - new data destination (NDD) register - suppress status flags writes (NF) of common instructions - new instructions Refer to the Intel Advanced Performance Extensions (Intel APX) Architecture Specification for details. The extended EVEX prefix does not need amended instruction decoder logic, except in one area. Some instructions are defined as SCALABLE which means the EVEX.W bit and EVEX.pp bits are used to determine operand size. Specifically, if an instruction is SCALABLE and EVEX.W is zero, then EVEX.pp value 0 (representing no prefix NP) means default operand size, whereas EVEX.pp value 1 (representing 66 prefix) means operand size override i.e. 16 bits Add an attribute (INAT_EVEX_SCALABLE) to identify such instructions, and amend the logic appropriately. Amend the awk script that generates the attribute tables from the opcode map, to recognise "(es)" as attribute INAT_EVEX_SCALABLE. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240502105853.5338-8-adrian.hunter@intel.com
2024-05-02	x86/insn: x86/insn: Add support for REX2 prefix to the instruction decoder ↵	Adrian Hunter
	opcode map Support for REX2 has been added to the instruction decoder logic and the awk script that generates the attribute tables from the opcode map. Add REX2 prefix byte (0xD5) to the opcode map. Add annotation (!REX2) for map 0/1 opcodes that are reserved under REX2. Add JMPABS to the opcode map and add annotation (REX2) to identify that it has a mandatory REX2 prefix. A separate opcode attribute table is not needed at this time because JMPABS has the same attribute encoding as the MOV instruction that it shares an opcode with i.e. INAT_MOFFSET. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240502105853.5338-7-adrian.hunter@intel.com
2024-05-02	x86/insn: Add support for REX2 prefix to the instruction decoder logic	Adrian Hunter
	Intel Advanced Performance Extensions (APX) uses a new 2-byte prefix named REX2 to select extended general purpose registers (EGPRs) i.e. r16 to r31. The REX2 prefix is effectively an extended version of the REX prefix. REX2 and EVEX are also used with PUSH/POP instructions to provide a Push-Pop Acceleration (PPX) hint. With PPX hints, a CPU will attempt to fast-forward register data between matching PUSH and POP instructions. REX2 is valid only with opcodes in maps 0 and 1. Similar extension for other maps is provided by the EVEX prefix, covered in a separate patch. Some opcodes in maps 0 and 1 are reserved under REX2. One of these is used for a new 64-bit absolute direct jump instruction JMPABS. Refer to the Intel Advanced Performance Extensions (Intel APX) Architecture Specification for details. Define a code value for the REX2 prefix (INAT_PFX_REX2), and add attribute flags for opcodes reserved under REX2 (INAT_NO_REX2) and to identify opcodes (only JMPABS) that require a mandatory REX2 prefix (INAT_REX2_VARIANT). Amend logic to read the REX2 prefix and get the opcode attribute for the map number (0 or 1) encoded in the REX2 prefix. Amend the awk script that generates the attribute tables from the opcode map, to recognise "REX2" as attribute INAT_PFX_REX2, and "(!REX2)" as attribute INAT_NO_REX2, and "(REX2)" as attribute INAT_REX2_VARIANT. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240502105853.5338-6-adrian.hunter@intel.com
2024-05-02	x86/insn: Add misc new Intel instructions	Adrian Hunter
	The x86 instruction decoder is used not only for decoding kernel instructions. It is also used by perf uprobes (user space probes) and by perf tools Intel Processor Trace decoding. Consequently, it needs to support instructions executed by user space also. Add instructions documented in Intel Architecture Instruction Set Extensions and Future Features Programming Reference March 2024 319433-052, that have not been added yet: AADD AAND AOR AXOR CMPccXADD PBNDKB RDMSRLIST URDMSR UWRMSR VBCSTNEBF162PS VBCSTNESH2PS VCVTNEEBF162PS VCVTNEEPH2PS VCVTNEOBF162PS VCVTNEOPH2PS VCVTNEPS2BF16 VPDPB[SU,UU,SS]D[,S] VPDPW[SU,US,UU]D[,S] VPMADD52HUQ VPMADD52LUQ VSHA512MSG1 VSHA512MSG2 VSHA512RNDS2 VSM3MSG1 VSM3MSG2 VSM3RNDS2 VSM4KEY4 VSM4RNDS4 WRMSRLIST TCMMIMFP16PS TCMMRLFP16PS TDPFP16PS PREFETCHIT1 PREFETCHIT0 Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240502105853.5338-5-adrian.hunter@intel.com
2024-05-02	x86/insn: Add VEX versions of VPDPBUSD, VPDPBUSDS, VPDPWSSD and VPDPWSSDS	Adrian Hunter
	The x86 instruction decoder is used not only for decoding kernel instructions. It is also used by perf uprobes (user space probes) and by perf tools Intel Processor Trace decoding. Consequently, it needs to support instructions executed by user space also. Intel Architecture Instruction Set Extensions and Future Features manual number 319433-044 of May 2021, documented VEX versions of instructions VPDPBUSD, VPDPBUSDS, VPDPWSSD and VPDPWSSDS, but the opcode map has them listed as EVEX only. Remove EVEX-only (ev) annotation from instructions VPDPBUSD, VPDPBUSDS, VPDPWSSD and VPDPWSSDS, which allows them to be decoded with either a VEX or EVEX prefix. Fixes: 0153d98f2dd6 ("x86/insn: Add misc instructions to x86 instruction decoder") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240502105853.5338-4-adrian.hunter@intel.com
2024-05-02	x86/insn: Fix PUSH instruction in x86 instruction decoder opcode map	Adrian Hunter
	The x86 instruction decoder is used not only for decoding kernel instructions. It is also used by perf uprobes (user space probes) and by perf tools Intel Processor Trace decoding. Consequently, it needs to support instructions executed by user space also. Opcode 0x68 PUSH instruction is currently defined as 64-bit operand size only i.e. (d64). That was based on Intel SDM Opcode Map. However that is contradicted by the Instruction Set Reference section for PUSH in the same manual. Remove 64-bit operand size only annotation from opcode 0x68 PUSH instruction. Example: $ cat pushw.s .global _start .text _start: pushw $0x1234 mov $0x1,%eax # system call number (sys_exit) int $0x80 $ as -o pushw.o pushw.s $ ld -s -o pushw pushw.o $ objdump -d pushw \| tail -4 0000000000401000 <.text>: 401000: 66 68 34 12 pushw $0x1234 401004: b8 01 00 00 00 mov $0x1,%eax 401009: cd 80 int $0x80 $ perf record -e intel_pt//u ./pushw [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.014 MB perf.data ] Before: $ perf script --insn-trace=disasm Warning: 1 instruction trace errors pushw 10349 [000] 10586.869237014: 401000 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) pushw $0x1234 pushw 10349 [000] 10586.869237014: 401006 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) addb %al, (%rax) pushw 10349 [000] 10586.869237014: 401008 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) addb %cl, %ch pushw 10349 [000] 10586.869237014: 40100a [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) addb $0x2e, (%rax) instruction trace error type 1 time 10586.869237224 cpu 0 pid 10349 tid 10349 ip 0x40100d code 6: Trace doesn't match instruction After: $ perf script --insn-trace=disasm pushw 10349 [000] 10586.869237014: 401000 [unknown] (./pushw) pushw $0x1234 pushw 10349 [000] 10586.869237014: 401004 [unknown] (./pushw) movl $1, %eax Fixes: eb13296cfaf6 ("x86: Instruction decoder API") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240502105853.5338-3-adrian.hunter@intel.com
2024-05-02	x86/insn: Add Key Locker instructions to the opcode map	Chang S. Bae
	The x86 instruction decoder needs to know these new instructions that are going to be used in the crypto library as well as the x86 core code. Add the following: LOADIWKEY: Load a CPU-internal wrapping key. ENCODEKEY128: Wrap a 128-bit AES key to a key handle. ENCODEKEY256: Wrap a 256-bit AES key to a key handle. AESENC128KL: Encrypt a 128-bit block of data using a 128-bit AES key indicated by a key handle. AESENC256KL: Encrypt a 128-bit block of data using a 256-bit AES key indicated by a key handle. AESDEC128KL: Decrypt a 128-bit block of data using a 128-bit AES key indicated by a key handle. AESDEC256KL: Decrypt a 128-bit block of data using a 256-bit AES key indicated by a key handle. AESENCWIDE128KL: Encrypt 8 128-bit blocks of data using a 128-bit AES key indicated by a key handle. AESENCWIDE256KL: Encrypt 8 128-bit blocks of data using a 256-bit AES key indicated by a key handle. AESDECWIDE128KL: Decrypt 8 128-bit blocks of data using a 128-bit AES key indicated by a key handle. AESDECWIDE256KL: Decrypt 8 128-bit blocks of data using a 256-bit AES key indicated by a key handle. The detail can be found in Intel Software Developer Manual. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20240502105853.5338-2-adrian.hunter@intel.com
2024-05-02	Merge tag 'v6.9-rc6' into perf/core, to pick up fixes	Ingo Molnar
	Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-05-01	selftests: netfilter: nft_concat_range.sh: reduce debug kernel run time	Florian Westphal
	Even a 1h timeout isn't enough for nft_concat_range.sh to complete on debug kernels. Reduce test complexity and only match on single entry if KSFT_MACHINE_SLOW is set. To spot 'slow' tests, print the subtest duration (in seconds) in addition to the status. Add new nft_concat_range_perf.sh script, not executed via kselftest, to run the performance (pps match rate) tests. Those need about 25m to complete which seems too much to run this via 'make run_tests'. Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20240430145810.23447-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-01	libbpf: better fix for handling nulled-out struct_ops program	Andrii Nakryiko
	Previous attempt to fix the handling of nulled-out (from skeleton) struct_ops program is working well only if struct_ops program is defined as non-autoloaded by default (i.e., has SEC("?struct_ops") annotation, with question mark). Unfortunately, that fix is incomplete due to how bpf_object_adjust_struct_ops_autoload() is marking referenced or non-referenced struct_ops program as autoloaded (or not). Because bpf_object_adjust_struct_ops_autoload() is run after bpf_map__init_kern_struct_ops() step, which sets program slot to NULL, such programs won't be considered "referenced", and so its autoload property won't be changed. This all sounds convoluted and it is, but the desire is to have as natural behavior (as far as struct_ops usage is concerned) as possible. This fix is redoing the original fix but makes it work for autoloaded-by-default struct_ops programs as well. We achieve this by forcing prog->autoload to false if prog was declaratively set for some struct_ops map, but then nulled-out from skeleton (programmatically). This achieves desired effect of not autoloading it. If such program is still referenced somewhere else (different struct_ops map or different callback field), it will get its autoload property adjusted by bpf_object_adjust_struct_ops_autoload() later. We also fix selftest, which accidentally used SEC("?struct_ops") annotation. It was meant to use autoload-by-default program from the very beginning. Fixes: f973fccd43d3 ("libbpf: handle nulled-out program in struct_ops correctly") Cc: Kui-Feng Lee <thinker.li@gmail.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240501041706.3712608-1-andrii@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-01	selftests/bpf: add tests for the "module: Function" syntax	Viktor Malik
	The previous patch added support for the "module:function" syntax for tracing programs. This adds tests for explicitly specifying the module name via the SEC macro and via the bpf_program__set_attach_target call. Signed-off-by: Viktor Malik <vmalik@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/8a076168ed847f7c8a6c25715737b1fea84e38be.1714469650.git.vmalik@redhat.com
2024-05-01	libbpf: support "module: Function" syntax for tracing programs	Viktor Malik
	In some situations, it is useful to explicitly specify a kernel module to search for a tracing program target (e.g. when a function of the same name exists in multiple modules or in vmlinux). This patch enables that by allowing the "module:function" syntax for the find_kernel_btf_id function. Thanks to this, the syntax can be used both from a SEC macro (i.e. `SEC(fentry/module:function)`) and via the bpf_program__set_attach_target API call. Signed-off-by: Viktor Malik <vmalik@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/9085a8cb9a552de98e554deb22ff7e977d025440.1714469650.git.vmalik@redhat.com
2024-05-01	selftests/ftrace: add fprobe test cases for VFS type "%pd" and "%pD"	Ye Bin
	This patch adds fprobe test cases for new print format type "%pd/%pD".The test cases test the following items: 1. Test "%pd" type for dput(); 2. Test "%pD" type for vfs_read(); This test case require enable CONFIG_HAVE_FUNCTION_ARG_ACCESS_API configuration. Link: https://lore.kernel.org/all/20240322064308.284457-6-yebin10@huawei.com/ Signed-off-by: Ye Bin <yebin10@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-05-01	selftests/ftrace: add kprobe test cases for VFS type "%pd" and "%pD"	Ye Bin
	This patch adds test cases for new print format type "%pd/%pD".The test cases test the following items: 1. Test README if add "%pd/%pD" type; 2. Test "%pd" type for dput(); 3. Test "%pD" type for vfs_read(); This test case require enable CONFIG_HAVE_FUNCTION_ARG_ACCESS_API configuration. Link: https://lore.kernel.org/all/20240322064308.284457-5-yebin10@huawei.com/ Signed-off-by: Ye Bin <yebin10@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-05-01	tools/power turbostat: Enable non-privileged users to read sysfs counters	Patryk Wlazlyn
	A group of counters called "sysfs" displays software C-state request counts and resulting perceived C-state residency. They are not built-in counters that turbostat knows about ahead of time, rather they are discovered in sysfs when turbostat starts. Thus, they are added dynamically, using the same interface as user-added MSR counters. When turbostat enters "no-msr" mode, such as when running as a non-privileged user, it clears all added counters. Updating that to clear only actual MSR added counters allows regular users to see the sysfs counters. [lenb: commit message] Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-05-01	tools/power turbostat: Replace _Static_assert with BUILD_BUG_ON	Patryk Wlazlyn
	So it compiles on GCC older than 9.0. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-05-01	tools/power turbostat: Add ARL-H support	Zhang Rui
	Add turbostat support for ARL-H, which behaves the same as ARL. [lenb: also add ARL-U] Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-05-01	tools/power turbostat: Enhance ARL/LNL support	Zhang Rui
	ARL/LNL don't have PC8, other than that, it behaves the same as CNL. Copy cnl_features for ARL/LNL, except that PC8 support is removed. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-04-30	selftests/bpf: Drop start_server_proto helper	Geliang Tang
	Protocol can be set by __start_server() helper directly now, this makes the heler start_server_proto() useless. This patch drops it, and implenments start_server() using make_sockaddr() and __start_server(). Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Link: https://lore.kernel.org/r/55d8a04e0bb8240a5fda2da3e9bdffe6fc8547b2.1714014697.git.tanggeliang@kylinos.cn Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-04-30	selftests/bpf: Make start_mptcp_server static	Geliang Tang
	start_mptcp_server() shouldn't be a public helper, it only be used in MPTCP tests. This patch moves it into prog_tests/mptcp.c, and implenments it using make_sockaddr() and start_server_addr() instead of using start_server_proto(). Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Link: https://lore.kernel.org/r/50ec7049e280c60a2924937940851f8fee2b73b8.1714014697.git.tanggeliang@kylinos.cn Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-04-30	selftests/bpf: Add opts argument for __start_server	Geliang Tang
	This patch adds network_helper_opts parameter for __start_server() instead of "int protocol" and "int timeout_ms". This not only reduces the number of parameters, but also makes it more flexible. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Link: https://lore.kernel.org/r/127d2f0929980b41f757dcfebe1b667e6bfb43f1.1714014697.git.tanggeliang@kylinos.cn Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-04-30	Merge tag 'kvmarm-fixes-6.9-2' of ↵	Paolo Bonzini
	git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.9, part #2 - Fix + test for a NULL dereference resulting from unsanitised user input in the vgic-v2 device attribute accessors
2024-04-30	cxl/test: Enhance event testing	Ira Weiny
	An issue was found in the processing of event logs when the output buffer length was not reset.[1] This bug was not caught with cxl-test for 2 reasons. First, the test harness mbox_send command [mock_get_event()] does not set the output size based on the amount of data returned like the hardware command does. Second, the simplistic event log testing always returned the same number of elements per-get command. Enhance the simulation of the event log mailbox to better match the bug found with real hardware to cover potential regressions. NOTE: These changes will cause cxl-events.sh in ndctl to fail without the fix from Kwangjin. However, no changes to the user space test was required. Therefore ndctl itself will be compatible with old or new kernels once both patches land in the new kernel. [1] Link: https://lore.kernel.org/all/20240401091057.1044-1-kwangjin.ko@sk.com/ Cc: Kwangjin Ko <kwangjin.ko@sk.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20240401-enhance-event-test-v1-1-6669a524ed38@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30	selftests/bpf: Add sockopt case to verify prog_type	Stanislav Fomichev
	Make sure only sockopt programs can be attached to the setsockopt and getsockopt hooks. Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240426231621.2716876-4-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-04-30	selftests/bpf: Extend sockopt tests to use BPF_LINK_CREATE	Stanislav Fomichev
	Run all existing test cases with the attachment created via BPF_LINK_CREATE. Next commit will add extra test cases to verify link_create attach_type enforcement. Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240426231621.2716876-3-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>