summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)Author
2024-11-01kselftest/arm64: Use ksft_perror() to log MTE failuresMark Brown
The logging in the allocation helpers variously uses ksft_print_msg() with very intermittent logging of errno and perror() (which won't produce KTAP conformant output) when logging the result of API calls that set errno. Standardise on using the ksft_perror() helper in these cases so that more information is available should the tests fail. Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/r/20241029-arm64-mte-test-logging-v1-1-a128e732e36e@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-11-01tracing/selftests: Add tracefs mount options testKalesh Singh
Add a selftest to check that the tracefs gid mount option is applied correctly. ./ftracetest test.d/00basic/mount_options.tc Use the new readme string "[gid=<gid>] as a requirement and also update test_ownership.tc requirements to use this. Cc: Eric Sandeen <sandeen@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Ali Zahraee <ahzahraee@gmail.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/20241030171928.4168869-4-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-31selftests/net: Fix ./ns-XXXXXX not cleanupLi Zhijian
``` readonly STATS="$(mktemp -p /tmp ns-XXXXXX)" readonly BASE=`basename $STATS` ``` It could be a mistake to write to $BASE rather than $STATS, where $STATS is used to save the NSTAT_HISTORY and it will be cleaned up before exit. Although since we've been creating the wrong file this whole time and everything worked, it's fine to remove these 2 lines completely Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Link: https://patch.msgid.link/20241030005943.400225-1-lizhijian@fujitsu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-31selftests: netdevsim: add fib_notifications to MakefileJakub Kicinski
Commit 19d36d2971e6 ("selftests: netdevsim: Add fib_notifications test") added the test but didn't include it in the Makefile. Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20241029192603.509295-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-31netlink: add NLA_POLICY_MAX_LEN macroAntonio Quartulli
Similarly to NLA_POLICY_MIN_LEN, NLA_POLICY_MAX_LEN defines a policy with a maximum length value. The netlink generator for YAML specs has been extended accordingly. Signed-off-by: Antonio Quartulli <antonio@openvpn.net> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20241029-b4-ovpn-v11-1-de4698c73a25@openvpn.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.12-rc6). Conflicts: drivers/net/wireless/intel/iwlwifi/mvm/mld-mac80211.c cbe84e9ad5e2 ("wifi: iwlwifi: mvm: really send iwl_txpower_constraints_cmd") 188a1bf89432 ("wifi: mac80211: re-order assigning channel in activate links") https://lore.kernel.org/all/20241028123621.7bbb131b@canb.auug.org.au/ net/mac80211/cfg.c c4382d5ca1af ("wifi: mac80211: update the right link for tx power") 8dd0498983ee ("wifi: mac80211: Fix setting txpower with emulate_chanctx") drivers/net/ethernet/intel/ice/ice_ptp_hw.h 6e58c3310622 ("ice: fix crash on probe for DPLL enabled E810 LOM") e4291b64e118 ("ice: Align E810T GPIO to other products") ebb2693f8fbd ("ice: Read SDP section from NVM for pin definitions") ac532f4f4251 ("ice: Cleanup unused declarations") https://lore.kernel.org/all/20241030120524.1ee1af18@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-31Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds
Pull bpf fixes from Daniel Borkmann: - Fix BPF verifier to force a checkpoint when the program's jump history becomes too long (Eduard Zingerman) - Add several fixes to the BPF bits iterator addressing issues like memory leaks and overflow problems (Hou Tao) - Fix an out-of-bounds write in trie_get_next_key (Byeonguk Jeong) - Fix BPF test infra's LIVE_FRAME frame update after a page has been recycled (Toke Høiland-Jørgensen) - Fix BPF verifier and undo the 40-bytes extra stack space for bpf_fastcall patterns due to various bugs (Eduard Zingerman) - Fix a BPF sockmap race condition which could trigger a NULL pointer dereference in sock_map_link_update_prog (Cong Wang) - Fix tcp_bpf_recvmsg_parser to retrieve seq_copied from tcp_sk under the socket lock (Jiayuan Chen) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf, test_run: Fix LIVE_FRAME frame update after a page has been recycled selftests/bpf: Add three test cases for bits_iter bpf: Use __u64 to save the bits in bits iterator bpf: Check the validity of nr_words in bpf_iter_bits_new() bpf: Add bpf_mem_alloc_check_size() helper bpf: Free dynamically allocated bits in bpf_iter_bits_destroy() bpf: disallow 40-bytes extra stack for bpf_fastcall patterns selftests/bpf: Add test for trie_get_next_key() bpf: Fix out-of-bounds write in trie_get_next_key() selftests/bpf: Test with a very short loop bpf: Force checkpoint when jmp history is too long bpf: fix filed access without lock sock_map: fix a NULL pointer dereference in sock_map_link_update_prog()
2024-10-31Merge tag 'net-6.12-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from WiFi, bluetooth and netfilter. No known new regressions outstanding. Current release - regressions: - wifi: mt76: do not increase mcu skb refcount if retry is not supported Current release - new code bugs: - wifi: - rtw88: fix the RX aggregation in USB 3 mode - mac80211: fix memory corruption bug in struct ieee80211_chanctx Previous releases - regressions: - sched: - stop qdisc_tree_reduce_backlog on TC_H_ROOT - sch_api: fix xa_insert() error path in tcf_block_get_ext() - wifi: - revert "wifi: iwlwifi: remove retry loops in start" - cfg80211: clear wdev->cqm_config pointer on free - netfilter: fix potential crash in nf_send_reset6() - ip_tunnel: fix suspicious RCU usage warning in ip_tunnel_find() - bluetooth: fix null-ptr-deref in hci_read_supported_codecs - eth: mlxsw: add missing verification before pushing Tx header - eth: hns3: fixed hclge_fetch_pf_reg accesses bar space out of bounds issue Previous releases - always broken: - wifi: mac80211: do not pass a stopped vif to the driver in .get_txpower - netfilter: sanitize offset and length before calling skb_checksum() - core: - fix crash when config small gso_max_size/gso_ipv4_max_size - skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension - mptcp: protect sched with rcu_read_lock - eth: ice: fix crash on probe for DPLL enabled E810 LOM - eth: macsec: fix use-after-free while sending the offloading packet - eth: stmmac: fix unbalanced DMA map/unmap for non-paged SKB data - eth: hns3: fix kernel crash when 1588 is sent on HIP08 devices - eth: mtk_wed: fix path of MT7988 WO firmware" * tag 'net-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits) net: hns3: fix kernel crash when 1588 is sent on HIP08 devices net: hns3: fixed hclge_fetch_pf_reg accesses bar space out of bounds issue net: hns3: initialize reset_timer before hclgevf_misc_irq_init() net: hns3: don't auto enable misc vector net: hns3: Resolved the issue that the debugfs query result is inconsistent. net: hns3: fix missing features due to dev->features configuration too early net: hns3: fixed reset failure issues caused by the incorrect reset type net: hns3: add sync command to sync io-pgtable net: hns3: default enable tx bounce buffer when smmu enabled netfilter: nft_payload: sanitize offset and length before calling skb_checksum() net: ethernet: mtk_wed: fix path of MT7988 WO firmware selftests: forwarding: Add IPv6 GRE remote change tests mlxsw: spectrum_ipip: Fix memory leak when changing remote IPv6 address mlxsw: pci: Sync Rx buffers for device mlxsw: pci: Sync Rx buffers for CPU mlxsw: spectrum_ptp: Add missing verification before pushing Tx header net: skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension Bluetooth: hci: fix null-ptr-deref in hci_read_supported_codecs netfilter: nf_reject_ipv6: fix potential crash in nf_send_reset6() netfilter: Fix use-after-free in get_info() ...
2024-10-31Merge tag 'nf-24-10-31' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== The following patchset contains Netfilter fixes for net: 1) Remove unused parameters in conntrack_dump_flush.c used by selftests, from Liu Jing. 2) Fix possible UaF when removing xtables module via getsockopt() interface, from Dong Chenchen. 3) Fix potential crash in nf_send_reset6() reported by syzkaller. From Eric Dumazet 4) Validate offset and length before calling skb_checksum() in nft_payload, otherwise hitting BUG() is possible. netfilter pull request 24-10-31 * tag 'nf-24-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nft_payload: sanitize offset and length before calling skb_checksum() netfilter: nf_reject_ipv6: fix potential crash in nf_send_reset6() netfilter: Fix use-after-free in get_info() selftests: netfilter: remove unused parameter ==================== Link: https://patch.msgid.link/ Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-30tools/mm: -Werror fixes in page-types/slabinfoWladislav Wiebe
Commit e6d2c436ff693 ("tools/mm: allow users to provide additional cflags/ldflags") passes now CFLAGS to Makefile. With this, build systems with default -Werror enabled found: slabinfo.c:1300:25: error: ignoring return value of 'chdir' declared with attribute 'warn_unused_result' [-Werror=unused-result]                          chdir("..");                          ^~~~~~~~~~~ page-types.c:397:35: error: format '%lu' expects argument of type 'long unsigned int', but argument 2 has type 'uint64_t' {aka 'long long unsigned int'} [-Werror=format=]                          printf("%lu\t", mapcnt0);                                  ~~^     ~~~~~~~ .. Fix page-types by using PRIu64 for uint64_t prints and check in slabinfo for return code on chdir(".."). Link: https://lkml.kernel.org/r/c1ceb507-94bc-461c-934d-c19b77edd825@gmail.com Fixes: e6d2c436ff69 ("tools/mm: allow users to provide additional cflags/ldflags") Signed-off-by: Wladislav Wiebe <wladislav.kw@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Herton R. Krzesinski <herton@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-30selftests: forwarding: Add IPv6 GRE remote change testsIdo Schimmel
Test that after changing the remote address of an ip6gre net device traffic is forwarded as expected. Test with both flat and hierarchical topologies and with and without an input / output keys. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/02b05246d2cdada0cf2fccffc0faa8a424d0f51b.1729866134.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30tests: hsr: Increase timeout to 50 secondsYunshui Jiang
The HSR test, hsr_ping.sh, actually needs 7 min to run. Around 375s to be exact, and even more on a debug kernel or kernel with other network security limits. The timeout setting for the kselftest is currently 45 seconds, which is way too short to integrate hsr tests to run_kselftest infrastructure. However, timeout of hundreds of seconds is quite a long time, especially in a CI/CD environment. It seems that we need accelerate the test and balance with timeout setting. The most time-consuming func is do_ping_long, where ping command sends 10 packages to the given address. The default interval between two ping packages is 1s according to the ping Mannual. There isn't any operation between pings thus we could pass -i 0.1 to ping to make it 10 times faster. While even with this short interval, the test still need about 46.4 seconds to finish because of the two HSR interfaces, each of which is tested by calling do_ping func 12 times and do_ping_long func 19 times and sleep for 3s. So, an explicit setting is also needed to slightly increase the timeout. And to leave us some slack, use 50 as default timeout. Signed-off-by: Yunshui Jiang <jiangyunshui@kylinos.cn> Link: https://patch.msgid.link/20241028082757.2945232-1-jiangyunshui@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30Merge tag 'perf-tools-fixes-for-v6.12-2-2024-10-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools fixes from Arnaldo Carvalho de Melo: - Update more header copies with the kernel sources, including const.h, msr-index.h, arm64's cputype.h, kvm's, bits.h and unaligned.h - The return from 'write' isn't a pid, fix cut'n'paste error in 'perf trace' - Fix up the python binding build on architectures without HAVE_KVM_STAT_SUPPORT - Add some more bounds checks to augmented_raw_syscalls.bpf.c (used to collect syscall pointer arguments in 'perf trace') to make the resulting bytecode to pass the kernel BPF verifier, allowing us to go back accepting clang 12.0.1 as the minimum version required for compiling BPF sources - Add __NR_capget for x86 to fix a regression on running perf + intel PT (hw tracing) as non-root setting up the capabilities as described in https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html - Fix missing syscalltbl in non-explicitly listed architectures, noticed on ARM 32-bit, that still needs a .tbl generator for the syscall id<->name tables, should be added for v6.13 - Handle 'perf test' failure when handling broken DWARF for ASM files * tag 'perf-tools-fixes-for-v6.12-2-2024-10-30' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: perf cap: Add __NR_capget to arch/x86 unistd tools headers: Update the linux/unaligned.h copy with the kernel sources tools headers arm64: Sync arm64's cputype.h with the kernel sources tools headers: Synchronize {uapi/}linux/bits.h with the kernel sources tools arch x86: Sync the msr-index.h copy with the kernel sources perf python: Fix up the build on architectures without HAVE_KVM_STAT_SUPPORT perf test: Handle perftool-testsuite_probe failure due to broken DWARF tools headers UAPI: Sync kvm headers with the kernel sources perf trace: Fix non-listed archs in the syscalltbl routines perf build: Change the clang check back to 12.0.1 perf trace augmented_raw_syscalls: Add more checks to pass the verifier perf trace augmented_raw_syscalls: Add extra array index bounds checking to satisfy some BPF verifiers perf trace: The return from 'write' isn't a pid tools headers UAPI: Sync linux/const.h with the kernel headers
2024-10-30selftests/bpf: Add three test cases for bits_iterHou Tao
Add more test cases for bits iterator: (1) huge word test Verify the multiplication overflow of nr_bits in bits_iter. Without the overflow check, when nr_words is 67108865, nr_bits becomes 64, causing bpf_probe_read_kernel_common() to corrupt the stack. (2) max word test Verify correct handling of maximum nr_words value (511). (3) bad word test Verify early termination of bits iteration when bits iterator initialization fails. Also rename bits_nomem to bits_too_big to better reflect its purpose. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241030100516.3633640-6-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-30selftests/bpf: Add a selftest for bpf_csum_diff()Puranjay Mohan
Add a selftest for the bpf_csum_diff() helper. This selftests runs the helper in all three configurations(push, pull, and diff) and verifies its output. The correct results have been computed by hand and by the helper's older implementation. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20241026125339.26459-5-puranjay@kernel.org
2024-10-30selftests/bpf: Don't mask result of bpf_csum_diff() in test_verifierPuranjay Mohan
The bpf_csum_diff() helper has been fixed to return a 16-bit value for all archs, so now we don't need to mask the result. This commit is basically reverting the below: commit 6185266c5a85 ("selftests/bpf: Mask bpf_csum_diff() return value to 16 bits in test_verifier") Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20241026125339.26459-4-puranjay@kernel.org
2024-10-30selftests: netfilter: remove unused parameterLiu Jing
err is never used, remove it. Signed-off-by: Liu Jing <liujing@cmss.chinamobile.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-10-30selftests: add file SLAB_TYPESAFE_BY_RCU recycling stressorChristian Brauner
Add a simple file stressor that lives directly in-tree. This will create a bunch of processes that each open 500 file descriptors and then use close_range() to close them all. Concurrently, other processes read /proc/<pid>/fd/ which rougly does f = fget_task_next(p, &fd); if (!f) break; data.mode = f->f_mode; fput(f); Which means that it'll try to get a reference to a file in another task's file descriptor table. Under heavy file load it is increasingly likely that the other task will manage to close @file and @file will be recycled due to SLAB_TYPEAFE_BY_RCU concurrently. This will trigger various warnings in the file reference counting code. Link: https://lore.kernel.org/r/20241021-vergab-streuen-924df15dceb9@brauner Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-29bpf: disallow 40-bytes extra stack for bpf_fastcall patternsEduard Zingerman
Hou Tao reported an issue with bpf_fastcall patterns allowing extra stack space above MAX_BPF_STACK limit. This extra stack allowance is not integrated properly with the following verifier parts: - backtracking logic still assumes that stack can't exceed MAX_BPF_STACK; - bpf_verifier_env->scratched_stack_slots assumes only 64 slots are available. Here is an example of an issue with precision tracking (note stack slot -8 tracked as precise instead of -520): 0: (b7) r1 = 42 ; R1_w=42 1: (b7) r2 = 42 ; R2_w=42 2: (7b) *(u64 *)(r10 -512) = r1 ; R1_w=42 R10=fp0 fp-512_w=42 3: (7b) *(u64 *)(r10 -520) = r2 ; R2_w=42 R10=fp0 fp-520_w=42 4: (85) call bpf_get_smp_processor_id#8 ; R0_w=scalar(...) 5: (79) r2 = *(u64 *)(r10 -520) ; R2_w=42 R10=fp0 fp-520_w=42 6: (79) r1 = *(u64 *)(r10 -512) ; R1_w=42 R10=fp0 fp-512_w=42 7: (bf) r3 = r10 ; R3_w=fp0 R10=fp0 8: (0f) r3 += r2 mark_precise: frame0: last_idx 8 first_idx 0 subseq_idx -1 mark_precise: frame0: regs=r2 stack= before 7: (bf) r3 = r10 mark_precise: frame0: regs=r2 stack= before 6: (79) r1 = *(u64 *)(r10 -512) mark_precise: frame0: regs=r2 stack= before 5: (79) r2 = *(u64 *)(r10 -520) mark_precise: frame0: regs= stack=-8 before 4: (85) call bpf_get_smp_processor_id#8 mark_precise: frame0: regs= stack=-8 before 3: (7b) *(u64 *)(r10 -520) = r2 mark_precise: frame0: regs=r2 stack= before 2: (7b) *(u64 *)(r10 -512) = r1 mark_precise: frame0: regs=r2 stack= before 1: (b7) r2 = 42 9: R2_w=42 R3_w=fp42 9: (95) exit This patch disables the additional allowance for the moment. Also, two test cases are removed: - bpf_fastcall_max_stack_ok: it fails w/o additional stack allowance; - bpf_fastcall_max_stack_fail: this test is no longer necessary, stack size follows regular rules, pattern invalidation is checked by other test cases. Reported-by: Hou Tao <houtao@huaweicloud.com> Closes: https://lore.kernel.org/bpf/20241023022752.172005-1-houtao@huaweicloud.com/ Fixes: 5b5f51bff1b6 ("bpf: no_caller_saved_registers attribute for helper calls") Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241029193911.1575719-1-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-29Merge tag 'sched_ext-for-6.12-rc5-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext fixes from Tejun Heo: - Instances of scx_ops_bypass() could race each other leading to misbehavior. Fix by protecting the operation with a spinlock. - selftest and userspace header fixes * tag 'sched_ext-for-6.12-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: sched_ext: Fix enq_last_no_enq_fails selftest sched_ext: Make cast_mask() inline scx: Fix raciness in scx_ops_bypass() scx: Fix exit selftest to use custom DSQ sched_ext: Fix function pointer type mismatches in BPF selftests selftests/sched_ext: add order-only dependency of runner.o on BPFOBJ
2024-10-29selftests/bpf: drop unnecessary bpf_iter.h type duplicationAndrii Nakryiko
Drop bpf_iter.h header which uses vmlinux.h but re-defines a bunch of iterator structures and some of BPF constants for use in BPF iterator selftests. None of that is necessary when fresh vmlinux.h header is generated for vmlinux image that matches latest selftests. So drop ugly hacks and have a nice plain vmlinux.h usage everywhere. We could do the same with all the kfunc __ksym redefinitions, but that has dependency on very fresh pahole, so I'm not addressing that here. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20241029203919.1948941-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-29libbpf: start v1.6 development cycleAndrii Nakryiko
With libbpf v1.5.0 release out, start v1.6 dev cycle. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20241029184045.581537-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-29selftests/bpf: Add test for trie_get_next_key()Byeonguk Jeong
Add a test for out-of-bounds write in trie_get_next_key() when a full path from root to leaf exists and bpf_map_get_next_key() is called with the leaf node. It may crashes the kernel on failure, so please run in a VM. Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/Zxx4ep78tsbeWPVM@localhost.localdomain Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-29selftests/bpf: remove xdp_synproxy IP_DF checkVincent Li
In real world production websites, the IP_DF flag is not always set for each packet from these websites. the IP_DF flag check breaks Internet connection to these websites for home based firewall like BPFire when XDP synproxy program is attached to firewall Internet facing side interface. see [0] [0] https://github.com/vincentmli/BPFire/issues/59 Signed-off-by: Vincent Li <vincent.mc.li@gmail.com> Link: https://lore.kernel.org/r/20241025031952.1351150-1-vincent.mc.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-10-29selftests: tc-testing: Fix typo errorKaran Sanghavi
Correct the typo errors in json files - "diffferent" is corrected to "different". - "muliple" and "miltiple" is corrected to "multiple". Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Karan Sanghavi <karansanghvi98@gmail.com> Link: https://patch.msgid.link/20241022-multiple_spell_error-v2-1-7e5036506fe5@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29selftests/bpf: Test with a very short loopEduard Zingerman
The test added is a simplified reproducer from syzbot report [1]. If verifier does not insert checkpoint somewhere inside the loop, verification of the program would take a very long time. This would happen because mark_chain_precision() for register r7 would constantly trace jump history of the loop back, processing many iterations for each mark_chain_precision() call. [1] https://lore.kernel.org/bpf/670429f6.050a0220.49194.0517.GAE@google.com/ Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20241029172641.1042523-2-eddyz87@gmail.com
2024-10-29bpf: handle implicit declaration of function gettid in bpf_iter.cJason Xing
As we can see from the title, when I compiled the selftests/bpf, I saw the error: implicit declaration of function ‘gettid’ ; did you mean ‘getgid’? [-Werror=implicit-function-declaration] skel->bss->tid = gettid(); ^~~~~~ getgid Directly call the syscall solves this issue. Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Tested-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/r/20241029074627.80289-1-kerneljasonxing@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-10-29selftests: netfilter: nft_flowtable.sh: make first pass deterministicFlorian Westphal
The CI occasionaly encounters a failing test run. Example: # PASS: ipsec tunnel mode for ns1/ns2 # re-run with random mtus: -o 10966 -l 19499 -r 31322 # PASS: flow offloaded for ns1/ns2 [..] # FAIL: ipsec tunnel ... counter 1157059 exceeds expected value 878489 This script will re-exec itself, on the second run, random MTUs are chosen for the involved links. This is done so we can cover different combinations (large mtu on client, small on server, link has lowest mtu, etc). Furthermore, file size is random, even for the first run. Rework this script and always use the same file size on initial run so that at least the first round can be expected to have reproducible behavior. Second round will use random mtu/filesize. Raise the failure limit to that of the file size, this should avoid all errneous test errors. Currently, first fin will remove the offload, so if one peer is already closing remaining data is handled by classic path, which result in larger-than-expected counter and a test failure. Given packet path also counts tcp/ip headers, in case offload is completely broken this test will still fail (as expected). The test counter limit could be made more strict again in the future once flowtable can keep a connection in offloaded state until FINs in both directions were seen. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241022152324.13554-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29selftests: Add a test mangling with uc_sigmaskDev Jain
The test is motivated by the following observation: Raise a signal, jump to signal handler. The ucontext_t structure dumped by kernel to userspace has a uc_sigmask field having the mask of blocked signals. If you run a fresh minimalistic program doing this, this field is empty, even if you block some signals while registering the handler with sigaction(). Here is what the man-pages have to say: sigaction(2): "sa_mask specifies a mask of signals which should be blocked (i.e., added to the signal mask of the thread in which the signal handler is invoked) during execution of the signal handler. In addition, the signal which triggered the handler will be blocked, unless the SA_NODEFER flag is used." signal(7): Under "Execution of signal handlers", (1.3) implies: "The thread's current signal mask is accessible via the ucontext_t object that is pointed to by the third argument of the signal handler." But, (1.4) states: "Any signals specified in act->sa_mask when registering the handler with sigprocmask(2) are added to the thread's signal mask. The signal being delivered is also added to the signal mask, unless SA_NODEFER was specified when registering the handler. These signals are thus blocked while the handler executes." There clearly is no distinction being made in the man pages between "Thread's signal mask" and ucontext_t; this logically should imply that a signal blocked by populating struct sigaction should be visible in ucontext_t. Here is what the kernel code does (for Aarch64): do_signal() -> handle_signal() -> sigmask_to_save(), which returns &current->blocked, is passed to setup_rt_frame() -> setup_sigframe() -> __copy_to_user(). Hence, &current->blocked is copied to ucontext_t exposed to userspace. Returning back to handle_signal(), signal_setup_done() -> signal_delivered() -> sigorsets() and set_current_blocked() are responsible for using information from struct ksignal ksig, which was populated through the sigaction() system call in kernel/signal.c: copy_from_user(&new_sa.sa, act, sizeof(new_sa.sa)), to update &current->blocked; hence, the set of blocked signals for the current thread is updated AFTER the kernel dumps ucontext_t to userspace. Assuming that the above is indeed the intended behaviour, because it semantically makes sense, since the signals blocked using sigaction() remain blocked only till the execution of the handler, and not in the context present before jumping to the handler (but nothing can be confirmed from the man-pages), this patch introduces a test for mangling with uc_sigmask. The test asserts the relation between blocked signal, delivered signal, and ucontext. The ucontext is mangled with, by adding a signal mask to it; on return from the handler, the thread must block the corresponding signal. In the test description, I have also described signal delivery and blockage, for ease of understanding what the test does. Signed-off-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-10-29selftests: Rename sigaltstack to generic signalDev Jain
Rename sigaltstack to generic signal directory, to allow adding more signal tests in the future. Signed-off-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Acked-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-10-28selftests/mm: fix deadlock for fork after pthread_create with atomic_boolEdward Liaw
Some additional synchronization is needed on Android ARM64; we see a deadlock with pthread_create when the parent thread races forward before the child has a chance to start doing work. Link: https://lkml.kernel.org/r/20241018171734.2315053-4-edliaw@google.com Fixes: cff294582798 ("selftests/mm: extend and rename uffd pagemap test") Signed-off-by: Edward Liaw <edliaw@google.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Peter Xu <peterx@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-28Revert "selftests/mm: replace atomic_bool with pthread_barrier_t"Edward Liaw
This reverts commit e61ef21e27e8deed8c474e9f47f4aa7bc37e138c. uffd_poll_thread may be called by other tests that do not initialize the pthread_barrier, so this approach is not correct. This will revert to using atomic_bool instead. Link: https://lkml.kernel.org/r/20241018171734.2315053-3-edliaw@google.com Fixes: e61ef21e27e8 ("selftests/mm: replace atomic_bool with pthread_barrier_t") Signed-off-by: Edward Liaw <edliaw@google.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Peter Xu <peterx@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-28Revert "selftests/mm: fix deadlock for fork after pthread_create on ARM"Edward Liaw
Patch series "selftests/mm: revert pthread_barrier change" On Android arm, pthread_create followed by a fork caused a deadlock in the case where the fork required work to be completed by the created thread. The previous patches incorrectly assumed that the parent would always initialize the pthread_barrier for the child thread. This reverts the change and replaces the fix for wp-fork-with-event with the original use of atomic_bool. This patch (of 3): This reverts commit e142cc87ac4ec618f2ccf5f68aedcd6e28a59d9d. fork_event_consumer may be called by other tests that do not initialize the pthread_barrier, so this approach is not correct. The subsequent patch will revert to using atomic_bool instead. Link: https://lkml.kernel.org/r/20241018171734.2315053-1-edliaw@google.com Link: https://lkml.kernel.org/r/20241018171734.2315053-2-edliaw@google.com Fixes: e142cc87ac4e ("fix deadlock for fork after pthread_create on ARM") Signed-off-by: Edward Liaw <edliaw@google.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Peter Xu <peterx@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-28tools: testing: add expand-only mode VMA testLorenzo Stoakes
Add a test to assert that VMG_FLAG_JUST_EXPAND functions as expected - that is, when the VMA iterator is positioned at the previous VMA and no VMAs proceed it, we observe an expansion with all state as expected. Explicitly place a prior VMA that would otherwise fail this test if the mode were not enabled (as it would traverse to the previous-previous VMA). Link: https://lkml.kernel.org/r/d2f88330254a6448092412bf7dfe077a579ab0dc.1729174352.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: kernel test robot <oliver.sang@intel.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-28selftests/watchdog-test: Fix system accidentally reset after watchdog-testLi Zhijian
When running watchdog-test with 'make run_tests', the watchdog-test will be terminated by a timeout signal(SIGTERM) due to the test timemout. And then, a system reboot would happen due to watchdog not stop. see the dmesg as below: ``` [ 1367.185172] watchdog: watchdog0: watchdog did not stop! ``` Fix it by registering more signals(including SIGTERM) in watchdog-test, where its signal handler will stop the watchdog. After that # timeout 1 ./watchdog-test Watchdog Ticking Away! . Stopping watchdog ticks... Link: https://lore.kernel.org/all/20241029031324.482800-1-lizhijian@fujitsu.com/ Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-10-29usbip: tools: Fix detach_port() invalid port error pathZongmin Zhou
The detach_port() doesn't return error when detach is attempted on an invalid port. Fixes: 40ecdeb1a187 ("usbip: usbip_detach: fix to check for invalid ports") Cc: stable@vger.kernel.org Reviewed-by: Hongren Zheng <i@zenithal.me> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Zongmin Zhou <zhouzongmin@kylinos.cn> Link: https://lore.kernel.org/r/20241024022700.1236660-1-min_halo@163.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-28selftests/intel_pstate: check if cpupower is installedAlessandro Zanni
Running "make kselftest TARGETS=intel_pstate" results in the following errors: - ./run.sh: line 89: cpupower: command not found - ./run.sh: line 91: cpupower: command not found if the cpupower is not installed. Since the test depends on cpupower, this patch stops the test if the cpupower is not installed. Link: https://lore.kernel.org/all/cc01753c8dab0f33669a5a0fc162544078055bd1.1730141362.git.alessandro.zanni87@gmail.com/ Signed-off-by: Alessandro Zanni <alessandro.zanni87@gmail.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-10-28selftests/intel_pstate: fix operand expected errorAlessandro Zanni
Running "make kselftest TARGETS=intel_pstate" results in the following errors: - ./run.sh: line 90: / 1000: syntax error: operand expected (error token is "/ 1000") - ./run.sh: line 92: / 1000: syntax error: operand expected (error token is "/ 1000") This fix allows to have cross-platform compatibility when using arithmetic expression with command substitutions. Link: https://lore.kernel.org/r/f37df23888cd5ea6b3976f19d3e25796129dd090.1730141362.git.alessandro.zanni87@gmail.com Signed-off-by: Alessandro Zanni <alessandro.zanni87@gmail.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-10-28selftests/mount_setattr: fix idmap_mount_tree_invalid failed to runzhouyuhang
Test case idmap_mount_tree_invalid failed to run on the newer kernel with the following output: # RUN mount_setattr_idmapped.idmap_mount_tree_invalid ... # mount_setattr_test.c:1428:idmap_mount_tree_invalid:Expected sys_mount_setattr(open_tree_fd, "", AT_EMPTY_PATH, &attr, sizeof(attr)) (0) ! = 0 (0) # idmap_mount_tree_invalid: Test terminated by assertion This is because tmpfs is mounted at "/mnt/A", and tmpfs already contains the flag FS_ALLOW_IDMAP after the commit 7a80e5b8c6fa ("shmem: support idmapped mounts for tmpfs"). So calling sys_mount_setattr here returns 0 instead of -EINVAL as expected. Ramfs does not support idmap mounts, so we can use it here to test invalid mounts, which allows the test case to pass with the following output: # Starting 1 tests from 1 test cases. # RUN mount_setattr_idmapped.idmap_mount_tree_invalid ... # OK mount_setattr_idmapped.idmap_mount_tree_invalid ok 1 mount_setattr_idmapped.idmap_mount_tree_invalid # PASSED: 1 / 1 tests passed. Link: https://lore.kernel.org/all/20241028084132.3212598-1-zhouyuhang1010@163.com/ Signed-off-by: zhouyuhang <zhouyuhang@kylinos.cn> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-10-28selftest/tcp-ao: Add filter testsLeo Stone
Add tests that check if getsockopt(TCP_AO_GET_KEYS) returns the right keys when using different filters. Sample output: > # ok 114 filter keys: by sndid, rcvid, address > # ok 115 filter keys: by is_current > # ok 116 filter keys: by is_rnext > # ok 117 filter keys: by sndid, rcvid > # ok 118 filter keys: correct nkeys when in.nkeys < matches Acked-by: Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by: Leo Stone <leocstone@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241021174652.6949-1-leocstone@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28selftests: mptcp: list sysctl dataMatthieu Baerts (NGI0)
Listing all the values linked to the MPTCP sysctl knobs was not exercised in MPTCP test suite. Let's do that to avoid any regressions, but also to have a kernel with a debug kconfig verifying more assumptions. For the moment, we are not interested by the output, only to avoid crashes and warnings. Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241021-net-mptcp-sched-lock-v1-3-637759cf061c@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28iommufd: Selftest coverage for IOMMU_IOAS_MAP_FILESteve Sistare
Add test cases to exercise IOMMU_IOAS_MAP_FILE. Link: https://patch.msgid.link/r/1729861919-234514-10-git-send-email-steven.sistare@oracle.com Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-28perf cap: Add __NR_capget to arch/x86 unistdIan Rogers
As there are duplicated kernel headers in tools/include libc can pick up the wrong definitions. This was causing the wrong system call for capget in perf. Reported-by: Adrian Hunter <adrian.hunter@intel.com> Fixes: e25ebda78e230283 ("perf cap: Tidy up and improve capability testing") Closes: https://lore.kernel.org/lkml/cc7d6bdf-1aeb-4179-9029-4baf50b59342@intel.com/ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241026055448.312247-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-28tools headers: Update the linux/unaligned.h copy with the kernel sourcesArnaldo Carvalho de Melo
To pick up the changes in: 7f053812dab3946c ("random: vDSO: minimize and simplify header includes") That required adding a copy of include/vdso/unaligned.h and its checking in tools/perf/check-headers.h. Addressing this perf tools build warning: Warning: Kernel ABI header differences: diff -u tools/include/linux/unaligned.h include/linux/unaligned.h Please see tools/include/uapi/README for further details. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Ian Rogers <irogers@google.com> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/Zx-uHvAbPAESofEN@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-28tools headers arm64: Sync arm64's cputype.h with the kernel sourcesArnaldo Carvalho de Melo
To get the changes in: 924725707d80bc25 ("arm64: cputype: Add Neoverse-N3 definitions") That makes this perf source code to be rebuilt: CC /tmp/build/perf-tools/util/arm-spe.o The changes in the above patch add MIDR_NEOVERSE_N3, that probably need changes in arm-spe.c, so probably we need to add it to that array? Or maybe we need to leave this for later when this is all tested on those machines? static const struct midr_range neoverse_spe[] = { MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N1), MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2), MIDR_ALL_VERSIONS(MIDR_NEOVERSE_V1), {}, }; Mark Rutland recommended about arm-spe.c in a previous update to this file: "I would not touch this for now -- someone would have to go audit the TRMs to check that those other cores have the same encoding, and I think it'd be better to do that as a follow-up." That addresses this perf build warning: Warning: Kernel ABI header differences: diff -u tools/arch/arm64/include/asm/cputype.h arch/arm64/include/asm/cputype.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/Zx-dffKdGsgkhG96@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-28tools headers: Synchronize {uapi/}linux/bits.h with the kernel sourcesArnaldo Carvalho de Melo
To pick up the changes in this cset: 947697c6f0f75f98 ("uapi: Define GENMASK_U128") This addresses these perf build warnings: Warning: Kernel ABI header differences: diff -u tools/include/uapi/linux/bits.h include/uapi/linux/bits.h diff -u tools/include/linux/bits.h include/linux/bits.h Please see tools/include/uapi/README for further details. Acked-by: Yury Norov <yury.norov@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/Zx-ZVH7bHqtFn8Dv@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-25sched_ext: Fix enq_last_no_enq_fails selftestTejun Heo
cc9877fb7677 ("sched_ext: Improve error reporting during loading") changed how load failures are reported so that more error context can be communicated. This breaks the enq_last_no_enq_fails test as attach no longer fails. The scheduler is guaranteed to be ejected on attach completion with full error information. Update enq_last_no_enq_fails so that it checks that the scheduler is ejected using ops.exit(). Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Vishal Chourasia <vishalc@linux.ibm.com> Link: http://lkml.kernel.org/r/Zxknp7RAVNjmdJSc@linux.ibm.com Fixes: cc9877fb7677 ("sched_ext: Improve error reporting during loading")
2024-10-25sched_ext: Make cast_mask() inlineTejun Heo
cast_mask() doesn't do any actual work and is defined in a header file. Force it to be inline. When it is not inlined and the function is not used, it can cause verificaiton failures like the following: # tools/testing/selftests/sched_ext/runner -t minimal ===== START ===== TEST: minimal DESCRIPTION: Verify we can load a fully minimal scheduler OUTPUT: libbpf: prog 'cast_mask': missing BPF prog type, check ELF section name '.text' libbpf: prog 'cast_mask': failed to load: -22 libbpf: failed to load object 'minimal' libbpf: failed to load BPF skeleton 'minimal': -22 ERR: minimal.c:20 Failed to open and load skel not ok 1 minimal # ===== END ===== Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: a748db0c8c6a ("tools/sched_ext: Receive misc updates from SCX repo")
2024-10-25cxl/test: Improve init-order fidelity relative to real-world systemsDan Williams
The investigation of an initialization failure [1] highlighted that cxl_test does not reflect the init-order of real world systems. The expected order is root/bus first then async probing of the memory devices. Fix up cxl_test to reflect that order. While it did not reproduce the initial bug report (since that is dependent on built-in vs modular builds), it did reveal a separate latent bug in the subsystem's decoder shutdown flow. Fix for that sent separately. Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1] Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/172964784521.81806.15791069994065969243.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2024-10-25cxl/port: Fix use-after-free, permit out-of-order decoder shutdownDan Williams
In support of investigating an initialization failure report [1], cxl_test was updated to register mock memory-devices after the mock root-port/bus device had been registered. That led to cxl_test crashing with a use-after-free bug with the following signature: cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1 cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1 cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0 1) cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1 [..] cxld_unregister: cxl decoder14.0: cxl_region_decode_reset: cxl_region region3: mock_decoder_reset: cxl_port port3: decoder3.0 reset 2) mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1 cxl_endpoint_decoder_release: cxl decoder14.0: [..] cxld_unregister: cxl decoder7.0: 3) cxl_region_decode_reset: cxl_region region3: Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI [..] RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core] [..] Call Trace: <TASK> cxl_region_decode_reset+0x69/0x190 [cxl_core] cxl_region_detach+0xe8/0x210 [cxl_core] cxl_decoder_kill_region+0x27/0x40 [cxl_core] cxld_unregister+0x5d/0x60 [cxl_core] At 1) a region has been established with 2 endpoint decoders (7.0 and 14.0). Those endpoints share a common switch-decoder in the topology (3.0). At teardown, 2), decoder14.0 is the first to be removed and hits the "out of order reset case" in the switch decoder. The effect though is that region3 cleanup is aborted leaving it in-tact and referencing decoder14.0. At 3) the second attempt to teardown region3 trips over the stale decoder14.0 object which has long since been deleted. The fix here is to recognize that the CXL specification places no mandate on in-order shutdown of switch-decoders, the driver enforces in-order allocation, and hardware enforces in-order commit. So, rather than fail and leave objects dangling, always remove them. In support of making cxl_region_decode_reset() always succeed, cxl_region_invalidate_memregion() failures are turned into warnings. Crashing the kernel is ok there since system integrity is at risk if caches cannot be managed around physical address mutation events like CXL region destruction. A new device_for_each_child_reverse_from() is added to cleanup port->commit_end after all dependent decoders have been disabled. In other words if decoders are allocated 0->1->2 and disabled 1->2->0 then port->commit_end only decrements from 2 after 2 has been disabled, and it decrements all the way to zero since 1 was disabled previously. Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1] Cc: stable@vger.kernel.org Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Zijun Hu <quic_zijuhu@quicinc.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://patch.msgid.link/172964782781.81806.17902885593105284330.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com>