summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)Author
2025-02-25tools/memory-model: Switch to softcoded herd7 tagsJonas Oberhauser
A new version of herd7 provides a -lkmmv2 switch which overrides the old herd7 behavior of simply ignoring any softcoded tags in the .def and .bell files. We port LKMM to this version of herd7 by providing the switch in linux-kernel.cfg and reporting an error if the LKMM is used without this switch. To preserve the semantics of LKMM, we also softcode the Noreturn tag on atomic RMW which do not return a value and define atomic_add_unless with an Mb tag in linux-kernel.def. We update the herd-representation.txt accordingly and clarify some of the resulting combinations. Co-developed-by: Hernan Ponce de Leon <hernan.poncedeleon@huaweicloud.com> Signed-off-by: Hernan Ponce de Leon <hernan.poncedeleon@huaweicloud.com> Signed-off-by: Jonas Oberhauser <jonas.oberhauser@huaweicloud.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Tested-by: Boqun Feng <boqun.feng@gmail.com> Tested-by: Akira Yokosawa <akiyks@gmail.com> # herdtools7.7.58
2025-02-25objtool: Add bch2_trans_unlocked_or_in_restart_error() to bcachefs noreturnsYouling Tang
Fix the following objtool warning during build time: fs/bcachefs/btree_cache.o: warning: objtool: btree_node_lock.constprop.0() falls through to next function bch2_recalc_btree_reserve() fs/bcachefs/btree_update.o: warning: objtool: bch2_trans_update_get_key_cache() falls through to next function need_whiteout_for_snapshot() bch2_trans_unlocked_or_in_restart_error() is an Obviously Correct (tm) panic() wrapper, add it to the list of known noreturns. Fixes: b318882022a8 ("bcachefs: bch2_trans_verify_not_unlocked_or_in_restart()") Reported-by: k2ci <kernel-bot@kylinos.cn> Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev> Link: https://lore.kernel.org/r/20250218064230.219997-1-youling.tang@linux.dev Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2025-02-25objtool: Fix C jump table annotations for ClangArd Biesheuvel
A C jump table (such as the one used by the BPF interpreter) is a const global array of absolute code addresses, and this means that the actual values in the table may not be known until the kernel is booted (e.g., when using KASLR or when the kernel VA space is sized dynamically). When using PIE codegen, the compiler will default to placing such const global objects in .data.rel.ro (which is annotated as writable), rather than .rodata (which is annotated as read-only). As C jump tables are explicitly emitted into .rodata, this used to result in warnings for LoongArch builds (which uses PIE codegen for the entire kernel) like Warning: setting incorrect section attributes for .rodata..c_jump_table due to the fact that the explicitly specified .rodata section inherited the read-write annotation that the compiler uses for such objects when using PIE codegen. This warning was suppressed by explicitly adding the read-only annotation to the __attribute__((section(""))) string, by commit c5b1184decc8 ("compiler.h: specify correct attribute for .rodata..c_jump_table") Unfortunately, this hack does not work on Clang's integrated assembler, which happily interprets the appended section type and permission specifiers as part of the section name, which therefore no longer matches the hard-coded pattern '.rodata..c_jump_table' that objtool expects, causing it to emit a warning kernel/bpf/core.o: warning: objtool: ___bpf_prog_run+0x20: sibling call from callable instruction with modified stack frame Work around this, by emitting C jump tables into .data.rel.ro instead, which is treated as .rodata by the linker script for all builds, not just PIE based ones. Fixes: c5b1184decc8 ("compiler.h: specify correct attribute for .rodata..c_jump_table") Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn> # on LoongArch Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250221135704.431269-6-ardb+git@google.com Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2025-02-25selftests/x86/lam: Fix minor memory in do_uring()liuye
Exception branch returns without freeing 'fi'. Signed-off-by: liuye <liuye@kylinos.cn> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250114082650.113105-1-liuye@kylinos.cn
2025-02-24Merge tag 'riscv-for-linus-6.14-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - A fix for cacheinfo DT probing to avoid reading non-boolean properties as booleans. - A fix for cpufeature to use bitmap_equal() instead of memcmp(), so unused bits are ignored. - Fixes for cmpxchg and futex cmpxchg that properly encode the sign extension requirements on inline asm, which results in spurious successes. This manifests in at least inode_set_ctime_current, but is likely just a disaster waiting to happen. - A fix for the rseq selftests, which was using an invalid constraint. - A pair of fixes for signal frame size handling: - We were reserving space for an extra empty extension context header on systems with extended signal context, thus resulting in unnecessarily large allocations. - We weren't properly checking for available extensions before calculating the signal stack size, which resulted in undersized stack allocations on some systems (at least those with T-Head custom vectors). Also, we've added Alex as a reviewer. He's been helping out a ton lately, thanks! * tag 'riscv-for-linus-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: MAINTAINERS: Add myself as a riscv reviewer riscv: signal: fix signal_minsigstksz riscv: signal: fix signal frame size rseq/selftests: Fix riscv rseq_offset_deref_addv inline asm riscv/futex: sign extend compare value in atomic cmpxchg riscv/atomic: Do proper sign extension also for unsigned in arch_cmpxchg riscv: cpufeature: use bitmap_equal() instead of memcmp() riscv: cacheinfo: Use of_property_present() for non-boolean properties
2025-02-24selftests: drv-net: test XDP, HDS auto and the ioctl pathJakub Kicinski
Test XDP and HDS interaction. While at it add a test for using the IOCTL, as that turned out to be the real culprit. Testing bnxt: # NETIF=eth0 ./ksft-net-drv/drivers/net/hds.py KTAP version 1 1..12 ok 1 hds.get_hds ok 2 hds.get_hds_thresh ok 3 hds.set_hds_disable # SKIP disabling of HDS not supported by the device ok 4 hds.set_hds_enable ok 5 hds.set_hds_thresh_zero ok 6 hds.set_hds_thresh_max ok 7 hds.set_hds_thresh_gt ok 8 hds.set_xdp ok 9 hds.enabled_set_xdp ok 10 hds.ioctl ok 11 hds.ioctl_set_xdp ok 12 hds.ioctl_enabled_set_xdp # Totals: pass:11 fail:0 xfail:0 xpass:0 skip:1 error:0 and netdevsim: # ./ksft-net-drv/drivers/net/hds.py KTAP version 1 1..12 ok 1 hds.get_hds ok 2 hds.get_hds_thresh ok 3 hds.set_hds_disable ok 4 hds.set_hds_enable ok 5 hds.set_hds_thresh_zero ok 6 hds.set_hds_thresh_max ok 7 hds.set_hds_thresh_gt ok 8 hds.set_xdp ok 9 hds.enabled_set_xdp ok 10 hds.ioctl ok 11 hds.ioctl_set_xdp ok 12 hds.ioctl_enabled_set_xdp # Totals: pass:12 fail:0 xfail:0 xpass:0 skip:0 error:0 Netdevsim needs a sane default for tx/rx ring size. ethtool 6.11 is needed for the --disable-netlink option. Acked-by: Stanislav Fomichev <sdf@fomichev.me> Tested-by: Taehee Yoo <ap420073@gmail.com> Link: https://patch.msgid.link/20250221025141.1132944-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24ASoC: dapm-graph: set fill colour of turned on nodesNicolas Frattaroli
Some tools like KGraphViewer interpret the "ON" nodes not having an explicitly set fill colour as them being entirely black, which obscures the text on them and looks funny. In fact, I thought they were off for the longest time. Comparing to the output of the `dot` tool, I assume they are supposed to be white. Instead of speclawyering over who's in the wrong and must immediately atone for their wickedness at the altar of RFC2119, just be explicit about it, set the fillcolor to white, and nobody gets confused. Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Tested-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Reviewed-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Link: https://patch.msgid.link/20250221-dapm-graph-node-colour-v1-1-514ed0aa7069@collabora.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-24sched_ext: idle: Introduce scx_bpf_nr_node_ids()Andrea Righi
Similarly to scx_bpf_nr_cpu_ids(), introduce a new kfunc scx_bpf_nr_node_ids() to expose the maximum number of NUMA nodes in the system. BPF schedulers can use this information together with the new node-aware kfuncs, for example to create per-node DSQs, validate node IDs, etc. Signed-off-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-02-22Merge tag 'ftrace-v6.14-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: "Function graph accounting fixes: - Fix the manage ops hashes The function graph registers a "manager ops" and "sub-ops" to ftrace. The manager ops does not have any callback but calls the sub-ops callbacks. The manage ops hashes (what is used to tell ftrace what functions to attach to) is built on the sub-ops it manages. There was an error in the way it built the hash. An empty hash means to attach to all functions. When the manager ops had one sub-ops it properly copied its hash. But when the manager ops had more than one sub-ops, it went into a loop to make a set of all functions it needed to add to the hash. If any of the subops hashes was empty, that would mean to attach to all functions. The error was that the first iteration of the loop passed in an empty hash to start with in order to add the other hashes. That starting hash was mistaken as to attach to all functions. This made the manage ops attach to all functions whenever it had two or more sub-ops, even if each sub-op was attached to only a single function. - Do not add duplicate entries to the manager ops hash If two or more subops hashes trace the same function, an entry for that function will be added to the manager ops for each subops. This causes waste and extra overhead. Fprobe accounting fixes: - Remove last function from fprobe hash Fprobes has a ftrace hash to manage which functions an fprobe is attached to. It also has a counter of how many fprobes are attached. When the last fprobe is removed, it unregisters the fprobe from ftrace but does not remove the functions the last fprobe was attached to from the hash. This leaves the old functions attached. When a new fprobe is added, the fprobe infrastructure attaches to not only the functions of the new fprobe, but also to the functions of the last fprobe. - Fix accounting of the fprobe counter When a fprobe is added, it updates a counter. If the counter goes from zero to one, it attaches its ops to ftrace. When an fprobe is removed, the counter is decremented. If the counter goes from 1 to zero, it removes the fprobes ops from ftrace. There was an issue where if two fprobes trace the same function, the addition of each fprobe would increment the counter. But when removing the first of the fprobes, it would notice that another fprobe is still attached to one of its functions no it does not remove the functions from the ftrace ops. But it also did not decrement the counter, so when the last fprobe is removed, the counter is still one. This leaves the fprobes callback still registered with ftrace and it being called by the functions defined by the fprobes ops hash. Worse yet, because all the functions from the fprobe ops hash have been removed, that tells ftrace that it wants to trace all functions. Thus, this puts the state of the system where every function is calling the fprobe callback handler (which does nothing as there are no registered fprobes), but this causes a good 13% slow down of the entire system. Other updates: - Add a selftest to test the above issues to prevent regressions. - Fix preempt count accounting in function tracing Better recursion protection was added to function tracing which added another layer of preempt disable. As the preempt_count gets traced in the event, it needs to subtract the amount of preempt disabling the tracer does to record what the preempt_count was when the trace was triggered. - Fix memory leak in output of set_event A variable is passed by the seq_file functions in the location that is set by the return of the next() function. The start() function allocates it and the stop() function frees it. But when the last item is found, the next() returns NULL which leaks the data that was allocated in start(). The m->private is used for something else, so have next() free the data when it returns NULL, as stop() will then just receive NULL in that case" * tag 'ftrace-v6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix memory leak when reading set_event file ftrace: Correct preemption accounting for function tracing. selftests/ftrace: Update fprobe test to check enabled_functions file fprobe: Fix accounting of when to unregister from function graph fprobe: Always unregister fgraph function from ops ftrace: Do not add duplicate entries in subops manager ops ftrace: Fix accounting of adding subops to a manager ops
2025-02-22selftests: remove reference to prime_numbers.shTamir Duberstein
Remove a leftover shell script reference from commit 313b38a6ecb4 ("lib/prime_numbers: convert self-test to KUnit"). Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202502171110.708d965a-lkp@intel.com Fixes: 313b38a6ecb4 ("lib/prime_numbers: convert self-test to KUnit") Signed-off-by: Tamir Duberstein <tamird@gmail.com> Link: https://lore.kernel.org/r/20250217-fix-prime-numbers-v1-1-eb0ca7235e60@gmail.com Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-22selftests/rseq: Add rseq syscall errors testMichael Jeanson
This test adds coverage of expected errors during rseq registration and unregistration, it disables glibc integration and will thus always exercise the rseq syscall explictly. Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Link: https://lore.kernel.org/r/20250121213402.1754762-1-mjeanson@efficios.com
2025-02-22selftests/lam: Test get_user() LAM pointer handlingMaciej Wieczor-Retman
Recent change in how get_user() handles pointers: https://lore.kernel.org/all/20241024013214.129639-1-torvalds@linux-foundation.org/ has a specific case for LAM. It assigns a different bitmask that's later used to check whether a pointer comes from userland in get_user(). Add test case to LAM that utilizes a ioctl (FIOASYNC) syscall which uses get_user() in its implementation. Execute the syscall with differently tagged pointers to verify that valid user pointers are passing through and invalid kernel/non-canonical pointers are not. Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Link: https://lore.kernel.org/r/1624d9d1b9502517053a056652d50dc5d26884ac.1737990375.git.maciej.wieczor-retman@intel.com
2025-02-22selftests/lam: Skip test if LAM is disabledMaciej Wieczor-Retman
Until LASS is merged into the kernel: https://lore.kernel.org/all/20241028160917.1380714-1-alexander.shishkin@linux.intel.com/ LAM is left disabled in the config file. Running the LAM selftest with disabled LAM only results in unhelpful output. Use one of LAM syscalls() to determine whether the kernel was compiled with LAM support (CONFIG_ADDRESS_MASKING) or not. Skip running the tests in the latter case. Merge CPUID checking function with the one mentioned above to achieve a single function that shows LAM's availability from both CPU and the kernel. Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Link: https://lore.kernel.org/r/251d0f45f6a768030115e8d04bc85458910cb0dc.1737990375.git.maciej.wieczor-retman@intel.com
2025-02-22selftests/lam: Move cpu_has_la57() to use cpuinfo flagMaciej Wieczor-Retman
In current form cpu_has_la57() reports platform's support for LA57 through reading the output of cpuid. A much more useful information is whether 5-level paging is actually enabled on the running system. Check whether 5-level paging is enabled by trying to map a page in the high linear address space. Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Link: https://lore.kernel.org/r/8b1ca51b13e6d94b5a42b6930d81b692cbb0bcbb.1737990375.git.maciej.wieczor-retman@intel.com
2025-02-21selftests/ftrace: Update fprobe test to check enabled_functions fileSteven Rostedt
A few bugs were found in the fprobe accounting logic along with it using the function graph infrastructure. Update the fprobe selftest to catch those bugs in case they or something similar shows up in the future. The test now checks the enabled_functions file which shows all the functions attached to ftrace or fgraph. When enabling a fprobe, make sure that its corresponding function is also added to that file. Also add two more fprobes to enable to make sure that the fprobe logic works properly with multiple probes. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Link: https://lore.kernel.org/20250220202055.733001756@goodmis.org Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Tested-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-02-21Merge branch 'perf/urgent' into perf/core, to pick up fixes before merging ↵Ingo Molnar
new patches Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-02-20Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds
Pull BPF fixes from Daniel Borkmann: - Fix a soft-lockup in BPF arena_map_free on 64k page size kernels (Alan Maguire) - Fix a missing allocation failure check in BPF verifier's acquire_lock_state (Kumar Kartikeya Dwivedi) - Fix a NULL-pointer dereference in trace_kfree_skb by adding kfree_skb to the raw_tp_null_args set (Kuniyuki Iwashima) - Fix a deadlock when freeing BPF cgroup storage (Abel Wu) - Fix a syzbot-reported deadlock when holding BPF map's freeze_mutex (Andrii Nakryiko) - Fix a use-after-free issue in bpf_test_init when eth_skb_pkt_type is accessing skb data not containing an Ethernet header (Shigeru Yoshida) - Fix skipping non-existing keys in generic_map_lookup_batch (Yan Zhai) - Several BPF sockmap fixes to address incorrect TCP copied_seq calculations, which prevented correct data reads from recv(2) in user space (Jiayuan Chen) - Two fixes for BPF map lookup nullness elision (Daniel Xu) - Fix a NULL-pointer dereference from vmlinux BTF lookup in bpf_sk_storage_tracing_allowed (Jared Kangas) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests: bpf: test batch lookup on array of maps with holes bpf: skip non exist keys in generic_map_lookup_batch bpf: Handle allocation failure in acquire_lock_state bpf: verifier: Disambiguate get_constant_map_key() errors bpf: selftests: Test constant key extraction on irrelevant maps bpf: verifier: Do not extract constant map keys for irrelevant maps bpf: Fix softlockup in arena_map_free on 64k page kernel net: Add rx_skb of kfree_skb to raw_tp_null_args[]. bpf: Fix deadlock when freeing cgroup storage selftests/bpf: Add strparser test for bpf selftests/bpf: Fix invalid flag of recv() bpf: Disable non stream socket for strparser bpf: Fix wrong copied_seq calculation strparser: Add read_sock callback bpf: avoid holding freeze_mutex during mmap operation bpf: unify VM_WRITE vs VM_MAYWRITE use in BPF map mmaping logic selftests/bpf: Adjust data size to have ETH_HLEN bpf, test_run: Fix use-after-free issue in eth_skb_pkt_type() bpf: Remove unnecessary BTF lookups in bpf_sk_storage_tracing_allowed
2025-02-20tools/nolibc: add support for 32-bit s390Thomas Weißschuh
32-bit s390 is very close to the existing 64-bit implementation. Some special handling is necessary as there is neither LLVM nor QEMU support. Also the kernel itself can not build natively for 32-bit s390, so instead the test program is executed with a 64-bit kernel. Acked-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/r/20250206-nolibc-s390-v2-2-991ad97e3d58@weissschuh.net Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-02-20selftests/nolibc: rename s390 to s390xThomas Weißschuh
Support for 32-bit s390 is about to be added. As "s39032" would look horrible, use the another naming scheme. 32-bit s390 is "s390" and 64-bit s390 is "s390x", similar to how it is handled in various toolchain components. Acked-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/r/20250206-nolibc-s390-v2-1-991ad97e3d58@weissschuh.net Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-02-20selftests/nolibc: only run constructor tests on nolibcThomas Weißschuh
The nolibc testsuite can be run against other libcs to test for interoperability. Some aspects of the constructor execution are not standardized and musl does not provide all tested feature, for one it does not provide arguments to the constructors, anymore? Skip the constructor tests on non-nolibc configurations. Acked-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/r/20250212-nolibc-test-constructor-v1-1-c963875b3da4@weissschuh.net Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-02-20Merge tag 'net-6.14-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Smaller than usual with no fixes from any subtree. Current release - regressions: - core: fix race of rtnl_net_lock(dev_net(dev)) Previous releases - regressions: - core: remove the single page frag cache for good - flow_dissector: fix handling of mixed port and port-range keys - sched: cls_api: fix error handling causing NULL dereference - tcp: - adjust rcvq_space after updating scaling ratio - drop secpath at the same time as we currently drop dst - eth: gtp: suppress list corruption splat in gtp_net_exit_batch_rtnl(). Previous releases - always broken: - vsock: - fix variables initialization during resuming - for connectible sockets allow only connected - eth: - geneve: fix use-after-free in geneve_find_dev() - ibmvnic: don't reference skb after sending to VIOS" * tag 'net-6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits) Revert "net: skb: introduce and use a single page frag cache" net: allow small head cache usage with large MAX_SKB_FRAGS values nfp: bpf: Add check for nfp_app_ctrl_msg_alloc() tcp: drop secpath at the same time as we currently drop dst net: axienet: Set mac_managed_pm arp: switch to dev_getbyhwaddr() in arp_req_set_public() net: Add non-RCU dev_getbyhwaddr() helper sctp: Fix undefined behavior in left shift operation selftests/bpf: Add a specific dst port matching flow_dissector: Fix port range key handling in BPF conversion selftests/net/forwarding: Add a test case for tc-flower of mixed port and port-range flow_dissector: Fix handling of mixed port and port-range keys geneve: Suppress list corruption splat in geneve_destroy_tunnels(). gtp: Suppress list corruption splat in gtp_net_exit_batch_rtnl(). dev: Use rtnl_net_dev_lock() in unregister_netdev(). net: Fix dev_net(dev) race in unregister_netdevice_notifier_dev_net(). net: Add net_passive_inc() and net_passive_dec(). net: pse-pd: pd692x0: Fix power limit retrieval MAINTAINERS: trim the GVE entry gve: set xdp redirect target only when it is available ...
2025-02-20tools/memory-model: Define effect of Mb tags on RMWs in tools/...Jonas Oberhauser
Herd7 transforms successful RMW with Mb tags by inserting smp_mb() fences around them. We emulate this by considering imaginary po-edges before the RMW read and before the RMW write, and extending the smp_mb() ordering rule, which currently only applies to real po edges that would be found around a really inserted smp_mb(), also to cases of the only imagined po edges. Reported-by: Viktor Vafeiadis <viktor@mpi-sws.org> Suggested-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Jonas Oberhauser <jonas.oberhauser@huaweicloud.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Tested-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-20tools/memory-model: Define applicable tags on operation in tools/...Jonas Oberhauser
Herd7 transforms reads, writes, and read-modify-writes by eliminating 'acquire tags from writes, 'release tags from reads, and 'acquire, 'release, and 'mb tags from failed read-modify-writes. We emulate this behavior by redefining Acquire, Release, and Mb sets in linux-kernel.bell to explicitly exclude those combinations. Herd7 furthermore adds 'noreturn tag to certain reads. Currently herd7 does not allow specifying the 'noreturn tag manually, but such manual declaration (e.g., through a syntax __atomic_op{noreturn}) would add invalid 'noreturn tags to writes; in preparation, we already also exclude this combination. Signed-off-by: Jonas Oberhauser <jonas.oberhauser@huaweicloud.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Tested-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-20tools/memory-model: Legitimize current use of tags in LKMM macrosJonas Oberhauser
The current macros in linux-kernel.def reference instructions such as __xchg{mb} or __cmpxchg{acquire}, which are invalid combinations of tags and instructions according to the declarations in linux-kernel.bell. This works with current herd7 because herd7 removes these tags anyways and does not actually enforce validity of combinations at all. If a future herd7 version no longer applies these hardcoded transformations, then all currently invalid combinations will actually appear on some instruction. We therefore adjust the declarations to make the resulting combinations valid, by adding the 'mb tag to the set of Accesses and allowing all Accesses to appear on all read, write, and RMW instructions. Signed-off-by: Jonas Oberhauser <jonas.oberhauser@huaweicloud.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Tested-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-20tools/memory-model: Add atomic_andnot() with its variantsPuranjay Mohan
Pull-855[1] added the support of atomic_andnot() to the herd tool. Use this to add the implementation in the LKMM. All of the ordering variants are also added. Here is a small litmus-test that uses this operation: C andnot { atomic_t u = ATOMIC_INIT(7); } P0(atomic_t *u) { r0 = atomic_fetch_andnot(3, u); r1 = READ_ONCE(*u); } exists (0:r0=7 /\ 0:r1=4) Test andnot Allowed States 1 0:r0=7; 0:r1=4; Ok Witnesses Positive: 1 Negative: 0 Condition exists (0:r0=7 /\ 0:r1=4) Observation andnot Always 1 0 Time andnot 0.00 Hash=78f011a0b5a0c65fa1cf106fcd62c845 [1] https://github.com/herd/herdtools7/pull/855 Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Andrea Parri <parri.andrea@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Will Deacon <will@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Jade Alglave <j.alglave@ucl.ac.uk> Cc: Luc Maranget <luc.maranget@inria.fr> Cc: Akira Yokosawa <akiyks@gmail.com> Cc: Daniel Lustig <dlustig@nvidia.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: <linux-arch@vger.kernel.org>
2025-02-20tools/memory-model: Add atomic_and()/or()/xor() and add_negativePuranjay Mohan
Pull-849[1] added the support of '&', '|', and '^' to the herd7 tool's atomics operations. Use these in linux-kernel.def to implement atomic_and()/or()/xor() with all their ordering variants. atomic_add_negative() is already available so add its acquire, release, and relaxed ordering variants. [1] https://github.com/herd/herdtools7/pull/849 Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Andrea Parri <parri.andrea@gmail.com> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Will Deacon <will@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Jade Alglave <j.alglave@ucl.ac.uk> Cc: Luc Maranget <luc.maranget@inria.fr> Cc: Akira Yokosawa <akiyks@gmail.com> Cc: Daniel Lustig <dlustig@nvidia.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: <linux-arch@vger.kernel.org>
2025-02-20selftests/nsfs: add ioctl validation testsChristian Brauner
Add simple tests to validate that non-nsfs ioctls are rejected. Link: https://lore.kernel.org/r/20250219-work-nsfs-v1-2-21128d73c5e8@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-19selftests/bpf: Add a specific dst port matchingCong Wang
After this patch: #102/1 flow_dissector_classification/ipv4:OK #102/2 flow_dissector_classification/ipv4_continue_dissect:OK #102/3 flow_dissector_classification/ipip:OK #102/4 flow_dissector_classification/gre:OK #102/5 flow_dissector_classification/port_range:OK #102/6 flow_dissector_classification/ipv6:OK #102 flow_dissector_classification:OK Summary: 1/6 PASSED, 0 SKIPPED, 0 FAILED Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Link: https://patch.msgid.link/20250218043210.732959-5-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-19selftests/net/forwarding: Add a test case for tc-flower of mixed port and ↵Cong Wang
port-range After this patch: # ./tc_flower_port_range.sh TEST: Port range matching - IPv4 UDP [ OK ] TEST: Port range matching - IPv4 TCP [ OK ] TEST: Port range matching - IPv6 UDP [ OK ] TEST: Port range matching - IPv6 TCP [ OK ] TEST: Port range matching - IPv4 UDP Drop [ OK ] Cc: Qiang Zhang <dtzq01@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250218043210.732959-3-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-19selftests/ovl: add third selftest for "override_creds"Christian Brauner
Add a simple test to verify that the new "override_creds" option works. Link: https://lore.kernel.org/r/20250219-work-overlayfs-v3-4-46af55e4ceda@kernel.org Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-19selftests/ovl: add second selftest for "override_creds"Christian Brauner
Add a simple test to verify that the new "override_creds" option works. Link: https://lore.kernel.org/r/20250219-work-overlayfs-v3-3-46af55e4ceda@kernel.org Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-19selftests/filesystems: add utils.{c,h}Christian Brauner
Add a new set of helpers that will be used in follow-up patches. Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-19selftests/ovl: add first selftest for "override_creds"Christian Brauner
Add a simple test to verify that the new "override_creds" option works. Link: https://lore.kernel.org/r/20250219-work-overlayfs-v3-2-46af55e4ceda@kernel.org Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-18selftests: bpf: test batch lookup on array of maps with holesYan Zhai
Iterating through array of maps may encounter non existing keys. The batch operation should not fail on when this happens. Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/9007237b9606dc2ee44465a4447fe46e13f3bea6.1739171594.git.yan@cloudflare.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-18tools: Remove redundant quiet setupCharlie Jenkins
Q is exported from Makefile.include so it is not necessary to manually set it. Reviewed-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Quentin Monnet <qmo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Benjamin Tissoires <bentiss@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Hao Luo <haoluo@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: KP Singh <kpsingh@kernel.org> Cc: Lukasz Luba <lukasz.luba@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Mykola Lysenko <mykolal@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Yonghong Song <yonghong.song@linux.dev> Cc: Zhang Rui <rui.zhang@intel.com> Link: https://lore.kernel.org/r/20250213-quiet_tools-v3-2-07de4482a581@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-02-18sched_ext: idle: Introduce node-aware idle cpu kfunc helpersAndrea Righi
Introduce a new kfunc to retrieve the node associated to a CPU: int scx_bpf_cpu_node(s32 cpu) Add the following kfuncs to provide BPF schedulers direct access to per-node idle cpumasks information: const struct cpumask *scx_bpf_get_idle_cpumask_node(int node) const struct cpumask *scx_bpf_get_idle_smtmask_node(int node) s32 scx_bpf_pick_idle_cpu_node(const cpumask_t *cpus_allowed, int node, u64 flags) s32 scx_bpf_pick_any_cpu_node(const cpumask_t *cpus_allowed, int node, u64 flags) Moreover, trigger an scx error when any of the non-node aware idle CPU kfuncs are used when SCX_OPS_BUILTIN_IDLE_PER_NODE is enabled. Cc: Yury Norov [NVIDIA] <yury.norov@gmail.com> Signed-off-by: Andrea Righi <arighi@nvidia.com> Reviewed-by: Yury Norov [NVIDIA] <yury.norov@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-02-18tools: Unify top-level quiet infrastructureCharlie Jenkins
Commit f2868b1a66d4f40f ("perf tools: Expose quiet/verbose variables in Makefile.perf") moved the quiet infrastructure out of tools/build/Makefile.build and into the top-level Makefile.perf file so that the quiet infrastructure could be used throughout perf and not just in Makefile.build. Extract out the quiet infrastructure into Makefile.include so that it can be leveraged outside of perf. Fixes: f2868b1a66d4f40f ("perf tools: Expose quiet/verbose variables in Makefile.perf") Reviewed-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Benjamin Tissoires <bentiss@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Hao Luo <haoluo@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: KP Singh <kpsingh@kernel.org> Cc: Lukasz Luba <lukasz.luba@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Mykola Lysenko <mykolal@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Monnet <qmo@kernel.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Yonghong Song <yonghong.song@linux.dev> Cc: Zhang Rui <rui.zhang@intel.com> Link: https://lore.kernel.org/r/20250213-quiet_tools-v3-1-07de4482a581@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-02-18selftest/bpf: Add vsock test for sockmap rejecting unconnectedMichal Luczaj
Verify that for a connectible AF_VSOCK socket, merely having a transport assigned is insufficient; socket must be connected for the sockmap to accept. This does not test datagram vsocks. Even though it hardly matters. VMCI is the only transport that features VSOCK_TRANSPORT_F_DGRAM, but it has an unimplemented vsock_transport::readskb() callback, making it unsupported by BPF/sockmap. Signed-off-by: Michal Luczaj <mhal@rbox.co> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18selftest/bpf: Adapt vsock_delete_on_close to sockmap rejecting unconnectedMichal Luczaj
Commit 515745445e92 ("selftest/bpf: Add test for vsock removal from sockmap on close()") added test that checked if proto::close() callback was invoked on AF_VSOCK socket release. I.e. it verified that a close()d vsock does indeed get removed from the sockmap. It was done simply by creating a socket pair and attempting to replace a close()d one with its peer. Since, due to a recent change, sockmap does not allow updating index with a non-established connectible vsock, redo it with a freshly established one. Signed-off-by: Michal Luczaj <mhal@rbox.co> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18Merge tag 'v6.14-rc3' into x86/core, to pick up fixesIngo Molnar
Pick up upstream x86 fixes before applying new patches. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-02-17selftests/mm: fix check for running THP testsMark Brown
When testing if we should try to compact memory or drop caches before we run the THP or HugeTLB tests we use | as an or operator. This doesn't work since run_vmtests.sh is written in shell where this is used to pipe the output of the first argument into the second. Instead use the shell's -o operator. Link: https://lkml.kernel.org/r/20250212-kselftest-mm-no-hugepages-v1-1-44702f538522@kernel.org Fixes: b433ffa8dbac ("selftests: mm: perform some system cleanup before using hugepages") Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Nico Pache <npache@redhat.com> Cc: Mariano Pache <npache@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-17getdelays: fix error format charactersWang Yaxin
getdelays had a compilation issue because the format string was not updated when the "delay min" was added. For example, after adding the "delay min" in printf, there were 7 strings but only 6 "%s" format specifiers. Similarly, after adding the 't->cpu_delay_total', there were 7 variables but only 6 format characters specifiers, causing compilation issues as follows. This commit fixes these issues to ensure that getdelays compiles correctly. root@xx:~/linux-next/tools/accounting$ make getdelays.c:199:9: warning: format `%llu' expects argument of type `long long unsigned int', but argument 8 has type `char *' [-Wformat=] 199 | printf("\n\nCPU %15s%15s%15s%15s%15s%15s\n" | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ..... 216 | "delay total", "delay average", "delay max", "delay min", | ~~~~~~~~~~~ | | | char * getdelays.c:200:21: note: format string is defined here 200 | " %15llu%15llu%15llu%15llu%15.3fms%13.6fms\n" | ~~~~~^ | | | long long unsigned int | %15s getdelays.c:199:9: warning: format `%f' expects argument of type `double', but argument 12 has type `long long unsigned int' [-Wformat=] 199 | printf("\n\nCPU %15s%15s%15s%15s%15s%15s\n" | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ..... 220 | (unsigned long long)t->cpu_delay_total, | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | | | long long unsigned int ..... Link: https://lkml.kernel.org/r/20250208144400544RduNRhwIpT3m2JyRBqskZ@zte.com.cn Fixes: f65c64f311ee ("delayacct: add delay min to record delay peak") Reviewed-by: xu xin <xu.xin16@zte.com.cn> Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn> Signed-off-by: Kun Jiang <jiang.kun2@zte.com.cn> Cc: Balbir Singh <bsingharora@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Fan Yu <fan.yu9@zte.com.cn> Cc: Peilin He <he.peilin@zte.com.cn> Cc: Qiang Tu <tu.qiang35@zte.com.cn> Cc: wangyong <wang.yong12@zte.com.cn> Cc: ye xingchen <ye.xingchen@zte.com.cn> Cc: Yunkai Zhang <zhang.yunkai@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-17tools/mm: fix build warnings with musl-libcFlorian Fainelli
musl-libc warns about the following: /home/florian/dev/buildroot/output/arm64/rpi4-b/host/aarch64-buildroot-linux-musl/sysroot/usr/include/sys/errno.h:1:2: attention: #warning redirecting incorrect #include <sys/errno.h> to <errno.h> [-Wcpp] 1 | #warning redirecting incorrect #include <sys/errno.h> to <errno.h> | ^~~~~~~ /home/florian/dev/buildroot/output/arm64/rpi4-b/host/aarch64-buildroot-linux-musl/sysroot/usr/include/sys/fcntl.h:1:2: attention: #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h> [-Wcpp] 1 | #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h> | ^~~~~~~ include errno.h and fcntl.h directly. Link: https://lkml.kernel.org/r/20250210200518.1137295-1-florian.fainelli@broadcom.com Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-17perf amd ibs: Sync arch/x86/include/asm/amd-ibs.h header with the kernelRavi Bangoria
Sync load latency related bit fields into the tool's header copy Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20250205060547.1337-4-ravi.bangoria@amd.com
2025-02-16Merge tag 'objtool_urgent_for_v6.14_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fixes from Borislav Petkov: - Move a warning about a lld.ld breakage into the verbose setting as said breakage has been fixed in the meantime - Teach objtool to ignore dangling jump table entries added by Clang * tag 'objtool_urgent_for_v6.14_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Move dodgy linker warn to verbose objtool: Ignore dangling jump table entries
2025-02-16Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM: - Large set of fixes for vector handling, especially in the interactions between host and guest state. This fixes a number of bugs affecting actual deployments, and greatly simplifies the FP/SIMD/SVE handling. Thanks to Mark Rutland for dealing with this thankless task. - Fix an ugly race between vcpu and vgic creation/init, resulting in unexpected behaviours - Fix use of kernel VAs at EL2 when emulating timers with nVHE - Small set of pKVM improvements and cleanups x86: - Fix broken SNP support with KVM module built-in, ensuring the PSP module is initialized before KVM even when the module infrastructure cannot be used to order initcalls - Reject Hyper-V SEND_IPI hypercalls if the local APIC isn't being emulated by KVM to fix a NULL pointer dereference - Enter guest mode (L2) from KVM's perspective before initializing the vCPU's nested NPT MMU so that the MMU is properly tagged for L2, not L1 - Load the guest's DR6 outside of the innermost .vcpu_run() loop, as the guest's value may be stale if a VM-Exit is handled in the fastpath" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits) x86/sev: Fix broken SNP support with KVM module built-in KVM: SVM: Ensure PSP module is initialized if KVM module is built-in crypto: ccp: Add external API interface for PSP module initialization KVM: arm64: vgic: Hoist SGI/PPI alloc from vgic_init() to kvm_create_vgic() KVM: arm64: timer: Drop warning on failed interrupt signalling KVM: arm64: Fix alignment of kvm_hyp_memcache allocations KVM: arm64: Convert timer offset VA when accessed in HYP code KVM: arm64: Simplify warning in kvm_arch_vcpu_load_fp() KVM: arm64: Eagerly switch ZCR_EL{1,2} KVM: arm64: Mark some header functions as inline KVM: arm64: Refactor exit handlers KVM: arm64: Refactor CPTR trap deactivation KVM: arm64: Remove VHE host restore of CPACR_EL1.SMEN KVM: arm64: Remove VHE host restore of CPACR_EL1.ZEN KVM: arm64: Remove host FPSIMD saving for non-protected KVM KVM: arm64: Unconditionally save+flush host FPSIMD/SVE/SME state KVM: x86: Load DR6 with guest value only before entering .vcpu_run() loop KVM: nSVM: Enter guest mode before initializing nested NPT MMU KVM: selftests: Add CPUID tests for Hyper-V features that need in-kernel APIC KVM: selftests: Manage CPUID array in Hyper-V CPUID test's core helper ...
2025-02-16sched_ext: idle: Per-node idle cpumasksAndrea Righi
Using a single global idle mask can lead to inefficiencies and a lot of stress on the cache coherency protocol on large systems with multiple NUMA nodes, since all the CPUs can create a really intense read/write activity on the single global cpumask. Therefore, split the global cpumask into multiple per-NUMA node cpumasks to improve scalability and performance on large systems. The concept is that each cpumask will track only the idle CPUs within its corresponding NUMA node, treating CPUs in other NUMA nodes as busy. In this way concurrent access to the idle cpumask will be restricted within each NUMA node. The split of multiple per-node idle cpumasks can be controlled using the SCX_OPS_BUILTIN_IDLE_PER_NODE flag. By default SCX_OPS_BUILTIN_IDLE_PER_NODE is not enabled and a global host-wide idle cpumask is used, maintaining the previous behavior. NOTE: if a scheduler explicitly enables the per-node idle cpumasks (via SCX_OPS_BUILTIN_IDLE_PER_NODE), scx_bpf_get_idle_cpu/smtmask() will trigger an scx error, since there are no system-wide cpumasks. = Test = Hardware: - System: DGX B200 - CPUs: 224 SMT threads (112 physical cores) - Processor: INTEL(R) XEON(R) PLATINUM 8570 - 2 NUMA nodes Scheduler: - scx_simple [1] (so that we can focus at the built-in idle selection policy and not at the scheduling policy itself) Test: - Run a parallel kernel build `make -j $(nproc)` and measure the average elapsed time over 10 runs: avg time | stdev ---------+------ before: 52.431s | 2.895 after: 50.342s | 2.895 = Conclusion = Splitting the global cpumask into multiple per-NUMA cpumasks helped to achieve a speedup of approximately +4% with this particular architecture and test case. The same test on a DGX-1 (40 physical cores, Intel Xeon E5-2698 v4 @ 2.20GHz, 2 NUMA nodes) shows a speedup of around 1.5-3%. On smaller systems, I haven't noticed any measurable regressions or improvements with the same test (parallel kernel build) and scheduler (scx_simple). Moreover, with a modified scx_bpfland that uses the new NUMA-aware APIs I observed an additional +2-2.5% performance improvement with the same test. [1] https://github.com/sched-ext/scx/blob/main/scheds/c/scx_simple.bpf.c Cc: Yury Norov [NVIDIA] <yury.norov@gmail.com> Signed-off-by: Andrea Righi <arighi@nvidia.com> Reviewed-by: Yury Norov [NVIDIA] <yury.norov@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-02-16sched_ext: idle: Introduce SCX_OPS_BUILTIN_IDLE_PER_NODEAndrea Righi
Add the new scheduler flag SCX_OPS_BUILTIN_IDLE_PER_NODE, which allows BPF schedulers to select between using a global flat idle cpumask or multiple per-node cpumasks. This only introduces the flag and the mechanism to enable/disable this feature without affecting any scheduling behavior. Cc: Yury Norov [NVIDIA] <yury.norov@gmail.com> Signed-off-by: Andrea Righi <arighi@nvidia.com> Reviewed-by: Yury Norov [NVIDIA] <yury.norov@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-02-15Merge tag 'rust-fixes-6.14-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux Pull rust fixes from Miguel Ojeda: - Fix objtool warning due to future Rust 1.85.0 (to be released in a few days) - Clean future Rust 1.86.0 (to be released 2025-04-03) Clippy warning * tag 'rust-fixes-6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: rust: rbtree: fix overindented list item objtool/rust: add one more `noreturn` Rust function
2025-02-14rseq/selftests: Fix riscv rseq_offset_deref_addv inline asmStafford Horne
When working on OpenRISC support for restartable sequences I noticed and fixed these two issues with the riscv support bits. 1 The 'inc' argument to RSEQ_ASM_OP_R_DEREF_ADDV was being implicitly passed to the macro. Fix this by adding 'inc' to the list of macro arguments. 2 The inline asm input constraints for 'inc' and 'off' use "er", The riscv gcc port does not have an "e" constraint, this looks to be copied from the x86 port. Fix this by just using an "r" constraint. I have compile tested this only for riscv. However, the same fixes I use in the OpenRISC rseq selftests and everything passes with no issues. Fixes: 171586a6ab66 ("selftests/rseq: riscv: Template memory ordering and percpu access mode") Signed-off-by: Stafford Horne <shorne@gmail.com> Tested-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Shuah Khan <skhan@linuxfoundation.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250114170721.3613280-1-shorne@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>