Age | Commit message (Collapse) | Author |
|
When allocating and freeing a VM's cached binary stats info, check for a
NULL descriptor, not a '0' file descriptor, as '0' is a legal FD. E.g. in
the unlikely scenario the kernel installs the stats FD at entry '0',
selftests would reallocate on the next __vm_get_stat() and/or fail to free
the stats in kvm_vm_free().
Fixes: 83f6e109f562 ("KVM: selftests: Cache binary stats metadata for duration of test")
Link: https://lore.kernel.org/r/20250111005049.1247555-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Now that dirty_log_test doesn't require running multiple iterations to
verify dirty pages, and actually runs the requested number of iterations,
drop the requirement that the test run at least "3" (which was really "2"
at the time the test was written) iterations.
Link: https://lore.kernel.org/r/20250111003004.1235645-21-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Actually run all requested iterations, instead of iterations-1 (the count
starts at '1' due to the need to avoid '0' as an in-memory value for a
dirty page).
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20250111003004.1235645-20-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Set the per-iteration variables at the start of each iteration instead of
setting them before the loop, and at the end of each iteration. To ensure
the vCPU doesn't race ahead before the first iteration, simply have the
vCPU worker want for sem_vcpu_cont, which conveniently avoids the need to
special case posting sem_vcpu_cont from the loop.
Link: https://lore.kernel.org/r/20250111003004.1235645-19-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Now that each iteration collects all dirty entries and ensures the guest
*completes* at least one write, tighten the exemptions for the last dirty
page of the previous iteration. Specifically, the only legal value (other
than the current iteration) is N-1.
Unlike the last page for the current iteration, the in-progress write from
the previous iteration is guaranteed to have completed, otherwise the test
would have hung.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20250111003004.1235645-18-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Ensure the vCPU fully completes at least one write in each dirty_log_test
iteration, as failure to dirty any pages complicates verification and
forces the test to be overly conservative about possible values. E.g.
verification needs to allow the last dirty page from a previous iteration
to have *any* value, because the vCPU could get stuck for multiple
iterations, which is unlikely but can happen in heavily overloaded and/or
nested virtualization setups.
Somewhat arbitrarily set the minimum to 0x100/256; high enough to be
interesting, but not so high as to lead to pointlessly long runtimes.
Opportunistically report the number of writes per iteration for debug
purposes, and so that a human can sanity check the test. Due to each
write targeting a random page, the number of dirty pages will likely be
lower than the number of total writes, but it shouldn't be absurdly lower
(which would suggest the pRNG is broken)
Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20250111003004.1235645-17-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Add a sanity check that a completely garbage value wasn't written to
the last dirty page in the ring, e.g. that it doesn't contain the *next*
iteration's value.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20250111003004.1235645-16-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Collect all dirty entries during each iteration of dirty_log_test by
doing a final collection after the vCPU has been stopped. To deal with
KVM's destructive approach to getting the dirty bitmaps, use a second
bitmap for the post-stop collection.
Collecting all entries that were dirtied during an iteration simplifies
the verification logic *and* improves test coverage.
- If a page is written during iteration X, but not seen as dirty until
X+1, the test can get a false pass if the page is also written during
X+1.
- If a dirty page used a stale value from a previous iteration, the test
would grant a false pass.
- If a missed dirty log occurs in the last iteration, the test would fail
to detect the issue.
E.g. modifying mark_page_dirty_in_slot() to dirty an unwritten gfn:
if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
unsigned long rel_gfn = gfn - memslot->base_gfn;
u32 slot = (memslot->as_id << 16) | memslot->id;
if (!vcpu->extra_dirty &&
gfn_to_memslot(kvm, gfn + 1) == memslot) {
vcpu->extra_dirty = true;
mark_page_dirty_in_slot(kvm, memslot, gfn + 1);
}
if (kvm->dirty_ring_size && vcpu)
kvm_dirty_ring_push(vcpu, slot, rel_gfn);
else if (memslot->dirty_bitmap)
set_bit_le(rel_gfn, memslot->dirty_bitmap);
}
isn't detected with the current approach, even with an interval of 1ms
(when running nested in a VM; bare metal would be even *less* likely to
detect the bug due to the vCPU being able to dirty more memory). Whereas
collecting all dirty entries consistently detects failures with an
interval of 700ms or more (the longer interval means a higher probability
of an actual write to the prematurely-dirtied page).
Link: https://lore.kernel.org/r/20250111003004.1235645-15-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Print out the last dirty pages from the current and previous iteration on
verification failures. In many cases, bugs (especially test bugs) occur
on the edges, i.e. on or near the last pages, and being able to correlate
failures with the last pages can aid in debug.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20250111003004.1235645-14-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
When verifying pages in dirty_log_test, immediately continue on all "pass"
scenarios to make the logic consistent in how it handles pass vs. fail.
No functional change intended.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20250111003004.1235645-13-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
When running dirty_log_test using the dirty ring, post to sem_vcpu_stop
only when the main thread has explicitly requested that the vCPU stop.
Synchronizing the vCPU and main thread whenever the dirty ring happens to
be full is unnecessary, as KVM's ABI is to actively prevent the vCPU from
running until the ring is no longer full. I.e. attempting to run the vCPU
will simply result in KVM_EXIT_DIRTY_RING_FULL without ever entering the
guest. And if KVM doesn't exit, e.g. let's the vCPU dirty more pages,
then that's a KVM bug worth finding.
Posting to sem_vcpu_stop on ring full also makes it difficult to get the
test logic right, e.g. it's easy to let the vCPU keep running when it
shouldn't, as a ring full can essentially happen at any given time.
Opportunistically rework the handling of dirty_ring_vcpu_ring_full to
leave it set for the remainder of the iteration in order to simplify the
surrounding logic.
Link: https://lore.kernel.org/r/20250111003004.1235645-12-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
In the dirty_log_test guest code, exit to userspace only when the vCPU is
explicitly told to stop. Periodically exiting just to check if a flag has
been set is unnecessary, weirdly complex, and wastes time handling exits
that could be used to dirty memory.
Opportunistically convert 'i' to a uint64_t to guard against the unlikely
scenario that guest_num_pages exceeds the storage of an int.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20250111003004.1235645-11-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Now that the vCPU doesn't dirty every page on the first iteration for
architectures that support the dirty ring, honor vcpu_stop in the dirty
ring's vCPU worker, i.e. stop when the main thread says "stop". This will
allow plumbing vcpu_stop into the guest so that the vCPU doesn't need to
periodically exit to userspace just to see if it should stop.
Add a comment explaining that marking all pages as dirty is problematic
for the dirty ring, as it results in the guest getting stuck on "ring
full". This could be addressed by adding a GUEST_SYNC() in that initial
loop, but it's not clear how that would interact with s390's behavior.
Link: https://lore.kernel.org/r/20250111003004.1235645-10-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
s390 specific workaround causes the dirty-log mode of the test to dirty
all guest memory on the first iteration, which is very slow when the test
is run in a nested VM.
Limit this workaround to s390x.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20250111003004.1235645-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Continue collecting entries from the dirty ring for the entire time the
vCPU is running. Collecting exactly once all but guarantees the vCPU will
encounter a "ring full" event and stop. While testing ring full is
interesting, stopping and doing nothing is not, especially for larger
intervals as the test effectively does nothing for a much longer time.
To balance continuous collection with letting the guest make forward
progress, chunk the interval waiting into 1ms loops (which also makes
the math dead simple).
To maintain coverage for "ring full", collect entries on subsequent
iterations if and only if the ring has been filled at least once. I.e.
let the ring fill up (if the interval allows), but after that contiuously
empty it so that the vCPU can keep running.
Opportunistically drop unnecessary zero-initialization of "count".
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20250111003004.1235645-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Cache the page's value during verification in a local variable, re-reading
from the pointer is ugly and error prone, e.g. allows for bugs like
checking the pointer itself instead of the value.
No functional change intended.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20250111003004.1235645-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Track and print the number of dirty and clear pages for each iteration.
This provides parity between all log modes, and will allow collecting the
dirty ring multiple times per iteration without spamming the console.
Opportunistically drop the "Dirtied N pages" print, which is redundant
and wrong. For the dirty ring testcase, the vCPU isn't guaranteed to
complete a loop. And when the vCPU does complete a loot, there are no
guarantees that it has *dirtied* that many pages; because the writes are
to random address, the vCPU may have written the same page over and over,
i.e. only dirtied one page.
While the number of writes performed by the vCPU is also interesting,
e.g. the pr_info() could be tweaked to use different verbiage, pages_count
doesn't correctly track the number of writes either (because loops aren't
guaranteed to a complete). Delete the print for now, as a future patch
will precisely track the number of writes, at which point the verification
phase can report the number of writes performed by each iteration.
Link: https://lore.kernel.org/r/20250111003004.1235645-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Drop an srandom() initialization that was leftover from the conversion to
use selftests' guest_random_xxx() APIs.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20250111003004.1235645-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Drop the signal/kick from dirty_log_test's dirty ring handling, as kicking
the vCPU adds marginal value, at the cost of adding significant complexity
to the test.
Asynchronously interrupting the vCPU isn't novel; unless the kernel is
fully tickless, the vCPU will be interrupted by IRQs for any decently
large interval.
And exiting to userspace mode in the middle of a sequence isn't novel
either, as the vCPU will do so every time the ring becomes full.
Link: https://lore.kernel.org/r/20250111003004.1235645-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Sync the new iteration to the guest prior to restarting the vCPU, otherwise
it's possible for the vCPU to dirty memory for the next iteration using the
current iteration's value.
Note, because the guest can be interrupted between the vCPU's load of the
iteration and its write to memory, it's still possible for the guest to
store the previous iteration to memory as the previous iteration may be
cached in a CPU register (which the test accounts for).
Note #2, the test's current approach of collecting dirty entries *before*
stopping the vCPU also results dirty memory having the previous iteration.
E.g. if page is dirtied in the previous iteration, but not the current
iteration, the verification phase will observe the previous iteration's
value in memory. That wart will be remedied in the near future, at which
point synchronizing the iteration before restarting the vCPU will guarantee
the only way for verification to observe stale iterations is due to the
CPU register caching case, or due to a dirty entry being collected before
the store retires.
Link: https://lore.kernel.org/r/20250111003004.1235645-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
If dirty_log_test is run nested, it is possible for entries in the emulated
PML log to appear before the actual memory write is committed to the RAM,
due to the way KVM retries memory writes as a response to a MMU fault.
In addition to that in some very rare cases retry can happen more than
once, which will lead to the test failure because once the write is
finally committed it may have a very outdated iteration value.
Detect and avoid this case.
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20250111003004.1235645-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Use KVM_ASM_SAFE_FEP, not simply KVM_ASM_SAFE, for kvm_asm_safe_fep(), as
the non-FEP version doesn't force emulation (stating the obvious). Note,
there are currently no users of kvm_asm_safe_fep().
Fixes: ab3b6a7de8df ("KVM: selftests: Add a forced emulation variation of KVM_ASM_SAFE()")
Link: https://lore.kernel.org/r/20250130163135.270770-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Correctly clean the BSS to the PoC before allowing EL2 to access it
on nVHE/hVHE/protected configurations
- Propagate ownership of debug registers in protected mode after the
rework that landed in 6.14-rc1
- Stop pretending that we can run the protected mode without a GICv3
being present on the host
- Fix a use-after-free situation that can occur if a vcpu fails to
initialise the NV shadow S2 MMU contexts
- Always evaluate the need to arm a background timer for fully
emulated guest timers
- Fix the emulation of EL1 timers in the absence of FEAT_ECV
- Correctly handle the EL2 virtual timer, specially when HCR_EL2.E2H==0
s390:
- move some of the guest page table (gmap) logic into KVM itself,
inching towards the final goal of completely removing gmap from the
non-kvm memory management code.
As an initial set of cleanups, move some code from mm/gmap into kvm
and start using __kvm_faultin_pfn() to fault-in pages as needed;
but especially stop abusing page->index and page->lru to aid in the
pgdesc conversion.
x86:
- Add missing check in the fix to defer starting the huge page
recovery vhost_task
- SRSO_USER_KERNEL_NO does not need SYNTHESIZED_F"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (31 commits)
KVM: x86/mmu: Ensure NX huge page recovery thread is alive before waking
KVM: remove kvm_arch_post_init_vm
KVM: selftests: Fix spelling mistake "initally" -> "initially"
kvm: x86: SRSO_USER_KERNEL_NO is not synthesized
KVM: arm64: timer: Don't adjust the EL2 virtual timer offset
KVM: arm64: timer: Correctly handle EL1 timer emulation when !FEAT_ECV
KVM: arm64: timer: Always evaluate the need for a soft timer
KVM: arm64: Fix nested S2 MMU structures reallocation
KVM: arm64: Fail protected mode init if no vgic hardware is present
KVM: arm64: Flush/sync debug state in protected mode
KVM: s390: selftests: Streamline uc_skey test to issue iske after sske
KVM: s390: remove the last user of page->index
KVM: s390: move PGSTE softbits
KVM: s390: remove useless page->index usage
KVM: s390: move gmap_shadow_pgt_lookup() into kvm
KVM: s390: stop using lists to keep track of used dat tables
KVM: s390: stop using page->index for non-shadow gmaps
KVM: s390: move some gmap shadowing functions away from mm/gmap.c
KVM: s390: get rid of gmap_translate()
KVM: s390: get rid of gmap_fault()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp fix from Kees Cook:
"This is really a work-around for x86_64 having grown a syscall to
implement uretprobe, which has caused problems since v6.11.
This may change in the future, but for now, this fixes the unintended
seccomp filtering when uretprobe switched away from traps, and does so
with something that should be easy to backport.
- Allow uretprobe on x86_64 to avoid behavioral complications (Eyal
Birger)"
* tag 'seccomp-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
selftests/seccomp: validate uretprobe syscall passes through seccomp
seccomp: passthrough uretprobe systemcall without filtering
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Fix fsnotify FMODE_NONOTIFY* handling.
This also disables fsnotify on all pseudo files by default apart from
very select exceptions. This carries a regression risk so we need to
watch out and adapt accordingly. However, it is overall a significant
improvement over the current status quo where every rando file can
get fsnotify enabled.
- Cleanup and simplify lockref_init() after recent lockref changes.
- Fix vboxfs build with gcc-15.
- Add an assert into inode_set_cached_link() to catch corrupt links.
- Allow users to also use an empty string check to detect whether a
given mount option string was empty or not.
- Fix how security options were appended to statmount()'s ->mnt_opt
field.
- Fix statmount() selftests to always check the returned mask.
- Fix uninitialized value in vfs_statx_path().
- Fix pidfs_ioctl() sanity checks to guard against ioctl() overloading
and preserve extensibility.
* tag 'vfs-6.14-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
vfs: sanity check the length passed to inode_set_cached_link()
pidfs: improve ioctl handling
fsnotify: disable pre-content and permission events by default
selftests: always check mask returned by statmount(2)
fsnotify: disable notification by default for all pseudo files
fs: fix adding security options to statmount.mnt_opt
fsnotify: use accessor to set FMODE_NONOTIFY_*
lockref: remove count argument of lockref_init
gfs2: switch to lockref_init(..., 1)
gfs2: use lockref_init for gl_lockref
statmount: let unset strings be empty
vboxsf: fix building with GCC 15
fs/stat.c: avoid harmless garbage value problem in vfs_statx_path()
|
|
STATMOUNT_MNT_OPTS can actually be missing if there are no options. This
is a change of behavior since 75ead69a7173 ("fs: don't let statmount return
empty strings").
The other checks shouldn't actually trigger, but add them for correctness
and for easier debugging if the test fails.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Link: https://lore.kernel.org/r/20250129160641.35485-1-mszeredi@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The uretprobe syscall is implemented as a performance enhancement on
x86_64 by having the kernel inject a call to it on function exit; User
programs cannot call this system call explicitly.
As such, this syscall is considered a kernel implementation detail and
should not be filtered by seccomp.
Enhance the seccomp bpf test suite to check that uretprobes can be
attached to processes without the killing the process regardless of
seccomp policy.
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Link: https://lore.kernel.org/r/20250202162921.335813-3-eyal.birger@gmail.com
[kees: Skip archs without __NR_uretprobe]
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Interestingly the recent kmemleak improvements allowed our CI to catch
a couple of percpu leaks addressed here.
We (mostly Jakub, to be accurate) are working to increase review
coverage over the net code-base tweaking the MAINTAINER entries.
Current release - regressions:
- core: harmonize tstats and dstats
- ipv6: fix dst refleaks in rpl, seg6 and ioam6 lwtunnels
- eth: tun: revert fix group permission check
- eth: stmmac: revert "specify hardware capability value when FIFO
size isn't specified"
Previous releases - regressions:
- udp: gso: do not drop small packets when PMTU reduces
- rxrpc: fix race in call state changing vs recvmsg()
- eth: ice: fix Rx data path for heavy 9k MTU traffic
- eth: vmxnet3: fix tx queue race condition with XDP
Previous releases - always broken:
- sched: pfifo_tail_enqueue: drop new packet when sch->limit == 0
- ethtool: ntuple: fix rss + ring_cookie check
- rxrpc: fix the rxrpc_connection attend queue handling
Misc:
- recognize Kuniyuki Iwashima as a maintainer"
* tag 'net-6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits)
Revert "net: stmmac: Specify hardware capability value when FIFO size isn't specified"
MAINTAINERS: add a sample ethtool section entry
MAINTAINERS: add entry for ethtool
rxrpc: Fix race in call state changing vs recvmsg()
rxrpc: Fix call state set to not include the SERVER_SECURING state
net: sched: Fix truncation of offloaded action statistics
tun: revert fix group permission check
selftests/tc-testing: Add a test case for qdisc_tree_reduce_backlog()
netem: Update sch->q.qlen before qdisc_tree_reduce_backlog()
selftests/tc-testing: Add a test case for pfifo_head_drop qdisc when limit==0
pfifo_tail_enqueue: Drop new packet when sch->limit == 0
selftests: mptcp: connect: -f: no reconnect
net: rose: lock the socket in rose_bind()
net: atlantic: fix warning during hot unplug
rxrpc: Fix the rxrpc_connection attend queue handling
net: harmonize tstats and dstats
selftests: drv-net: rss_ctx: don't fail reconfigure test if queue offset not supported
selftests: drv-net: rss_ctx: add missing cleanup in queue reconfigure
ethtool: ntuple: fix rss + ring_cookie check
ethtool: rss: fix hiding unsupported fields in dumps
...
|
|
Integrate the test case provided by Mingi Cho into TDC.
All test results:
1..4
ok 1 ca5e - Check class delete notification for ffff:
ok 2 e4b7 - Check class delete notification for root ffff:
ok 3 33a9 - Check ingress is not searchable on backlog update
ok 4 a4b9 - Test class qlen notification
Cc: Mingi Cho <mincho@theori.io>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Link: https://patch.msgid.link/20250204005841.223511-5-xiyou.wangcong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When limit == 0, pfifo_tail_enqueue() must drop new packet and
increase dropped packets count of the qdisc.
All test results:
1..16
ok 1 a519 - Add bfifo qdisc with system default parameters on egress
ok 2 585c - Add pfifo qdisc with system default parameters on egress
ok 3 a86e - Add bfifo qdisc with system default parameters on egress with handle of maximum value
ok 4 9ac8 - Add bfifo qdisc on egress with queue size of 3000 bytes
ok 5 f4e6 - Add pfifo qdisc on egress with queue size of 3000 packets
ok 6 b1b1 - Add bfifo qdisc with system default parameters on egress with invalid handle exceeding maximum value
ok 7 8d5e - Add bfifo qdisc on egress with unsupported argument
ok 8 7787 - Add pfifo qdisc on egress with unsupported argument
ok 9 c4b6 - Replace bfifo qdisc on egress with new queue size
ok 10 3df6 - Replace pfifo qdisc on egress with new queue size
ok 11 7a67 - Add bfifo qdisc on egress with queue size in invalid format
ok 12 1298 - Add duplicate bfifo qdisc on egress
ok 13 45a0 - Delete nonexistent bfifo qdisc
ok 14 972b - Add prio qdisc on egress with invalid format for handles
ok 15 4d39 - Delete bfifo qdisc twice
ok 16 d774 - Check pfifo_head_drop qdisc enqueue behaviour when limit == 0
Signed-off-by: Quang Le <quanglex97@gmail.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Link: https://patch.msgid.link/20250204005841.223511-3-xiyou.wangcong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The '-f' parameter is there to force the kernel to emit MPTCP FASTCLOSE
by closing the connection with unread bytes in the receive queue.
The xdisconnect() helper was used to stop the connection, but it does
more than that: it will shut it down, then wait before reconnecting to
the same address. This causes the mptcp_join's "fastclose test" to fail
all the time.
This failure is due to a recent change, with commit 218cc166321f
("selftests: mptcp: avoid spurious errors on disconnect"), but that went
unnoticed because the test is currently ignored. The recent modification
only shown an existing issue: xdisconnect() doesn't need to be used
here, only the shutdown() part is needed.
Fixes: 6bf41020b72b ("selftests: mptcp: update and extend fastclose test-cases")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250204-net-mptcp-sft-conn-f-v1-1-6b470c72fffa@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching
Pull livepatching fix from Petr Mladek:
- Fix livepatching selftests for util-linux-2.40.x
* tag 'livepatching-for-6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
selftests: livepatch: handle PRINTK_CALLER in check_result()
|
|
There is a spelling mistake in a literal string and in the function
test_get_inital_dirty. Fix them.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Message-ID: <20250204105647.367743-1-colin.i.king@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
- some selftest fixes
- move some kvm-related functions from mm into kvm
- remove all usage of page->index and page->lru from kvm
- fixes and cleanups for vsie
|
|
supported
Vast majority of drivers does not support queue offset.
Simply return if the rss context + queue ntuple fails.
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250201013040.725123-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit under Fixes adds ntuple rules but never deletes them.
Fixes: 29a4bc1fe961 ("selftest: extend test_rss_context_queue_reconfigure for action addition")
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250201013040.725123-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 4094871db1d6 ("udp: only do GSO if # of segs > 1") avoided GSO
for small packets. But the kernel currently dismisses GSO requests only
after checking MTU/PMTU on gso_size. This means any packets, regardless
of their payload sizes, could be dropped when PMTU becomes smaller than
requested gso_size. We encountered this issue in production and it
caused a reliability problem that new QUIC connection cannot be
established before PMTU cache expired, while non GSO sockets still
worked fine at the same time.
Ideally, do not check any GSO related constraints when payload size is
smaller than requested gso_size, and return EMSGSIZE instead of EINVAL
on MTU/PMTU check failure to be more specific on the error cause.
Fixes: 4094871db1d6 ("udp: only do GSO if # of segs > 1")
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull turbostat updates from Len Brown:
- Fix regression that affinitized forked child in one-shot mode.
- Harden one-shot mode against hotplug online/offline
- Enable RAPL SysWatt column by default
- Add initial PTL, CWF platform support
- Harden initial PMT code in response to early use
- Enable first built-in PMT counter: CWF c1e residency
- Refuse to run on unsupported platforms without --force, to encourage
updating to a version that supports the system, and to avoid
no-so-useful measurement results
* tag 'turbostat-2025.02.02' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (25 commits)
tools/power turbostat: version 2025.02.02
tools/power turbostat: Add CPU%c1e BIC for CWF
tools/power turbostat: Harden one-shot mode against cpu offline
tools/power turbostat: Fix forked child affinity regression
tools/power turbostat: Add tcore clock PMT type
tools/power turbostat: version 2025.01.14
tools/power turbostat: Allow adding PMT counters directly by sysfs path
tools/power turbostat: Allow mapping multiple PMT files with the same GUID
tools/power turbostat: Add PMT directory iterator helper
tools/power turbostat: Extend PMT identification with a sequence number
tools/power turbostat: Return default value for unmapped PMT domains
tools/power turbostat: Check for non-zero value when MSR probing
tools/power turbostat: Enhance turbostat self-performance visibility
tools/power turbostat: Add fixed RAPL PSYS divisor for SPR
tools/power turbostat: Fix PMT mmaped file size rounding
tools/power turbostat: Remove SysWatt from DISABLED_BY_DEFAULT
tools/power turbostat: Add an NMI column
tools/power turbostat: add Busy% to "show idle"
tools/power turbostat: Introduce --force parameter
tools/power turbostat: Improve --help output
...
|
|
Summary of Changes since 2024.11.30:
Fix regression in 2023.11.07 that affinitized forked child
in one-shot mode.
Harden one-shot mode against hotplug online/offline
Enable RAPL SysWatt column by default.
Add initial PTL, CWF platform support.
Harden initial PMT code in response to early use.
Enable first built-in PMT counter: CWF c1e residency
Refuse to run on unsupported platforms without --force,
to encourage updating to a version that supports the system,
and to avoid no-so-useful measurement results.
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull AT_EXECVE_CHECK selftest fix from Kees Cook:
"Fixes the AT_EXECVE_CHECK selftests which didn't run on old versions
of glibc"
* tag 'AT_EXECVE_CHECK-v6.14-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
selftests: Handle old glibc without execveat(2)
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- The PH1520 pinctrl and dwmac drivers are enabeled in defconfig
- A redundant AQRL barrier has been removed from the futex cmpxchg
implementation
- Support for the T-Head vector extensions, which includes exposing
these extensions to userspace on systems that implement them
- Some more page table information is now printed on die() and systems
that cause PA overflows
* tag 'riscv-for-linus-6.14-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: add a warning when physical memory address overflows
riscv/mm/fault: add show_pte() before die()
riscv: Add ghostwrite vulnerability
selftests: riscv: Support xtheadvector in vector tests
selftests: riscv: Fix vector tests
riscv: hwprobe: Document thead vendor extensions and xtheadvector extension
riscv: hwprobe: Add thead vendor extension probing
riscv: vector: Support xtheadvector save/restore
riscv: Add xtheadvector instruction definitions
riscv: csr: Add CSR encodings for CSR_VXRM/CSR_VXSAT
RISC-V: define the elements of the VCSR vector CSR
riscv: vector: Use vlenb from DT for thead
riscv: Add thead and xtheadvector as a vendor extension
riscv: dts: allwinner: Add xtheadvector to the D1/D1s devicetree
dt-bindings: cpus: add a thead vlen register length property
dt-bindings: riscv: Add xtheadvector ISA extension description
RISC-V: Mark riscv_v_init() as __init
riscv: defconfig: drop RT_GROUP_SCHED=y
riscv/futex: Optimize atomic cmpxchg
riscv: defconfig: enable pinctrl and dwmac support for TH1520
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
- The biggest changes are the TLB flushing scalability optimizations,
to update the mm_cpumask lazily and related changes.
This feature has both a track record and a continued risk of
performance regressions, so it was already delayed by a cycle - but
it's all 100% perfect now™ (Rik van Riel)
- Also miscellaneous fixes and cleanups. (Gautam Somani, Kirill
Shutemov, Sebastian Andrzej Siewior)
* tag 'x86-mm-2025-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Remove unnecessary include of <linux/extable.h>
x86/mtrr: Rename mtrr_overwrite_state() to guest_force_mtrr_state()
x86/mm/selftests: Fix typo in lam.c
x86/mm/tlb: Only trim the mm_cpumask once a second
x86/mm/tlb: Also remove local CPU from mm_cpumask if stale
x86/mm/tlb: Add tracepoint for TLB flush IPI to stale CPU
x86/mm/tlb: Update mm_cpumask lazily
|
|
In some rare situations a non default storage key is already set on the
memory used by the test. Within normal VMs the key is reset / zapped
when the memory is added to the VM. This is not the case for ucontrol
VMs. With the initial iske check removed this test case can work in all
situations. The function of the iske instruction is still validated by
the remaining code.
Fixes: 0185fbc6a2d3 ("KVM: s390: selftests: Add uc_skey VM test case")
Signed-off-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Link: https://lore.kernel.org/r/20250128131803.1047388-1-schlameuss@linux.ibm.com
Message-ID: <20250128131803.1047388-1-schlameuss@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
|
|
With the latest patch, attempting to create a memslot from userspace
will result in an EEXIST error for UCONTROL VMs, instead of EINVAL,
since the new memslot will collide with the internal memslot. There is
no simple way to bring back the previous behaviour.
This is not a problem, but the test needs to be fixed accordingly.
Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Link: https://lore.kernel.org/r/20250123144627.312456-5-imbrenda@linux.ibm.com
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20250123144627.312456-5-imbrenda@linux.ibm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools fixes from Namhyung Kim:
"An early round of random fixes in perf tools for this cycle.
perf trace:
- Fix loading of BPF program on certain clang versions
- Fix out-of-bound access in syscalls with 6 arguments
- Skip syscall enum test if landlock syscall is not available
perf annotate:
- Fix segfaults due to invalid access in disasm arrays
perf stat:
- Fix error handling in topology parsing"
* tag 'perf-tools-fixes-for-v6.14-2025-01-30' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
perf cpumap: Fix die and cluster IDs
perf test: Skip syscall enum test if no landlock syscall
perf trace: Fix runtime error of index out of bounds
perf annotate: Use an array for the disassembler preference
perf trace: Fix BPF loading failure (-E2BIG)
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from IPSec, netfilter and Bluetooth.
Nothing really stands out, but as usual there's a slight concentration
of fixes for issues added in the last two weeks before the merge
window, and driver bugs from 6.13 which tend to get discovered upon
wider distribution.
Current release - regressions:
- net: revert RTNL changes in unregister_netdevice_many_notify()
- Bluetooth: fix possible infinite recursion of btusb_reset
- eth: adjust locking in some old drivers which protect their state
with spinlocks to avoid sleeping in atomic; core protects netdev
state with a mutex now
Previous releases - regressions:
- eth:
- mlx5e: make sure we pass node ID, not CPU ID to kvzalloc_node()
- bgmac: reduce max frame size to support just 1500 bytes; the
jumbo frame support would previously cause OOB writes, but now
fails outright
- mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted, avoid
false detection of MPTCP blackholing
Previous releases - always broken:
- mptcp: handle fastopen disconnect correctly
- xfrm:
- make sure skb->sk is a full sock before accessing its fields
- fix taking a lock with preempt disabled for RT kernels
- usb: ipheth: improve safety of packet metadata parsing; prevent
potential OOB accesses
- eth: renesas: fix missing rtnl lock in suspend/resume path"
* tag 'net-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
MAINTAINERS: add Neal to TCP maintainers
net: revert RTNL changes in unregister_netdevice_many_notify()
net: hsr: fix fill_frame_info() regression vs VLAN packets
doc: mptcp: sysctl: blackhole_timeout is per-netns
mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted
netfilter: nf_tables: reject mismatching sum of field_len with set key length
net: sh_eth: Fix missing rtnl lock in suspend/resume path
net: ravb: Fix missing rtnl lock in suspend/resume path
selftests/net: Add test for loading devbound XDP program in generic mode
net: xdp: Disallow attaching device-bound programs in generic mode
tcp: correct handling of extreme memory squeeze
bgmac: reduce max frame size to support just MTU 1500
vsock/test: Add test for connect() retries
vsock/test: Add test for UAF due to socket unbinding
vsock/test: Introduce vsock_connect_fd()
vsock/test: Introduce vsock_bind()
vsock: Allow retrying on connect() failure
vsock: Keep the binding until socket destruction
Bluetooth: L2CAP: accept zero as a special value for MTU auto-selection
Bluetooth: btnxpuart: Fix glitches seen in dual A2DP streaming
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- update gpio-sim selftests to not fail now that we no longer allow
rmdir() on configfs entries of active devices
- remove leftover code from gpio-mxc
* tag 'gpio-fixes-for-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
selftests: gpio: gpio-sim: Fix missing chip disablements
gpio: mxc: remove dead code after switch to DT-only
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs d_revalidate updates from Al Viro:
"Provide stable parent and name to ->d_revalidate() instances
Most of the filesystem methods where we care about dentry name and
parent have their stability guaranteed by the callers;
->d_revalidate() is the major exception.
It's easy enough for callers to supply stable values for expected name
and expected parent of the dentry being validated. That kills quite a
bit of boilerplate in ->d_revalidate() instances, along with a bunch
of races where they used to access ->d_name without sufficient
precautions"
* tag 'pull-revalidate' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
9p: fix ->rename_sem exclusion
orangefs_d_revalidate(): use stable parent inode and name passed by caller
ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller
nfs: fix ->d_revalidate() UAF on ->d_name accesses
nfs{,4}_lookup_validate(): use stable parent inode passed by caller
gfs2_drevalidate(): use stable parent inode and name passed by caller
fuse_dentry_revalidate(): use stable parent inode and name passed by caller
vfat_revalidate{,_ci}(): use stable parent inode passed by caller
exfat_d_revalidate(): use stable parent inode passed by caller
fscrypt_d_revalidate(): use stable parent inode passed by caller
ceph_d_revalidate(): propagate stable name down into request encoding
ceph_d_revalidate(): use stable parent inode passed by caller
afs_d_revalidate(): use stable name and parent inode passed by caller
Pass parent directory inode and expected name to ->d_revalidate()
generic_ci_d_compare(): use shortname_storage
ext4 fast_commit: make use of name_snapshot primitives
dissolve external_name.u into separate members
make take_dentry_name_snapshot() lockless
dcache: back inline names with a struct-wrapped array of unsigned long
make sure that DNAME_INLINE_LEN is a multiple of word size
|
|
Add a test to bpf_offload.py for loading a devbound XDP program in
generic mode, checking that it fails correctly.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250127131344.238147-2-toke@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Deliberately fail a connect() attempt; expect error. Then verify that
subsequent attempt (using the same socket) can still succeed, rather than
fail outright.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://patch.msgid.link/20250128-vsock-transport-vs-autobind-v3-6-1cf57065b770@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|