Age | Commit message (Collapse) | Author |
|
Pull more dma-mapping updates from Christoph Hellwig:
- fix a regression in dma remap handling vs AMD memory encryption (me)
- finally kill off the legacy PCI DMA API (Christophe JAILLET)
* tag 'dma-mapping-5.18-1' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: move pgprot_decrypted out of dma_pgprot
PCI/doc: cleanup references to the legacy PCI DMA API
PCI: Remove the deprecated "pci-dma-compat.h" API
|
|
Pull ARM fixes from Russell King:
- avoid unnecessary rebuilds for library objects
- fix return value of __setup handlers
- fix invalid input check for "crashkernel=" kernel option
- silence KASAN warnings in unwind_frame
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 9191/1: arm/stacktrace, kasan: Silence KASAN warnings in unwind_frame()
ARM: 9190/1: kdump: add invalid input check for 'crashkernel=0'
ARM: 9187/1: JIVE: fix return value of __setup handler
ARM: 9189/1: decompressor: fix unneeded rebuilds of library objects
|
|
This reverts commit 7dabfa2bc4803eed83d6f22bd6f045495f40636b. There are
multiple reports that this breaks boot on various systems. The common
theme is that orphan clks are having rates set on them when that isn't
expected. Let's revert it out for now so that -rc1 boots.
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reported-by: Tony Lindgren <tony@atomide.com>
Reported-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Link: https://lore.kernel.org/r/366a0232-bb4a-c357-6aa8-636e398e05eb@samsung.com
Cc: Maxime Ripard <maxime@cerno.tech>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Link: https://lore.kernel.org/r/20220403022818.39572-1-sboyd@kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull more perf tools updates from Arnaldo Carvalho de Melo:
- Avoid SEGV if core.cpus isn't set in 'perf stat'.
- Stop depending on .git files for building PERF-VERSION-FILE, used in
'perf --version', fixing some perf tools build scenarios.
- Convert tracepoint.py example to python3.
- Update UAPI header copies from the kernel sources: socket,
mman-common, msr-index, KVM, i915 and cpufeatures.
- Update copy of libbpf's hashmap.c.
- Directly return instead of using local ret variable in
evlist__create_syswide_maps(), found by coccinelle.
* tag 'perf-tools-for-v5.18-2022-04-02' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf python: Convert tracepoint.py example to python3
perf evlist: Directly return instead of using local ret variable
perf cpumap: More cpu map reuse by merge.
perf cpumap: Add is_subset function
perf evlist: Rename cpus to user_requested_cpus
perf tools: Stop depending on .git files for building PERF-VERSION-FILE
tools headers cpufeatures: Sync with the kernel sources
tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
tools headers UAPI: Sync linux/kvm.h with the kernel sources
tools kvm headers arm64: Update KVM headers from the kernel sources
tools arch x86: Sync the msr-index.h copy with the kernel sources
tools headers UAPI: Sync asm-generic/mman-common.h with the kernel
perf beauty: Update copy of linux/socket.h with the kernel sources
perf tools: Update copy of libbpf's hashmap.c
perf stat: Avoid SEGV if core.cpus isn't set
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Fix empty $(PYTHON) expansion.
- Fix UML, which got broken by the attempt to suppress Clang warnings.
- Fix warning message in modpost.
* tag 'kbuild-fixes-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
modpost: restore the warning message for missing symbol versions
Revert "um: clang: Strip out -mno-global-merge from USER_CFLAGS"
kbuild: Remove '-mno-global-merge'
kbuild: fix empty ${PYTHON} in scripts/link-vmlinux.sh
kconfig: remove stale comment about removed kconfig_print_symbol()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fixes from Thomas Bogendoerfer:
- build fix for gpio
- fix crc32 build problems
- check for failed memory allocations
* tag 'mips_5.18_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: crypto: Fix CRC32 code
MIPS: rb532: move GPIOD definition into C-files
MIPS: lantiq: check the return value of kzalloc()
mips: sgi-ip22: add a check for the return of kzalloc()
|
|
Pull kvm fixes from Paolo Bonzini:
- Only do MSR filtering for MSRs accessed by rdmsr/wrmsr
- Documentation improvements
- Prevent module exit until all VMs are freed
- PMU Virtualization fixes
- Fix for kvm_irq_delivery_to_apic_fast() NULL-pointer dereferences
- Other miscellaneous bugfixes
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits)
KVM: x86: fix sending PV IPI
KVM: x86/mmu: do compare-and-exchange of gPTE via the user address
KVM: x86: Remove redundant vm_entry_controls_clearbit() call
KVM: x86: cleanup enter_rmode()
KVM: x86: SVM: fix tsc scaling when the host doesn't support it
kvm: x86: SVM: remove unused defines
KVM: x86: SVM: move tsc ratio definitions to svm.h
KVM: x86: SVM: fix avic spec based definitions again
KVM: MIPS: remove reference to trap&emulate virtualization
KVM: x86: document limitations of MSR filtering
KVM: x86: Only do MSR filtering when access MSR by rdmsr/wrmsr
KVM: x86/emulator: Emulate RDPID only if it is enabled in guest
KVM: x86/pmu: Fix and isolate TSX-specific performance event logic
KVM: x86: mmu: trace kvm_mmu_set_spte after the new SPTE was set
KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs
KVM: x86: Trace all APICv inhibit changes and capture overall status
KVM: x86: Add wrappers for setting/clearing APICv inhibits
KVM: x86: Make APICv inhibit reasons an enum and cleanup naming
KVM: X86: Handle implicit supervisor access with SMAP
KVM: X86: Rename variable smap to not_smap in permission_fault()
...
|
|
This log message was accidentally chopped off.
I was wondering why this happened, but checking the ML log, Mark
precisely followed my suggestion [1].
I just used "..." because I was too lazy to type the sentence fully.
Sorry for the confusion.
[1]: https://lore.kernel.org/all/CAK7LNAR6bXXk9-ZzZYpTqzFqdYbQsZHmiWspu27rtsFxvfRuVA@mail.gmail.com/
Fixes: 4a6795933a89 ("kbuild: modpost: Explicitly warn about unprototyped symbols")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
|
|
Pull block driver fix from Jens Axboe:
"Got two reports on nbd spewing warnings on load now, which is a
regression from a commit that went into your tree yesterday.
Revert the problematic change for now"
* tag 'for-5.18/drivers-2022-04-02' of git://git.kernel.dk/linux-block:
Revert "nbd: fix possible overflow on 'first_minor' in nbd_dev_add()"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci fix from Bjorn Helgaas:
- Fix Hyper-V "defined but not used" build issue added during merge
window (YueHaibing)
* tag 'pci-v5.18-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: hv: Remove unused hv_set_msi_entry_from_desc()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
Pull chrome platform updates from Benson Leung:
"cros_ec_typec:
- Check for EC device - Fix a crash when using the cros_ec_typec
driver on older hardware not capable of typec commands
- Make try power role optional
- Mux configuration reorganization series from Prashant
cros_ec_debugfs:
- Fix use after free. Thanks Tzung-bi
sensorhub:
- cros_ec_sensorhub fixup - Split trace include file
misc:
- Add new mailing list for chrome-platform development:
chrome-platform@lists.linux.dev
Now with patchwork!"
* tag 'tag-chrome-platform-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
platform/chrome: cros_ec_debugfs: detach log reader wq from devm
platform: chrome: Split trace include file
platform/chrome: cros_ec_typec: Update mux flags during partner removal
platform/chrome: cros_ec_typec: Configure muxes at start of port update
platform/chrome: cros_ec_typec: Get mux state inside configure_mux
platform/chrome: cros_ec_typec: Move mux flag checks
platform/chrome: cros_ec_typec: Check for EC device
platform/chrome: cros_ec_typec: Make try power role optional
MAINTAINERS: platform-chrome: Add new chrome-platform@lists.linux.dev list
|
|
This reverts commit 6d35d04a9e18990040e87d2bbf72689252669d54.
Both Gabriel and Borislav report that this commit casues a regression
with nbd:
sysfs: cannot create duplicate filename '/dev/block/43:0'
Revert it before 5.18-rc1 and we'll investigage this separately in
due time.
Link: https://lore.kernel.org/all/YkiJTnFOt9bTv6A2@zn.tnic/
Reported-by: Gabriel L. Somlo <somlo@cmu.edu>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Commit 7ea1a0124b6d ("watch_queue: Free the alloc bitmap when the
watch_queue is torn down") took care of the bitmap, but not the page
array.
BUG: memory leak
unreferenced object 0xffff88810d9bc140 (size 32):
comm "syz-executor335", pid 3603, jiffies 4294946994 (age 12.840s)
hex dump (first 32 bytes):
40 a7 40 04 00 ea ff ff 00 00 00 00 00 00 00 00 @.@.............
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
kmalloc_array include/linux/slab.h:621 [inline]
kcalloc include/linux/slab.h:652 [inline]
watch_queue_set_size+0x12f/0x2e0 kernel/watch_queue.c:251
pipe_ioctl+0x82/0x140 fs/pipe.c:632
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:874 [inline]
__se_sys_ioctl fs/ioctl.c:860 [inline]
__x64_sys_ioctl+0xfc/0x140 fs/ioctl.c:860
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
Reported-by: syzbot+25ea042ae28f3888727a@syzkaller.appspotmail.com
Fixes: c73be61cede5 ("pipe: Add general notification queue support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/r/20220322004654.618274-1-eric.dumazet@gmail.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
After being merged, user_events become more visible to a wider audience
that have concerns with the current API.
It is too late to fix this for this release, but instead of a full
revert, just mark it as BROKEN (which prevents it from being selected in
make config). Then we can work finding a better API. If that fails,
then it will need to be completely reverted.
To not have the code silently bitrot, still allow building it with
COMPILE_TEST.
And to prevent the uapi header from being installed, then later changed,
and then have an old distro user space see the old version, move the
header file out of the uapi directory.
Surround the include with CONFIG_COMPILE_TEST to the current location,
but when the BROKEN tag is taken off, it will use the uapi directory,
and fail to compile. This is a good way to remind us to move the header
back.
Link: https://lore.kernel.org/all/20220330155835.5e1f6669@gandalf.local.home
Link: https://lkml.kernel.org/r/20220330201755.29319-1-mathieu.desnoyers@efficios.com
Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
While user_events API is under development and has been marked for broken
to not let the API become fixed, move the header file out of the uapi
directory. This is to prevent it from being installed, then later changed,
and then have an old distro user space update with a new kernel, where
applications see the user_events being available, but the old header is in
place, and then they get compiled incorrectly.
Also, surround the include with CONFIG_COMPILE_TEST to the current
location, but when the BROKEN tag is taken off, it will use the uapi
directory, and fail to compile. This is a good way to remind us to move
the header back.
Link: https://lore.kernel.org/all/20220330155835.5e1f6669@gandalf.local.home
Link: https://lkml.kernel.org/r/20220330201755.29319-1-mathieu.desnoyers@efficios.com
Link: https://lkml.kernel.org/r/20220401143903.188384f3@gandalf.local.home
Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
ftrace_graph_is_dead() is used on hot paths, it just reads a variable
in memory and is not worth suffering function call constraints.
For instance, at entry of prepare_ftrace_return(), inlining it avoids
saving prepare_ftrace_return() parameters to stack and restoring them
after calling ftrace_graph_is_dead().
While at it using a static branch is even more performant and is
rather well adapted considering that the returned value will almost
never change.
Inline ftrace_graph_is_dead() and replace 'kill_ftrace_graph' bool
by a static branch.
The performance improvement is noticeable.
Link: https://lkml.kernel.org/r/e0411a6a0ed3eafff0ad2bc9cd4b0e202b4617df.1648623570.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
After being merged, user_events become more visible to a wider audience
that have concerns with the current API. It is too late to fix this for
this release, but instead of a full revert, just mark it as BROKEN (which
prevents it from being selected in make config). Then we can work finding
a better API. If that fails, then it will need to be completely reverted.
Link: https://lore.kernel.org/all/2059213643.196683.1648499088753.JavaMail.zimbra@efficios.com/
Link: https://lkml.kernel.org/r/20220330155835.5e1f6669@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Remove eBPF interfaces within user_events to ensure they are fully
reviewed.
Link: https://lore.kernel.org/all/20220329165718.GA10381@kbox/
Link: https://lkml.kernel.org/r/20220329173051.10087-1-beaub@linux.microsoft.com
Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Make sure the event_mutex is properly held during dyn_event_add call.
This is required when adding dynamic events.
Link: https://lkml.kernel.org/r/20220328223225.1992-1-beaub@linux.microsoft.com
Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
kzalloc is a memory allocation function which can return NULL when some
internal memory errors happen. It is safer to add null pointer check.
Link: https://lkml.kernel.org/r/20220329104004.2376879-1-lv.ruyi@zte.com.cn
Cc: stable@vger.kernel.org
Fixes: c1a3c36017d4 ("proc: bootconfig: Add /proc/bootconfig to show boot config list")
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
When looking for implementation of different phases of the creation of the
TRACE_EVENT() macro, it is pretty useless when all helper macro
redefinitions are in files labeled "stageX_defines.h". Rename them to
state which phase the files are for. For instance, when looking for the
defines that are used to create the event fields, seeing
"stage4_event_fields.h" gives the developer a good idea that the defines
are in that file.
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
is required
Before Commit c3e5e415bc1e6 ("KVM: X86: Change kvm_sync_page()
to return true when remote flush is needed"), the return value
of kvm_sync_page() indicates whether the page is synced, and
kvm_mmu_get_page() would rebuild page when the sync fails.
But now, kvm_sync_page() returns false when the page is
synced and no tlb flushing is required, which leads to
rebuild page in kvm_mmu_get_page(). So return the return
value of mmu->sync_page() directly and check it in
kvm_mmu_get_page(). If the sync fails, the page will be
zapped and the invalid_list is not empty, so set flush as
true is accepted in mmu_sync_children().
Cc: stable@vger.kernel.org
Fixes: c3e5e415bc1e6 ("KVM: X86: Change kvm_sync_page() to return true when remote flush is needed")
Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
Acked-by: Lai Jiangshan <jiangshanlai@gmail.com>
Message-Id: <0dabeeb789f57b0d793f85d073893063e692032d.1647336064.git.houwenlong.hwl@antgroup.com>
[mmu_sync_children should not flush if the page is zapped. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
kvm_load_{guest|host}_xsave_state handles xsave on vm entry and exit,
part of which is managing memory protection key state. The latest
arch.pkru is updated with a rdpkru, and if that doesn't match the base
host_pkru (which about 70% of the time), we issue a __write_pkru.
To improve performance, implement the following optimizations:
1. Reorder if conditions prior to wrpkru in both
kvm_load_{guest|host}_xsave_state.
Flip the ordering of the || condition so that XFEATURE_MASK_PKRU is
checked first, which when instrumented in our environment appeared
to be always true and less overall work than kvm_read_cr4_bits.
For kvm_load_guest_xsave_state, hoist arch.pkru != host_pkru ahead
one position. When instrumented, I saw this be true roughly ~70% of
the time vs the other conditions which were almost always true.
With this change, we will avoid 3rd condition check ~30% of the time.
2. Wrap PKU sections with CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS,
as if the user compiles out this feature, we should not have
these branches at all.
Signed-off-by: Jon Kohler <jon@nutanix.com>
Message-Id: <20220324004439.6709-1-jon@nutanix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Inhibit the AVIC of the vCPU that is running nested for the duration of the
nested run, so that all interrupts arriving from both its vCPU siblings
and from KVM are delivered using normal IPIs and cause that vCPU to vmexit.
Note that unlike normal AVIC inhibition, there is no need to
update the AVIC mmio memslot, because the nested guest uses its
own set of paging tables.
That also means that AVIC doesn't need to be inhibited VM wide.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322174050.241850-7-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add optional callback .vcpu_get_apicv_inhibit_reasons returning
extra inhibit reasons that prevent APICv from working on this vCPU.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322174050.241850-6-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
In case L1 enables vGIF for L2, the L2 cannot affect L1's GIF, regardless
of STGI/CLGI intercepts, and since VM entry enables GIF, this means
that L1's GIF is always 1 while L2 is running.
Thus in this case leave L1's vGIF in vmcb01, while letting L2
control the vGIF thus implementing nested vGIF.
Also allow KVM to toggle L1's GIF during nested entry/exit
by always using vmcb01.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322174050.241850-5-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Expose the pause filtering and threshold in the guest CPUID
and support PAUSE filtering when possible:
- If the L0 doesn't intercept PAUSE (cpu_pm=on), then allow L1 to
have full control over PAUSE filtering.
- if the L1 doesn't intercept PAUSE, use host values and update
the adaptive count/threshold even when running nested.
- Otherwise always exit to L1; it is not really possible to merge
the fields correctly. It is expected that in this case, userspace
will not enable this feature in the guest CPUID, to avoid having the
guest update both fields pointlessly.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322174050.241850-4-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This was tested with kvm-unit-test that was developed
for this purpose.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322174050.241850-3-mlevitsk@redhat.com>
[Copy all of DEBUGCTL except for reserved bits. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
When L2 is running without LBR virtualization, we should ensure
that L1's LBR msrs continue to update as usual.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322174050.241850-2-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
KVM always uses vgif when allowed, thus there is
no need to query current vmcb for it
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322172449.235575-9-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This makes the code a bit shorter and cleaner.
No functional change intended.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322172449.235575-4-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Clarify that this function is not used to initialize any part of
the vmcb02. No functional change intended.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
If guest kernel is configured with nopvspin, or CONFIG_PARAVIRT_SPINLOCK
is disabled, or guest find its has dedicated pCPUs from realtime hint
feature, the pvspinlock will be disabled, and vCPU preemption check
is disabled too.
Hoever, KVM still can emulating HLT for vCPU for both cases. Checking if a vCPU
is preempted or not can still boost performance in IPI-heavy scenarios such as
unixbench file copy and pipe-based context switching tests: Here the vCPU is
running with a dedicated pCPU, so the guest kernel has nopvspin but is
emulating HLT for the vCPU:
Testcase Base with patch
System Benchmarks Index Values INDEX INDEX
Dhrystone 2 using register variables 3278.4 3277.7
Double-Precision Whetstone 822.8 825.8
Execl Throughput 1296.5 941.1
File Copy 1024 bufsize 2000 maxblocks 2124.2 2142.7
File Copy 256 bufsize 500 maxblocks 1335.9 1353.6
File Copy 4096 bufsize 8000 maxblocks 4256.3 4760.3
Pipe Throughput 1050.1 1054.0
Pipe-based Context Switching 243.3 352.0
Process Creation 820.1 814.4
Shell Scripts (1 concurrent) 2169.0 2086.0
Shell Scripts (8 concurrent) 7710.3 7576.3
System Call Overhead 672.4 673.9
======== =======
System Benchmarks Index Score 1467.2 1483.0
Move the setting of pv_ops.lock.vcpu_is_preempted to kvm_guest_init, so
that it does not depend on pvspinlock.
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Message-Id: <1646815610-43315-1-git-send-email-lirongqing@baidu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20220225145304.36166-4-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Don't snapshot tsc_khz into max_tsc_khz during KVM initialization if the
host TSC is constant, in which case the actual TSC frequency will never
change and thus capturing the "max" TSC during initialization is
unnecessary, KVM can simply use tsc_khz during VM creation.
On CPUs with constant TSC, but not a hardware-specified TSC frequency,
snapshotting max_tsc_khz and using that to set a VM's default TSC
frequency can lead to KVM thinking it needs to manually scale the guest's
TSC if refining the TSC completes after KVM snapshots tsc_khz. The
actual frequency never changes, only the kernel's calculation of what
that frequency is changes. On systems without hardware TSC scaling, this
either puts KVM into "always catchup" mode (extremely inefficient), or
prevents creating VMs altogether.
Ideally, KVM would not be able to race with TSC refinement, or would have
a hook into tsc_refine_calibration_work() to get an alert when refinement
is complete. Avoiding the race altogether isn't practical as refinement
takes a relative eternity; it's deliberately put on a work queue outside
of the normal boot sequence to avoid unnecessarily delaying boot.
Adding a hook is doable, but somewhat gross due to KVM's ability to be
built as a module. And if the TSC is constant, which is likely the case
for every VMX/SVM-capable CPU produced in the last decade, the race can
be hit if and only if userspace is able to create a VM before TSC
refinement completes; refinement is slow, but not that slow.
For now, punt on a proper fix, as not taking a snapshot can help some
uses cases and not taking a snapshot is arguably correct irrespective of
the race with refinement.
[ dwmw2: Rebase on top of KVM-wide default_tsc_khz to ensure that all
vCPUs get the same frequency even if we hit the race. ]
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Anton Romanov <romanton@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20220225145304.36166-3-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This sets the default TSC frequency for subsequently created vCPUs.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20220225145304.36166-2-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The [clang-analyzer-deadcode.DeadStores] helper reports
that the value stored to 'irq' is never read.
Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220301120217.38092-1-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
available
Currently KVM setup posted interrupt VMCS only depending on
per-vcpu APICv activation status at the vCPU creation time.
However, this status can be toggled dynamically under some
circumstance. So potentially, later posted interrupt enabling
may be problematic without VMCS readiness.
To fix this, always settle the VMCS setting for posted interrupt
as long as APICv is available and lapic locates in kernel.
Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Message-Id: <20220315145836.9910-1-guang.zeng@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add test cases for timers in the past, and reading the status of a timer
which has already fired.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20220309143835.253911-3-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Test a combination of event channel send, poll and timer operations.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20220303154127.202856-18-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add support for SCHEDOP_poll hypercall.
This implementation is optimized for polling for a single channel, which
is what Linux does. Polling for multiple channels is not especially
efficient (and has not been tested).
PV spinlocks slow path uses this hypercall, and explicitly crash if it's
not supported.
[ dwmw2: Rework to use kvm_vcpu_halt(), not supported for 32-bit guests ]
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-17-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
At the end of the patch series adding this batch of event channel
acceleration features, finally add the feature bit which advertises
them and document it all.
For SCHEDOP_poll we need to wake a polling vCPU when a given port
is triggered, even when it's masked — and we want to implement that
in the kernel, for efficiency. So we want the kernel to know that it
has sole ownership of event channel delivery. Thus, we allow
userspace to make the 'promise' by setting the corresponding feature
bit in its KVM_XEN_HVM_CONFIG call. As we implement SCHEDOP_poll
bypass later, we will do so only if that promise has been made by
userspace.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-16-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Windows uses a per-vCPU vector, and it's delivered via the local APIC
basically like an MSI (with associated EOI) unlike the traditional
guest-wide vector which is just magically asserted by Xen (and in the
KVM case by kvm_xen_has_interrupt() / kvm_cpu_get_extint()).
Now that the kernel is able to raise event channel events for itself,
being able to do so for Windows guests is also going to be useful.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-15-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Turns out this is a fast path for PV guests because they use it to
trigger the event channel upcall. So letting it bounce all the way up
to userspace is not great.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-14-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
If the guest has offloaded the timer virq, handle the following
hypercalls for programming the timer:
VCPUOP_set_singleshot_timer
VCPUOP_stop_singleshot_timer
set_timer_op(timestamp_ns)
The event channel corresponding to the timer virq is then used to inject
events once timer deadlines are met. For now we back the PV timer with
hrtimer.
[ dwmw2: Add save/restore, 32-bit compat mode, immediate delivery,
don't check timer in kvm_vcpu_has_event() ]
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-13-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
In order to intercept hypercalls such as VCPUOP_set_singleshot_timer, we
need to be aware of the Xen CPU numbering.
This looks a lot like the Hyper-V handling of vpidx, for obvious reasons.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-12-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Cooperative Linux guests after an IPI-many may yield vcpu if
any of the IPI'd vcpus were preempted (i.e. runstate is 'runnable'.)
Support SCHEDOP_yield for handling yield.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-11-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Userspace registers a sending @port to either deliver to an @eventfd
or directly back to a local event channel port.
After binding events the guest or host may wish to bind those
events to a particular vcpu. This is usually done for unbound
and and interdomain events. Update requests are handled via the
KVM_XEN_EVTCHN_UPDATE flag.
Unregistered ports are handled by the emulator.
Co-developed-by: Ankur Arora <ankur.a.arora@oracle.com>
Co-developed-By: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-10-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This adds a KVM_XEN_HVM_EVTCHN_SEND ioctl which allows direct injection
of events given an explicit { vcpu, port, priority } in precisely the
same form that those fields are given in the IRQ routing table.
Userspace is currently able to inject 2-level events purely by setting
the bits in the shared_info and vcpu_info, but FIFO event channels are
harder to deal with; we will need the kernel to take sole ownership of
delivery when we support those.
A patch advertising this feature with a new bit in the KVM_CAP_XEN_HVM
ioctl will be added in a subsequent patch.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-9-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Clean it up to return -errno on error consistently, while still being
compatible with the return conventions for kvm_arch_set_irq_inatomic()
and the kvm_set_irq() callback.
We use -ENOTCONN to indicate when the port is masked. No existing users
care, except that it's negative.
Also allow it to optimise the vCPU lookup. Unless we abuse the lapic
map, there is no quick lookup from APIC ID to a vCPU; the logic in
kvm_get_vcpu_by_id() will just iterate over all vCPUs till it finds
the one it wants. So do that just once and stash the result in the
struct kvm_xen_evtchn for next time.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-8-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|