summaryrefslogtreecommitdiff
path: root/arch/loongarch/kvm
AgeCommit message (Collapse)Author
13 daysLoongArch: KVM: Add tracepoints for CPUCFG and CSR emulation exitsYulong Han
This patch adds tracepoints to track KVM exits caused by CPUCFG and CSR emulation. Note that IOCSR emulation tracing is already covered by the generic trace_kvm_iocsr(). Reviewed-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Yulong Han <wheatfox17@icloud.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
13 daysLoongArch: KVM: Add stat information with kernel irqchipBibo Mao
Move stat information about kernel irqchip from VM to vCPU, since all vm exiting events should be vCPU relative. And also add entry with structure kvm_vcpu_stats_desc[], so that it can display with directory /sys/kernel/debug/kvm. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
13 daysLoongArch: KVM: Replace eiointc_enable_irq() with eiointc_update_irq()Bibo Mao
Function eiointc_enable_irq() checks mask value with char type, and call eiointc_update_irq() eventually. Function eiointc_update_irq() will update one single irq status directly. Here it can check mask value with unsigned long type and call function eiointc_update_irq(), that is simple and direct. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
13 daysLoongArch: KVM: Use generic function loongarch_eiointc_write()Bibo Mao
With all eiointc iocsr register write operation with 1/2/4/8 bytes size, generic function loongarch_eiointc_write() is used here. And function loongarch_eiointc_writeb(), loongarch_eiointc_writew(), loongarch_eiointc_writel() are removed. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
13 daysLoongArch: KVM: Use generic function loongarch_eiointc_read()Bibo Mao
Generic read function loongarch_eiointc_read() is used for 1/2/4/8 bytes read access. It reads 8 bytes from emulated software state and shift right from address offset. Also the similar with kvm_complete_iocsr_read(), destination register of IOCSRRD.{B/H/W} is sign extension from byte/half word/word. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
13 daysLoongArch: KVM: Use standard bitops API with eiointcBibo Mao
Standard bitops APIs such test_bit() is used here, rather than manually calculating the offset and mask. Also use non-atomic API __set_bit() and __clear_bit() rather than set_bit() and clear_bit(), since the global spinlock is held already. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
13 daysLoongArch: KVM: Remove never called default case statementBibo Mao
IOCSR instruction supports 1/2/4/8 bytes access, len must be 1/2/4/8 bytes from iocsr exit emulation function kvm_emu_iocsr(), remove the default case in switch case statements. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
13 daysLoongArch: KVM: Remove unused parameter lenBibo Mao
Parameter len is unused in some functions with eiointc emulation driver, remove it here. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
13 daysLoongArch: KVM: Remove unnecessary local variableBibo Mao
Local variable device1 can be replaced with existing variable device, it makes code concise. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
13 daysLoongArch: KVM: Simplify kvm_deliver_intr()Yury Norov (NVIDIA)
The function opencodes for_each_set_bit() macro, which makes it bulky. Using the proper API makes all the housekeeping code going away. Reviewed-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
13 daysLoongArch: KVM: Rework kvm_send_pv_ipi()Yury Norov (NVIDIA)
The function in fact traverses a "bitmap" stored in GPR regs A1 and A2, but does it in a non-obvious way by creating a single-word bitmap twice. This patch switches the function to create a single 2-word bitmap, and also employs for_each_set_bit() macro, as it helps to drop most of the housekeeping code. While there, convert the function to return void to not confuse readers with unchecked result. Reviewed-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-06-27LoongArch: KVM: Disable updating of "num_cpu" and "feature"Bibo Mao
Property "num_cpu" and "feature" are read-only once eiointc is created, which are set with KVM_DEV_LOONGARCH_EXTIOI_GRP_CTRL attr group before device creation. Attr group KVM_DEV_LOONGARCH_EXTIOI_GRP_SW_STATUS is to update register and software state for migration and reset usage, property "num_cpu" and "feature" can not be update again if it is created already. Here discard write operation with property "num_cpu" and "feature" in attr group KVM_DEV_LOONGARCH_EXTIOI_GRP_CTRL. Cc: stable@vger.kernel.org Fixes: 1ad7efa552fd ("LoongArch: KVM: Add EIOINTC user mode read and write functions") Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-06-27LoongArch: KVM: Check validity of "num_cpu" from user spaceBibo Mao
The maximum supported cpu number is EIOINTC_ROUTE_MAX_VCPUS about irqchip EIOINTC, here add validation about cpu number to avoid array pointer overflow. Cc: stable@vger.kernel.org Fixes: 1ad7efa552fd ("LoongArch: KVM: Add EIOINTC user mode read and write functions") Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-06-27LoongArch: KVM: Check interrupt route from physical CPUBibo Mao
With EIOINTC interrupt controller, physical CPU ID is set for irq route. However the function kvm_get_vcpu() is used to get destination vCPU when delivering irq. With API kvm_get_vcpu(), the logical CPU ID is used. With API kvm_get_vcpu_by_cpuid(), vCPU ID can be searched from physical CPU ID. Cc: stable@vger.kernel.org Fixes: 3956a52bc05b ("LoongArch: KVM: Add EIOINTC read and write functions") Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-06-27LoongArch: KVM: Fix interrupt route update with EIOINTCBibo Mao
With function eiointc_update_sw_coremap(), there is forced assignment like val = *(u64 *)pvalue. Parameter pvalue may be pointer to char type or others, there is problem with forced assignment with u64 type. Here the detailed value is passed rather address pointer. Cc: stable@vger.kernel.org Fixes: 3956a52bc05b ("LoongArch: KVM: Add EIOINTC read and write functions") Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-06-27LoongArch: KVM: Add address alignment check for IOCSR emulationBibo Mao
IOCSR instruction supports 1/2/4/8 bytes access, the address should be naturally aligned with its access size. Here address alignment check is added in the EIOINTC kernel emulation. Cc: stable@vger.kernel.org Fixes: 3956a52bc05b ("LoongArch: KVM: Add EIOINTC read and write functions") Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-06-26LoongArch: KVM: Avoid overflow with array indexBibo Mao
The variable index is modified and reused as array index when modify register EIOINTC_ENABLE. There will be array index overflow problem. Cc: stable@vger.kernel.org Fixes: 3956a52bc05b ("LoongArch: KVM: Add EIOINTC read and write functions") Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-05-20LoongArch: KVM: Do not flush tlb if HW PTW supportedBibo Mao
With HW PTW supported, invalid TLB is not added when page fault happens. But for EXCCODE_TLBM exception, stale TLB may exist because of the last read access. Thus TLB flush operation is necessary for the EXCCODE_TLBM exception, but not necessary for other tyeps of page fault exceptions. With SW PTW supported, invalid TLB is added in the TLB refill exception. TLB flush operation is necessary for all types of page fault exceptions. Here remove unnecessary TLB flush opereation with HW PTW supported. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-05-20LoongArch: KVM: Add ecode parameter for exception handlersBibo Mao
For some KVM exception types, they share the same exception handler. To show the difference, ecode (exception code) is added as a new parameter in exception handlers. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-04-26Merge tag 'loongarch-fixes-6.15-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch fixes from Huacai Chen: "Add a missing Kconfig option, fix some bugs in exception handlers, memory management and KVM" * tag 'loongarch-fixes-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: LoongArch: KVM: Fix PMU pass-through issue if VM exits to host finally LoongArch: KVM: Fully clear some CSRs when VM reboot LoongArch: KVM: Fix multiple typos of KVM code LoongArch: Return NULL from huge_pte_offset() for invalid PMD LoongArch: Remove a bogus reference to ZONE_DMA LoongArch: Handle fp, lsx, lasx and lbt assembly symbols LoongArch: Make do_xyz() exception handlers more robust LoongArch: Make regs_irqs_disabled() more clear LoongArch: Select ARCH_USE_MEMTEST
2025-04-26LoongArch: KVM: Fix PMU pass-through issue if VM exits to host finallyBibo Mao
In function kvm_pre_enter_guest(), it prepares to enter guest and check whether there are pending signals or events. And it will not enter guest if there are, PMU pass-through preparation for guest should be cancelled and host should own PMU hardware. Cc: stable@vger.kernel.org Fixes: f4e40ea9f78f ("LoongArch: KVM: Add PMU support for guest") Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-04-26LoongArch: KVM: Fully clear some CSRs when VM rebootBibo Mao
Some registers such as LOONGARCH_CSR_ESTAT and LOONGARCH_CSR_GINTC are partly cleared with function _kvm_setcsr(). This comes from the hardware specification, some bits are read only in VM mode, and however they can be written in host mode. So they are partly cleared in VM mode, and can be fully cleared in host mode. These read only bits show pending interrupt or exception status. When VM reset, the read-only bits should be cleared, otherwise vCPU will receive unknown interrupts in boot stage. Here registers LOONGARCH_CSR_ESTAT/LOONGARCH_CSR_GINTC are fully cleared in ioctl KVM_REG_LOONGARCH_VCPU_RESET vCPU reset path. Cc: stable@vger.kernel.org Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-04-26LoongArch: KVM: Fix multiple typos of KVM codeYulong Han
Fix multiple typos inside arch/loongarch/kvm. Cc: stable@vger.kernel.org Reviewed-by: Yuli Wang <wangyuli@uniontech.com> Reviewed-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Yulong Han <wheatfox17@icloud.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-04-23Fix mis-uses of 'cc-option' for warning disablementLinus Torvalds
This was triggered by one of my mis-uses causing odd build warnings on sparc in linux-next, but while figuring out why the "obviously correct" use of cc-option caused such odd breakage, I found eight other cases of the same thing in the tree. The root cause is that 'cc-option' doesn't work for checking negative warning options (ie things like '-Wno-stringop-overflow') because gcc will silently accept options it doesn't recognize, and so 'cc-option' ends up thinking they are perfectly fine. And it all works, until you have a situation where _another_ warning is emitted. At that point the compiler will go "Hmm, maybe the user intended to disable this warning but used that wrong option that I didn't recognize", and generate a warning for the unrecognized negative option. Which explains why we have several cases of this in the tree: the 'cc-option' test really doesn't work for this situation, but most of the time it simply doesn't matter that ity doesn't work. The reason my recently added case caused problems on sparc was pointed out by Thomas Weißschuh: the sparc build had a previous explicit warning that then triggered the new one. I think the best fix for this would be to make 'cc-option' a bit smarter about this sitation, possibly by adding an intentional warning to the test case that then triggers the unrecognized option warning reliably. But the short-term fix is to replace 'cc-option' with an existing helper designed for this exact case: 'cc-disable-warning', which picks the negative warning but uses the positive form for testing the compiler support. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lore.kernel.org/all/20250422204718.0b4e3f81@canb.auug.org.au/ Explained-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-03-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Nested virtualization support for VGICv3, giving the nested hypervisor control of the VGIC hardware when running an L2 VM - Removal of 'late' nested virtualization feature register masking, making the supported feature set directly visible to userspace - Support for emulating FEAT_PMUv3 on Apple silicon, taking advantage of an IMPLEMENTATION DEFINED trap that covers all PMUv3 registers - Paravirtual interface for discovering the set of CPU implementations where a VM may run, addressing a longstanding issue of guest CPU errata awareness in big-little systems and cross-implementation VM migration - Userspace control of the registers responsible for identifying a particular CPU implementation (MIDR_EL1, REVIDR_EL1, AIDR_EL1), allowing VMs to be migrated cross-implementation - pKVM updates, including support for tracking stage-2 page table allocations in the protected hypervisor in the 'SecPageTable' stat - Fixes to vPMU, ensuring that userspace updates to the vPMU after KVM_RUN are reflected into the backing perf events LoongArch: - Remove unnecessary header include path - Assume constant PGD during VM context switch - Add perf events support for guest VM RISC-V: - Disable the kernel perf counter during configure - KVM selftests improvements for PMU - Fix warning at the time of KVM module removal x86: - Add support for aging of SPTEs without holding mmu_lock. Not taking mmu_lock allows multiple aging actions to run in parallel, and more importantly avoids stalling vCPUs. This includes an implementation of per-rmap-entry locking; aging the gfn is done with only a per-rmap single-bin spinlock taken, whereas locking an rmap for write requires taking both the per-rmap spinlock and the mmu_lock. Note that this decreases slightly the accuracy of accessed-page information, because changes to the SPTE outside aging might not use atomic operations even if they could race against a clear of the Accessed bit. This is deliberate because KVM and mm/ tolerate false positives/negatives for accessed information, and testing has shown that reducing the latency of aging is far more beneficial to overall system performance than providing "perfect" young/old information. - Defer runtime CPUID updates until KVM emulates a CPUID instruction, to coalesce updates when multiple pieces of vCPU state are changing, e.g. as part of a nested transition - Fix a variety of nested emulation bugs, and add VMX support for synthesizing nested VM-Exit on interception (instead of injecting #UD into L2) - Drop "support" for async page faults for protected guests that do not set SEND_ALWAYS (i.e. that only want async page faults at CPL3) - Bring a bit of sanity to x86's VM teardown code, which has accumulated a lot of cruft over the years. Particularly, destroy vCPUs before the MMU, despite the latter being a VM-wide operation - Add common secure TSC infrastructure for use within SNP and in the future TDX - Block KVM_CAP_SYNC_REGS if guest state is protected. It does not make sense to use the capability if the relevant registers are not available for reading or writing - Don't take kvm->lock when iterating over vCPUs in the suspend notifier to fix a largely theoretical deadlock - Use the vCPU's actual Xen PV clock information when starting the Xen timer, as the cached state in arch.hv_clock can be stale/bogus - Fix a bug where KVM could bleed PVCLOCK_GUEST_STOPPED across different PV clocks; restrict PVCLOCK_GUEST_STOPPED to kvmclock, as KVM's suspend notifier only accounts for kvmclock, and there's no evidence that the flag is actually supported by Xen guests - Clean up the per-vCPU "cache" of its reference pvclock, and instead only track the vCPU's TSC scaling (multipler+shift) metadata (which is moderately expensive to compute, and rarely changes for modern setups) - Don't write to the Xen hypercall page on MSR writes that are initiated by the host (userspace or KVM) to fix a class of bugs where KVM can write to guest memory at unexpected times, e.g. during vCPU creation if userspace has set the Xen hypercall MSR index to collide with an MSR that KVM emulates - Restrict the Xen hypercall MSR index to the unofficial synthetic range to reduce the set of possible collisions with MSRs that are emulated by KVM (collisions can still happen as KVM emulates Hyper-V MSRs, which also reside in the synthetic range) - Clean up and optimize KVM's handling of Xen MSR writes and xen_hvm_config - Update Xen TSC leaves during CPUID emulation instead of modifying the CPUID entries when updating PV clocks; there is no guarantee PV clocks will be updated between TSC frequency changes and CPUID emulation, and guest reads of the TSC leaves should be rare, i.e. are not a hot path x86 (Intel): - Fix a bug where KVM unnecessarily reads XFD_ERR from hardware and thus modifies the vCPU's XFD_ERR on a #NM due to CR0.TS=1 - Pass XFD_ERR as the payload when injecting #NM, as a preparatory step for upcoming FRED virtualization support - Decouple the EPT entry RWX protection bit macros from the EPT Violation bits, both as a general cleanup and in anticipation of adding support for emulating Mode-Based Execution Control (MBEC) - Reject KVM_RUN if userspace manages to gain control and stuff invalid guest state while KVM is in the middle of emulating nested VM-Enter - Add a macro to handle KVM's sanity checks on entry/exit VMCS control pairs in anticipation of adding sanity checks for secondary exit controls (the primary field is out of bits) x86 (AMD): - Ensure the PSP driver is initialized when both the PSP and KVM modules are built-in (the initcall framework doesn't handle dependencies) - Use long-term pins when registering encrypted memory regions, so that the pages are migrated out of MIGRATE_CMA/ZONE_MOVABLE and don't lead to excessive fragmentation - Add macros and helpers for setting GHCB return/error codes - Add support for Idle HLT interception, which elides interception if the vCPU has a pending, unmasked virtual IRQ when HLT is executed - Fix a bug in INVPCID emulation where KVM fails to check for a non-canonical address - Don't attempt VMRUN for SEV-ES+ guests if the vCPU's VMSA is invalid, e.g. because the vCPU was "destroyed" via SNP's AP Creation hypercall - Reject SNP AP Creation if the requested SEV features for the vCPU don't match the VM's configured set of features Selftests: - Fix again the Intel PMU counters test; add a data load and do CLFLUSH{OPT} on the data instead of executing code. The theory is that modern Intel CPUs have learned new code prefetching tricks that bypass the PMU counters - Fix a flaw in the Intel PMU counters test where it asserts that an event is counting correctly without actually knowing what the event counts on the underlying hardware - Fix a variety of flaws, bugs, and false failures/passes dirty_log_test, and improve its coverage by collecting all dirty entries on each iteration - Fix a few minor bugs related to handling of stats FDs - Add infrastructure to make vCPU and VM stats FDs available to tests by default (open the FDs during VM/vCPU creation) - Relax an assertion on the number of HLT exits in the xAPIC IPI test when running on a CPU that supports AMD's Idle HLT (which elides interception of HLT if a virtual IRQ is pending and unmasked)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (216 commits) RISC-V: KVM: Optimize comments in kvm_riscv_vcpu_isa_disable_allowed RISC-V: KVM: Teardown riscv specific bits after kvm_exit LoongArch: KVM: Register perf callbacks for guest LoongArch: KVM: Implement arch-specific functions for guest perf LoongArch: KVM: Add stub for kvm_arch_vcpu_preempted_in_kernel() LoongArch: KVM: Remove PGD saving during VM context switch LoongArch: KVM: Remove unnecessary header include path KVM: arm64: Tear down vGIC on failed vCPU creation KVM: arm64: PMU: Reload when resetting KVM: arm64: PMU: Reload when user modifies registers KVM: arm64: PMU: Fix SET_ONE_REG for vPMC regs KVM: arm64: PMU: Assume PMU presence in pmu-emul.c KVM: arm64: PMU: Set raw values from user to PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu KVM: arm64: Factor out pKVM hyp vcpu creation to separate function KVM: arm64: Initialize HCRX_EL2 traps in pKVM KVM: arm64: Factor out setting HCRX_EL2 traps into separate function KVM: x86: block KVM_CAP_SYNC_REGS if guest state is protected KVM: x86: Add infrastructure for secure TSC KVM: x86: Push down setting vcpu.arch.user_set_tsc ...
2025-03-25Merge tag 'timers-cleanups-2025-03-23' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer cleanups from Thomas Gleixner: "A treewide hrtimer timer cleanup hrtimers are initialized with hrtimer_init() and a subsequent store to the callback pointer. This turned out to be suboptimal for the upcoming Rust integration and is obviously a silly implementation to begin with. This cleanup replaces the hrtimer_init(T); T->function = cb; sequence with hrtimer_setup(T, cb); The conversion was done with Coccinelle and a few manual fixups. Once the conversion has completely landed in mainline, hrtimer_init() will be removed and the hrtimer::function becomes a private member" * tag 'timers-cleanups-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (100 commits) wifi: rt2x00: Switch to use hrtimer_update_function() io_uring: Use helper function hrtimer_update_function() serial: xilinx_uartps: Use helper function hrtimer_update_function() ASoC: fsl: imx-pcm-fiq: Switch to use hrtimer_setup() RDMA: Switch to use hrtimer_setup() virtio: mem: Switch to use hrtimer_setup() drm/vmwgfx: Switch to use hrtimer_setup() drm/xe/oa: Switch to use hrtimer_setup() drm/vkms: Switch to use hrtimer_setup() drm/msm: Switch to use hrtimer_setup() drm/i915/request: Switch to use hrtimer_setup() drm/i915/uncore: Switch to use hrtimer_setup() drm/i915/pmu: Switch to use hrtimer_setup() drm/i915/perf: Switch to use hrtimer_setup() drm/i915/gvt: Switch to use hrtimer_setup() drm/i915/huc: Switch to use hrtimer_setup() drm/amdgpu: Switch to use hrtimer_setup() stm class: heartbeat: Switch to use hrtimer_setup() i2c: Switch to use hrtimer_setup() iio: Switch to use hrtimer_setup() ...
2025-03-18LoongArch: KVM: Register perf callbacks for guestBibo Mao
Add selection for GUEST_PERF_EVENTS if KVM is enabled, also add perf callback register when KVM module is loading. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-03-18LoongArch: KVM: Implement arch-specific functions for guest perfBibo Mao
Three architecture specific functions are added for the guest perf feature, they are kvm_arch_vcpu_in_kernel(), kvm_arch_vcpu_get_ip() and kvm_arch_pmi_in_guest(). Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-03-18LoongArch: KVM: Add stub for kvm_arch_vcpu_preempted_in_kernel()Bibo Mao
Pause-Loop Exiting is not supported by LoongArch hardware, nor is pv spinlock feature. So function kvm_vcpu_on_spin() is not used. Function kvm_arch_vcpu_preempted_in_kernel() is defined as a stub function here since it is only called by unused function kvm_vcpu_on_spin(). Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-03-18LoongArch: KVM: Remove PGD saving during VM context switchBibo Mao
PGD table for primary mmu keeps unchanged once VM is created, it is not necessary to save PGD table pointer during VM context switch. And it can be acquired when VM is created. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-03-18LoongArch: KVM: Remove unnecessary header include pathMasahiro Yamada
arch/loongarch/kvm/ includes local headers with the double-quote form (#include "..."). Also, TRACE_INCLUDE_PATH in arch/loongarch/kvm/trace.h is relative to include/trace/. Hence, the local header search path is unneeded. Reviewed-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-03-08LoongArch: KVM: Fix GPA size issue about VMBibo Mao
Physical address space is 48 bit on Loongson-3A5000 physical machine, however it is 47 bit for VM on Loongson-3A5000 system. Size of physical address space of VM is the same with the size of virtual user space (a half) of physical machine. Variable cpu_vabits represents user address space, kernel address space is not included (user space and kernel space are both a half of total). Here cpu_vabits, rather than cpu_vabits - 1, is to represent the size of guest physical address space. Also there is strict checking about page fault GPA address, inject error if it is larger than maximum GPA address of VM. Cc: stable@vger.kernel.org Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-03-08LoongArch: KVM: Reload guest CSR registers after sleepBibo Mao
On host, the HW guest CSR registers are lost after suspend and resume operation. Since last_vcpu of boot CPU still records latest vCPU pointer so that the guest CSR register skips to reload when boot CPU resumes and vCPU is scheduled. Here last_vcpu is cleared so that guest CSR registers will reload from scheduled vCPU context after suspend and resume. Cc: stable@vger.kernel.org Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-03-08LoongArch: KVM: Add interrupt checking for AVECBibo Mao
There is a newly added macro INT_AVEC with CSR ESTAT register, which is bit 14 used for LoongArch AVEC support. AVEC interrupt status bit 14 is supported with macro CSR_ESTAT_IS, so here replace the hard-coded value 0x1fff with macro CSR_ESTAT_IS so that the AVEC interrupt status is also supported by KVM. Cc: stable@vger.kernel.org Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-02-18LoongArch: KVM: Switch to use hrtimer_setup()Nam Cao
hrtimer_setup() takes the callback function pointer as argument and initializes the timer completely. Replace hrtimer_init() and the open coded initialization of hrtimer::function with the new setup mechanism. Patch was created by using Coccinelle. Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/a5b1b53813a15a73afdfff6fbb4c9064fa582be1.1738746821.git.namcao@linutronix.de
2025-02-13LoongArch: KVM: Set host with kernel mode when switch to VM modeBibo Mao
PRMD register is only meaningful on the beginning stage of exception entry, and it is overwritten with nested irq or exception. When CPU runs in VM mode, interrupt need be enabled on host. And the mode for host had better be kernel mode rather than random or user mode. When VM is running, the running mode with top command comes from CRMD register, and running mode should be kernel mode since kernel function is executing with perf command. It needs be consistent with both top and perf command. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-02-13LoongArch: KVM: Remove duplicated cache attribute settingBibo Mao
Cache attribute comes from GPA->HPA secondary mmu page table and is configured when kvm is enabled. It is the same for all VMs, so remove duplicated cache attribute setting on vCPU context switch. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-02-13LoongArch: KVM: Fix typo issue about GCFG feature detectionBibo Mao
This is typo issue and misusage about GCFG feature macro. The code is wrong, only that it does not cause obvious problem since GCFG is set again on vCPU context switch. Fixes: 0d0df3c99d4f ("LoongArch: KVM: Implement kvm hardware enable, disable interface") Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-01-13LoongArch: KVM: Add hypercall service support for usermode VMMBibo Mao
Some VMMs provides special hypercall service in usermode, KVM should not handle the usermode hypercall service, thus pass it to usermode, let the usermode VMM handle it. Here a new code KVM_HCALL_CODE_USER_SERVICE is added for the user-mode hypercall service, KVM lets all six registers visible to usermode VMM. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-01-13LoongArch: KVM: Clear LLBCTL if secondary mmu mapping is changedBibo Mao
LLBCTL is a separated guest CSR register from host, host exception ERET instruction will clear the host LLBCTL CSR register, and guest exception will clear the guest LLBCTL CSR register. VCPU0 atomic64_fetch_add_unless VCPU1 atomic64_fetch_add_unless ll.d %[p], %[c] beq %[p], %[u], 1f Here secondary mmu mapping is changed, host hpa page is replaced with a new page. And VCPU1 will execute atomic instruction on the new page. ll.d %[p], %[c] beq %[p], %[u], 1f add.d %[rc], %[p], %[a] sc.d %[rc], %[c] add.d %[rc], %[p], %[a] sc.d %[rc], %[c] LLBCTL is set on VCPU0 and it represents the memory is not modified by other VCPUs, sc.d will modify the memory directly. So clear WCLLB of the guest LLBCTL register when mapping is the changed. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-12-03LoongArch: KVM: Protect kvm_io_bus_{read,write}() with SRCUHuacai Chen
When we enable lockdep we get such a warning: ============================= WARNING: suspicious RCU usage 6.12.0-rc7+ #1891 Tainted: G W ----------------------------- arch/loongarch/kvm/../../../virt/kvm/kvm_main.c:5945 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by qemu-system-loo/948: #0: 90000001184a00a8 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0xf4/0xe20 [kvm] stack backtrace: CPU: 2 UID: 0 PID: 948 Comm: qemu-system-loo Tainted: G W 6.12.0-rc7+ #1891 Tainted: [W]=WARN Hardware name: Loongson Loongson-3A5000-7A1000-1w-CRB/Loongson-LS3A5000-7A1000-1w-CRB, BIOS vUDK2018-LoongArch-V2.0.0-prebeta9 10/21/2022 Stack : 0000000000000089 9000000005a0db9c 90000000071519c8 900000012c578000 900000012c57b940 0000000000000000 900000012c57b948 9000000007e53788 900000000815bcc8 900000000815bcc0 900000012c57b7b0 0000000000000001 0000000000000001 4b031894b9d6b725 0000000005dec000 9000000100427b00 00000000000003d2 0000000000000001 000000000000002d 0000000000000003 0000000000000030 00000000000003b4 0000000005dec000 0000000000000000 900000000806d000 9000000007e53788 00000000000000b4 0000000000000004 0000000000000004 0000000000000000 0000000000000000 9000000107baf600 9000000008916000 9000000007e53788 9000000005924778 000000001fe001e5 00000000000000b0 0000000000000007 0000000000000000 0000000000071c1d ... Call Trace: [<9000000005924778>] show_stack+0x38/0x180 [<90000000071519c4>] dump_stack_lvl+0x94/0xe4 [<90000000059eb754>] lockdep_rcu_suspicious+0x194/0x240 [<ffff80000221f47c>] kvm_io_bus_read+0x19c/0x1e0 [kvm] [<ffff800002225118>] kvm_emu_mmio_read+0xd8/0x440 [kvm] [<ffff8000022254bc>] kvm_handle_read_fault+0x3c/0xe0 [kvm] [<ffff80000222b3c8>] kvm_handle_exit+0x228/0x480 [kvm] Fix it by protecting kvm_io_bus_{read,write}() with SRCU. Cc: stable@vger.kernel.org Reviewed-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-12-02LoongArch: KVM: Protect kvm_check_requests() with SRCUHuacai Chen
When we enable lockdep we get such a warning: ============================= WARNING: suspicious RCU usage 6.12.0-rc7+ #1891 Tainted: G W ----------------------------- include/linux/kvm_host.h:1043 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by qemu-system-loo/948: #0: 90000001184a00a8 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0xf4/0xe20 [kvm] stack backtrace: CPU: 0 UID: 0 PID: 948 Comm: qemu-system-loo Tainted: G W 6.12.0-rc7+ #1891 Tainted: [W]=WARN Hardware name: Loongson Loongson-3A5000-7A1000-1w-CRB/Loongson-LS3A5000-7A1000-1w-CRB, BIOS vUDK2018-LoongArch-V2.0.0-prebeta9 10/21/2022 Stack : 0000000000000089 9000000005a0db9c 90000000071519c8 900000012c578000 900000012c57b920 0000000000000000 900000012c57b928 9000000007e53788 900000000815bcc8 900000000815bcc0 900000012c57b790 0000000000000001 0000000000000001 4b031894b9d6b725 0000000004dec000 90000001003299c0 0000000000000414 0000000000000001 000000000000002d 0000000000000003 0000000000000030 00000000000003b4 0000000004dec000 90000001184a0000 900000000806d000 9000000007e53788 00000000000000b4 0000000000000004 0000000000000004 0000000000000000 0000000000000000 9000000107baf600 9000000008916000 9000000007e53788 9000000005924778 0000000010000044 00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d ... Call Trace: [<9000000005924778>] show_stack+0x38/0x180 [<90000000071519c4>] dump_stack_lvl+0x94/0xe4 [<90000000059eb754>] lockdep_rcu_suspicious+0x194/0x240 [<ffff8000022143bc>] kvm_gfn_to_hva_cache_init+0xfc/0x120 [kvm] [<ffff80000222ade4>] kvm_pre_enter_guest+0x3a4/0x520 [kvm] [<ffff80000222b3dc>] kvm_handle_exit+0x23c/0x480 [kvm] Fix it by protecting kvm_check_requests() with SRCU. Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-14Merge tag 'loongarch-kvm-6.13' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD LoongArch KVM changes for v6.13 1. Add iocsr and mmio bus simulation in kernel. 2. Add in-kernel interrupt controller emulation. 3. Add virt extension support for eiointc irqchip.
2024-11-13LoongArch: KVM: Add irqfd supportXianglai Li
Enable the KVM_IRQ_ROUTING/KVM_IRQCHIP/KVM_MSI configuration items, add the KVM_CAP_IRQCHIP capability, and implement the query interface of the in-kernel irqchip. Signed-off-by: Xianglai Li <lixianglai@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-13LoongArch: KVM: Add PCHPIC user mode read and write functionsXianglai Li
Implement the communication interface between the user mode programs and the kernel in PCHPIC interrupt control simulation, which is used to obtain or send the simulation data of the interrupt controller in the user mode process, and is also used in VM migration or VM saving and restoration. Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-13LoongArch: KVM: Add PCHPIC read and write functionsXianglai Li
Add implementation of IPI interrupt controller's address space read and write function simulation. Implement interrupt injection interface under loongarch. Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-13LoongArch: KVM: Add PCHPIC device supportXianglai Li
Add device model for PCHPIC interrupt controller, implemente basic create & destroy interface, and register device model to kvm device table. Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-13LoongArch: KVM: Add EIOINTC user mode read and write functionsXianglai Li
Implement the communication interface between the user mode programs and the kernel in EIOINTC interrupt controller simulation, which is used to obtain or send the simulation data of the interrupt controller in the user mode process, and is also used in VM migration or VM saving and restoration. Signed-off-by: Xianglai Li <lixianglai@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-13LoongArch: KVM: Add EIOINTC read and write functionsXianglai Li
Add implementation of EIOINTC interrupt controller's address space read and write function simulation. Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-13LoongArch: KVM: Add EIOINTC device supportXianglai Li
Add device model for EIOINTC interrupt controller, implement basic create & destroy interfaces, and register device model to kvm device table. Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>