summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-12-08KVM: Force PPC to define its own rcuwait objectSean Christopherson
Do not define/reference kvm_vcpu.wait if __KVM_HAVE_ARCH_WQP is true, and instead force the architecture (PPC) to define its own rcuwait object. Allowing common KVM to directly access vcpu->wait without a guard makes it all too easy to introduce potential bugs, e.g. kvm_vcpu_block(), kvm_vcpu_on_spin(), and async_pf_execute() all operate on vcpu->wait, not the result of kvm_arch_vcpu_get_wait(), and so may do the wrong thing for PPC. Due to PPC's shenanigans with respect to callbacks and waits (it switches to the virtual core's wait object at KVM_RUN!?!?), it's not clear whether or not this fixes any bugs. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211009021236.4122790-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: s390: Ensure kvm_arch_no_poll() is read once when blocking vCPUSean Christopherson
Wrap s390's halt_poll_max_steal with READ_ONCE and snapshot the result of kvm_arch_no_poll() in kvm_vcpu_block() to avoid a mostly-theoretical, largely benign bug on s390 where the result of kvm_arch_no_poll() could change due to userspace modifying halt_poll_max_steal while the vCPU is blocking. The bug is largely benign as it will either cause KVM to skip updating halt-polling times (no_poll toggles false=>true) or to update halt-polling times with a slightly flawed block_ns. Note, READ_ONCE is unnecessary in the current code, add it in case the arch hook is ever inlined, and to provide a hint that userspace can change the param at will. Fixes: 8b905d28ee17 ("KVM: s390: provide kvm_arch_no_poll function") Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211009021236.4122790-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: SVM: Ensure target pCPU is read once when signalling AVIC doorbellSean Christopherson
Ensure vcpu->cpu is read once when signalling the AVIC doorbell. If the compiler rereads the field and the vCPU is migrated between the check and writing the doorbell, KVM would signal the wrong physical CPU. Functionally, signalling the wrong CPU in this case is not an issue as task migration means the vCPU has exited and will pick up any pending interrupts on the next VMRUN. Add the READ_ONCE() purely to clean up the code. Opportunistically add a comment explaining the task migration behavior, and rename cpuid=>cpu to avoid conflating the CPU number with KVM's more common usage of CPUID. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211009021236.4122790-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: VMX: Don't unblock vCPU w/ Posted IRQ if IRQs are disabled in guestPaolo Bonzini
Don't configure the wakeup handler when a vCPU is blocking with IRQs disabled, in which case any IRQ, posted or otherwise, should not be recognized and thus should not wake the vCPU. Fixes: bf9f6ac8d749 ("KVM: Update Posted-Interrupts Descriptor when vCPU is blocked") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211009021236.4122790-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: x86: change TLB flush indicator to boolVihas Mak
change 0 to false and 1 to true to fix following cocci warnings: arch/x86/kvm/mmu/mmu.c:1485:9-10: WARNING: return of 0/1 in function 'kvm_set_pte_rmapp' with return type bool arch/x86/kvm/mmu/mmu.c:1636:10-11: WARNING: return of 0/1 in function 'kvm_test_age_rmapp' with return type bool Signed-off-by: Vihas Mak <makvihas@gmail.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Message-Id: <20211114164312.GA28736@makvihas> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: Avoid atomic operations when kicking the running vCPUPaolo Bonzini
If we do have the vcpu mutex, as is the case if kvm_running_vcpu is set to the target vcpu of the kick, changes to vcpu->mode do not need atomic operations; cmpxchg is only needed _outside_ the mutex to ensure that the IN_GUEST_MODE->EXITING_GUEST_MODE change does not race with the vcpu thread going OUTSIDE_GUEST_MODE. Use this to optimize the case of a vCPU sending an interrupt to itself. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: x86/MMU: Simplify flow of vmx_get_mt_maskBen Gardon
Remove the gotos from vmx_get_mt_mask. It's easier to build the whole memory type at once, than it is to combine separate cacheability and ipat fields. No functional change intended. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20211115234603.2908381-12-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: x86/mmu: Propagate memslot const qualifierBen Gardon
In preparation for implementing in-place hugepage promotion, various functions will need to be called from zap_collapsible_spte_range, which has the const qualifier on its memslot argument. Propagate the const qualifier to the various functions which will be needed. This just serves to simplify the following patch. No functional change intended. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20211115234603.2908381-11-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: x86/mmu: Remove need for a vcpu from mmu_try_to_unsync_pagesBen Gardon
The vCPU argument to mmu_try_to_unsync_pages is now only used to get a pointer to the associated struct kvm, so pass in the kvm pointer from the beginning to remove the need for a vCPU when calling the function. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20211115234603.2908381-7-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: x86/mmu: Remove need for a vcpu from kvm_slot_page_track_is_activeBen Gardon
kvm_slot_page_track_is_active only uses its vCPU argument to get a pointer to the assoicated struct kvm, so just pass in the struct KVM to remove the need for a vCPU pointer. No functional change intended. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20211115234603.2908381-6-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: x86/mmu: Use shadow page role to detect PML-unfriendly pages for L2Sean Christopherson
Rework make_spte() to query the shadow page's role, specifically whether or not it's a guest_mode page, a.k.a. a page for L2, when determining if the SPTE is compatible with PML. This eliminates a dependency on @vcpu, with a future goal of being able to create SPTEs without a specific vCPU. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: nSVM: introduce struct vmcb_ctrl_area_cachedEmanuele Giuseppe Esposito
This structure will replace vmcb_control_area in svm_nested_state, providing only the fields that are actually used by the nested state. This avoids having and copying around uninitialized fields. The cost of this, however, is that all functions (in this case vmcb_is_intercept) expect the old structure, so they need to be duplicated. In addition, in svm_get_nested_state() user space expects a vmcb_control_area struct, so we need to copy back all fields in a temporary structure before copying it to userspace. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211103140527.752797-7-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: nSVM: split out __nested_vmcb_check_controlsPaolo Bonzini
Remove the struct vmcb_control_area parameter from nested_vmcb_check_controls, for consistency with the functions that operate on the save area. This way, VMRUN uses the version without underscores for both areas, while KVM_SET_NESTED_STATE uses the version with underscores. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: nSVM: use svm->nested.save to load vmcb12 registers and avoid TOC/TOU racesEmanuele Giuseppe Esposito
Use the already checked svm->nested.save cached fields (EFER, CR0, CR4, ...) instead of vmcb12's in nested_vmcb02_prepare_save(). This prevents from creating TOC/TOU races, since the guest could modify the vmcb12 fields. This also avoids the need of force-setting EFER_SVME in nested_vmcb02_prepare_save. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211103140527.752797-6-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: nSVM: use vmcb_save_area_cached in nested_vmcb_valid_sregs()Emanuele Giuseppe Esposito
Now that struct vmcb_save_area_cached contains the required vmcb fields values (done in nested_load_save_from_vmcb12()), check them to see if they are correct in nested_vmcb_valid_sregs(). While at it, rename nested_vmcb_valid_sregs in nested_vmcb_check_save. __nested_vmcb_check_save takes the additional @save parameter, so it is helpful when we want to check a non-svm save state, like in svm_set_nested_state. The reason for that is that save is the L1 state, not L2, so we check it without moving it to svm->nested.save. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20211103140527.752797-5-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: nSVM: rename nested_load_control_from_vmcb12 in ↵Emanuele Giuseppe Esposito
nested_copy_vmcb_control_to_cache Following the same naming convention of the previous patch, rename nested_load_control_from_vmcb12. In addition, inline copy_vmcb_control_area as it is only called by this function. __nested_copy_vmcb_control_to_cache() works with vmcb_control_area parameters and it will be useful in next patches, when we use local variables instead of svm cached state. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20211103140527.752797-4-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: nSVM: introduce svm->nested.save to cache save area before checksEmanuele Giuseppe Esposito
This is useful in the next patch, to keep a saved copy of vmcb12 registers and pass it around more easily. Instead of blindly copying everything, we just copy EFER, CR0, CR3, CR4, DR6 and DR7 which are needed by the VMRUN checks. If more fields will need to be checked, it will be quite obvious to see that they must be added in struct vmcb_save_area_cached and in nested_copy_vmcb_save_to_cache(). __nested_copy_vmcb_save_to_cache() takes a vmcb_save_area_cached parameter, which is useful in order to save the state to a local variable. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20211103140527.752797-3-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: nSVM: move nested_vmcb_check_cr3_cr4 logic in nested_vmcb_valid_sregsEmanuele Giuseppe Esposito
Inline nested_vmcb_check_cr3_cr4 as it is not called by anyone else. Doing so simplifies next patches. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211103140527.752797-2-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: Dynamically allocate "new" memslots from the get-goSean Christopherson
Allocate the "new" memslot for !DELETE memslot updates straight away instead of filling an intermediate on-stack object and forcing kvm_set_memslot() to juggle the allocation and do weird things like reuse the old memslot object in MOVE. In the MOVE case, this results in an "extra" memslot allocation due to allocating both the "new" slot and the "invalid" slot, but that's a temporary and not-huge allocation, and MOVE is a relatively rare memslot operation. Regarding MOVE, drop the open-coded management of the gfn tree with a call to kvm_replace_memslot(), which already handles the case where new->base_gfn != old->base_gfn. This is made possible by virtue of not having to copy the "new" memslot data after erasing the old memslot from the gfn tree. Using kvm_replace_memslot(), and more specifically not reusing the old memslot, means the MOVE case now does hva tree and hash list updates, but that's a small price to pay for simplifying the code and making MOVE align with all the other flavors of updates. The "extra" updates are firmly in the noise from a performance perspective, e.g. the "move (in)active area" selfttests show a (very, very) slight improvement. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <f0d8c72727aa825cf682bd4e3da4b3fa68215dd4.1638817641.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: Wait 'til the bitter end to initialize the "new" memslotSean Christopherson
Initialize the "new" memslot in the !DELETE path only after the various sanity checks have passed. This will allow a future commit to allocate @new dynamically without having to copy a memslot, and without having to deal with freeing @new in error paths and in the "nothing to change" path that's hiding in the sanity checks. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <a084d0531ca3a826a7f861eb2b08b5d1c06ef265.1638817641.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: Optimize overlapping memslots checkMaciej S. Szmigiero
Do a quick lookup for possibly overlapping gfns when creating or moving a memslot instead of performing a linear scan of the whole memslot set. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> [sean: tweaked params to avoid churn in future cleanup] Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <a4795e5c2f624754e9c0aab023ebda1966feb3e1.1638817641.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: Optimize gfn lookup in kvm_zap_gfn_range()Maciej S. Szmigiero
Introduce a memslots gfn upper bound operation and use it to optimize kvm_zap_gfn_range(). This way this handler can do a quick lookup for intersecting gfns and won't have to do a linear scan of the whole memslot set. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <ef242146a87a335ee93b441dcf01665cb847c902.1638817641.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: Call kvm_arch_flush_shadow_memslot() on the old slot in ↵Maciej S. Szmigiero
kvm_invalidate_memslot() kvm_invalidate_memslot() calls kvm_arch_flush_shadow_memslot() on the active, but KVM_MEMSLOT_INVALID slot. Do it on the inactive (but valid) old slot instead since arch code really should not get passed such invalid slot. Note that this means that the "arch" field of the slot provided to kvm_arch_flush_shadow_memslot() may have stale data since this function is called with slots_arch_lock released. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <813595ecc193d6ae39a87709899d4251523b05f8.1638817641.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: Keep memslots in tree-based structures instead of array-based onesMaciej S. Szmigiero
The current memslot code uses a (reverse gfn-ordered) memslot array for keeping track of them. Because the memslot array that is currently in use cannot be modified every memslot management operation (create, delete, move, change flags) has to make a copy of the whole array so it has a scratch copy to work on. Strictly speaking, however, it is only necessary to make copy of the memslot that is being modified, copying all the memslots currently present is just a limitation of the array-based memslot implementation. Two memslot sets, however, are still needed so the VM continues to run on the currently active set while the requested operation is being performed on the second, currently inactive one. In order to have two memslot sets, but only one copy of actual memslots it is necessary to split out the memslot data from the memslot sets. The memslots themselves should be also kept independent of each other so they can be individually added or deleted. These two memslot sets should normally point to the same set of memslots. They can, however, be desynchronized when performing a memslot management operation by replacing the memslot to be modified by its copy. After the operation is complete, both memslot sets once again point to the same, common set of memslot data. This commit implements the aforementioned idea. For tracking of gfns an ordinary rbtree is used since memslots cannot overlap in the guest address space and so this data structure is sufficient for ensuring that lookups are done quickly. The "last used slot" mini-caches (both per-slot set one and per-vCPU one), that keep track of the last found-by-gfn memslot, are still present in the new code. Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <17c0cf3663b760a0d3753d4ac08c0753e941b811.1638817641.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: s390: Introduce kvm_s390_get_gfn_end()Maciej S. Szmigiero
And use it where s390 code would just access the memslot with the highest gfn directly. No functional change intended. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-Id: <42496041d6af1c23b1cbba2636b344ca8d5fc3af.1638817641.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: Use interval tree to do fast hva lookup in memslotsMaciej S. Szmigiero
The current memslots implementation only allows quick binary search by gfn, quick lookup by hva is not possible - the implementation has to do a linear scan of the whole memslots array, even though the operation being performed might apply just to a single memslot. This significantly hurts performance of per-hva operations with higher memslot counts. Since hva ranges can overlap between memslots an interval tree is needed for tracking them. [sean: handle interval tree updates in kvm_replace_memslot()] Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <d66b9974becaa9839be9c4e1a5de97b177b4ac20.1638817640.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: Resolve memslot ID via a hash table instead of via a static arrayMaciej S. Szmigiero
Memslot ID to the corresponding memslot mappings are currently kept as indices in static id_to_index array. The size of this array depends on the maximum allowed memslot count (regardless of the number of memslots actually in use). This has become especially problematic recently, when memslot count cap was removed, so the maximum count is now full 32k memslots - the maximum allowed by the current KVM API. Keeping these IDs in a hash table (instead of an array) avoids this problem. Resolving a memslot ID to the actual memslot (instead of its index) will also enable transitioning away from an array-based implementation of the whole memslots structure in a later commit. Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <117fb2c04320e6cd6cf34f205a72eadb0aa8d5f9.1638817640.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: Move WARN on invalid memslot index to update_memslots()Maciej S. Szmigiero
Since kvm_memslot_move_forward() can theoretically return a negative memslot index even when kvm_memslot_move_backward() returned a positive one (and so did not WARN) let's just move the warning to the common code. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <eeed890ccb951e7b0dce15bc170eb2661d5b02da.1638817640.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: Integrate gfn_to_memslot_approx() into search_memslots()Maciej S. Szmigiero
s390 arch has gfn_to_memslot_approx() which is almost identical to search_memslots(), differing only in that in case the gfn falls in a hole one of the memslots bordering the hole is returned. Add this lookup mode as an option to search_memslots() so we don't have two almost identical functions for looking up a memslot by its gfn. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> [sean: tweaked helper names to keep gfn_to_memslot_approx() in s390] Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <171cd89b52c718dbe180ecd909b4437a64a7e2ec.1638817640.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: x86: Use nr_memslot_pages to avoid traversing the memslots arrayMaciej S. Szmigiero
There is no point in recalculating from scratch the total number of pages in all memslots each time a memslot is created or deleted. Use KVM's cached nr_memslot_pages to compute the default max number of MMU pages. Note that even with nr_memslot_pages capped at ULONG_MAX we can't safely multiply it by KVM_PERMILLE_MMU_PAGES (20) since this operation can possibly overflow an unsigned long variable. Write this "* 20 / 1000" operation as "/ 50" instead to avoid such overflow. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> [sean: use common KVM field and rework changelog accordingly] Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <d14c5a24535269606675437d5602b7dac4ad8c0e.1638817640.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: x86: Don't call kvm_mmu_change_mmu_pages() if the count hasn't changedMaciej S. Szmigiero
There is no point in calling kvm_mmu_change_mmu_pages() for memslot operations that don't change the total page count, so do it just for KVM_MR_CREATE and KVM_MR_DELETE. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <9e56b7616a11f5654e4ab486b3237366b7ba9f2a.1638817640.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: Don't make a full copy of the old memslot in __kvm_set_memory_region()Sean Christopherson
Stop making a full copy of the old memslot in __kvm_set_memory_region() now that metadata updates are handled by kvm_set_memslot(), i.e. now that the old memslot's dirty bitmap doesn't need to be referenced after the memslot and its pointer is modified/invalidated by kvm_set_memslot(). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <5dce0946b41bba8c83f6e3424c6955c56bcc9f86.1638817640.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: s390: Skip gfn/size sanity checks on memslot DELETE or FLAGS_ONLYSean Christopherson
Sanity check the hva, gfn, and size of a userspace memory region only if any of those properties can change, i.e. skip the checks for DELETE and FLAGS_ONLY. KVM doesn't allow moving the hva or changing the size, a gfn change shows up as a MOVE even if flags are being modified, and the checks are pointless for the DELETE case as userspace_addr and gfn_base are zeroed by common KVM. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <05430738437ac2c9c7371ac4e11f4a533e1677da.1638817640.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: x86: Don't assume old/new memslots are non-NULL at memslot commitSean Christopherson
Play nice with a NULL @old or @new when handling memslot updates so that common KVM can pass NULL for one or the other in CREATE and DELETE cases instead of having to synthesize a dummy memslot. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <2eb7788adbdc2bc9a9c5f86844dd8ee5c8428732.1638817640.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: Use prepare/commit hooks to handle generic memslot metadata updatesSean Christopherson
Handle the generic memslot metadata, a.k.a. dirty bitmap, updates at the same time that arch handles it's own metadata updates, i.e. at memslot prepare and commit. This will simplify converting @new to a dynamically allocated object, and more closely aligns common KVM with architecture code. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <2ddd5446e3706fe3c1e52e3df279f04c458be830.1638817640.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: Stop passing kvm_userspace_memory_region to arch memslot hooksSean Christopherson
Drop the @mem param from kvm_arch_{prepare,commit}_memory_region() now that its use has been removed in all architectures. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <aa5ed3e62c27e881d0d8bc0acbc1572bc336dc19.1638817640.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: RISC-V: Use "new" memslot instead of userspace memory regionSean Christopherson
Get the slot ID, hva, etc... from the "new" memslot instead of the userspace memory region when preparing/committing a memory region. This will allow a future commit to drop @mem from the prepare/commit hooks once all architectures convert to using "new". Opportunistically wait to get the various "new" values until after filtering out the DELETE case in anticipation of a future commit passing NULL for @new when deleting a memslot. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <543608ab88a1190e73a958efffafc98d2652c067.1638817640.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: x86: Use "new" memslot instead of userspace memory regionSean Christopherson
Get the number of pages directly from the new memslot instead of computing the same from the userspace memory region when allocating memslot metadata. This will allow a future patch to drop @mem. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <ef44892eb615f5c28e682bbe06af96aff9ce2a9f.1638817639.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: s390: Use "new" memslot instead of userspace memory regionSean Christopherson
Get the gfn, size, and hva from the new memslot instead of the userspace memory region when preparing/committing memory region changes. This will allow a future commit to drop the @mem param. Note, this has a subtle functional change as KVM would previously reject DELETE if userspace provided a garbage userspace_addr or guest_phys_addr, whereas KVM zeros those fields in the "new" memslot when deleting an existing memslot. Arguably the old behavior is more correct, but there's zero benefit into requiring userspace to provide sane values for hva and gfn. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <917ed131c06a4c7b35dd7fb7ed7955be899ad8cc.1638817639.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: PPC: Avoid referencing userspace memory region in memslot updatesSean Christopherson
For PPC HV, get the number of pages directly from the new memslot instead of computing the same from the userspace memory region, and explicitly check for !DELETE instead of inferring the same when toggling mmio_update. The motivation for these changes is to avoid referencing the @mem param so that it can be dropped in a future commit. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <1e97fb5198be25f98ef82e63a8d770c682264cc9.1638817639.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: MIPS: Drop pr_debug from memslot commit to avoid using "mem"Sean Christopherson
Remove an old (circa 2012) kvm_debug from kvm_arch_commit_memory_region() to print basic information when committing a memslot change. The primary motivation for removing the kvm_debug is to avoid using @mem, the user memory region, so that said param can be removed. Alternatively, the debug message could be converted to use @new, but that would require synthesizing select state to play nice with the DELETED case, which will pass NULL for @new in the future. And there's no argument to be had for dumping generic information in an arch callback, i.e. if there's a good reason for the debug message, then it belongs in common KVM code where all architectures can benefit. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <446929a668f6e1346751571b71db41e94e976cdf.1638817639.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: arm64: Use "new" memslot instead of userspace memory regionSean Christopherson
Get the slot ID, hva, etc... from the "new" memslot instead of the userspace memory region when preparing/committing a memory region. This will allow a future commit to drop @mem from the prepare/commit hooks once all architectures convert to using "new". Opportunistically wait to get the hva begin+end until after filtering out the DELETE case in anticipation of a future commit passing NULL for @new when deleting a memslot. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <c019d00c2531520c52e0b52dfda1be5aa898103c.1638817639.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: Let/force architectures to deal with arch specific memslot dataSean Christopherson
Pass the "old" slot to kvm_arch_prepare_memory_region() and force arch code to handle propagating arch specific data from "new" to "old" when necessary. This is a baby step towards dynamically allocating "new" from the get go, and is a (very) minor performance boost on x86 due to not unnecessarily copying arch data. For PPC HV, copy the rmap in the !CREATE and !DELETE paths, i.e. for MOVE and FLAGS_ONLY. This is functionally a nop as the previous behavior would overwrite the pointer for CREATE, and eventually discard/ignore it for DELETE. For x86, copy the arch data only for FLAGS_ONLY changes. Unlike PPC HV, x86 needs to reallocate arch data in the MOVE case as the size of x86's allocations depend on the alignment of the memslot's gfn. Opportunistically tweak kvm_arch_prepare_memory_region()'s param order to match the "commit" prototype. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> [mss: add missing RISCV kvm_arch_prepare_memory_region() change] Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <67dea5f11bbcfd71e3da5986f11e87f5dd4013f9.1638817639.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: Use "new" memslot's address space ID instead of dedicated paramSean Christopherson
Now that the address space ID is stored in every slot, including fake slots used for deletion, use the slot's as_id instead of passing in the redundant information as a param to kvm_set_memslot(). This will greatly simplify future memslot work by avoiding passing a large number of variables around purely to honor @as_id. Drop a comment in the DELETE path about new->as_id being provided purely for debug, as that's now a lie. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <03189577be214ab8530a4b3a3ee3ed1c2f9e5815.1638817639.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: Resync only arch fields when slots_arch_lock gets reacquiredMaciej S. Szmigiero
There is no need to copy the whole memslot data after releasing slots_arch_lock for a moment to install temporary memslots copy in kvm_set_memslot() since this lock only protects the arch field of each memslot. Just resync this particular field after reacquiring slots_arch_lock. Note, this also eliminates the need to manually clear the INVALID flag when restoring memslots; the "setting" of the INVALID flag was an unwanted side effect of copying the entire memslots. Since kvm_copy_memslots() has just one caller remaining now open-code it instead. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> [sean: tweak shortlog, note INVALID flag in changelog, revert comment] Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <b63035d114707792e9042f074478337f770dff6a.1638817638.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: Open code kvm_delete_memslot() into its only callerSean Christopherson
Fold kvm_delete_memslot() into __kvm_set_memory_region() to free up the "kvm_delete_memslot()" name for use in a future helper. The delete logic isn't so complex/long that it truly needs a helper, and it will be simplified a wee bit further in upcoming commits. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <2887631c31a82947faa488ab72f55f8c68b7c194.1638817638.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: Require total number of memslot pages to fit in an unsigned longSean Christopherson
Explicitly disallow creating more memslot pages than can fit in an unsigned long, KVM doesn't correctly handle a total number of memslot pages that doesn't fit in an unsigned long and remedying that would be a waste of time. For a 64-bit kernel, this is a nop as memslots are not allowed to overlap in the gfn address space. With a 32-bit kernel, userspace can at most address 3gb of virtual memory, whereas wrapping the total number of pages would require 4tb+ of guest physical memory. Even with x86's second address space for SMM, userspace would need to alias all of guest memory more than one _thousand_ times. And on older x86 hardware with MAXPHYADDR < 43, the guest couldn't actually access any of those aliases even if userspace lied about guest.MAXPHYADDR. On 390 and arm64, this is a nop as they don't support 32-bit hosts. On x86, practically speaking this is simply acknowledging reality as the existing kvm_mmu_calculate_default_mmu_pages() assumes the total number of pages fits in an "unsigned long". On PPC, this is likely a nop as every flavor of PPC KVM assumes gfns (and gpas!) fit in unsigned long. arch/powerpc/kvm/book3s_32_mmu_host.c goes a step further and fails the build if CONFIG_PTE_64BIT=y, which presumably means that it does't support 64-bit physical addresses. On MIPS, this is also likely a nop as the core MMU helpers assume gpas fit in unsigned long, e.g. see kvm_mips_##name##_pte. And finally, RISC-V is a "don't care" as it doesn't exist in any release, i.e. there is no established ABI to break. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <1c2c91baf8e78acccd4dad38da591002e61c013c.1638817638.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: Convert kvm_for_each_vcpu() to using xa_for_each_range()Marc Zyngier
Now that the vcpu array is backed by an xarray, use the optimised iterator that matches the underlying data structure. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Message-Id: <20211116160403.4074052-8-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: Use 'unsigned long' as kvm_for_each_vcpu()'s indexMarc Zyngier
Everywhere we use kvm_for_each_vpcu(), we use an int as the vcpu index. Unfortunately, we're about to move rework the iterator, which requires this to be upgrade to an unsigned long. Let's bite the bullet and repaint all of it in one go. Signed-off-by: Marc Zyngier <maz@kernel.org> Message-Id: <20211116160403.4074052-7-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: Convert the kvm->vcpus array to a xarrayMarc Zyngier
At least on arm64 and x86, the vcpus array is pretty huge (up to 1024 entries on x86) and is mostly empty in the majority of the cases (running 1k vcpu VMs is not that common). This mean that we end-up with a 4kB block of unused memory in the middle of the kvm structure. Instead of wasting away this memory, let's use an xarray instead, which gives us almost the same flexibility as a normal array, but with a reduced memory usage with smaller VMs. Signed-off-by: Marc Zyngier <maz@kernel.org> Message-Id: <20211116160403.4074052-6-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>