summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-07-03arm: KVM: Simplify HYP initMarc Zyngier
Just like for arm64, we can now make the HYP setup a lot simpler, and we can now initialise it in one go (instead of the two phases we currently have). Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03arm/arm64: KVM: Kill free_boot_hyp_pgdMarc Zyngier
There is no way to free the boot PGD, because it doesn't exist anymore as a standalone entity. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03arm/arm64: KVM: Drop boot_pgdMarc Zyngier
Since we now only have one set of page tables, the concept of boot_pgd is useless and can be removed. We still keep it as an element of the "extended idmap" thing. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03arm64: KVM: Simplify HYP init/teardownMarc Zyngier
Now that we only have the "merged page tables" case to deal with, there is a bunch of things we can simplify in the HYP code (both at init and teardown time). Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03arm/arm64: KVM: Always have merged page tablesMarc Zyngier
We're in a position where we can now always have "merged" page tables, where both the runtime mapping and the idmap coexist. This results in some code being removed, but there is more to come. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03arm64: KVM: Runtime detection of lower HYP offsetMarc Zyngier
Add the code that enables the switch to the lower HYP VA range. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03arm/arm64: KVM: Export __hyp_text_start/end symbolsMarc Zyngier
Declare the __hyp_text_start/end symbols in asm/virt.h so that they can be reused without having to declare them locally. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03arm64: KVM: Refactor kern_hyp_va to deal with multiple offsetsMarc Zyngier
As we move towards a selectable HYP VA range, it is obvious that we don't want to test a variable to find out if we need to use the bottom VA range, the top VA range, or use the address as is (for VHE). Instead, we can expand our current helper to generate the right mask or nop with code patching. We default to using the top VA space, with alternatives to switch to the bottom one or to nop out the instructions. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03arm64: KVM: Define HYP offset masksMarc Zyngier
Define the two possible HYP VA regions in terms of VA_BITS, and keep HYP_PAGE_OFFSET_MASK as a temporary compatibility definition. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03arm64: Add ARM64_HYP_OFFSET_LOW capabilityMarc Zyngier
As we need to indicate to the rest of the kernel which region of the HYP VA space is safe to use, add a capability that will indicate that KVM should use the [VA_BITS-2:0] range. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03arm64: KVM: Kill HYP_PAGE_OFFSETMarc Zyngier
HYP_PAGE_OFFSET is not massively useful. And the way we use it in KERN_HYP_VA is inconsistent with the equivalent operation in EL2, where we use a mask instead. Let's replace the uses of HYP_PAGE_OFFSET with HYP_PAGE_OFFSET_MASK, and get rid of the pointless macro. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03arm/arm64: KVM: Remove hyp_kern_va helperMarc Zyngier
hyp_kern_va is now completely unused, so let's remove it entirely. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03arm64: KVM: Always reference __hyp_panic_string via its kernel VAMarc Zyngier
__hyp_panic_string is passed via the HYP panic code to the panic function, and is being "upgraded" to a kernel address, as it is referenced by the HYP code (in a PC-relative way). This is a bit silly, and we'd be better off obtaining the kernel address and not mess with it at all. This patch implements this with a tiny bit of asm glue, by forcing the string pointer to be read from the literal pool. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03arm64: KVM: Merged page tables documentationMarc Zyngier
Since dealing with VA ranges tends to hurt my brain badly, let's start with a bit of documentation that will hopefully help understanding what comes next... Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03KVM: arm/arm64: The GIC is dead, long live the GICMarc Zyngier
I don't think any single piece of the KVM/ARM code ever generated as much hatred as the GIC emulation. It was written by someone who had zero experience in modeling hardware (me), was riddled with design flaws, should have been scrapped and rewritten from scratch long before having a remote chance of reaching mainline, and yet we supported it for a good three years. No need to mention the names of those who suffered, the git log is singing their praises. Thankfully, we now have a much more maintainable implementation, and we can safely put the grumpy old GIC to rest. Fellow hackers, please raise your glass in memory of the GIC: The GIC is dead, long live the GIC! Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-06-29arm/arm64: KVM: Make default HYP mappings non-excutableMarc Zyngier
Structures that can be generally written to don't have any requirement to be executable (quite the opposite). This includes the kvm and vcpu structures, as well as the stacks. Let's change the default to incorporate the XN flag. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-06-29arm/arm64: KVM: Map the HYP text as read-onlyMarc Zyngier
There should be no reason for mapping the HYP text read/write. As such, let's have a new set of flags (PAGE_HYP_EXEC) that allows execution, but makes the page as read-only, and update the two call sites that deal with mapping code. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-06-29arm/arm64: KVM: Enforce HYP read-only mapping of the kernel's rodata sectionMarc Zyngier
In order to be able to use C code in HYP, we're now mapping the kernel's rodata in HYP. It works absolutely fine, except that we're mapping it RWX, which is not what it should be. Add a new HYP_PAGE_RO protection, and pass it as the protection flags when mapping the rodata section. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-06-29arm64: Add PTE_HYP_XN page table flagMarc Zyngier
EL2 page tables can be configured to deny code from being executed, which is done by setting bit 54 in the page descriptor. It is the same bit as PTE_UXN, but the "USER" reference felt odd in the hypervisor code. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-06-29arm/arm64: KVM: Add a protection parameter to create_hyp_mappingsMarc Zyngier
Currently, create_hyp_mappings applies a "one size fits all" page protection (PAGE_HYP). As we're heading towards separate protections for different sections, let's make this protection a parameter, and let the callers pass their prefered protection (PAGE_HYP for everyone for the time being). Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-06-21Merge tag 'kvm-s390-next-4.8-2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: vSIE (nested virtualization) feature for 4.8 (kvm/next) With an updated QEMU this allows to create nested KVM guests (KVM under KVM) on s390. s390 memory management changes from Martin Schwidefsky or acked by Martin. One common code memory management change (pageref) acked by Andrew Morton. The feature has to be enabled with the nested medule parameter.
2016-06-21KVM: s390: vsie: add module parameter "nested"David Hildenbrand
Let's be careful first and allow nested virtualization only if enabled by the system administrator. In addition, user space still has to explicitly enable it via SCLP features for it to work. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21KVM: s390: vsie: add indication for future featuresDavid Hildenbrand
We have certain SIE features that we cannot support for now. Let's add these features, so user space can directly prepare to enable them, so we don't have to update yet another component. In addition, add a comment block, telling why it is for now not possible to forward/enable these features. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21KVM: s390: vsie: correctly set and handle guest TODDavid Hildenbrand
Guest 2 sets up the epoch of guest 3 from his point of view. Therefore, we have to add the guest 2 epoch to the guest 3 epoch. We also have to take care of guest 2 epoch changes on STP syncs. This will work just fine by also updating the guest 3 epoch when a vsie_block has been set for a VCPU. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21KVM: s390: vsie: speed up VCPU external callsDavid Hildenbrand
Whenever a SIGP external call is injected via the SIGP external call interpretation facility, the VCPU is not kicked. When a VCPU is currently in the VSIE, the external call might not be processed immediately. Therefore we have to provoke partial execution exceptions, which leads to a kick of the VCPU and therefore also kick out of VSIE. This is done by simulating the WAIT state. This bit has no other side effects. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21KVM: s390: don't use CPUSTAT_WAIT to detect if a VCPU is idleDavid Hildenbrand
As we want to make use of CPUSTAT_WAIT also when a VCPU is not idle but to force interception of external calls, let's check in the bitmap instead. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21KVM: s390: vsie: speed up VCPU irq delivery when handling vsieDavid Hildenbrand
Whenever we want to wake up a VCPU (e.g. when injecting an IRQ), we have to kick it out of vsie, so the request will be handled faster. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21KVM: s390: vsie: try to refault after a reported fault to g2David Hildenbrand
We can avoid one unneeded SIE entry after we reported a fault to g2. Theoretically, g2 resolves the fault and we can create the shadow mapping directly, instead of failing again when entering the SIE. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21KVM: s390: vsie: support IBS interpretationDavid Hildenbrand
We can easily enable ibs for guest 2, so he can use it for guest 3. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21KVM: s390: vsie: support conditional-external-interceptionDavid Hildenbrand
We can easily enable cei for guest 2, so he can use it for guest 3. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21KVM: s390: vsie: support intervention-bypassDavid Hildenbrand
We can easily enable intervention bypass for guest 2, so it can use it for guest 3. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21KVM: s390: vsie: support guest-storage-limit-suppressionDavid Hildenbrand
We can easily forward guest-storage-limit-suppression if available. One thing to care about is keeping the prefix properly mapped when gsls in toggled on/off or the mso changes in between. Therefore we better remap the prefix on any mso changes just like we already do with the prefix. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21KVM: s390: vsie: support guest-PER-enhancementDavid Hildenbrand
We can easily forward the guest-PER-enhancement facility to guest 2 if available. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21KVM: s390: vsie: support shared IPTE-interlock facilityDavid Hildenbrand
As we forward the whole SCA provided by guest 2, we can directly forward SIIF if available. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21KVM: s390: vsie: support 64-bit-SCAODavid Hildenbrand
Let's provide the 64-bit-SCAO facility to guest 2, so he can set up a SCA for guest 3 that has a 64 bit address. Please note that we already require the 64 bit SCAO for our vsie implementation, in order to forward the SCA directly (by pinning the page). Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21KVM: s390: vsie: support run-time-instrumentationDavid Hildenbrand
As soon as guest 2 is allowed to use run-time-instrumentation (indicated via via STFLE), it can also enable it for guest 3. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21KVM: s390: vsie: support vectory facility (SIMD)David Hildenbrand
As soon as guest 2 is allowed to use the vector facility (indicated via STFLE), it can also enable it for guest 3. We have to take care of the sattellite block that might be used when not relying on lazy vector copying (not the case for KVM). Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21KVM: s390: vsie: support transactional executionDavid Hildenbrand
As soon as guest 2 is allowed to use transactional execution (indicated via STFLE), he can also enable it for guest 3. Active transactional execution requires also the second prefix page to be mapped. If that page cannot be mapped, a validity icpt has to be presented to the guest. We have to take care of tx being toggled on/off, otherwise we might get wrong prefix validity icpt. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21KVM: s390: vsie: support aes dea wrapping keysDavid Hildenbrand
As soon as message-security-assist extension 3 is enabled for guest 2, we have to allow key wrapping for guest 3. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21KVM: s390: vsie: support STFLE interpretationDavid Hildenbrand
Issuing STFLE is extremely rare. Instead of copying 2k on every VSIE call, let's do this lazily, when a guest 3 tries to execute STFLE. We can setup the block and retry. Unfortunately, we can't directly forward that facility list, as we only have a 31 bit address for the facility list designation. So let's use a DMA allocation for our vsie_page instead for now. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21KVM: s390: vsie: support host-protection-interruptionDavid Hildenbrand
Introduced with ESOP, therefore available for the guest if it is allowed to use ESOP. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21KVM: s390: vsie: support edat1 / edat2David Hildenbrand
If guest 2 is allowed to use edat 1 / edat 2, it can also set it up for guest 3, so let's properly check and forward the edat cpuflags. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21KVM: s390: vsie: support setting the ibcDavid Hildenbrand
As soon as we forward an ibc to guest 2 (indicated via kvm->arch.model.ibc), he can also use it for guest 3. Let's properly round the ibc up/down, so we avoid any potential validity icpts from the underlying SIE, if it doesn't simply round the values. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21KVM: s390: vsie: optimize gmap prefix mappingDavid Hildenbrand
In order to not always map the prefix, we have to take care of certain aspects that implicitly unmap the prefix: - Changes to the prefix address - Changes to MSO, because the HVA of the prefix is changed - Changes of the gmap shadow (e.g. unshadowed, asce or edat changes) By properly handling these cases, we can stop remapping the prefix when there is no reason to do so. This also allows us now to not acquire any gmap shadow locks when rerunning the vsie and still having a valid gmap shadow. Please note, to detect changing gmap shadows, we have to keep the reference of the gmap shadow. The address of a gmap shadow does otherwise not reliably indicate if the gmap shadow has changed (the memory chunk could get reused). Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21KVM: s390: vsie: initial support for nested virtualizationDavid Hildenbrand
This patch adds basic support for nested virtualization on s390x, called VSIE (virtual SIE) and allows it to be used by the guest if the necessary facilities are supported by the hardware and enabled for the guest. In order to make this work, we have to shadow the sie control block provided by guest 2. In order to gain some performance, we have to reuse the same shadow blocks as good as possible. For now, we allow as many shadow blocks as we have VCPUs (that way, every VCPU can run the VSIE concurrently). We have to watch out for the prefix getting unmapped out of our shadow gmap and properly get the VCPU out of VSIE in that case, to fault the prefix pages back in. We use the PROG_REQUEST bit for that purpose. This patch is based on an initial prototype by Tobias Elpelt. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21mm/page_ref: introduce page_ref_inc_returnDavid Hildenbrand
Let's introduce that helper. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20s390: introduce page_to_virt() and pfn_to_virt()David Hildenbrand
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20KVM: s390: backup the currently enabled gmap when scheduled outDavid Hildenbrand
Nested virtualization will have to enable own gmaps. Current code would enable the wrong gmap whenever scheduled out and back in, therefore resulting in the wrong gmap being enabled. This patch reenables the last enabled gmap, therefore avoiding having to touch vcpu->arch.gmap when enabling a different gmap. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20KVM: s390: fast path for shadow gmaps in gmap notifierDavid Hildenbrand
The default kvm gmap notifier doesn't have to handle shadow gmaps. So let's just directly exit in case we get notified about one. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20s390/mm: don't fault everything in read-write in gmap_pte_op_fixup()David Hildenbrand
Let's not fault in everything in read-write but limit it to read-only where possible. When restricting access rights, we already have the required protection level in our hands. When reading from guest 2 storage (gmap_read_table), it is obviously PROT_READ. When shadowing a pte, the required protection level is given via the guest 2 provided pte. Based on an initial patch by Martin Schwidefsky. Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>