summaryrefslogtreecommitdiff
path: root/arch/arm/kvm/mmu.c
AgeCommit message (Collapse)Author
2014-12-18Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM update from Paolo Bonzini: "3.19 changes for KVM: - spring cleaning: removed support for IA64, and for hardware- assisted virtualization on the PPC970 - ARM, PPC, s390 all had only small fixes For x86: - small performance improvements (though only on weird guests) - usual round of hardware-compliancy fixes from Nadav - APICv fixes - XSAVES support for hosts and guests. XSAVES hosts were broken because the (non-KVM) XSAVES patches inadvertently changed the KVM userspace ABI whenever XSAVES was enabled; hence, this part is going to stable. Guest support is just a matter of exposing the feature and CPUID leaves support" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (179 commits) KVM: move APIC types to arch/x86/ KVM: PPC: Book3S: Enable in-kernel XICS emulation by default KVM: PPC: Book3S HV: Improve H_CONFER implementation KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register KVM: PPC: Book3S HV: Remove code for PPC970 processors KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions KVM: PPC: Book3S HV: Simplify locking around stolen time calculations arch: powerpc: kvm: book3s_paired_singles.c: Remove unused function arch: powerpc: kvm: book3s_pr.c: Remove unused function arch: powerpc: kvm: book3s.c: Remove some unused functions arch: powerpc: kvm: book3s_32_mmu.c: Remove unused function KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked KVM: PPC: Book3S HV: ptes are big endian KVM: PPC: Book3S HV: Fix inaccuracies in ICP emulation for H_IPI KVM: PPC: Book3S HV: Fix KSM memory corruption KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI KVM: PPC: Book3S HV: Fix computation of tlbie operand KVM: PPC: Book3S HV: Add missing HPTE unlock KVM: PPC: BookE: Improve irq inject tracepoint arm/arm64: KVM: Require in-kernel vgic for the arch timers ...
2014-12-13arm/arm64: KVM: Introduce stage2_unmap_vmChristoffer Dall
Introduce a new function to unmap user RAM regions in the stage2 page tables. This is needed on reboot (or when the guest turns off the MMU) to ensure we fault in pages again and make the dcache, RAM, and icache coherent. Using unmap_stage2_range for the whole guest physical range does not work, because that unmaps IO regions (such as the GIC) which will not be recreated or in the best case faulted in on a page-by-page basis. Call this function on secondary and subsequent calls to the KVM_ARM_VCPU_INIT ioctl so that a reset VCPU will detect the guest Stage-1 MMU is off when faulting in pages and make the caches coherent. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-11-26arm/arm64: kvm: drop inappropriate use of kvm_is_mmio_pfn()Ard Biesheuvel
Instead of using kvm_is_mmio_pfn() to decide whether a host region should be stage 2 mapped with device attributes, add a new static function kvm_is_device_pfn() that disregards RAM pages with the reserved bit set, as those should usually not be mapped as device memory. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-26arm64: KVM: fix unmapping with 48-bit VAsMark Rutland
Currently if using a 48-bit VA, tearing down the hyp page tables (which can happen in the absence of a GICH or GICV resource) results in the rather nasty splat below, evidently becasue we access a table that doesn't actually exist. Commit 38f791a4e499792e (arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2) added a pgd_none check to __create_hyp_mappings to account for the additional level of tables, but didn't add a corresponding check to unmap_range, and this seems to be the source of the problem. This patch adds the missing pgd_none check, ensuring we don't try to access tables that don't exist. Original splat below: kvm [1]: Using HYP init bounce page @83fe94a000 kvm [1]: Cannot obtain GICH resource Unable to handle kernel paging request at virtual address ffff7f7fff000000 pgd = ffff800000770000 [ffff7f7fff000000] *pgd=0000000000000000 Internal error: Oops: 96000004 [#1] PREEMPT SMP Modules linked in: CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.18.0-rc2+ #89 task: ffff8003eb500000 ti: ffff8003eb45c000 task.ti: ffff8003eb45c000 PC is at unmap_range+0x120/0x580 LR is at free_hyp_pgds+0xac/0xe4 pc : [<ffff80000009b768>] lr : [<ffff80000009cad8>] pstate: 80000045 sp : ffff8003eb45fbf0 x29: ffff8003eb45fbf0 x28: ffff800000736000 x27: ffff800000735000 x26: ffff7f7fff000000 x25: 0000000040000000 x24: ffff8000006f5000 x23: 0000000000000000 x22: 0000007fffffffff x21: 0000800000000000 x20: 0000008000000000 x19: 0000000000000000 x18: ffff800000648000 x17: ffff800000537228 x16: 0000000000000000 x15: 000000000000001f x14: 0000000000000000 x13: 0000000000000001 x12: 0000000000000020 x11: 0000000000000062 x10: 0000000000000006 x9 : 0000000000000000 x8 : 0000000000000063 x7 : 0000000000000018 x6 : 00000003ff000000 x5 : ffff800000744188 x4 : 0000000000000001 x3 : 0000000040000000 x2 : ffff800000000000 x1 : 0000007fffffffff x0 : 000000003fffffff Process swapper/0 (pid: 1, stack limit = 0xffff8003eb45c058) Stack: (0xffff8003eb45fbf0 to 0xffff8003eb460000) fbe0: eb45fcb0 ffff8003 0009cad8 ffff8000 fc00: 00000000 00000080 00736140 ffff8000 00736000 ffff8000 00000000 00007c80 fc20: 00000000 00000080 006f5000 ffff8000 00000000 00000080 00743000 ffff8000 fc40: 00735000 ffff8000 006d3030 ffff8000 006fe7b8 ffff8000 00000000 00000080 fc60: ffffffff 0000007f fdac1000 ffff8003 fd94b000 ffff8003 fda47000 ffff8003 fc80: 00502b40 ffff8000 ff000000 ffff7f7f fdec6000 00008003 fdac1630 ffff8003 fca0: eb45fcb0 ffff8003 ffffffff 0000007f eb45fd00 ffff8003 0009b378 ffff8000 fcc0: ffffffea 00000000 006fe000 ffff8000 00736728 ffff8000 00736120 ffff8000 fce0: 00000040 00000000 00743000 ffff8000 006fe7b8 ffff8000 0050cd48 00000000 fd00: eb45fd60 ffff8003 00096070 ffff8000 006f06e0 ffff8000 006f06e0 ffff8000 fd20: fd948b40 ffff8003 0009a320 ffff8000 00000000 00000000 00000000 00000000 fd40: 00000ae0 00000000 006aa25c ffff8000 eb45fd60 ffff8003 0017ca44 00000002 fd60: eb45fdc0 ffff8003 0009a33c ffff8000 006f06e0 ffff8000 006f06e0 ffff8000 fd80: fd948b40 ffff8003 0009a320 ffff8000 00000000 00000000 00735000 ffff8000 fda0: 006d3090 ffff8000 006aa25c ffff8000 00735000 ffff8000 006d3030 ffff8000 fdc0: eb45fdd0 ffff8003 000814c0 ffff8000 eb45fe50 ffff8003 006aaac4 ffff8000 fde0: 006ddd90 ffff8000 00000006 00000000 006d3000 ffff8000 00000095 00000000 fe00: 006a1e90 ffff8000 00735000 ffff8000 006d3000 ffff8000 006aa25c ffff8000 fe20: 00735000 ffff8000 006d3030 ffff8000 eb45fe50 ffff8003 006fac68 ffff8000 fe40: 00000006 00000006 fe293ee6 ffff8003 eb45feb0 ffff8003 004f8ee8 ffff8000 fe60: 004f8ed4 ffff8000 00735000 ffff8000 00000000 00000000 00000000 00000000 fe80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 fea0: 00000000 00000000 00000000 00000000 00000000 00000000 000843d0 ffff8000 fec0: 004f8ed4 ffff8000 00000000 00000000 00000000 00000000 00000000 00000000 fee0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ff00: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ff20: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ff40: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ff60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ff80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ffa0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000005 00000000 ffe0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Call trace: [<ffff80000009b768>] unmap_range+0x120/0x580 [<ffff80000009cad4>] free_hyp_pgds+0xa8/0xe4 [<ffff80000009b374>] kvm_arch_init+0x268/0x44c [<ffff80000009606c>] kvm_init+0x24/0x260 [<ffff80000009a338>] arm_init+0x18/0x24 [<ffff8000000814bc>] do_one_initcall+0x88/0x1a0 [<ffff8000006aaac0>] kernel_init_freeable+0x148/0x1e8 [<ffff8000004f8ee4>] kernel_init+0x10/0xd4 Code: 8b000263 92628479 d1000720 eb01001f (f9400340) ---[ end trace 3bc230562e926fa4 ]--- Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Jungseok Lee <jungseoklee85@gmail.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-25arm, arm64: KVM: handle potential incoherency of readonly memslotsArd Biesheuvel
Readonly memslots are often used to implement emulation of ROMs and NOR flashes, in which case the guest may legally map these regions as uncached. To deal with the incoherency associated with uncached guest mappings, treat all readonly memslots as incoherent, and ensure that pages that belong to regions tagged as such are flushed to DRAM before being passed to the guest. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2014-11-25arm, arm64: KVM: allow forced dcache flush on page faultsLaszlo Ersek
To allow handling of incoherent memslots in a subsequent patch, this patch adds a paramater 'ipa_uncached' to cache_coherent_guest_page() so that we can instruct it to flush the page's contents to DRAM even if the guest has caching globally enabled. Signed-off-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2014-11-25arm/arm64: kvm: drop inappropriate use of kvm_is_mmio_pfn()Ard Biesheuvel
Instead of using kvm_is_mmio_pfn() to decide whether a host region should be stage 2 mapped with device attributes, add a new static function kvm_is_device_pfn() that disregards RAM pages with the reserved bit set, as those should usually not be mapped as device memory. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2014-10-15arm: kvm: STRICT_MM_TYPECHECKS fix for user_mem_abortSteve Capper
Commit: b886576 ARM: KVM: user_mem_abort: support stage 2 MMIO page mapping introduced some code in user_mem_abort that failed to compile if STRICT_MM_TYPECHECKS was enabled. This patch fixes up the failing comparison. Signed-off-by: Steve Capper <steve.capper@linaro.org> Reviewed-by: Kim Phillips <kim.phillips@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-14arm/arm64: KVM: Ensure memslots are within KVM_PHYS_SIZEChristoffer Dall
When creating or moving a memslot, make sure the IPA space is within the addressable range of the guest. Otherwise, user space can create too large a memslot and KVM would try to access potentially unallocated page table entries when inserting entries in the Stage-2 page tables. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-14arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2Christoffer Dall
This patch adds the necessary support for all host kernel PGSIZE and VA_SPACE configuration options for both EL2 and the Stage-2 page tables. However, for 40bit and 42bit PARange systems, the architecture mandates that VTCR_EL2.SL0 is maximum 1, resulting in fewer levels of stage-2 pagge tables than levels of host kernel page tables. At the same time, systems with a PARange > 42bit, we limit the IPA range by always setting VTCR_EL2.T0SZ to 24. To solve the situation with different levels of page tables for Stage-2 translation than the host kernel page tables, we allocate a dummy PGD with pointers to our actual inital level Stage-2 page table, in order for us to reuse the kernel pgtable manipulation primitives. Reproducing all these in KVM does not look pretty and unnecessarily complicates the 32-bit side. Systems with a PARange < 40bits are not yet supported. [ I have reworked this patch from its original form submitted by Jungseok to take the architecture constraints into consideration. There were too many changes from the original patch for me to preserve the authorship. Thanks to Catalin Marinas for his help in figuring out a good solution to this challenge. I have also fixed various bugs and missing error code handling from the original patch. - Christoffer ] Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Jungseok Lee <jungseoklee85@gmail.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-13arm/arm64: KVM: map MMIO regions at creation timeArd Biesheuvel
There is really no point in faulting in memory regions page by page if they are not backed by demand paged system RAM but by a linear passthrough mapping of a host MMIO region. So instead, detect such regions at setup time and install the mappings for the backing all at once. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-10arm/arm64: KVM: add 'writable' parameter to kvm_phys_addr_ioremapArd Biesheuvel
Add support for read-only MMIO passthrough mappings by adding a 'writable' parameter to kvm_phys_addr_ioremap. For the moment, mappings will be read-write even if 'writable' is false, but once the definition of PAGE_S2_DEVICE gets changed, those mappings will be created read-only. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-10arm/arm64: KVM: fix potential NULL dereference in user_mem_abort()Ard Biesheuvel
Handle the potential NULL return value of find_vma_intersection() before dereferencing it. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-10arm/arm64: KVM: use __GFP_ZERO not memset() to get zeroed pagesArd Biesheuvel
Pass __GFP_ZERO to __get_free_pages() instead of calling memset() explicitly. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-09-26arm/arm64: KVM: Report correct FSC for unsupported fault typesChristoffer Dall
When we catch something that's not a permission fault or a translation fault, we log the unsupported FSC in the kernel log, but we were masking off the bottom bits of the FSC which was not very helpful. Also correctly report the FSC for data and instruction faults rather than telling people it was a DFCS, which doesn't exist in the ARM ARM. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-09-11ARM/arm64: KVM: fix use of WnR bit in kvm_is_write_fault()Ard Biesheuvel
The ISS encoding for an exception from a Data Abort has a WnR bit[6] that indicates whether the Data Abort was caused by a read or a write instruction. While there are several fields in the encoding that are only valid if the ISV bit[24] is set, WnR is not one of them, so we can read it unconditionally. Instead of fixing both implementations of kvm_is_write_fault() in place, reimplement it just once using kvm_vcpu_dabt_iswrite(), which already does the right thing with respect to the WnR bit. Also fix up the callers to pass 'vcpu' Acked-by: Laszlo Ersek <lersek@redhat.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2014-08-27arm/arm64: KVM: Support KVM_CAP_READONLY_MEMChristoffer Dall
When userspace loads code and data in a read-only memory regions, KVM needs to be able to handle this on arm and arm64. Specifically this is used when running code directly from a read-only flash device; the common scenario is a UEFI blob loaded with the -bios option in QEMU. Note that the MMIO exit on writes to a read-only memory is ABI and can be used to emulate block-erase style flash devices. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-07-11ARM: KVM: user_mem_abort: support stage 2 MMIO page mappingKim Phillips
A userspace process can map device MMIO memory via VFIO or /dev/mem, e.g., for platform device passthrough support in QEMU. During early development, we found the PAGE_S2 memory type being used for MMIO mappings. This patch corrects that by using the more strongly ordered memory type for device MMIO mappings: PAGE_S2_DEVICE. Signed-off-by: Kim Phillips <kim.phillips@linaro.org> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2014-07-11ARM: KVM: Unmap IPA on memslot delete/moveEric Auger
Currently when a KVM region is deleted or moved after KVM_SET_USER_MEMORY_REGION ioctl, the corresponding intermediate physical memory is not unmapped. This patch corrects this and unmaps the region's IPA range in kvm_arch_commit_memory_region using unmap_stage2_range. Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-07-11arm/arm64: KVM: Fix and refactor unmap_rangeChristoffer Dall
unmap_range() was utterly broken, to quote Marc, and broke in all sorts of situations. It was also quite complicated to follow and didn't follow the usual scheme of having a separate iterating function for each level of page tables. Address this by refactoring the code and introduce a pgd_clear() function. Reviewed-by: Jungseok Lee <jays.lee@samsung.com> Reviewed-by: Mario Smarduch <m.smarduch@samsung.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-04-28arm: KVM: fix possible misalignment of PGDs and bounce pageMark Salter
The kvm/mmu code shared by arm and arm64 uses kalloc() to allocate a bounce page (if hypervisor init code crosses page boundary) and hypervisor PGDs. The problem is that kalloc() does not guarantee the proper alignment. In the case of the bounce page, the page sized buffer allocated may also cross a page boundary negating the purpose and leading to a hang during kvm initialization. Likewise the PGDs allocated may not meet the minimum alignment requirements of the underlying MMU. This patch uses __get_free_page() to guarantee the worst case alignment needs of the bounce page and PGDs on both arm and arm64. Cc: <stable@vger.kernel.org> # 3.10+ Signed-off-by: Mark Salter <msalter@redhat.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-03-03ARM: KVM: fix warning in mmu.cMarc Zyngier
Compiling with THP enabled leads to the following warning: arch/arm/kvm/mmu.c: In function ‘unmap_range’: arch/arm/kvm/mmu.c:177:39: warning: ‘pte’ may be used uninitialized in this function [-Wmaybe-uninitialized] if (kvm_pmd_huge(*pmd) || page_empty(pte)) { ^ Code inspection reveals that these two cases are mutually exclusive, so GCC is a bit overzealous here. Silence it anyway by initializing pte to NULL and testing it later on. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-03-03arm64: KVM: flush VM pages before letting the guest enable cachesMarc Zyngier
When the guest runs with caches disabled (like in an early boot sequence, for example), all the writes are diectly going to RAM, bypassing the caches altogether. Once the MMU and caches are enabled, whatever sits in the cache becomes suddenly visible, which isn't what the guest expects. A way to avoid this potential disaster is to invalidate the cache when the MMU is being turned on. For this, we hook into the SCTLR_EL1 trapping code, and scan the stage-2 page tables, invalidating the pages/sections that have already been mapped in. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-03-03ARM: KVM: introduce kvm_p*d_addr_endMarc Zyngier
The use of p*d_addr_end with stage-2 translation is slightly dodgy, as the IPA is 40bits, while all the p*d_addr_end helpers are taking an unsigned long (arm64 is fine with that as unligned long is 64bit). The fix is to introduce 64bit clean versions of the same helpers, and use them in the stage-2 page table code. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-03-03arm64: KVM: force cache clean on page fault when caches are offMarc Zyngier
In order for the guest with caches off to observe data written contained in a given page, we need to make sure that page is committed to memory, and not just hanging in the cache (as guest accesses are completely bypassing the cache until it decides to enable it). For this purpose, hook into the coherent_icache_guest_page function and flush the region if the guest SCTLR_EL1 register doesn't show the MMU and caches as being enabled. The function also get renamed to coherent_cache_guest_page. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-01-08arm/arm64: KVM: relax the requirements of VMA alignment for THPMarc Zyngier
The THP code in KVM/ARM is a bit restrictive in not allowing a THP to be used if the VMA is not 2MB aligned. Actually, it is not so much the VMA that matters, but the associated memslot: A process can perfectly mmap a region with no particular alignment restriction, and then pass a 2MB aligned address to KVM. In this case, KVM will only use this 2MB aligned region, and will ignore the range between vma->vm_start and memslot->userspace_addr. It can also choose to place this memslot at whatever alignment it wants in the IPA space. In the end, what matters is the relative alignment of the user space and IPA mappings with respect to a 2M page. They absolutely must be the same if you want to use THP. Cc: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2013-12-11arm/arm64: kvm: Use virt_to_idmap instead of virt_to_phys for idmap mappingsSantosh Shilimkar
KVM initialisation fails on architectures implementing virt_to_idmap() because virt_to_phys() on such architectures won't fetch you the correct idmap page. So update the KVM ARM code to use the virt_to_idmap() to fix the issue. Since the KVM code is shared between arm and arm64, we create kvm_virt_to_phys() and handle the redirection in respective headers. Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2013-11-19Merge tag 'kvm-arm-fixes-3.13-1' of ↵Gleb Natapov
git://git.linaro.org/people/cdall/linux-kvm-arm into next Fix percpu vmalloc allocations
2013-11-16arm/arm64: KVM: Fix hyp mappings of vmalloc regionsChristoffer Dall
Using virt_to_phys on percpu mappings is horribly wrong as it may be backed by vmalloc. Introduce kvm_kaddr_to_phys which translates both types of valid kernel addresses to the corresponding physical address. At the same time resolves a typing issue where we were storing the physical address as a 32 bit unsigned long (on arm), truncating the physical address for addresses above the 4GB limit. This caused breakage on Keystone. Cc: <stable@vger.kernel.org> [3.10+] Reported-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2013-10-17KVM: ARM: Transparent huge page (THP) supportChristoffer Dall
Support transparent huge pages in KVM/ARM and KVM/ARM64. The transparent_hugepage_adjust is not very pretty, but this is also how it's solved on x86 and seems to be simply an artifact on how THPs behave. This should eventually be shared across architectures if possible, but that can always be changed down the road. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2013-10-17KVM: ARM: Support hugetlbfs backed huge pagesChristoffer Dall
Support huge pages in KVM/ARM and KVM/ARM64. The pud_huge checking on the unmap path may feel a bit silly as the pud_huge check is always defined to false, but the compiler should be smart about this. Note: This deals only with VMAs marked as huge which are allocated by users through hugetlbfs only. Transparent huge pages can only be detected by looking at the underlying pages (or the page tables themselves) and this patch so far simply maps these on a page-by-page level in the Stage-2 page tables. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2013-09-05Merge branches 'debug-choice', 'devel-stable' and 'misc' into for-linusRussell King
2013-08-13ARM: 7808/1: KVM: mm: Get rid of L_PTE_USER ref from PAGE_S2_DEVICEChristoffer Dall
THe L_PTE_USER actually has nothing to do with stage 2 mappings and the L_PTE_S2_RDWR value sets the readable bit, which was what L_PTE_USER was used for before proper handling of stage 2 memory defines. Changelog: [v3]: Drop call to kvm_set_s2pte_writable in mmu.c [v2]: Change default mappings to be r/w instead of r/o, as per Marc Zyngier's suggestion. Cc: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-08-07arm64: KVM: fix 2-level page tables unmappingMarc Zyngier
When using 64kB pages, we only have two levels of page tables, meaning that PGD, PUD and PMD are fused. In this case, trying to refcount PUDs and PMDs independently is a a complete disaster, as they are the same. We manage to get it right for the allocation (stage2_set_pte uses {pmd,pud}_none), but the unmapping path clears both pud and pmd refcounts, which fails spectacularly with 2-level page tables. The fix is to avoid calling clear_pud_entry when both the pmd and pud pages are empty. For this, and instead of introducing another pud_empty function, consolidate both pte_empty and pmd_empty into page_empty (the code is actually identical) and use that to also test the validity of the pud. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2013-08-07ARM: KVM: Fix unaligned unmap_range leakChristoffer Dall
The unmap_range function did not properly cover the case when the start address was not aligned to PMD_SIZE or PUD_SIZE and an entire pte table or pmd table was cleared, causing us to leak memory when incrementing the addr. The fix is to always move onto the next page table entry boundary instead of adding the full size of the VA range covered by the corresponding table level entry. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2013-07-03Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "On the x86 side, there are some optimizations and documentation updates. The big ARM/KVM change for 3.11, support for AArch64, will come through Catalin Marinas's tree. s390 and PPC have misc cleanups and bugfixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (87 commits) KVM: PPC: Ignore PIR writes KVM: PPC: Book3S PR: Invalidate SLB entries properly KVM: PPC: Book3S PR: Allow guest to use 1TB segments KVM: PPC: Book3S PR: Don't keep scanning HPTEG after we find a match KVM: PPC: Book3S PR: Fix invalidation of SLB entry 0 on guest entry KVM: PPC: Book3S PR: Fix proto-VSID calculations KVM: PPC: Guard doorbell exception with CONFIG_PPC_DOORBELL KVM: Fix RTC interrupt coalescing tracking kvm: Add a tracepoint write_tsc_offset KVM: MMU: Inform users of mmio generation wraparound KVM: MMU: document fast invalidate all mmio sptes KVM: MMU: document fast invalidate all pages KVM: MMU: document fast page fault KVM: MMU: document mmio page fault KVM: MMU: document write_flooding_count KVM: MMU: document clear_spte_count KVM: MMU: drop kvm_mmu_zap_mmio_sptes KVM: MMU: init kvm generation close to mmio wrap-around value KVM: MMU: add tracepoint for check_mmio_spte KVM: MMU: fast invalidate all mmio sptes ...
2013-06-26ARM: KVM: get rid of S2_PGD_SIZEMarc Zyngier
S2_PGD_SIZE defines the number of pages used by a stage-2 PGD and is unused, except for a VM_BUG_ON check that missuses the define. As the check is very unlikely to ever triggered except in circumstances where KVM is the least of our worries, just kill both the define and the VM_BUG_ON check. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-06-03ARM: KVM: be more thorough when invalidating TLBsMarc Zyngier
The KVM/ARM MMU code doesn't take care of invalidating TLBs before freeing a {pte,pmd} table. This could cause problems if the page is reallocated and then speculated into by another CPU. Reported-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-04-28ARM: KVM: perform HYP initilization for hotplugged CPUsMarc Zyngier
Now that we have the necessary infrastructure to boot a hotplugged CPU at any point in time, wire a CPU notifier that will perform the HYP init for the incoming CPU. Note that this depends on the platform code and/or firmware to boot the incoming CPU with HYP mode enabled and return to the kernel by following the normal boot path (HYP stub installed). Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-04-28ARM: KVM: switch to a dual-step HYP init codeMarc Zyngier
Our HYP init code suffers from two major design issues: - it cannot support CPU hotplug, as we tear down the idmap very early - it cannot perform a TLB invalidation when switching from init to runtime mappings, as pages are manipulated from PL1 exclusively The hotplug problem mandates that we keep two sets of page tables (boot and runtime). The TLB problem mandates that we're able to transition from one PGD to another while in HYP, invalidating the TLBs in the process. To be able to do this, we need to share a page between the two page tables. A page that will have the same VA in both configurations. All we need is a VA that has the following properties: - This VA can't be used to represent a kernel mapping. - This VA will not conflict with the physical address of the kernel text The vectors page seems to satisfy this requirement: - The kernel never maps anything else there - The kernel text being copied at the beginning of the physical memory, it is unlikely to use the last 64kB (I doubt we'll ever support KVM on a system with something like 4MB of RAM, but patches are very welcome). Let's call this VA the trampoline VA. Now, we map our init page at 3 locations: - idmap in the boot pgd - trampoline VA in the boot pgd - trampoline VA in the runtime pgd The init scenario is now the following: - We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd, runtime stack, runtime vectors - Enable the MMU with the boot pgd - Jump to a target into the trampoline page (remember, this is the same physical page!) - Now switch to the runtime pgd (same VA, and still the same physical page!) - Invalidate TLBs - Set stack and vectors - Profit! (or eret, if you only care about the code). Note that we keep the boot mapping permanently (it is not strictly an idmap anymore) to allow for CPU hotplug in later patches. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-04-28ARM: KVM: rework HYP page table freeingMarc Zyngier
There is no point in freeing HYP page tables differently from Stage-2. They now have the same requirements, and should be dealt with the same way. Promote unmap_stage2_range to be The One True Way, and get rid of a number of nasty bugs in the process (good thing we never actually called free_hyp_pmds before...). Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-04-28ARM: KVM: move to a KVM provided HYP idmapMarc Zyngier
After the HYP page table rework, it is pretty easy to let the KVM code provide its own idmap, rather than expecting the kernel to provide it. It takes actually less code to do so. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-04-28ARM: KVM: fix HYP mapping limitations around zeroMarc Zyngier
The current code for creating HYP mapping doesn't like to wrap around zero, which prevents from mapping anything into the last page of the virtual address space. It doesn't take much effort to remove this limitation, making the code more consistent with the rest of the kernel in the process. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-04-28ARM: KVM: simplify HYP mapping populationMarc Zyngier
The way we populate HYP mappings is a bit convoluted, to say the least. Passing a pointer around to keep track of the current PFN is quite odd, and we end-up having two different PTE accessors for no good reason. Simplify the whole thing by unifying the two PTE accessors, passing a pgprot_t around, and moving the various validity checks to the upper layers. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-03-06ARM: KVM: sanitize freeing of HYP page tablesMarc Zyngier
Instead of trying to free everything from PAGE_OFFSET to the top of memory, use the virt_addr_valid macro to check the upper limit. Also do the same for the vmalloc region where the IO mappings are allocated. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-03-06ARM: KVM: change kvm_tlb_flush_vmid to kvm_tlb_flush_vmid_ipaMarc Zyngier
v8 is capable of invalidating Stage-2 by IPA, but v7 is not. Change kvm_tlb_flush_vmid() to take an IPA parameter, which is then ignored by the invalidation code (and nuke the whole TLB as it always did). This allows v8 to implement a more optimized strategy. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2013-03-06ARM: KVM: move include of asm/idmap.h to kvm_mmu.hMarc Zyngier
Since the arm64 code doesn't have a global asm/idmap.h file, move the inclusion to asm/kvm_mmu.h. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2013-03-06ARM: KVM: fix fault_ipa computingMarc Zyngier
The ARM ARM says that HPFAR reports bits [39:12] of the faulting IPA, and we need to complement it with the bottom 12 bits of the faulting VA. This is always 12 bits, irrespective of the page size. Makes it clearer in the code. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2013-03-06ARM: KVM: fix address validation for HYP mappingsMarc Zyngier
__create_hyp_mappings() performs some kind of address validation before creating the mapping, by verifying that the start address is above PAGE_OFFSET. This check is not completely correct for kernel memory (the upper boundary has to be checked as well so we do not end up with highmem pages), and wrong for IO mappings (the mapping must exist in the vmalloc region). Fix this by using the proper predicates (virt_addr_valid and is_vmalloc_addr), which also work correctly on ARM64 (where the vmalloc region is below PAGE_OFFSET). Also change the BUG_ON() into a less agressive error return. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2013-03-06ARM: KVM: allow HYP mappings to be at an offset from kernel mappingsMarc Zyngier
arm64 cannot represent the kernel VAs in HYP mode, because of the lack of TTBR1 at EL2. A way to cope with this situation is to have HYP VAs to be an offset from the kernel VAs. Introduce macros to convert a kernel VA to a HYP VA, make the HYP mapping functions use these conversion macros. Also change the documentation to reflect the existence of the offset. On ARM, where we can have an identity mapping between kernel and HYP, the macros are without any effect. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>