summaryrefslogtreecommitdiff
path: root/arch/s390/mm/pgtable.c
AgeCommit message (Collapse)Author
2023-06-19s390: allow pte_offset_map_lock() to failHugh Dickins
In rare transient cases, not yet made possible, pte_offset_map() and pte_offset_map_lock() may not find a page table: handle appropriately. Add comment on mm's contract with s390 above __zap_zero_pages(), and fix old comment there: must be called after THP was disabled. Link: https://lkml.kernel.org/r/3ff29363-336a-9733-12a1-5c31a45c8aeb@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Chris Zankel <chris@zankel.net> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: John David Anglin <dave.anglin@bell.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-14s390/mm: add support for RDP (Reset DAT-Protection)Gerald Schaefer
RDP instruction allows to reset DAT-protection bit in a PTE, with less CPU synchronization overhead than IPTE instruction. In particular, IPTE can cause machine-wide synchronization overhead, and excessive IPTE usage can negatively impact machine performance. RDP can be used instead of IPTE, if the new PTE only differs in SW bits and _PAGE_PROTECT HW bit, for PTE protection changes from RO to RW. SW PTE bit changes are allowed, e.g. for dirty and young tracking, but none of the other HW-defined part of the PTE must change. This is because the architecture forbids such changes to an active and valid PTE, which is why invalidation with IPTE is always used first, before writing a new entry. The RDP optimization helps mainly for fault-driven SW dirty-bit tracking. Writable PTEs are initially always mapped with HW _PAGE_PROTECT bit set, to allow SW dirty-bit accounting on first write protection fault, where the DAT-protection would then be reset. The reset is now done with RDP instead of IPTE, if RDP instruction is available. RDP cannot always guarantee that the DAT-protection reset is propagated to all CPUs immediately. This means that spurious TLB protection faults on other CPUs can now occur. For this, common code provides a flush_tlb_fix_spurious_fault() handler, which will now be used to do a CPU-local TLB flush. However, this will clear the whole TLB of a CPU, and not just the affected entry. For more fine-grained flushing, by simply doing a (local) RDP again, flush_tlb_fix_spurious_fault() would need to also provide the PTE pointer. Note that spurious TLB protection faults cannot really be distinguished from racing pagetable updates, where another thread already installed the correct PTE. In such a case, the local TLB flush would be unnecessary overhead, but overall reduction of CPU synchronization overhead by not using IPTE is still expected to be beneficial. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-06-01s390/mm: use non-quiescing sske for KVM switch to keyed guestChristian Borntraeger
The switch to a keyed guest does not require a classic sske as the other guest CPUs are not accessing the key before the switch is complete. By using the NQ SSKE things are faster especially with multiple guests. Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> Suggested-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Link: https://lore.kernel.org/r/20220530092706.11637-3-borntraeger@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-03-01s390/mm,pgtable: don't use pte_val()/pXd_val() as lvalueHeiko Carstens
Convert pgtable code so pte_val()/pXd_val() aren't used as lvalue anymore. This allows in later step to convert pte_val()/pXd_val() to functions, which in turn makes it impossible to use these macros to modify page table entries like they have been used before. Therefore a construct like this: pte_val(*pte) = __pa(addr) | prot; which would directly write into a page table, isn't possible anymore with the last step of this series. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-01s390/mm: use set_pXd()/set_pte() helper functions everywhereHeiko Carstens
Use the new set_pXd()/set_pte() helper functions at all places where page table entries are modified. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-25s390/mm: optimize reset_guest_reference_bit()David Hildenbrand
We already optimize get_guest_storage_key() to assume that if we don't have a PTE table and don't have a huge page mapped that the storage key is 0. Similarly, optimize reset_guest_reference_bit() to simply do nothing if there is no PTE table and no huge page mapped. Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20210909162248.14969-10-david@redhat.com Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2021-10-25s390/mm: optimize set_guest_storage_key()David Hildenbrand
We already optimize get_guest_storage_key() to assume that if we don't have a PTE table and don't have a huge page mapped that the storage key is 0. Similarly, optimize set_guest_storage_key() to simply do nothing in case the key to set is 0. Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20210909162248.14969-9-david@redhat.com Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2021-10-25s390/mm: no need for pte_alloc_map_lock() if we know the pmd is presentDavid Hildenbrand
pte_map_lock() is sufficient. Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20210909162248.14969-8-david@redhat.com Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2021-10-25s390/mm: fix VMA and page table handling code in storage key handling functionsDavid Hildenbrand
There are multiple things broken about our storage key handling functions: 1. We should not walk/touch page tables outside of VMA boundaries when holding only the mmap sem in read mode. Evil user space can modify the VMA layout just before this function runs and e.g., trigger races with page table removal code since commit dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap"). gfn_to_hva() will only translate using KVM memory regions, but won't validate the VMA. 2. We should not allocate page tables outside of VMA boundaries: if evil user space decides to map hugetlbfs to these ranges, bad things will happen because we suddenly have PTE or PMD page tables where we shouldn't have them. 3. We don't handle large PUDs that might suddenly appeared inside our page table hierarchy. Don't manually allocate page tables, properly validate that we have VMA and bail out on pud_large(). All callers of page table handling functions, except get_guest_storage_key(), call fixup_user_fault() in case they receive an -EFAULT and retry; this will allocate the necessary page tables if required. To keep get_guest_storage_key() working as expected and not requiring kvm_s390_get_skeys() to call fixup_user_fault() distinguish between "there is simply no page table or huge page yet and the key is assumed to be 0" and "this is a fault to be reported". Although commit 637ff9efe5ea ("s390/mm: Add huge pmd storage key handling") introduced most of the affected code, it was actually already broken before when using get_locked_pte() without any VMA checks. Note: Ever since commit 637ff9efe5ea ("s390/mm: Add huge pmd storage key handling") we can no longer set a guest storage key (for example from QEMU during VM live migration) without actually resolving a fault. Although we would have created most page tables, we would choke on the !pmd_present(), requiring a call to fixup_user_fault(). I would have thought that this is problematic in combination with postcopy life migration ... but nobody noticed and this patch doesn't change the situation. So maybe it's just fine. Fixes: 9fcf93b5de06 ("KVM: S390: Create helper function get_guest_storage_key") Fixes: 24d5dd0208ed ("s390/kvm: Provide function for setting the guest storage key") Fixes: a7e19ab55ffd ("KVM: s390: handle missing storage-key facility") Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20210909162248.14969-5-david@redhat.com Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2021-10-25s390/mm: validate VMA in PGSTE manipulation functionsDavid Hildenbrand
We should not walk/touch page tables outside of VMA boundaries when holding only the mmap sem in read mode. Evil user space can modify the VMA layout just before this function runs and e.g., trigger races with page table removal code since commit dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap"). gfn_to_hva() will only translate using KVM memory regions, but won't validate the VMA. Further, we should not allocate page tables outside of VMA boundaries: if evil user space decides to map hugetlbfs to these ranges, bad things will happen because we suddenly have PTE or PMD page tables where we shouldn't have them. Similarly, we have to check if we suddenly find a hugetlbfs VMA, before calling get_locked_pte(). Fixes: 2d42f9477320 ("s390/kvm: Add PGSTE manipulation functions") Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20210909162248.14969-4-david@redhat.com Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2021-09-07s390/mm: fix kernel doc commentsHeiko Carstens
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-01mm: remove special swap entry functionsAlistair Popple
Patch series "Add support for SVM atomics in Nouveau", v11. Introduction ============ Some devices have features such as atomic PTE bits that can be used to implement atomic access to system memory. To support atomic operations to a shared virtual memory page such a device needs access to that page which is exclusive of the CPU. This series introduces a mechanism to temporarily unmap pages granting exclusive access to a device. These changes are required to support OpenCL atomic operations in Nouveau to shared virtual memory (SVM) regions allocated with the CL_MEM_SVM_ATOMICS clSVMAlloc flag. A more complete description of the OpenCL SVM feature is available at https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/ OpenCL_API.html#_shared_virtual_memory . Implementation ============== Exclusive device access is implemented by adding a new swap entry type (SWAP_DEVICE_EXCLUSIVE) which is similar to a migration entry. The main difference is that on fault the original entry is immediately restored by the fault handler instead of waiting. Restoring the entry triggers calls to MMU notifers which allows a device driver to revoke the atomic access permission from the GPU prior to the CPU finalising the entry. Patches ======= Patches 1 & 2 refactor existing migration and device private entry functions. Patches 3 & 4 rework try_to_unmap_one() by splitting out unrelated functionality into separate functions - try_to_migrate_one() and try_to_munlock_one(). Patch 5 renames some existing code but does not introduce functionality. Patch 6 is a small clean-up to swap entry handling in copy_pte_range(). Patch 7 contains the bulk of the implementation for device exclusive memory. Patch 8 contains some additions to the HMM selftests to ensure everything works as expected. Patch 9 is a cleanup for the Nouveau SVM implementation. Patch 10 contains the implementation of atomic access for the Nouveau driver. Testing ======= This has been tested with upstream Mesa 21.1.0 and a simple OpenCL program which checks that GPU atomic accesses to system memory are atomic. Without this series the test fails as there is no way of write-protecting the page mapping which results in the device clobbering CPU writes. For reference the test is available at https://ozlabs.org/~apopple/opencl_svm_atomics/ Further testing has been performed by adding support for testing exclusive access to the hmm-tests kselftests. This patch (of 10): Remove multiple similar inline functions for dealing with different types of special swap entries. Both migration and device private swap entries use the swap offset to store a pfn. Instead of multiple inline functions to obtain a struct page for each swap entry type use a common function pfn_swap_entry_to_page(). Also open-code the various entry_to_pfn() functions as this results is shorter code that is easier to understand. Link: https://lkml.kernel.org/r/20210616105937.23201-1-apopple@nvidia.com Link: https://lkml.kernel.org/r/20210616105937.23201-2-apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-14s390/pci: Implement ioremap_wc/prot() with MIONiklas Schnelle
With our current support for the new MIO PCI instructions, write combining/write back MMIO memory can be obtained via the pci_iomap_wc() and pci_iomap_wc_range() functions. This is achieved by using the write back address for a specific bar as provided in clp_store_query_pci_fn() These functions are however not widely used and instead drivers often rely on ioremap_wc() and ioremap_prot(), which on other platforms enable write combining using a PTE flag set through the pgrprot value. While we do not have a write combining flag in the low order flag bits of the PTE like x86_64 does, with MIO support, there is a write back bit in the physical address (bit 1 on z15) and thus also the PTE. Which bit is used to toggle write back and whether it is available at all, is however not fixed in the architecture. Instead we get this information from the CLP Store Logical Processor Characteristics for PCI command. When the write back bit is not provided we fall back to the existing behavior. Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-08-07mm: remove unneeded includes of <asm/pgalloc.h>Mike Rapoport
Patch series "mm: cleanup usage of <asm/pgalloc.h>" Most architectures have very similar versions of pXd_alloc_one() and pXd_free_one() for intermediate levels of page table. These patches add generic versions of these functions in <asm-generic/pgalloc.h> and enable use of the generic functions where appropriate. In addition, functions declared and defined in <asm/pgalloc.h> headers are used mostly by core mm and early mm initialization in arch and there is no actual reason to have the <asm/pgalloc.h> included all over the place. The first patch in this series removes unneeded includes of <asm/pgalloc.h> In the end it didn't work out as neatly as I hoped and moving pXd_alloc_track() definitions to <asm-generic/pgalloc.h> would require unnecessary changes to arches that have custom page table allocations, so I've decided to move lib/ioremap.c to mm/ and make pgalloc-track.h local to mm/. This patch (of 8): In most cases <asm/pgalloc.h> header is required only for allocations of page table memory. Most of the .c files that include that header do not use symbols declared in <asm/pgalloc.h> and do not require that header. As for the other header files that used to include <asm/pgalloc.h>, it is possible to move that include into the .c file that actually uses symbols from <asm/pgalloc.h> and drop the include from the header file. The process was somewhat automated using sed -i -E '/[<"]asm\/pgalloc\.h/d' \ $(grep -L -w -f /tmp/xx \ $(git grep -E -l '[<"]asm/pgalloc\.h')) where /tmp/xx contains all the symbols defined in arch/*/include/asm/pgalloc.h. [rppt@linux.ibm.com: fix powerpc warning] Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Pekka Enberg <penberg@kernel.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Joerg Roedel <joro@8bytes.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Cc: Stafford Horne <shorne@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Joerg Roedel <jroedel@suse.de> Cc: Matthew Wilcox <willy@infradead.org> Link: http://lkml.kernel.org/r/20200627143453.31835-1-rppt@kernel.org Link: http://lkml.kernel.org/r/20200627143453.31835-2-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09mm: don't include asm/pgtable.h if linux/mm.h is already includedMike Rapoport
Patch series "mm: consolidate definitions of page table accessors", v2. The low level page table accessors (pXY_index(), pXY_offset()) are duplicated across all architectures and sometimes more than once. For instance, we have 31 definition of pgd_offset() for 25 supported architectures. Most of these definitions are actually identical and typically it boils down to, e.g. static inline unsigned long pmd_index(unsigned long address) { return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1); } static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) { return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address); } These definitions can be shared among 90% of the arches provided XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined. For architectures that really need a custom version there is always possibility to override the generic version with the usual ifdefs magic. These patches introduce include/linux/pgtable.h that replaces include/asm-generic/pgtable.h and add the definitions of the page table accessors to the new header. This patch (of 12): The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the functions involving page table manipulations, e.g. pte_alloc() and pmd_alloc(). So, there is no point to explicitly include <asm/pgtable.h> in the files that include <linux/mm.h>. The include statements in such cases are remove with a simple loop: for f in $(git grep -l "include <linux/mm.h>") ; do sed -i -e '/include <asm\/pgtable.h>/ d' $f done Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-10s390/mm: silence compiler warning when compiling without CONFIG_PGSTEThomas Huth
If CONFIG_PGSTE is not set (e.g. when compiling without KVM), GCC complains: CC arch/s390/mm/pgtable.o arch/s390/mm/pgtable.c:413:15: warning: ‘pmd_alloc_map’ defined but not used [-Wunused-function] static pmd_t *pmd_alloc_map(struct mm_struct *mm, unsigned long addr) ^~~~~~~~~~~~~ Wrap the function with "#ifdef CONFIG_PGSTE" to silence the warning. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-03-05mm: update ptep_modify_prot_commit to take old pte value as argAneesh Kumar K.V
Architectures like ppc64 require to do a conditional tlb flush based on the old and new value of pte. Enable that by passing old pte value as the arg. Link: http://lkml.kernel.org/r/20190116085035.29729-3-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm: update ptep_modify_prot_start/commit to take vm_area_struct as argAneesh Kumar K.V
Patch series "NestMMU pte upgrade workaround for mprotect", v5. We can upgrade pte access (R -> RW transition) via mprotect. We need to make sure we follow the recommended pte update sequence as outlined in commit bd5050e38aec ("powerpc/mm/radix: Change pte relax sequence to handle nest MMU hang") for such updates. This patch series does that. This patch (of 5): Some architectures may want to call flush_tlb_range from these helpers. Link: http://lkml.kernel.org/r/20190116085035.29729-2-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-18s390: remove the ptep_modify_prot_{start,commit} exportsChristoph Hellwig
These two functions are only used by core MM code, so no need to export them. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-07-30s390/mm: Add huge page gmap linking supportJanosch Frank
Let's allow huge pmd linking when enabled through the KVM_CAP_S390_HPAGE_1M capability. Also we can now restrict gmap invalidation and notification to the cases where the capability has been activated and save some cycles when that's not the case. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com>
2018-07-30s390/mm: Add huge pmd storage key handlingJanosch Frank
Storage keys for guests with huge page mappings have to be managed in hardware. There are no PGSTEs for PMDs that we could use to retain the guests's logical view of the key. Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com>
2018-07-30s390/mm: Add huge page dirty sync supportJanosch Frank
To do dirty loging with huge pages, we protect huge pmds in the gmap. When they are written to, we unprotect them and mark them dirty. We introduce the function gmap_test_and_clear_dirty_pmd which handles dirty sync for huge pages. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com>
2018-07-30s390/mm: Add gmap pmd invalidation and clearingJanosch Frank
If the host invalidates a pmd, we also have to invalidate the corresponding gmap pmds, as well as flush them from the TLB. This is necessary, as we don't share the pmd tables between host and guest as we do with ptes. The clearing part of these three new functions sets a guest pmd entry to _SEGMENT_ENTRY_EMPTY, so the guest will fault on it and we will re-link it. Flushing the gmap is not necessary in the host's lazy local and csp cases. Both purge the TLB completely. Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com> Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: David Hildenbrand <david@redhat.com>
2018-05-17KVM: s390: Add storage key facility interpretation controlJanosch Frank
Up to now we always expected to have the storage key facility available for our (non-VSIE) KVM guests. For huge page support, we need to be able to disable it, so let's introduce that now. We add the use_skf variable to manage KVM storage key facility usage. Also we rename use_skey in the mm context struct to uses_skeys to make it more clear that it is an indication that the vm actively uses storage keys. Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com> Reviewed-by: Farhan Ali <alifm@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2017-11-24s390: mm: add SPDX identifiers to the remaining filesGreg Kroah-Hartman
It's good to have SPDX identifiers in all files to make it easier to audit the kernel tree for correct licenses. Update the arch/s390/mm/ files with the correct SPDX license identifier based on the license text in the file itself. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This work is based on a script and data from Thomas Gleixner, Philippe Ombredanne, and Kate Stewart. Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-07Merge tag 'kvm-s390-next-4.14-2' of ↵Radim Krčmář
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux KVM: s390: Fixes and features for 4.14 - merge of topic branch tlb-flushing from the s390 tree to get the no-dat base features - merge of kvm/master to avoid conflicts with additional sthyi fixes - wire up the no-dat enhancements in KVM - multiple epoch facility (z14 feature) - Configuration z/Architecture Mode - more sthyi fixes - gdb server range checking fix - small code cleanups
2017-08-29KVM: s390: expose no-DAT to guest and migration supportClaudio Imbrenda
The STFLE bit 147 indicates whether the ESSA no-DAT operation code is valid, the bit is not normally provided to the host; the host is instead provided with an SCLP bit that indicates whether guests can support the feature. This patch: * enables the STFLE bit in the guest if the corresponding SCLP bit is present in the host. * adds support for migrating the no-DAT bit in the PGSTEs * fixes the software interpretation of the ESSA instruction that is used when migrating, both for the new operation code and for the old "set stable", as per specifications. Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2017-07-25s390/mm,kvm: use nodat PGSTE tag to optimize TLB flushingMartin Schwidefsky
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-07-25s390/mm: add guest ASCE TLB flush optimizationMartin Schwidefsky
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-07-25s390/mm: add no-dat TLB flush optimizationMartin Schwidefsky
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-07-13s390/mm: set change and reference bit on lazy key enablementChristian Borntraeger
When we enable storage keys for a guest lazily, we reset the ACC and F values. That is correct assuming that these are 0 on a clear reset and the guest obviously has not used any key setting instruction. We also zero out the change and reference bit. This is not correct as the architecture prefers over-indication instead of under-indication for the keyless->keyed transition. This patch fixes the behaviour and always sets guest change and guest reference for all guest storage keys on the keyless -> keyed switch. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-12s390/mm: implement 5 level pages tablesMartin Schwidefsky
Add the logic to upgrade the page table for a 64-bit process to five levels. This increases the TASK_SIZE from 8PB to 16EB-4K. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-04-20s390/kvm: Add PGSTE manipulation functionsClaudio Imbrenda
Add PGSTE manipulation functions: * set_pgste_bits sets specific bits in a PGSTE * get_pgste returns the whole PGSTE * pgste_perform_essa manipulates a PGSTE to set specific storage states * ESSA_[SG]ET_* macros used to indicate the action for manipulate_pgste Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Reviewed-by: Janosch Frank <frankja@de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-03-02KVM: s390: Fix guest migration for huge guests resulting in panicJanosch Frank
While we can technically not run huge page guests right now, we can setup a guest with huge pages. Trying to migrate it will trigger a VM_BUG_ON and, if the kernel is not configured to panic on a BUG, it will happily try to work on non-existing page table entries. With this patch, we always return "dirty" if we encounter a large page when migrating. This at least fixes the immediate problem until we have proper handling for both kind of pages. Fixes: 15f36eb ("KVM: s390: Add proper dirty bitmap support to S390 kvm.") Cc: <stable@vger.kernel.org> # 3.16+ Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-02-22Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "4.11 is going to be a relatively large release for KVM, with a little over 200 commits and noteworthy changes for most architectures. ARM: - GICv3 save/restore - cache flushing fixes - working MSI injection for GICv3 ITS - physical timer emulation MIPS: - various improvements under the hood - support for SMP guests - a large rewrite of MMU emulation. KVM MIPS can now use MMU notifiers to support copy-on-write, KSM, idle page tracking, swapping, ballooning and everything else. KVM_CAP_READONLY_MEM is also supported, so that writes to some memory regions can be treated as MMIO. The new MMU also paves the way for hardware virtualization support. PPC: - support for POWER9 using the radix-tree MMU for host and guest - resizable hashed page table - bugfixes. s390: - expose more features to the guest - more SIMD extensions - instruction execution protection - ESOP2 x86: - improved hashing in the MMU - faster PageLRU tracking for Intel CPUs without EPT A/D bits - some refactoring of nested VMX entry/exit code, preparing for live migration support of nested hypervisors - expose yet another AVX512 CPUID bit - host-to-guest PTP support - refactoring of interrupt injection, with some optimizations thrown in and some duct tape removed. - remove lazy FPU handling - optimizations of user-mode exits - optimizations of vcpu_is_preempted() for KVM guests generic: - alternative signaling mechanism that doesn't pound on tsk->sighand->siglock" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (195 commits) x86/kvm: Provide optimized version of vcpu_is_preempted() for x86-64 x86/paravirt: Change vcp_is_preempted() arg type to long KVM: VMX: use correct vmcs_read/write for guest segment selector/base x86/kvm/vmx: Defer TR reload after VM exit x86/asm/64: Drop __cacheline_aligned from struct x86_hw_tss x86/kvm/vmx: Simplify segment_base() x86/kvm/vmx: Get rid of segment_base() on 64-bit kernels x86/kvm/vmx: Don't fetch the TSS base from the GDT x86/asm: Define the kernel TSS limit in a macro kvm: fix page struct leak in handle_vmon KVM: PPC: Book3S HV: Disable HPT resizing on POWER9 for now KVM: Return an error code only as a constant in kvm_get_dirty_log() KVM: Return an error code only as a constant in kvm_get_dirty_log_protect() KVM: Return directly after a failed copy_from_user() in kvm_vm_compat_ioctl() KVM: x86: remove code for lazy FPU handling KVM: race-free exit from KVM_RUN without POSIX signals KVM: PPC: Book3S HV: Turn "KVM guest htab" message into a debug message KVM: PPC: Book3S PR: Ratelimit copy data failure error messages KVM: Support vCPU-based gfn->hva cache KVM: use separate generations for each address space ...
2017-02-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: - New entropy generation for the pseudo random number generator. - Early boot printk output via sclp to help debug crashes on boot. This needs to be enabled with a kernel parameter. - Add proper no-execute support with a bit in the page table entry. - Bug fixes and cleanups. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (65 commits) s390/syscall: fix single stepped system calls s390/zcrypt: make ap_bus explicitly non-modular s390/zcrypt: Removed unneeded debug feature directory creation. s390: add missing "do {} while (0)" loop constructs to multiline macros s390/mm: add cond_resched call to kernel page table dumper s390: get rid of MACHINE_HAS_PFMF and MACHINE_HAS_HPAGE s390/mm: make memory_block_size_bytes available for !MEMORY_HOTPLUG s390: replace ACCESS_ONCE with READ_ONCE s390: Audit and remove any remaining unnecessary uses of module.h s390: mm: Audit and remove any unnecessary uses of module.h s390: kernel: Audit and remove any unnecessary uses of module.h s390/kdump: Use "LINUX" ELF note name instead of "CORE" s390: add no-execute support s390: report new vector facilities s390: use correct input data address for setup_randomness s390/sclp: get rid of common response code handling s390/sclp: don't add new lines to each printed string s390/sclp: make early sclp code readable s390/sclp: disable early sclp code as soon as the base sclp driver is active s390/sclp: move early printk code to drivers ...
2017-02-08s390: add no-execute supportMartin Schwidefsky
Bit 0x100 of a page table, segment table of region table entry can be used to disallow code execution for the virtual addresses associated with the entry. There is one tricky bit, the system call to return from a signal is part of the signal frame written to the user stack. With a non-executable stack this would stop working. To avoid breaking things the protection fault handler checks the opcode that caused the fault for 0x0a77 (sys_sigreturn) and 0x0aad (sys_rt_sigreturn) and injects a system call. This is preferable to the alternative solution with a stub function in the vdso because it works for vdso=off and statically linked binaries as well. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-01-30KVM: s390: Fix RRBE return code not being CCJanosch Frank
reset_guest_reference_bit needs to return the CC, so we can set it in the guest PSW when emulating RRBE. Right now it only returns 0. Let's fix that. Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2017-01-24s390/mm: Fix cmma unused transfer from pgste into pteChristian Borntraeger
The last pgtable rework silently disabled the CMMA unused state by setting a local pte variable (a parameter) instead of propagating it back into the caller. Fix it. Fixes: ebde765c0e85 ("s390/mm: uninline ptep_xxx functions from pgtable.h") Cc: stable@vger.kernel.org # v4.6+ Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-24s390/mm: merge local / non-local IDTE helperMartin Schwidefsky
Merge the __p[m|u]xdp_idte and __p[m|u]dp_idte_local functions into a single __p[m|u]dp_idte function with an additional parameter. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-24s390/mm: merge local / non-local IPTE helperMartin Schwidefsky
Merge the __ptep_ipte and __ptep_ipte_local functions into a single __ptep_ipte function with an additional parameter. The __pte_ipte_range function is still extra as the while loops makes it hard to merge. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: - ARM: GICv3 ITS emulation and various fixes. Removal of the old VGIC implementation. - s390: support for trapping software breakpoints, nested virtualization (vSIE), the STHYI opcode, initial extensions for CPU model support. - MIPS: support for MIPS64 hosts (32-bit guests only) and lots of cleanups, preliminary to this and the upcoming support for hardware virtualization extensions. - x86: support for execute-only mappings in nested EPT; reduced vmexit latency for TSC deadline timer (by about 30%) on Intel hosts; support for more than 255 vCPUs. - PPC: bugfixes. * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (302 commits) KVM: PPC: Introduce KVM_CAP_PPC_HTM MIPS: Select HAVE_KVM for MIPS64_R{2,6} MIPS: KVM: Reset CP0_PageMask during host TLB flush MIPS: KVM: Fix ptr->int cast via KVM_GUEST_KSEGX() MIPS: KVM: Sign extend MFC0/RDHWR results MIPS: KVM: Fix 64-bit big endian dynamic translation MIPS: KVM: Fail if ebase doesn't fit in CP0_EBase MIPS: KVM: Use 64-bit CP0_EBase when appropriate MIPS: KVM: Set CP0_Status.KX on MIPS64 MIPS: KVM: Make entry code MIPS64 friendly MIPS: KVM: Use kmap instead of CKSEG0ADDR() MIPS: KVM: Use virt_to_phys() to get commpage PFN MIPS: Fix definition of KSEGX() for 64-bit KVM: VMX: Add VMCS to CPU's loaded VMCSs before VMPTRLD kvm: x86: nVMX: maintain internal copy of current VMCS KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures KVM: arm64: vgic-its: Simplify MAPI error handling KVM: arm64: vgic-its: Make vgic_its_cmd_handle_mapi similar to other handlers KVM: arm64: vgic-its: Turn device_id validation into generic ID validation ...
2016-07-06s390/mm: add support for 2GB hugepagesGerald Schaefer
This adds support for 2GB hugetlbfs pages on s390. Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-20s390/mm: shadow pages with real guest requested protectionDavid Hildenbrand
We really want to avoid manually handling protection for nested virtualization. By shadowing pages with the protection the guest asked us for, the SIE can handle most protection-related actions for us (e.g. special handling for MVPG) and we can directly forward protection exceptions to the guest. PTEs will now always be shadowed with the correct _PAGE_PROTECT flag. Unshadowing will take care of any guest changes to the parent PTE and any host changes to the host PTE. If the host PTE doesn't have the fitting access rights or is not available, we have to fix it up. Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20s390/mm: add shadow gmap supportMartin Schwidefsky
For a nested KVM guest the outer KVM host needs to create shadow page tables for the nested guest. This patch adds the basic support to the guest address space (gmap) code. For each guest address space the inner KVM host creates, the first outer KVM host needs to create shadow page tables. The address space is identified by the ASCE loaded into the control register 1 at the time the inner SIE instruction for the second nested KVM guest is executed. The outer KVM host creates the shadow tables starting with the table identified by the ASCE on a on-demand basis. The outer KVM host will get repeated faults for all the shadow tables needed to run the second KVM guest. While a shadow page table for the second KVM guest is active the access to the origin region, segment and page tables needs to be restricted for the first KVM guest. For region and segment and page tables the first KVM guest may read the memory, but write attempt has to lead to an unshadow. This is done using the page invalid and read-only bits in the page table of the first KVM guest. If the first guest re-accesses one of the origin pages of a shadow, it gets a fault and the affected parts of the shadow page table hierarchy needs to be removed again. PGSTE tables don't have to be shadowed, as all interpretation assist can't deal with the invalid bits in the shadow pte being set differently than the original ones provided by the first KVM guest. Many bug fixes and improvements by David Hildenbrand. Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20s390/mm: extended gmap pte notifierMartin Schwidefsky
The current gmap pte notifier forces a pte into to a read-write state. If the pte is invalidated the gmap notifier is called to inform KVM that the mapping will go away. Extend this approach to allow read-write, read-only and no-access as possible target states and call the pte notifier for any change to the pte. This mechanism is used to temporarily set specific access rights for a pte without doing the heavy work of a true mprotect call. Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-13s390/mm: simplify the TLB flushing codeMartin Schwidefsky
ptep_flush_lazy and pmdp_flush_lazy use mm->context.attach_count to decide between a lazy TLB flush vs an immediate TLB flush. The field contains two 16-bit counters, the number of CPUs that have the mm attached and can create TLB entries for it and the number of CPUs in the middle of a page table update. The __tlb_flush_asce, ptep_flush_direct and pmdp_flush_direct functions use the attach counter and a mask check with mm_cpumask(mm) to decide between a local flush local of the current CPU and a global flush. For all these functions the decision between lazy vs immediate and local vs global TLB flush can be based on CPU masks. There are two masks: the mm->context.cpu_attach_mask with the CPUs that are actively using the mm, and the mm_cpumask(mm) with the CPUs that have used the mm since the last full flush. The decision between lazy vs immediate flush is based on the mm->context.cpu_attach_mask, to decide between local vs global flush the mm_cpumask(mm) is used. With this patch all checks will use the CPU masks, the old counter mm->context.attach_count with its two 16-bit values is turned into a single counter mm->context.flush_count that keeps track of the number of CPUs with incomplete page table updates. The sole user of this counter is finish_arch_post_lock_switch() which waits for the end of all page table updates. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/mm: fix vunmap vs finish_arch_post_lock_switchMartin Schwidefsky
The vunmap_pte_range() function calls ptep_get_and_clear() without any locking. ptep_get_and_clear() uses ptep_xchg_lazy()/ptep_flush_direct() for the page table update. ptep_flush_direct requires that preemption is disabled, but without any locking this is not the case. If the kernel preempts the task while the attach_counter is increased an endless loop in finish_arch_post_lock_switch() will occur the next time the task is scheduled. Add explicit preempt_disable()/preempt_enable() calls to the relevant functions in arch/s390/mm/pgtable.c. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13KVM: s390/mm: Fix CMMA reset during rebootChristian Borntraeger
commit 1e133ab296f ("s390/mm: split arch/s390/mm/pgtable.c") factored out the page table handling code from __gmap_zap and __s390_reset_cmma into ptep_zap_unused and added a simple flag that tells which one of the function (reset or not) is to be made. This also changed the behaviour, as it also zaps unused page table entries on reset. Turns out that this is wrong as s390_reset_cmma uses the page walker, which DOES NOT take the ptl lock. The most simple fix is to not do the zapping part on reset (which uses the walker) Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Fixes: 1e133ab296f ("s390/mm: split arch/s390/mm/pgtable.c") Cc: stable@vger.kernel.org # 4.6+ Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-10KVM: s390: handle missing storage-key facilityDavid Hildenbrand
Without the storage-key facility, SIE won't interpret SSKE, ISKE and RRBE for us. So let's add proper interception handlers that will be called if lazy sske cannot be enabled. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>