summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)Author
2017-06-22x86/ioapic: Create named irq domainThomas Gleixner
Use the fwnode to create a named domain so diagnosis works, but only when the the ioapic is not device tree based. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Keith Busch <keith.busch@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20170619235443.752782603@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-06-22x86/vector: Create named irq domainThomas Gleixner
Use the fwnode to create a named domain so diagnosis works. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Keith Busch <keith.busch@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20170619235443.673635238@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-06-22x86/apic: Add name to irq chipThomas Gleixner
Add the missing name, so debugging will work proper. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Keith Busch <keith.busch@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20170619235443.266561988@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-06-22x86/tsc: Call check_system_tsc_reliable() before unsynchronized_tsc()Zhenzhong Duan
tsc_clocksource_reliable is initialized in check_system_tsc_reliable(), but it is checked in unsynchronized_tsc() which is called before the initialization. In practice that's not an issue because systems which mark the TSC reliable have X86_FEATURE_CONSTANT_TSC set as well, which is evaluated in unsynchronized_tsc() before tsc_clocksource_reliable. Reorder the calls so initialization happens before usage. [ tglx: Massaged changelog ] Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/b1532ef7-cd9f-45f7-9f49-48dd2a5c2495@default
2017-06-22x86/hyperv: Read TSC frequency from a synthetic MSRVitaly Kuznetsov
It was found that SMI_TRESHOLD of 50000 is not enough for Hyper-V guests in nested environment and falling back to counting jiffies is not an option for Gen2 guests as they don't have PIT. As Hyper-V provides TSC frequency in a synthetic MSR we can just use this information instead of doing a error prone calibration. Reported-and-tested-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Jork Loeser <jloeser@microsoft.com> Cc: devel@linuxdriverproject.org Cc: "K. Y. Srinivasan" <kys@microsoft.com> Link: http://lkml.kernel.org/r/20170622100730.18112-3-vkuznets@redhat.com
2017-06-22x86/hyperv: Check frequency MSRs presence according to the specificationVitaly Kuznetsov
Hyper-V TLFS specifies two bits which should be checked before accessing frequency MSRs: - AccessFrequencyMsrs (BIT(11) in EAX) which indicates if we have access to frequency MSRs. - FrequencyMsrsAvailable (BIT(8) in EDX) which indicates is these MSRs are present. Rename and specify these bits accordingly. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Ladi Prosek <lprosek@redhat.com> Cc: Jork Loeser <jloeser@microsoft.com> Cc: devel@linuxdriverproject.org Cc: "K. Y. Srinivasan" <kys@microsoft.com> Link: http://lkml.kernel.org/r/20170622100730.18112-2-vkuznets@redhat.com
2017-06-22x86/debug: Extend the lower bound of crash kernel low reservationsJiri Bohac
The following change in 2013: 0212f9159694 ("x86: Add Crash kernel low reservation") ... introduced reserve_crashkernel_low(). This function is used to reserve crash kernel memory either if crashkernel=size,low is given on the command line or if the region reserved by reserve_crashkernel is entirely above 4G. reserve_crashkernel_low() tries to find a block of 'low_size' bytes. But there seems to be no good reason to restrict the lower bound of the range to 'low_size'. Make memblock_find_in_range() search from the start of memory. Signed-off-by: Jiri Bohac <jbohac@suse.cz> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20170616161602.2r7birrf2y3ylv6v@dwarf.suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-22x86/mm: Remove reset_lazy_tlbstate()Andy Lutomirski
The only call site also calls idle_task_exit(), and idle_task_exit() puts us into a clean state by explicitly switching to init_mm. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/3acc7ad02a2ec060d2321a1e0f6de1cb90069517.1498022414.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-22Merge branch 'linus' into x86/mm, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-22x86/apic: Make arch_init_msi/htirq_domain __initDou Liyang
These two functions are only called by arch_early_irq_init(), which is an __init function, so mark them __init as well. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1498101341-10182-1-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-22x86/apic: Make init_legacy_irqs() __initDou Liyang
This function is only called by arch_early_irq_init(), which is an __init function, so mark the child function __init as well. In addition mark it inline for the !CONFIG_X86_IO_APIC case. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1498040061-5332-1-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-20x86/MCE, xen/mcelog: Make /dev/mcelog registration messages more preciseJuergen Gross
When running under Xen as dom0, /dev/mcelog is being provided by Xen instead of the normal mcelog character device of the MCE core. Convert an error message being issued by the MCE core in this case to an informative message that Xen has registered the device. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170614084059.19294-1-jgross@suse.com
2017-06-20x86/boot/64: Put __startup_64() into .head.textKirill A. Shutemov
Put __startup_64() and fixup_pointer() into .head.text section to make sure it's always near startup_64() and always callable. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel test robot <fengguang.wu@intel.com> Cc: wfg@linux.intel.com Link: http://lkml.kernel.org/r/20170616113024.ajmif63cmcszry5a@black.fi.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-20x86/microcode/intel: Save pointer to ucode patch for early AP loadingBorislav Petkov
Normally, when the initrd is gone, we can't search it for microcode blobs to apply anymore. For that we need to stash away the patch in our own storage. And save_microcode_in_initrd_intel() looks like the proper place to do that from. So in order for early loading to work, invalidate the intel_ucode_patch pointer to the patch *before* scanning the initrd one last time. If the scanning code finds a microcode patch, it will assign that pointer again, this time with our own storage's address. This way, early microcode application during resume-from-RAM works too, even after the initrd is long gone. Tested-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170614140626.4462-2-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-20x86/microcode: Look for the initrd at the correct address on 32-bitBorislav Petkov
Early during boot, the BSP finds the ramdisk's position from boot_params but by the time the APs get to boot, the BSP has continued in the mean time and has potentially managed to relocate that ramdisk. And in that case, the APs need to find the ramdisk at its new position, in *physical* memory as they're running before paging has been enabled. Thus, get the updated physical location of the ramdisk which is in the relocated_ramdisk variable. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170614140626.4462-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-20x86/nmi: Fix timeout test in test_nmi_ipi()Dan Carpenter
We're supposed to exit the loop with "timeout" set to zero. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Don Zickus <dzickus@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-janitors@vger.kernel.org Fixes: 99e8b9ca90d6 ("x86, NMI: Add NMI IPI selftest") Link: http://lkml.kernel.org/r/20170619105304.GA23995@elgon.mountain Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-20Merge branch 'WIP.sched/core' into sched/coreIngo Molnar
Conflicts: kernel/sched/Makefile Pick up the waitqueue related renames - it didn't get much feedback, so it appears to be uncontroversial. Famous last words? ;-) Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-20x86/hpet: Do not use smp_processor_id() in preemptible codeBorislav Petkov
When hpet=force is supplied on the kernel command line and the HPET supports the Legacy Replacement Interrupt Route option (HPET_ID_LEGSUP), the legacy interrupts init code uses the boot CPU's mask initially by calling smp_processor_id() assuming that it is running on the BSP. It does run on the BSP but the code region is preemptible and the preemption check fires. Simply use the BSP's id directly to avoid the warning. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20170620093154.18472-1-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-06-19mm: larger stack guard gap, between vmasHugh Dickins
Stack guard page is a useful feature to reduce a risk of stack smashing into a different mapping. We have been using a single page gap which is sufficient to prevent having stack adjacent to a different mapping. But this seems to be insufficient in the light of the stack usage in userspace. E.g. glibc uses as large as 64kB alloca() in many commonly used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX] which is 256kB or stack strings with MAX_ARG_STRLEN. This will become especially dangerous for suid binaries and the default no limit for the stack size limit because those applications can be tricked to consume a large portion of the stack and a single glibc call could jump over the guard page. These attacks are not theoretical, unfortunatelly. Make those attacks less probable by increasing the stack guard gap to 1MB (on systems with 4k pages; but make it depend on the page size because systems with larger base pages might cap stack allocations in the PAGE_SIZE units) which should cover larger alloca() and VLA stack allocations. It is obviously not a full fix because the problem is somehow inherent, but it should reduce attack space a lot. One could argue that the gap size should be configurable from userspace, but that can be done later when somebody finds that the new 1MB is wrong for some special case applications. For now, add a kernel command line option (stack_guard_gap) to specify the stack gap size (in page units). Implementation wise, first delete all the old code for stack guard page: because although we could get away with accounting one extra page in a stack vma, accounting a larger gap can break userspace - case in point, a program run with "ulimit -S -v 20000" failed when the 1MB gap was counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK and strict non-overcommit mode. Instead of keeping gap inside the stack vma, maintain the stack guard gap as a gap between vmas: using vm_start_gap() in place of vm_start (or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few places which need to respect the gap - mainly arch_get_unmapped_area(), and and the vma tree's subtree_gap support for that. Original-patch-by: Oleg Nesterov <oleg@redhat.com> Original-patch-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Tested-by: Helge Deller <deller@gmx.de> # parisc Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-15iommu/vt-d: Correctly disable Intel IOMMU force onShaohua Li
I made a mistake in commit bfd20f1. We should skip the force on with the option enabled instead of vice versa. Not sure why this passed our performance test, sorry. Fixes: bfd20f1cc850 ('x86, iommu/vt-d: Add an option to disable Intel IOMMU force on') Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-06-14x86/mce: Update bootlog description to reflect behavior on AMDYazen Ghannam
The bootlog option is only disabled by default on AMD Fam10h and older systems. Update bootlog description to say this. Change the family value to hex to avoid confusion. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170613162835.30750-9-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-14x86/mce: Don't disable MCA banks when offlining a CPU on AMDYazen Ghannam
AMD systems have non-core, shared MCA banks within a die. These banks are controlled by a master CPU per die. If this CPU is offlined then all the shared banks are disabled in addition to the CPU's core banks. Also, Fam17h systems may have SMT enabled. The MCA_CTL register is shared between SMT thread siblings. If a CPU is offlined then all its sibling's MCA banks are also disabled. Extend the existing vendor check to AMD too. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> [ Fix up comment. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170613162835.30750-8-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-14x86/mce/mce-inject: Preset the MCE injection structBorislav Petkov
Populate the MCE injection struct before doing initial injection so that values which don't change have sane defaults. Tested-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Yazen Ghannam <yazen.ghannam@amd.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20170613162835.30750-7-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-14x86/mce: Clean up include filesBorislav Petkov
Not really needed. Tested-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Yazen Ghannam <yazen.ghannam@amd.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20170613162835.30750-6-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-14x86/mce: Get rid of register_mce_write_callback()Borislav Petkov
Make the mcelog call a notifier which lands in the injector module and does the injection. This allows for mce-inject to be a normal kernel module now. Tested-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Yazen Ghannam <yazen.ghannam@amd.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20170613162835.30750-5-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-14x86/mce: Merge mce_amd_inj into mce-injectBorislav Petkov
Reuse mce_amd_inj's debugfs interface so that mce-inject can benefit from it too. The old functionality is still preserved under CONFIG_X86_MCELOG_LEGACY. Tested-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Yazen Ghannam <yazen.ghannam@amd.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20170613162835.30750-4-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-14x86/mce/AMD: Use saved threshold block info in interrupt handlerYazen Ghannam
In the amd_threshold_interrupt() handler, we loop through every possible block in each bank and rediscover the block's address and if it's valid, e.g. valid, counter present and not locked. However, we already have the address saved in the threshold blocks list for each CPU and bank. The list only contains blocks that have passed all the valid checks. Besides the redundancy, there's also a smp_call_function* in get_block_address() which causes a warning when servicing the interrupt: WARNING: CPU: 0 PID: 0 at kernel/smp.c:281 smp_call_function_single+0xdd/0xf0 ... Call Trace: <IRQ> rdmsr_safe_on_cpu() get_block_address.isra.2() amd_threshold_interrupt() smp_threshold_interrupt() threshold_interrupt() because we do get called in an interrupt handler *with* interrupts disabled, which can result in a deadlock. Drop the redundant valid checks and move the overflow check, logging and block reset into a separate function. Check the first block then iterate over the rest. This procedure is needed since the first block is used as the head of the list. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170613162835.30750-3-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-14x86/mce/AMD: Use msr_stat when clearing MCA_STATUSYazen Ghannam
The value of MCA_STATUS is used as the MSR when clearing MCA_STATUS. This may cause the following warning: unchecked MSR access error: WRMSR to 0x11b (tried to write 0x0000000000000000) Call Trace: <IRQ> smp_threshold_interrupt() threshold_interrupt() Use msr_stat instead which has the MSR address. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Fixes: 37d43acfd79f ("x86/mce/AMD: Redo error logging from APIC LVT interrupt handlers") Link: http://lkml.kernel.org/r/20170613162835.30750-2-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-14Merge tag 'v4.12-rc5' into ras/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13x86/boot/64: Add support of additional page table level during early bootKirill A. Shutemov
This patch adds support for 5-level paging during early boot. It generalizes boot for 4- and 5-level paging on 64-bit systems with compile-time switch between them. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170606113133.22974-10-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13x86/boot/64: Rename init_level4_pgt and early_level4_pgtKirill A. Shutemov
With CONFIG_X86_5LEVEL=y, level 4 is no longer top level of page tables. Let's give these variable more generic names: init_top_pgt and early_top_pgt. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170606113133.22974-9-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13x86/boot/64: Rewrite startup_64() in CKirill A. Shutemov
The patch write most of startup_64 logic in C. This is preparation for 5-level paging enabling. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170606113133.22974-8-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13x86/mm: Split read_cr3() into read_cr3_pa() and __read_cr3()Andy Lutomirski
The kernel has several code paths that read CR3. Most of them assume that CR3 contains the PGD's physical address, whereas some of them awkwardly use PHYSICAL_PAGE_MASK to mask off low bits. Add explicit mask macros for CR3 and convert all of the CR3 readers. This will keep them from breaking when PCID is enabled. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: xen-devel <xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/883f8fb121f4616c1c1427ad87350bb2f5ffeca1.1497288170.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13Merge branch 'sched/urgent' into x86/mm, to pick up dependent fixIngo Molnar
Andy will need the following scheduler fix for the PCID series: 252d2a4117bc: sched/core: Idle_task_exit() shouldn't use switch_mm_irqs_off() So do a cross-merge. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13x86/time: Make setup_default_timer_irq() staticDou Liyang
This function isn't used outside of time.c, so let's mark it static. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1497321029-29049-1-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-12x86/debug: Handle early WARN_ONs properPeter Zijlstra
Hans managed to trigger a WARN very early in the boot which killed his (Virtual) box. The reason is that the recent rework of WARN() to use UD0 forgot to add the fixup_bug() call to early_fixup_exception(). As a result the kernel does not handle the WARN_ON injected UD0 exception and panics. Add the missing fixup call, so early UD's injected by WARN() get handled. Fixes: 9a93848fe787 ("x86/debug: Implement __WARN() using UD0") Reported-and-tested-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Frank Mehnert <frank.mehnert@oracle.com> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Michael Thayer <michael.thayer@oracle.com> Link: http://lkml.kernel.org/r/20170612180108.w4vgu2ckucmllf3a@hirez.programming.kicks-ass.net
2017-06-11Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "Bug fixes (ARM, s390, x86)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: async_pf: avoid async pf injection when in guest mode KVM: cpuid: Fix read/write out-of-bounds vulnerability in cpuid emulation arm: KVM: Allow unaligned accesses at HYP arm64: KVM: Allow unaligned accesses at EL2 arm64: KVM: Preserve RES1 bits in SCTLR_EL2 KVM: arm/arm64: Handle possible NULL stage2 pud when ageing pages KVM: nVMX: Fix exception injection kvm: async_pf: fix rcu_irq_enter() with irqs enabled KVM: arm/arm64: vgic-v3: Fix nr_pre_bits bitfield extraction KVM: s390: fix ais handling vs cpu model KVM: arm/arm64: Fix isues with GICv2 on GICv3 migration
2017-06-08x86/microcode/intel: Clear patch pointer before jettisoning the initrdDominik Brodowski
During early boot, load_ucode_intel_ap() uses __load_ucode_intel() to obtain a pointer to the relevant microcode patch (embedded in the initrd), and stores this value in 'intel_ucode_patch' to speed up the microcode patch application for subsequent CPUs. On resuming from suspend-to-RAM, however, load_ucode_ap() calls load_ucode_intel_ap() for each non-boot-CPU. By then the initramfs is long gone so the pointer stored in 'intel_ucode_patch' no longer points to a valid microcode patch. Clear that pointer so that we effectively fall back to the CPU hotplug notifier callbacks to update the microcode. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> [ Edit and massage commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # 4.10.. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170607095819.9754-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-08x86/ldt: Rename ldt_struct::size to ::nr_entriesBorislav Petkov
... because this is exactly what it is: the number of entries in the LDT. Calling it "size" is simply confusing and it is actually begging to be called "nr_entries" or somesuch, especially if you see constructs like: alloc_size = size * LDT_ENTRY_SIZE; since LDT_ENTRY_SIZE is the size of a single entry. There should be no functionality change resulting from this patch, as the before/after output from tools/testing/selftests/x86/ldt_gdt.c shows. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170606173116.13977-1-bp@alien8.de [ Renamed 'n_entries' to 'nr_entries' ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-06kvm: async_pf: fix rcu_irq_enter() with irqs enabledPaolo Bonzini
native_safe_halt enables interrupts, and you just shouldn't call rcu_irq_enter() with interrupts enabled. Reorder the call with the following local_irq_disable() to respect the invariant. Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-06-05x86/mm: Rework lazy TLB to track the actual loaded mmAndy Lutomirski
Lazy TLB state is currently managed in a rather baroque manner. AFAICT, there are three possible states: - Non-lazy. This means that we're running a user thread or a kernel thread that has called use_mm(). current->mm == current->active_mm == cpu_tlbstate.active_mm and cpu_tlbstate.state == TLBSTATE_OK. - Lazy with user mm. We're running a kernel thread without an mm and we're borrowing an mm_struct. We have current->mm == NULL, current->active_mm == cpu_tlbstate.active_mm, cpu_tlbstate.state != TLBSTATE_OK (i.e. TLBSTATE_LAZY or 0). The current cpu is set in mm_cpumask(current->active_mm). CR3 points to current->active_mm->pgd. The TLB is up to date. - Lazy with init_mm. This happens when we call leave_mm(). We have current->mm == NULL, current->active_mm == cpu_tlbstate.active_mm, but that mm is only relelvant insofar as the scheduler is tracking it for refcounting. cpu_tlbstate.state != TLBSTATE_OK. The current cpu is clear in mm_cpumask(current->active_mm). CR3 points to swapper_pg_dir, i.e. init_mm->pgd. This patch simplifies the situation. Other than perf, x86 stops caring about current->active_mm at all. We have cpu_tlbstate.loaded_mm pointing to the mm that CR3 references. The TLB is always up to date for that mm. leave_mm() just switches us to init_mm. There are no longer any special cases for mm_cpumask, and switch_mm() switches mms without worrying about laziness. After this patch, cpu_tlbstate.state serves only to tell the TLB flush code whether it may switch to init_mm instead of doing a normal flush. This makes fairly extensive changes to xen_exit_mmap(), which used to look a bit like black magic. Perf is unchanged. With or without this change, perf may behave a bit erratically if it tries to read user memory in kernel thread context. We should build on this patch to teach perf to never look at user memory when cpu_tlbstate.loaded_mm != current->mm. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-05x86/cpu/cyrix: Add alternative Device ID of Geode GX1 SoCChristian Sünkenberg
A SoC variant of Geode GX1, notably NSC branded SC1100, seems to report an inverted Device ID in its DIR0 configuration register, specifically 0xb instead of the expected 0x4. Catch this presumably quirky version so it's properly recognized as GX1 and has its cache switched to write-back mode, which provides a significant performance boost in most workloads. SC1100's datasheet "Geode™ SC1100 Information Appliance On a Chip", states in section 1.1.7.1 "Device ID" that device identification values are specified in SC1100's device errata. These, however, seem to not have been publicly released. Wading through a number of boot logs and /proc/cpuinfo dumps found on pastebin and blogs, this patch should mostly be relevant for a number of now admittedly aging Soekris NET4801 and PC Engines WRAP devices, the latter being the platform this issue was discovered on. Performance impact was verified using "openssl speed", with write-back caching scaling throughput between -3% and +41%. Signed-off-by: Christian Sünkenberg <christian.suenkenberg@student.kit.edu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1496596719.26725.14.camel@student.kit.edu Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-04x86/tsc: Remove the TSC_ADJUST clampPeter Zijlstra
Now that all affected platforms have a microcode update; and we check this and disable TSC_DEADLINE and print a microcode revision update error if its too old, we can remove the TSC_ADJUST clamp. This should help with systems where the second socket runs ahead of the first socket and needs a negative adjustment. In this case we'd hit the 0 clamp and give up for not achieving synchronization. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: kevin.b.stanton@intel.com Link: http://lkml.kernel.org/r/20170531155306.100950003@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-06-04x86/apic: Add TSC_DEADLINE quirk due to errataPeter Zijlstra
Due to errata it is possible for the TSC_DEADLINE timer to misbehave after using TSC_ADJUST. A microcode update is available to fix this situation. Avoid using the TSC_DEADLINE timer if it is affected by this issue and report the required microcode version. [ tglx: Renamed function to apic_check_deadline_errata() ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: kevin.b.stanton@intel.com Link: http://lkml.kernel.org/r/20170531155306.050849877@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-06-04x86/apic: Change the lapic name in deadline modePeter Zijlstra
So that we can more easily see in what mode the lapic timer operates. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: kevin.b.stanton@intel.com Link: http://lkml.kernel.org/r/20170531155305.989808008@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-05-29x86/debug/32: Convert a smp_processor_id() call to raw to avoid ↵Borislav Petkov
DEBUG_PREEMPT warning ... to raw_smp_processor_id() to not trip the BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 check. The reasoning behind it is that __warn() already uses the raw_ variants but the show_regs() path on 32-bit doesn't. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170528092212.fiod7kygpjm23m3o@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-29x86/microcode/AMD: Change load_microcode_amd()'s param to bool to fix ↵Borislav Petkov
preemptibility bug With CONFIG_DEBUG_PREEMPT enabled, I get: BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 caller is debug_smp_processor_id CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc2+ #2 Call Trace: dump_stack check_preemption_disabled debug_smp_processor_id save_microcode_in_initrd_amd ? microcode_init save_microcode_in_initrd ... because, well, it says it above, we're using smp_processor_id() in preemptible code. But passing the CPU number is not really needed. It is only used to determine whether we're on the BSP, and, if so, to save the microcode patch for early loading. [ We don't absolutely need to do it on the BSP but we do that customarily there. ] Instead, convert that function parameter to a boolean which denotes whether the patch should be saved or not, thereby avoiding the use of smp_processor_id() in preemptible code. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170528200414.31305-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-27Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A series of fixes for X86: - The final fix for the end-of-stack issue in the unwinder - Handle non PAT systems gracefully - Prevent access to uninitiliazed memory - Move early delay calaibration after basic init - Fix Kconfig help text - Fix a cross compile issue - Unbreak older make versions" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/timers: Move simple_udelay_calibration past init_hypervisor_platform x86/alternatives: Prevent uninitialized stack byte read in apply_alternatives() x86/PAT: Fix Xorg regression on CPUs that don't support PAT x86/watchdog: Fix Kconfig help text file path reference to lockup watchdog documentation x86/build: Permit building with old make versions x86/unwind: Add end-of-stack check for ftrace handlers Revert "x86/entry: Fix the end of the stack for newly forked tasks" x86/boot: Use CROSS_COMPILE prefix for readelf
2017-05-27Merge branch 'ras-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS fixes from Thomas Gleixner: "Two fixlets for RAS: - Export memory_error() so the NFIT module can utilize it - Handle memory errors in NFIT correctly" * 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: acpi, nfit: Fix the memory error check in nfit_handle_mce() x86/MCE: Export memory_error()
2017-05-26x86/ftrace: Make sure that ftrace trampolines are not RWXThomas Gleixner
ftrace use module_alloc() to allocate trampoline pages. The mapping of module_alloc() is RWX, which makes sense as the memory is written to right after allocation. But nothing makes these pages RO after writing to them. Add proper set_memory_rw/ro() calls to protect the trampolines after modification. Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1705251056410.1862@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>