summaryrefslogtreecommitdiff
path: root/arch/x86/kernel/apic/apic.c
AgeCommit message (Collapse)Author
2019-11-05x86/apic/32: Avoid bogus LDR warningsJan Beulich
The removal of the LDR initialization in the bigsmp_32 APIC code unearthed a problem in setup_local_APIC(). The code checks unconditionally for a mismatch of the logical APIC id by comparing the early APIC id which was initialized in get_smp_config() with the actual LDR value in the APIC. Due to the removal of the bogus LDR initialization the check now can trigger on bigsmp_32 APIC systems emitting a warning for every booting CPU. This is of course a false positive because the APIC is not using logical destination mode. Restrict the check and the possibly resulting fixup to systems which are actually using the APIC in logical destination mode. [ tglx: Massaged changelog and added Cc stable ] Fixes: bae3a8d3308 ("x86/apic: Do not initialize LDR and DFR for bigsmp") Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/666d8f91-b5a8-1afd-7add-821e72a35f03@suse.com
2019-10-27x86/apic, x86/uprobes: Correct parameter names in kernel-doc commentsYi Wang
Rename parameter names to the correct ones used in the function. No functional changes. [ bp: Merge two patches into a single one. ] Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/1571816442-22494-1-git-send-email-wang.yi59@zte.com.cn
2019-10-18x86: Use pr_warn instead of pr_warningKefeng Wang
As said in commit f2c2cbcc35d4 ("powerpc: Use pr_warn instead of pr_warning"), removing pr_warning so all logging messages use a consistent <prefix>_warn style. Let's do it. Link: http://lkml.kernel.org/r/20191018031850.48498-7-wangkefeng.wang@huawei.com To: linux-kernel@vger.kernel.org Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Robert Richter <rric@kernel.org> Cc: Darren Hart <dvhart@infradead.org> Cc: Andy Shevchenko <andy@infradead.org> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com>
2019-09-17Merge branch 'x86-apic-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic updates from Thomas Gleixner: - Cleanup the apic IPI implementation by removing duplicated code and consolidating the functions into the APIC core. - Implement a safe variant of the IPI broadcast mode. Contrary to earlier attempts this uses the core tracking of which CPUs have been brought online at least once so that a broadcast does not end up in some dead end in BIOS/SMM code when the CPU is still waiting for init. Once all CPUs have been brought up once, IPI broadcasting is enabled. Before that regular one by one IPIs are issued. - Drop the paravirt CR8 related functions as they have no user anymore - Initialize the APIC TPR to block interrupt 16-31 as they are reserved for CPU exceptions and should never be raised by any well behaving device. - Emit a warning when vector space exhaustion breaks the admin set affinity of an interrupt. - Make sure to use the NMI fallback when shutdown via reboot vector IPI fails. The original code had conditions which prevent the code path to be reached. - Annotate various APIC config variables as RO after init. [ The ipi broadcase change came in earlier through the cpu hotplug branch, but I left the explanation in the commit message since it was shared between the two different branches - Linus ] * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits) x86/apic/vector: Warn when vector space exhaustion breaks affinity x86/apic: Annotate global config variables as "read-only after init" x86/apic/x2apic: Implement IPI shorthands support x86/apic/flat64: Remove the IPI shorthand decision logic x86/apic: Share common IPI helpers x86/apic: Remove the shorthand decision logic x86/smp: Enhance native_send_call_func_ipi() x86/smp: Move smp_function_call implementations into IPI code x86/apic: Provide and use helper for send_IPI_allbutself() x86/apic: Add static key to Control IPI shorthands x86/apic: Move no_ipi_broadcast() out of 32bit x86/apic: Add NMI_VECTOR wait to IPI shorthand x86/apic: Remove dest argument from __default_send_IPI_shortcut() x86/hotplug: Silence APIC and NMI when CPU is dead x86/cpu: Move arch_smt_update() to a neutral place x86/apic/uv: Make x2apic_extra_bits static x86/apic: Consolidate the apic local headers x86/apic: Move apic_flat_64 header into apic directory x86/apic: Move ipi header into apic directory x86/apic: Cleanup the include maze ...
2019-09-16Merge branch 'x86-cpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu-feature updates from Ingo Molnar: - Rework the Intel model names symbols/macros, which were decades of ad-hoc extensions and added random noise. It's now a coherent, easy to follow nomenclature. - Add new Intel CPU model IDs: - "Tiger Lake" desktop and mobile models - "Elkhart Lake" model ID - and the "Lightning Mountain" variant of Airmont, plus support code - Add the new AVX512_VP2INTERSECT instruction to cpufeatures - Remove Intel MPX user-visible APIs and the self-tests, because the toolchain (gcc) is not supporting it going forward. This is the first, lowest-risk phase of MPX removal. - Remove X86_FEATURE_MFENCE_RDTSC - Various smaller cleanups and fixes * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) x86/cpu: Update init data for new Airmont CPU model x86/cpu: Add new Airmont variant to Intel family x86/cpu: Add Elkhart Lake to Intel family x86/cpu: Add Tiger Lake to Intel family x86: Correct misc typos x86/intel: Add common OPTDIFFs x86/intel: Aggregate microserver naming x86/intel: Aggregate big core graphics naming x86/intel: Aggregate big core mobile naming x86/intel: Aggregate big core client naming x86/cpufeature: Explain the macro duplication x86/ftrace: Remove mcount() declaration x86/PCI: Remove superfluous returns from void functions x86/msr-index: Move AMD MSRs where they belong x86/cpu: Use constant definitions for CPU models lib: Remove redundant ftrace flag removal x86/crash: Remove unnecessary comparison x86/bitops: Use __builtin_constant_p() directly instead of IS_IMMEDIATE() x86: Remove X86_FEATURE_MFENCE_RDTSC x86/mpx: Remove MPX APIs ...
2019-09-12Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "A KVM guest fix, and a kdump kernel relocation errors fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/timer: Force PIT initialization when !X86_FEATURE_ARAT x86/purgatory: Change compiler flags from -mcmodel=kernel to -mcmodel=large to fix kexec relocation errors
2019-09-08x86/timer: Force PIT initialization when !X86_FEATURE_ARATJan Stancek
KVM guests with commit c8c4076723da ("x86/timer: Skip PIT initialization on modern chipsets") applied to guest kernel have been observed to have unusually higher CPU usage with symptoms of increase in vm exits for HLT and MSW_WRITE (MSR_IA32_TSCDEADLINE). This is caused by older QEMUs lacking support for X86_FEATURE_ARAT. lapic clock retains CLOCK_EVT_FEAT_C3STOP and nohz stays inactive. There's no usable broadcast device either. Do the PIT initialization if guest CPU lacks X86_FEATURE_ARAT. On real hardware it shouldn't matter as ARAT and DEADLINE come together. Fixes: c8c4076723da ("x86/timer: Skip PIT initialization on modern chipsets") Signed-off-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-09-07Revert "x86/apic: Include the LDR when clearing out APIC registers"Linus Torvalds
This reverts commit 558682b5291937a70748d36fd9ba757fb25b99ae. Chris Wilson reports that it breaks his CPU hotplug test scripts. In particular, it breaks offlining and then re-onlining the boot CPU, which we treat specially (and the BIOS does too). The symptoms are that we can offline the CPU, but it then does not come back online again: smpboot: CPU 0 is now offline smpboot: Booting Node 0 Processor 0 APIC 0x0 smpboot: do_boot_cpu failed(-1) to wakeup CPU#0 Thomas says he knows why it's broken (my personal suspicion: our magic handling of the "cpu0_logical_apicid" thing), but for 5.3 the right fix is to just revert it, since we've never touched the LDR bits before, and it's not worth the risk to do anything else at this stage. [ Hotpluging of the boot CPU is special anyway, and should be off by default. See the "BOOTPARAM_HOTPLUG_CPU0" config option and the cpu0_hotplug kernel parameter. In general you should not do it, and it has various known limitations (hibernate and suspend require the boot CPU, for example). But it should work, even if the boot CPU is special and needs careful treatment - Linus ] Link: https://lore.kernel.org/lkml/156785100521.13300.14461504732265570003@skylake-alporthouse-com/ Reported-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Bandan Das <bsd@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-02Merge branch 'linus' into x86/cpu, to resolve conflictsIngo Molnar
Conflicts: tools/power/x86/turbostat/turbostat.c Recent turbostat changes conflicted with a pending rename of x86 model names in tip:x86/cpu, sort it out. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-08-28x86/intel: Aggregate microserver namingPeter Zijlstra
Currently big microservers have _XEON_D while small microservers have _X, Make it uniformly: _D. for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_\(X\|XEON_D\)"` do sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*ATOM.*\)_X/\1_D/g' \ -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_XEON_D/\1_D/g' ${i} done Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: x86@kernel.org Cc: Dave Hansen <dave.hansen@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/20190827195122.677152989@infradead.org
2019-08-28x86/intel: Aggregate big core graphics namingPeter Zijlstra
Currently big core clients with extra graphics on have: - _G - _GT3E Make it uniformly: _G for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_GT3E"` do sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_GT3E/\1_G/g' ${i} done Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: x86@kernel.org Cc: Dave Hansen <dave.hansen@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/20190827195122.622802314@infradead.org
2019-08-28x86/intel: Aggregate big core mobile namingPeter Zijlstra
Currently big core mobile chips have either: - _L - _ULT - _MOBILE Make it uniformly: _L. for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_\(MOBILE\|ULT\)"` do sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_\(MOBILE\|ULT\)/\1_L/g' ${i} done Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: x86@kernel.org Cc: Dave Hansen <dave.hansen@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190827195122.568978530@infradead.org
2019-08-28x86/intel: Aggregate big core client namingPeter Zijlstra
Currently the big core client models either have: - no OPTDIFF - _CORE - _DESKTOP Make it uniformly: 'no OPTDIFF'. for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_\(CORE\|DESKTOP\)"` do sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_\(CORE\|DESKTOP\)/\1/g' ${i} done Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: x86@kernel.org Cc: Dave Hansen <dave.hansen@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190827195122.513945586@infradead.org
2019-08-26x86/apic: Include the LDR when clearing out APIC registersBandan Das
Although APIC initialization will typically clear out the LDR before setting it, the APIC cleanup code should reset the LDR. This was discovered with a 32-bit KVM guest jumping into a kdump kernel. The stale bits in the LDR triggered a bug in the KVM APIC implementation which caused the destination mapping for VCPUs to be corrupted. Note that this isn't intended to paper over the KVM APIC bug. The kernel has to clear the LDR when resetting the APIC registers except when X2APIC is enabled. This lacks a Fixes tag because missing to clear LDR goes way back into pre git history. [ tglx: Made x2apic_enabled a function call as required ] Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190826101513.5080-3-bsd@redhat.com
2019-08-19x86/apic: Handle missing global clockevent gracefullyThomas Gleixner
Some newer machines do not advertise legacy timers. The kernel can handle that situation if the TSC and the CPU frequency are enumerated by CPUID or MSRs and the CPU supports TSC deadline timer. If the CPU does not support TSC deadline timer the local APIC timer frequency has to be known as well. Some Ryzens machines do not advertize legacy timers, but there is no reliable way to determine the bus frequency which feeds the local APIC timer when the machine allows overclocking of that frequency. As there is no legacy timer the local APIC timer calibration crashes due to a NULL pointer dereference when accessing the not installed global clock event device. Switch the calibration loop to a non interrupt based one, which polls either TSC (if frequency is known) or jiffies. The latter requires a global clockevent. As the machines which do not have a global clockevent installed have a known TSC frequency this is a non issue. For older machines where TSC frequency is not known, there is no known case where the legacy timers do not exist as that would have been reported long ago. Reported-by: Daniel Drake <drake@endlessm.com> Reported-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Daniel Drake <drake@endlessm.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908091443030.21433@nanos.tec.linutronix.de Link: http://bugzilla.opensuse.org/show_bug.cgi?id=1142926#c12
2019-08-07x86/apic: Annotate global config variables as "read-only after init"Sean Christopherson
Mark the APIC's global config variables that are constant after boot as __ro_after_init to help document that the majority of the APIC config is not changed at runtime, and to harden the kernel a smidge. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190805212134.12001-1-sean.j.christopherson@intel.com
2019-07-25x86/hotplug: Silence APIC and NMI when CPU is deadThomas Gleixner
In order to support IPI/NMI broadcasting via the shorthand mechanism side effects of shorthands need to be mitigated: Shorthand IPIs and NMIs hit all CPUs including unplugged CPUs Neither of those can be handled on unplugged CPUs for obvious reasons. It would be trivial to just fully disable the APIC via the enable bit in MSR_APICBASE. But that's not possible because clearing that bit on systems based on the 3 wire APIC bus would require a hardware reset to bring it back as the APIC would lose track of bus arbitration. On systems with FSB delivery APICBASE could be disabled, but it has to be guaranteed that no interrupt is sent to the APIC while in that state and it's not clear from the SDM whether it still responds to INIT/SIPI messages. Therefore stay on the safe side and switch the APIC into soft disabled mode so it won't deliver any regular vector to the CPU. NMIs are still propagated to the 'dead' CPUs. To mitigate that add a check for the CPU being offline on early nmi entry and if so bail. Note, this cannot use the stop/restart_nmi() magic which is used in the alternatives code. A dead CPU cannot invoke nmi_enter() or anything else due to RCU and other reasons. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907241723290.1791@nanos.tec.linutronix.de
2019-07-25x86/apic: Make apic_pending_intr_clear() more robustThomas Gleixner
In course of developing shorthand based IPI support issues with the function which tries to clear eventually pending ISR bits in the local APIC were observed. 1) O-day testing triggered the WARN_ON() in apic_pending_intr_clear(). This warning is emitted when the function fails to clear pending ISR bits or observes pending IRR bits which are not delivered to the CPU after the stale ISR bit(s) are ACK'ed. Unfortunately the function only emits a WARN_ON() and fails to dump the IRR/ISR content. That's useless for debugging. Feng added spot on debug printk's which revealed that the stale IRR bit belonged to the APIC timer interrupt vector, but adding ad hoc debug code does not help with sporadic failures in the field. Rework the loop so the full IRR/ISR contents are saved and on failure dumped. 2) The loop termination logic is interesting at best. If the machine has no TSC or cpu_khz is not known yet it tries 1 million times to ack stale IRR/ISR bits. What? With TSC it uses the TSC to calculate the loop termination. It takes a timestamp at entry and terminates the loop when: (rdtsc() - start_timestamp) >= (cpu_hkz << 10) That's roughly one second. Both methods are problematic. The APIC has 256 vectors, which means that in theory max. 256 IRR/ISR bits can be set. In practice this is impossible and the chance that more than a few bits are set is close to zero. With the pure loop based approach the 1 million retries are complete overkill. With TSC this can terminate too early in a guest which is running on a heavily loaded host even with only a couple of IRR/ISR bits set. The reason is that after acknowledging the highest priority ISR bit, pending IRRs must get serviced first before the next round of acknowledge can take place as the APIC (real and virtualized) does not honour EOI without a preceeding interrupt on the CPU. And every APIC read/write takes a VMEXIT if the APIC is virtualized. While trying to reproduce the issue 0-day reported it was observed that the guest was scheduled out long enough under heavy load that it terminated after 8 iterations. Make the loop terminate after 512 iterations. That's plenty enough in any case and does not take endless time to complete. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105219.158847694@linutronix.de
2019-07-25x86/apic: Soft disable APIC before initializing itThomas Gleixner
If the APIC was already enabled on entry of setup_local_APIC() then disabling it soft via the SPIV register makes a lot of sense. That masks all LVT entries and brings it into a well defined state. Otherwise previously enabled LVTs which are not touched in the setup function stay unmasked and might surprise the just booting kernel. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105219.068290579@linutronix.de
2019-07-25x86/apic: Invoke perf_events_lapic_init() after enabling APICThomas Gleixner
If the APIC is soft disabled then unmasking an LVT entry does not work and the write is ignored. perf_events_lapic_init() tries to do so. Move the invocation after the point where the APIC has been enabled. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105218.962517234@linutronix.de
2019-07-22x86/apic: Initialize TPR to block interrupts 16-31Andy Lutomirski
The APIC, per spec, is fundamentally confused and thinks that interrupt vectors 16-31 are valid. This makes no sense -- the CPU reserves vectors 0-31 for exceptions (faults, traps, etc). Obviously, no device should actually produce an interrupt with vector 16-31, but robustness can be improved by setting the APIC TPR class to 1, which will prevent delivery of an interrupt with a vector below 32. Note: This is *not* intended as a security measure against attackers who control malicious hardware. Any PCI or similar hardware that can be controlled by an attacker MUST be behind a functional IOMMU that remaps interrupts. The purpose of this change is to reduce the chance that a certain class of device malfunctions crashes the kernel in hard-to-debug ways. Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/dc04a9f8b234d7b0956a8d2560b8945bcd9c4bf7.1563117760.git.luto@kernel.org
2019-07-16x86/apic: Silence -Wtype-limits compiler warningsQian Cai
There are many compiler warnings like this, In file included from ./arch/x86/include/asm/smp.h:13, from ./arch/x86/include/asm/mmzone_64.h:11, from ./arch/x86/include/asm/mmzone.h:5, from ./include/linux/mmzone.h:969, from ./include/linux/gfp.h:6, from ./include/linux/mm.h:10, from arch/x86/kernel/apic/io_apic.c:34: arch/x86/kernel/apic/io_apic.c: In function 'check_timer': ./arch/x86/include/asm/apic.h:37:11: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits] if ((v) <= apic_verbosity) \ ^~ arch/x86/kernel/apic/io_apic.c:2160:2: note: in expansion of macro 'apic_printk' apic_printk(APIC_QUIET, KERN_INFO "..TIMER: vector=0x%02X " ^~~~~~~~~~~ ./arch/x86/include/asm/apic.h:37:11: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits] if ((v) <= apic_verbosity) \ ^~ arch/x86/kernel/apic/io_apic.c:2207:4: note: in expansion of macro 'apic_printk' apic_printk(APIC_QUIET, KERN_ERR "..MP-BIOS bug: " ^~~~~~~~~~~ APIC_QUIET is 0, so silence them by making apic_verbosity type int. Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1562621805-24789-1-git-send-email-cai@lca.pw
2019-07-08Merge branch 'x86-apic-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x96 apic updates from Thomas Gleixner: "Updates for the x86 APIC interrupt handling and APIC timer: - Fix a long standing issue with spurious interrupts which was caused by the big vector management rework a few years ago. Robert Hodaszi provided finally enough debug data and an excellent initial failure analysis which allowed to understand the underlying issues. This contains a change to the core interrupt management code which is required to handle this correctly for the APIC/IO_APIC. The core changes are NOOPs for most architectures except ARM64. ARM64 is not impacted by the change as confirmed by Marc Zyngier. - Newer systems allow to disable the PIT clock for power saving causing panic in the timer interrupt delivery check of the IO/APIC when the HPET timer is not enabled either. While the clock could be turned on this would cause an endless whack a mole game to chase the proper register in each affected chipset. These systems provide the relevant frequencies for TSC, CPU and the local APIC timer via CPUID and/or MSRs, which allows to avoid the PIT/HPET based calibration. As the calibration code is the only usage of the legacy timers on modern systems and is skipped anyway when the frequencies are known already, there is no point in setting up the PIT and actually checking for the interrupt delivery via IO/APIC. To achieve this on a wide variety of platforms, the CPUID/MSR based frequency readout has been made more robust, which also allowed to remove quite some workarounds which turned out to be not longer required. Thanks to Daniel Drake for analysis, patches and verification" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/irq: Seperate unused system vectors from spurious entry again x86/irq: Handle spurious interrupt after shutdown gracefully x86/ioapic: Implement irq_get_irqchip_state() callback genirq: Add optional hardware synchronization for shutdown genirq: Fix misleading synchronize_irq() documentation genirq: Delay deactivation in free_irq() x86/timer: Skip PIT initialization on modern chipsets x86/apic: Use non-atomic operations when possible x86/apic: Make apic_bsp_setup() static x86/tsc: Set LAPIC timer period to crystal clock frequency x86/apic: Rename 'lapic_timer_frequency' to 'lapic_timer_period' x86/tsc: Use CPUID.0x16 to calculate missing crystal frequency
2019-07-03x86/irq: Seperate unused system vectors from spurious entry againThomas Gleixner
Quite some time ago the interrupt entry stubs for unused vectors in the system vector range got removed and directly mapped to the spurious interrupt vector entry point. Sounds reasonable, but it's subtly broken. The spurious interrupt vector entry point pushes vector number 0xFF on the stack which makes the whole logic in __smp_spurious_interrupt() pointless. As a consequence any spurious interrupt which comes from a vector != 0xFF is treated as a real spurious interrupt (vector 0xFF) and not acknowledged. That subsequently stalls all interrupt vectors of equal and lower priority, which brings the system to a grinding halt. This can happen because even on 64-bit the system vector space is not guaranteed to be fully populated. A full compile time handling of the unused vectors is not possible because quite some of them are conditonally populated at runtime. Bring the entry stubs back, which wastes 160 bytes if all stubs are unused, but gains the proper handling back. There is no point to selectively spare some of the stubs which are known at compile time as the required code in the IDT management would be way larger and convoluted. Do not route the spurious entries through common_interrupt and do_IRQ() as the original code did. Route it to smp_spurious_interrupt() which evaluates the vector number and acts accordingly now that the real vector numbers are handed in. Fixup the pr_warn so the actual spurious vector (0xff) is clearly distiguished from the other vectors and also note for the vectored case whether it was pending in the ISR or not. "Spurious APIC interrupt (vector 0xFF) on CPU#0, should never happen." "Spurious interrupt vector 0xed on CPU#1. Acked." "Spurious interrupt vector 0xee on CPU#1. Not pending!." Fixes: 2414e021ac8d ("x86: Avoid building unused IRQ entry stubs") Reported-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Jan Beulich <jbeulich@suse.com> Link: https://lkml.kernel.org/r/20190628111440.550568228@linutronix.de
2019-06-29x86/timer: Skip PIT initialization on modern chipsetsThomas Gleixner
Recent Intel chipsets including Skylake and ApolloLake have a special ITSSPRC register which allows the 8254 PIT to be gated. When gated, the 8254 registers can still be programmed as normal, but there are no IRQ0 timer interrupts. Some products such as the Connex L1430 and exone go Rugged E11 use this register to ship with the PIT gated by default. This causes Linux to fail to boot: Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with apic=debug and send a report. The panic happens before the framebuffer is initialized, so to the user, it appears as an early boot hang on a black screen. Affected products typically have a BIOS option that can be used to enable the 8254 and make Linux work (Chipset -> South Cluster Configuration -> Miscellaneous Configuration -> 8254 Clock Gating), however it would be best to make Linux support the no-8254 case. Modern sytems allow to discover the TSC and local APIC timer frequencies, so the calibration against the PIT is not required. These systems have always running timers and the local APIC timer works also in deep power states. So the setup of the PIT including the IO-APIC timer interrupt delivery checks are a pointless exercise. Skip the PIT setup and the IO-APIC timer interrupt checks on these systems, which avoids the panic caused by non ticking PITs and also speeds up the boot process. Thanks to Daniel for providing the changelog, initial analysis of the problem and testing against a variety of machines. Reported-by: Daniel Drake <drake@endlessm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Daniel Drake <drake@endlessm.com> Cc: bp@alien8.de Cc: hpa@zytor.com Cc: linux@endlessm.com Cc: rafael.j.wysocki@intel.com Cc: hdegoede@redhat.com Link: https://lkml.kernel.org/r/20190628072307.24678-1-drake@endlessm.com
2019-06-22x86/apic: Fix integer overflow on 10 bit left shift of cpu_khzColin Ian King
The left shift of unsigned int cpu_khz will overflow for large values of cpu_khz, so cast it to a long long before shifting it to avoid overvlow. For example, this can happen when cpu_khz is 4194305, i.e. ~4.2 GHz. Addresses-Coverity: ("Unintentional integer overflow") Fixes: 8c3ba8d04924 ("x86, apic: ack all pending irqs when crashed/on kexec") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: kernel-janitors@vger.kernel.org Link: https://lkml.kernel.org/r/20190619181446.13635-1-colin.king@canonical.com
2019-06-16x86/apic: Make apic_bsp_setup() staticThomas Gleixner
No user outside of apic.c. Remove the stale and bogus function comment while at it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-05-21treewide: Add SPDX license identifier for missed filesThomas Gleixner
Add SPDX license identifiers to all files which: - Have no license information of any form - Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-09x86/apic: Rename 'lapic_timer_frequency' to 'lapic_timer_period'Daniel Drake
This variable is a period unit (number of clock cycles per jiffy), not a frequency (which is number of cycles per second). Give it a more appropriate name. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Daniel Drake <drake@endlessm.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: len.brown@intel.com Cc: linux@endlessm.com Cc: rafael.j.wysocki@intel.com Link: http://lkml.kernel.org/r/20190509055417.13152-2-drake@endlessm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-25x86/apic: Unify duplicated local apic timer clockevent initializationJacob Pan
Local APIC timer clockevent parameters can be calculated based on platform specific methods. However the code is mostly duplicated with the interrupt based calibration. The commit which increased the max_delta parameter updated only one place and made the implementations diverge. Unify it to prevent further damage. [ tglx: Rename function to lapic_init_clockevent() and adjust changelog a bit ] Fixes: 4aed89d6b515 ("x86, lapic-timer: Increase the max_delta to 31 bits") Reported-by: Daniel Drake <drake@endlessm.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/1556213272-63568-1-git-send-email-jacob.jun.pan@linux.intel.com
2018-12-08x86/kernel: Fix more -Wmissing-prototypes warningsBorislav Petkov
... with the goal of eventually enabling -Wmissing-prototypes by default. At least on x86. Make functions static where possible, otherwise add prototypes or make them visible through includes. asm/trace/ changes courtesy of Steven Rostedt <rostedt@goodmis.org>. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> # ACPI + cpufreq bits Cc: Andrew Banman <andrew.banman@hpe.com> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mike Travis <mike.travis@hpe.com> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yi Wang <wang.yi59@zte.com.cn> Cc: linux-acpi@vger.kernel.org
2018-10-31mm: remove include/linux/bootmem.hMike Rapoport
Move remaining definitions and declarations from include/linux/bootmem.h into include/linux/memblock.h and remove the redundant header. The includes were replaced with the semantic patch below and then semi-automated removal of duplicated '#include <linux/memblock.h> @@ @@ - #include <linux/bootmem.h> + #include <linux/memblock.h> [sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h] Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au [sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h] Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au [sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal] Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Jonas Bonn <jonas@southpole.se> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ley Foon Tan <lftan@altera.com> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Serge Semin <fancer.lancer@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-09-27x86/apic: Add Hygon Dhyana supportPu Wen
Add Hygon Dhyana support to the APIC subsystem. When running in 32 bit mode, bigsmp should be enabled if there are more than 8 cores online. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: x86@kernel.org Cc: thomas.lendacky@amd.com Link: https://lkml.kernel.org/r/7a557265a8c7c9e842fe60f9d8e064458801aef3.1537533369.git.puwen@hygon.cn
2018-08-14x86/smp: fix non-SMP broken build due to redefinition of ↵Vlastimil Babka
apic_id_is_primary_thread The function has an inline "return false;" definition with CONFIG_SMP=n but the "real" definition is also visible leading to "redefinition of ‘apic_id_is_primary_thread’" compiler error. Guard it with #ifdef CONFIG_SMP Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Fixes: 6a4d2657e048 ("x86/smp: Provide topology_is_primary_thread()") Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-14Merge branch 'l1tf-final' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Merge L1 Terminal Fault fixes from Thomas Gleixner: "L1TF, aka L1 Terminal Fault, is yet another speculative hardware engineering trainwreck. It's a hardware vulnerability which allows unprivileged speculative access to data which is available in the Level 1 Data Cache when the page table entry controlling the virtual address, which is used for the access, has the Present bit cleared or other reserved bits set. If an instruction accesses a virtual address for which the relevant page table entry (PTE) has the Present bit cleared or other reserved bits set, then speculative execution ignores the invalid PTE and loads the referenced data if it is present in the Level 1 Data Cache, as if the page referenced by the address bits in the PTE was still present and accessible. While this is a purely speculative mechanism and the instruction will raise a page fault when it is retired eventually, the pure act of loading the data and making it available to other speculative instructions opens up the opportunity for side channel attacks to unprivileged malicious code, similar to the Meltdown attack. While Meltdown breaks the user space to kernel space protection, L1TF allows to attack any physical memory address in the system and the attack works across all protection domains. It allows an attack of SGX and also works from inside virtual machines because the speculation bypasses the extended page table (EPT) protection mechanism. The assoicated CVEs are: CVE-2018-3615, CVE-2018-3620, CVE-2018-3646 The mitigations provided by this pull request include: - Host side protection by inverting the upper address bits of a non present page table entry so the entry points to uncacheable memory. - Hypervisor protection by flushing L1 Data Cache on VMENTER. - SMT (HyperThreading) control knobs, which allow to 'turn off' SMT by offlining the sibling CPU threads. The knobs are available on the kernel command line and at runtime via sysfs - Control knobs for the hypervisor mitigation, related to L1D flush and SMT control. The knobs are available on the kernel command line and at runtime via sysfs - Extensive documentation about L1TF including various degrees of mitigations. Thanks to all people who have contributed to this in various ways - patches, review, testing, backporting - and the fruitful, sometimes heated, but at the end constructive discussions. There is work in progress to provide other forms of mitigations, which might be less horrible performance wise for a particular kind of workloads, but this is not yet ready for consumption due to their complexity and limitations" * 'l1tf-final' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits) x86/microcode: Allow late microcode loading with SMT disabled tools headers: Synchronise x86 cpufeatures.h for L1TF additions x86/mm/kmmio: Make the tracer robust against L1TF x86/mm/pat: Make set_memory_np() L1TF safe x86/speculation/l1tf: Make pmd/pud_mknotpresent() invert x86/speculation/l1tf: Invert all not present mappings cpu/hotplug: Fix SMT supported evaluation KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry x86/speculation: Use ARCH_CAPABILITIES to skip L1D flush on vmentry x86/speculation: Simplify sysfs report of VMX L1TF vulnerability Documentation/l1tf: Remove Yonah processors from not vulnerable list x86/KVM/VMX: Don't set l1tf_flush_l1d from vmx_handle_external_intr() x86/irq: Let interrupt handlers set kvm_cpu_l1tf_flush_l1d x86: Don't include linux/irq.h from asm/hardirq.h x86/KVM/VMX: Introduce per-host-cpu analogue of l1tf_flush_l1d x86/irq: Demote irq_cpustat_t::__softirq_pending to u16 x86/KVM/VMX: Move the l1tf_flush_l1d test to vmx_l1d_flush() x86/KVM/VMX: Replace 'vmx_l1d_flush_always' with 'vmx_l1d_flush_cond' x86/KVM/VMX: Don't set l1tf_flush_l1d to true from vmx_l1d_flush() cpu/hotplug: detect SMT disabled by BIOS ...
2018-08-13Merge branch 'x86-apic-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic update from Thomas Gleixner: "Trivial cleanups of the APIC related code" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic: Trivial coding style fixes x86/vector: Merge allocate_vector() into assign_vector_locked()
2018-08-05x86: Don't include linux/irq.h from asm/hardirq.hNicolai Stange
The next patch in this series will have to make the definition of irq_cpustat_t available to entering_irq(). Inclusion of asm/hardirq.h into asm/apic.h would cause circular header dependencies like asm/smp.h asm/apic.h asm/hardirq.h linux/irq.h linux/topology.h linux/smp.h asm/smp.h or linux/gfp.h linux/mmzone.h asm/mmzone.h asm/mmzone_64.h asm/smp.h asm/apic.h asm/hardirq.h linux/irq.h linux/irqdesc.h linux/kobject.h linux/sysfs.h linux/kernfs.h linux/idr.h linux/gfp.h and others. This causes compilation errors because of the header guards becoming effective in the second inclusion: symbols/macros that had been defined before wouldn't be available to intermediate headers in the #include chain anymore. A possible workaround would be to move the definition of irq_cpustat_t into its own header and include that from both, asm/hardirq.h and asm/apic.h. However, this wouldn't solve the real problem, namely asm/harirq.h unnecessarily pulling in all the linux/irq.h cruft: nothing in asm/hardirq.h itself requires it. Also, note that there are some other archs, like e.g. arm64, which don't have that #include in their asm/hardirq.h. Remove the linux/irq.h #include from x86' asm/hardirq.h. Fix resulting compilation errors by adding appropriate #includes to *.c files as needed. Note that some of these *.c files could be cleaned up a bit wrt. to their set of #includes, but that should better be done from separate patches, if at all. Signed-off-by: Nicolai Stange <nstange@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-07-30x86/apic: Trivial coding style fixesYi Wang
There is inconsistent indenting in calibrate_APIC_clock() and activate_managed(). Remove the surplus TAB. Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: jgross@suse.com Cc: ville.syrjala@linux.intel.com Cc: len.brown@intel.com Cc: gregkh@linuxfoundation.org Cc: zhong.weidong@zte.com.cn Link: https://lkml.kernel.org/r/1532672103-32250-1-git-send-email-wang.yi59@zte.com.cn
2018-07-24x86/apic: Future-proof the TSC_DEADLINE quirk for SKXLen Brown
All SKX with stepping higher than 4 support the TSC_DEADLINE, no matter the microcode version. Without this patch, upcoming SKX steppings will not be able to use their TSC_DEADLINE timer. Signed-off-by: Len Brown <len.brown@intel.com> Cc: <stable@kernel.org> # v4.14+ Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 616dd5872e ("x86/apic: Update TSC_DEADLINE quirk with additional SKX stepping") Link: http://lkml.kernel.org/r/d0c7129e509660be9ec6b233284b8d42d90659e8.1532207856.git.len.brown@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-02Revert "x86/apic: Ignore secondary threads if nosmt=force"Thomas Gleixner
Dave Hansen reported, that it's outright dangerous to keep SMT siblings disabled completely so they are stuck in the BIOS and wait for SIPI. The reason is that Machine Check Exceptions are broadcasted to siblings and the soft disabled sibling has CR4.MCE = 0. If a MCE is delivered to a logical core with CR4.MCE = 0, it asserts IERR#, which shuts down or reboots the machine. The MCE chapter in the SDM contains the following blurb: Because the logical processors within a physical package are tightly coupled with respect to shared hardware resources, both logical processors are notified of machine check errors that occur within a given physical processor. If machine-check exceptions are enabled when a fatal error is reported, all the logical processors within a physical package are dispatched to the machine-check exception handler. If machine-check exceptions are disabled, the logical processors enter the shutdown state and assert the IERR# signal. When enabling machine-check exceptions, the MCE flag in control register CR4 should be set for each logical processor. Reverting the commit which ignores siblings at enumeration time solves only half of the problem. The core cpuhotplug logic needs to be adjusted as well. This thoughtful engineered mechanism also turns the boot process on all Intel HT enabled systems into a MCE lottery. MCE is enabled on the boot CPU before the secondary CPUs are brought up. Depending on the number of physical cores the window in which this situation can happen is smaller or larger. On a HSW-EX it's about 750ms: MCE is enabled on the boot CPU: [ 0.244017] mce: CPU supports 22 MCE banks The corresponding sibling #72 boots: [ 1.008005] .... node #0, CPUs: #72 That means if an MCE hits on physical core 0 (logical CPUs 0 and 72) between these two points the machine is going to shutdown. At least it's a known safe state. It's obvious that the early boot can be hit by an MCE as well and then runs into the same situation because MCEs are not yet enabled on the boot CPU. But after enabling them on the boot CPU, it does not make any sense to prevent the kernel from recovering. Adjust the nosmt kernel parameter documentation as well. Reverts: 2207def700f9 ("x86/apic: Ignore secondary threads if nosmt=force") Reported-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tony Luck <tony.luck@intel.com>
2018-06-21x86/apic: Ignore secondary threads if nosmt=forceThomas Gleixner
nosmt on the kernel command line merely prevents the onlining of the secondary SMT siblings. nosmt=force makes the APIC detection code ignore the secondary SMT siblings completely, so they even do not show up as possible CPUs. That reduces the amount of memory allocations for per cpu variables and saves other resources from being allocated too large. This is not fully equivalent to disabling SMT in the BIOS because the low level SMT enabling in the BIOS can result in partitioning of resources between the siblings, which is not undone by just ignoring them. Some CPUs can use the full resources when their sibling is not onlined, but this is depending on the CPU family and model and it's not well documented whether this applies to all partitioned resources. That means depending on the workload disabling SMT in the BIOS might result in better performance. Linus analysis of the Intel manual: The intel optimization manual is not very clear on what the partitioning rules are. I find: "In general, the buffers for staging instructions between major pipe stages are partitioned. These buffers include µop queues after the execution trace cache, the queues after the register rename stage, the reorder buffer which stages instructions for retirement, and the load and store buffers. In the case of load and store buffers, partitioning also provided an easier implementation to maintain memory ordering for each logical processor and detect memory ordering violations" but some of that partitioning may be relaxed if the HT thread is "not active": "In Intel microarchitecture code name Sandy Bridge, the micro-op queue is statically partitioned to provide 28 entries for each logical processor, irrespective of software executing in single thread or multiple threads. If one logical processor is not active in Intel microarchitecture code name Ivy Bridge, then a single thread executing on that processor core can use the 56 entries in the micro-op queue" but I do not know what "not active" means, and how dynamic it is. Some of that partitioning may be entirely static and depend on the early BIOS disabling of HT, and even if we park the cores, the resources will just be wasted. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org>
2018-06-21x86/smp: Provide topology_is_primary_thread()Thomas Gleixner
If the CPU is supporting SMT then the primary thread can be found by checking the lower APIC ID bits for zero. smp_num_siblings is used to build the mask for the APIC ID bits which need to be taken into account. This uses the MPTABLE or ACPI/MADT supplied APIC ID, which can be different than the initial APIC ID in CPUID. But according to AMD the lower bits have to be consistent. Intel gave a tentative confirmation as well. Preparatory patch to support disabling SMT at boot/runtime. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org>
2018-04-02Merge branch 'x86-apic-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic updates from Ingo Molnar: "The main x86 APIC/IOAPIC changes in this cycle were: - Robustify kexec support to more carefully restore IRQ hardware state before calling into kexec/kdump kernels. (Baoquan He) - Clean up the local APIC code a bit (Dou Liyang) - Remove unused callbacks (David Rientjes)" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic: Finish removing unused callbacks x86/apic: Drop logical_smp_processor_id() inline x86/apic: Modernize the pending interrupt code x86/apic: Move pending interrupt check code into it's own function x86/apic: Set up through-local-APIC mode on the boot CPU if 'noapic' specified x86/apic: Rename variables and functions related to x86_io_apic_ops x86/apic: Remove the (now) unused disable_IO_APIC() function x86/apic: Fix restoring boot IRQ mode in reboot and kexec/kdump x86/apic: Split disable_IO_APIC() into two functions to fix CONFIG_KEXEC_JUMP=y x86/apic: Split out restore_boot_irq_mode() from disable_IO_APIC() x86/apic: Make setup_local_APIC() static x86/apic: Simplify init_bsp_APIC() usage x86/x2apic: Mark set_x2apic_phys_mode() as __init
2018-03-01x86/apic: Drop logical_smp_processor_id() inlineDou Liyang
The logical_smp_processor_id() inline which is only called in setup_local_APIC() on x86_32 systems has no real value. Drop it and directly use GET_APIC_LOGICAL_ID() at the call site and use a more suitable variable name for readability Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: andy.shevchenko@gmail.com Cc: bhe@redhat.com Cc: ebiederm@xmission.com Link: https://lkml.kernel.org/r/20180301055930.2396-4-douly.fnst@cn.fujitsu.com
2018-03-01x86/apic: Modernize the pending interrupt codeDou Liyang
The pending interrupt check code is old, update the following: - Use for_each_set_bit() instead of open coding it - Replace printk() with pr_err() - Get rid of printk line breaks - Make curly braces balanced Suggested-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: bhe@redhat.com Cc: ebiederm@xmission.com Link: https://lkml.kernel.org/r/20180301055930.2396-3-douly.fnst@cn.fujitsu.com
2018-03-01x86/apic: Move pending interrupt check code into it's own functionDou Liyang
The pending interrupt check code is mixed with the local APIC setup code, that looks messy. Extract the related code, move it into a new function named apic_pending_intr_clear(). Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: bhe@redhat.com Cc: ebiederm@xmission.com Link: https://lkml.kernel.org/r/20180301055930.2396-2-douly.fnst@cn.fujitsu.com
2018-02-17x86/apic: Set up through-local-APIC mode on the boot CPU if 'noapic' specifiedBaoquan He
Currently the kdump kernel becomes very slow if 'noapic' is specified. Normal kernel doesn't have this bug. Kernel parameter 'noapic' is used to disable IO-APIC in system for testing or special purpose. Here the root cause is that in kdump kernel LAPIC is disabled since commit: 522e664644 ("x86/apic: Disable I/O APIC before shutdown of the local APIC") In this case we need set up through-local-APIC on boot CPU in setup_local_APIC(). In normal kernel the legacy irq mode is enabled by the BIOS. If it is virtual wire mode, the local-APIC has been enabled and set as through-local-APIC. Though we fixed the regression introduced by commit 522e664644, to further improve robustness set up the through-local-APIC mode explicitly, do not rely on the default boot IRQ mode. Signed-off-by: Baoquan He <bhe@redhat.com> Reviewed-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: douly.fnst@cn.fujitsu.com Cc: joro@8bytes.org Cc: prarit@redhat.com Cc: uobergfe@redhat.com Link: http://lkml.kernel.org/r/20180214054656.3780-7-bhe@redhat.com [ Rewrote the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-16x86/apic: Make setup_local_APIC() staticDou Liyang
This function isn't used outside of apic.c, so let's mark it static. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bhe@redhat.com Cc: ebiederm@xmission.com Link: http://lkml.kernel.org/r/20180214062554.21020-1-douly.fnst@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-15x86/cpu: Rename cpu_data.x86_mask to cpu_data.x86_steppingJia Zhang
x86_mask is a confusing name which is hard to associate with the processor's stepping. Additionally, correct an indent issue in lib/cpu.c. Signed-off-by: Jia Zhang <qianyue.zj@alibaba-inc.com> [ Updated it to more recent kernels. ] Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bp@alien8.de Cc: tony.luck@intel.com Link: http://lkml.kernel.org/r/1514771530-70829-1-git-send-email-qianyue.zj@alibaba-inc.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-14Revert "x86/apic: Remove init_bsp_APIC()"Ville Syrjälä
This reverts commit b371ae0d4a194b178817b0edfb6a7395c7aec37a. It causes boot hangs on old P3/P4 systems when the local APIC is enforced in UP mode. Reported-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: yinghai@kernel.org Cc: bhe@redhat.com Link: https://lkml.kernel.org/r/20171128145350.21560-1-ville.syrjala@linux.intel.com