summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-06-21nvme-pci: limit max IO size and segments to avoid high order allocationsJens Axboe
nvme requires an sg table allocation for each request. If the request is large, then the allocation can become quite large. For instance, with our default software settings of 1280KB IO size, we'll need 10248 bytes of sg table. That turns into a 2nd order allocation, which we can't always guarantee. If we fail the allocation, blk-mq will retry it later. But there's no guarantee that we'll EVER be able to allocate that much contigious memory. Limit the IO size such that we never need more than a single page of memory. That's a lot faster and more reliable. Then back that allocation with a mempool, so that we know we'll always be able to succeed the allocation at some point. Signed-off-by: Jens Axboe <axboe@kernel.dk> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-06-21locking/lockdep: Do not record IRQ state within lockdep codeSteven Rostedt (VMware)
While debugging where things were going wrong with mapping enabling/disabling interrupts with the lockdep state and actual real enabling and disabling interrupts, I had to silent the IRQ disabling/enabling in debug_check_no_locks_freed() because it was always showing up as it was called before the splat was. Use raw_local_irq_save/restore() for not only debug_check_no_locks_freed() but for all internal lockdep functions, as they hide useful information about where interrupts were used incorrectly last. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Link: https://lkml.kernel.org/lkml/20180404140630.3f4f4c7a@gandalf.local.home Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21KVM: arm64: Prevent KVM_COMPAT from being selectedMarc Zyngier
There is very little point in trying to support the 32bit KVM/arm API on arm64, and this was never an anticipated use case. Let's make it clear by not selecting KVM_COMPAT. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-06-21KVM: Enforce error in ioctl for compat tasks when !KVM_COMPATMarc Zyngier
The current behaviour of the compat ioctls is a bit odd. We provide a compat_ioctl method when KVM_COMPAT is set, and NULL otherwise. But NULL means that the normal, non-compat ioctl should be used directly for compat tasks, and there is no way to actually prevent a compat task from issueing KVM ioctls. This patch changes this behaviour, by always registering a compat_ioctl method, even if KVM_COMPAT is not selected. In that case, the callback will always return -EINVAL. Fixes: de8e5d744051568c8aad ("KVM: Disable compat ioctl for s390") Reported-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-06-21kernel.h: Fix a typo in commentWei Wang
Signed-off-by: Wei Wang <wvw@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@suse.de> Cc: Crt Mori <cmo@melexis.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: gregkh@linuxfoundation.org Cc: wei.vince.wang@gmail.com Link: https://lkml.kernel.org/lkml/20180424212241.16013-1-wvw@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21uprobes/x86: Remove incorrect WARN_ON() in uprobe_init_insn()Oleg Nesterov
insn_get_length() has the side-effect of processing the entire instruction but only if it was decoded successfully, otherwise insn_complete() can fail and in this case we need to just return an error without warning. Reported-by: syzbot+30d675e3ca03c1c351e7@syzkaller.appspotmail.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: syzkaller-bugs@googlegroups.com Link: https://lkml.kernel.org/lkml/20180518162739.GA5559@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21nvme-pci: move nvme_kill_queues to nvme_remove_dead_ctrlJianchao Wang
There is race between nvme_remove and nvme_reset_work that can lead to io hang. nvme_remove nvme_reset_work -> nvme_remove_dead_ctrl -> nvme_dev_disable -> quiesce request_queue -> queue remove_work -> cancel_work_sync reset_work -> nvme_remove_namespaces -> splice ctrl->namespaces nvme_remove_dead_ctrl_work -> nvme_kill_queues -> nvme_ns_remove do nothing -> blk_cleanup_queue -> blk_freeze_queue Finally, the request_queue is quiesced state when wait freeze, we will get io hang here. To fix it, move the nvme_kill_queues from nvme_remove_dead_ctrl_work to nvme_remove_dead_ctrl. Suggested-by: Keith Busch <keith.busch@linux.intel.com> Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-06-21x86/platform/UV: Add kernel parameter to set memory block sizemike.travis@hpe.com
Add a kernel parameter that allows setting UV memory block size. This is to provide an adjustment for new forms of PMEM and other DIMM memory that might require alignment restrictions other than scanning the global address table for the required minimum alignment. The value set will be further adjusted by both the GAM range table scan as well as restrictions imposed by set_memory_block_size_order(). Signed-off-by: Mike Travis <mike.travis@hpe.com> Reviewed-by: Andrew Banman <andrew.banman@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dan.j.williams@intel.com Cc: jgross@suse.com Cc: kirill.shutemov@linux.intel.com Cc: mhocko@suse.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/lkml/20180524201711.854849120@stormcage.americas.sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21x86/platform/UV: Use new set memory block size functionmike.travis@hpe.com
Add a call to the new function to "adjust" the current fixed UV memory block size of 2GB so it can be changed to a different physical boundary. This accommodates changes in the Intel BIOS, and therefore UV BIOS, which now can align boundaries different than the previous UV standard of 2GB. It also flags any UV Global Address boundaries from BIOS that cause a change in the mem block size (boundary). The current boundary of 2GB has been used on UV since the first system release in 2009 with Linux 2.6 and has worked fine. But the new NVDIMM persistent memory modules (PMEM), along with the Intel BIOS changes to support these modules caused the memory block size boundary to be set to a lower limit. Intel only guarantees that this minimum boundary at 64MB though the current Linux limit is 128MB. Note that the default remains 2GB if no changes occur. Signed-off-by: Mike Travis <mike.travis@hpe.com> Reviewed-by: Andrew Banman <andrew.banman@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dan.j.williams@intel.com Cc: jgross@suse.com Cc: kirill.shutemov@linux.intel.com Cc: mhocko@suse.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/lkml/20180524201711.732785782@stormcage.americas.sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21x86/platform/UV: Add adjustable set memory block size functionmike.travis@hpe.com
Add a new function to "adjust" the current fixed UV memory block size of 2GB so it can be changed to a different physical boundary. This is out of necessity so arch dependent code can accommodate specific BIOS requirements which can align these new PMEM modules at less than the default boundaries. A "set order" type of function was used to insure that the memory block size will be a power of two value without requiring a validity check. 64GB was chosen as the upper limit for memory block size values to accommodate upcoming 4PB systems which have 6 more bits of physical address space (46 becoming 52). Signed-off-by: Mike Travis <mike.travis@hpe.com> Reviewed-by: Andrew Banman <andrew.banman@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dan.j.williams@intel.com Cc: jgross@suse.com Cc: kirill.shutemov@linux.intel.com Cc: mhocko@suse.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/lkml/20180524201711.609546602@stormcage.americas.sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21x86/spectre_v1: Disable compiler optimizations over array_index_mask_nospec()Dan Williams
Mark Rutland noticed that GCC optimization passes have the potential to elide necessary invocations of the array_index_mask_nospec() instruction sequence, so mark the asm() volatile. Mark explains: "The volatile will inhibit *some* cases where the compiler could lift the array_index_nospec() call out of a branch, e.g. where there are multiple invocations of array_index_nospec() with the same arguments: if (idx < foo) { idx1 = array_idx_nospec(idx, foo) do_something(idx1); } < some other code > if (idx < foo) { idx2 = array_idx_nospec(idx, foo); do_something_else(idx2); } ... since the compiler can determine that the two invocations yield the same result, and reuse the first result (likely the same register as idx was in originally) for the second branch, effectively re-writing the above as: if (idx < foo) { idx = array_idx_nospec(idx, foo); do_something(idx); } < some other code > if (idx < foo) { do_something_else(idx); } ... if we don't take the first branch, then speculatively take the second, we lose the nospec protection. There's more info on volatile asm in the GCC docs: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Volatile " Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@vger.kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: babdde2698d4 ("x86: Implement array_index_mask_nospec") Link: https://lkml.kernel.org/lkml/152838798950.14521.4893346294059739135.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21Merge branches 'acpi-soc' and 'acpi-processor'Rafael J. Wysocki
These are a stable-candidate suspend/resume fix of the ACPI driver for Intel SoCs (LPSS) and an inline stub fix for the ACPI processor driver. * acpi-soc: ACPI / LPSS: Avoid PM quirks on suspend and resume from S3 * acpi-processor: ACPI / processor: Finish making acpi_processor_ppc_has_changed() void
2018-06-21Merge branch 'pm-tools'Rafael J. Wysocki
These are turbostat utility updates for 4.18-rc2 including two fixes for recent regressions and some minor extensions. * pm-tools: tools/power turbostat: version 18.06.20 tools/power turbostat: add the missing command line switches tools/power turbostat: add single character tokens to help tools/power turbostat: alphabetize the help output tools/power turbostat: fix segfault on 'no node' machines tools/power turbostat: add optional APIC X2APIC columns tools/power turbostat: decode cpuid.1.HT tools/power turbostat: fix show/hide issues resulting from mis-merge
2018-06-21x86/pti: Don't report XenPV as vulnerableJiri Kosina
Xen PV domain kernel is not by design affected by meltdown as it's enforcing split CR3 itself. Let's not report such systems as "Vulnerable" in sysfs (we're also already forcing PTI to off in X86_HYPER_XEN_PV cases); the security of the system ultimately depends on presence of mitigation in the Hypervisor, which can't be easily detected from DomU; let's report that. Reported-and-tested-by: Mike Latimer <mlatimer@suse.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Juergen Gross <jgross@suse.com> Cc: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1806180959080.6203@cbobk.fhfr.pm [ Merge the user-visible string into a single line. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21Merge branches 'pm-core' and 'pm-opp'Rafael J. Wysocki
These are a PM core fix and an OPP framework fix for 4.18-rc2, both "stable" material. * pm-core: PM / core: Fix supplier device runtime PM usage counter imbalance * pm-opp: PM / OPP: Update voltage in case freq == old_freq
2018-06-21x86/build: Remove unnecessary preparation for purgatoryMasahiro Yamada
kexec-purgatory.c is properly generated when Kbuild descend into the arch/x86/purgatory/. Thus the 'archprepare' target is redundant. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Marek <michal.lkml@markovi.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/1529401422-28838-3-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21Revert "kexec/purgatory: Add clean-up for purgatory directory"Masahiro Yamada
Reverts the following commit: b0108f9e93d0 ("kexec: purgatory: add clean-up for purgatory directory") ... which incorrectly stated that the kexec-purgatory.c and purgatory.ro files were not removed after 'make mrproper'. In fact, they are. You can confirm it after reverting it. $ make mrproper $ touch arch/x86/purgatory/kexec-purgatory.c $ touch arch/x86/purgatory/purgatory.ro $ make mrproper CLEAN arch/x86/purgatory $ ls arch/x86/purgatory/ entry64.S Makefile purgatory.c setup-x86_64.S stack.S string.c This is obvious from the build system point of view. arch/x86/Makefile adds 'arch/x86' to core-y. Hence 'make clean' descends like this: arch/x86/Kbuild -> arch/x86/purgatory/Makefile Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Marek <michal.lkml@markovi.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/1529401422-28838-2-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21KVM: arm/arm64: add WARN_ON if size is not PAGE_SIZE aligned in ↵Jia He
unmap_stage2_range There is a panic in armv8a server(QDF2400) under memory pressure tests (start 20 guests and run memhog in the host). ---------------------------------begin-------------------------------- [35380.800950] BUG: Bad page state in process qemu-kvm pfn:dd0b6 [35380.805825] page:ffff7fe003742d80 count:-4871 mapcount:-2126053375 mapping: (null) index:0x0 [35380.815024] flags: 0x1fffc00000000000() [35380.818845] raw: 1fffc00000000000 0000000000000000 0000000000000000 ffffecf981470000 [35380.826569] raw: dead000000000100 dead000000000200 ffff8017c001c000 0000000000000000 [35380.805825] page:ffff7fe003742d80 count:-4871 mapcount:-2126053375 mapping: (null) index:0x0 [35380.815024] flags: 0x1fffc00000000000() [35380.818845] raw: 1fffc00000000000 0000000000000000 0000000000000000 ffffecf981470000 [35380.826569] raw: dead000000000100 dead000000000200 ffff8017c001c000 0000000000000000 [35380.834294] page dumped because: nonzero _refcount [...] --------------------------------end-------------------------------------- The root cause might be what was fixed at [1]. But from the KVM points of view, it would be better if the issue was caught earlier. If the size is not PAGE_SIZE aligned, unmap_stage2_range might unmap the wrong(more or less) page range. Hence it caused the "BUG: Bad page state" Let's WARN in that case, so that the issue is obvious. [1] https://lkml.org/lkml/2018/5/3/1042 Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: jia.he@hxt-semitech.com [maz: tidied up commit message] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-06-21rseq/cleanup: Do not abort rseq c.s. in child on fork()Mathieu Desnoyers
Considering that we explicitly forbid system calls in rseq critical sections, it is not valid to issue a fork or clone system call within a rseq critical section, so rseq_fork() is not required to restart an active rseq c.s. in the child process. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ben Maurer <bmaurer@fb.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Lameter <cl@linux.com> Cc: Dave Watson <davejwatson@fb.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Link: https://lore.kernel.org/lkml/20180619133230.4087-4-mathieu.desnoyers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21rseq/selftests/arm: Align 'struct rseq_cs' on 32 bytesMathieu Desnoyers
uapi/linux/rseq.h aligns 'struct rseq_cs' on 32 bytes. Satisfy this alignment requirement in its definition within the rseq-arm.h inline assembly as well. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Hunter <ahh@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ben Maurer <bmaurer@fb.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Lameter <cl@linux.com> Cc: Dave Watson <davejwatson@fb.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Link: https://lore.kernel.org/lkml/20180619133230.4087-3-mathieu.desnoyers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21rseq/selftests: Make run_param_test.sh executableMathieu Desnoyers
The executable bit of the run_param_test.sh script got lost in the merge. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Hunter <ahh@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ben Maurer <bmaurer@fb.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Lameter <cl@linux.com> Cc: Dave Watson <davejwatson@fb.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Link: https://lore.kernel.org/lkml/20180619133230.4087-2-mathieu.desnoyers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21x86/xen: Add call of speculative_store_bypass_ht_init() to PV pathsJuergen Gross
Commit: 1f50ddb4f418 ("x86/speculation: Handle HT correctly on AMD") ... added speculative_store_bypass_ht_init() to the per-CPU initialization sequence. speculative_store_bypass_ht_init() needs to be called on each CPU for PV guests, too. Reported-by: Brian Woods <brian.woods@amd.com> Tested-by: Brian Woods <brian.woods@amd.com> Signed-off-by: Juergen Gross <jgross@suse.com> Cc: <stable@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boris.ostrovsky@oracle.com Cc: xen-devel@lists.xenproject.org Fixes: 1f50ddb4f4189243c05926b842dc1a0332195f31 ("x86/speculation: Handle HT correctly on AMD") Link: https://lore.kernel.org/lkml/20180621084331.21228-1-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21drm/bridge/sii8620: fix display of packed pixel modes in MHL2Maciej Purski
Currently packed pixel modes in MHL2 can't be displayed. The device automatically recognizes output format, so setting format other than RGB causes failure. Fix it by writing proper values to registers. Tested on MHL1 and MHL2 using various vendors' dongles both in DVI and HDMI mode. Signed-off-by: Maciej Purski <m.purski@samsung.com> Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Link: https://patchwork.freedesktop.org/patch/msgid/1516706239-9104-1-git-send-email-m.purski@samsung.com
2018-06-21KVM: arm64: Avoid mistaken attempts to save SVE state for vcpusDave Martin
Commit e6b673b ("KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing") uses fpsimd_save() to save the FPSIMD state for a vcpu when scheduling the vcpu out. However, currently current's value of TIF_SVE is restored before calling fpsimd_save() which means that fpsimd_save() may erroneously attempt to save SVE state from the vcpu. This enables current's vector state to be polluted with guest data. current->thread.sve_state may be unallocated or not large enough, so this can also trigger a NULL dereference or buffer overrun. Instead of this, TIF_SVE should be configured properly for the guest when calling fpsimd_save() with the vcpu context loaded. This patch ensures this by delaying restoration of current's TIF_SVE until after the call to fpsimd_save(). Fixes: e6b673b741ea ("KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing") Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-06-21KVM: arm64/sve: Fix SVE trap restoration for non-current tasksDave Martin
Commit e6b673b ("KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing") attempts to restore the configuration of userspace SVE trapping via a call to fpsimd_bind_task_to_cpu(), but the logic for determining when to do this is not correct. The patch makes the errnoenous assumption that the only task that may try to enter userspace with the currently loaded FPSIMD/SVE register content is current. This may not be the case however: if some other user task T is scheduled on the CPU during the execution of the KVM run loop, and the vcpu does not try to use the registers in the meantime, then T's state may be left there intact. If T happens to be the next task to enter userspace on this CPU then the hooks for reloading the register state and configuring traps will be skipped. (Also, current never has SVE state at this point anyway and should always have the trap enabled, as a side-effect of the ioctl() syscall needed to reach the KVM run loop in the first place.) This patch instead restores the state of the EL0 trap from the state observed at the most recent vcpu_load(), ensuring that the trap is set correctly for the loaded context (if any). Fixes: e6b673b741ea ("KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing") Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-06-21KVM: arm64: Don't mask softirq with IRQs disabled in vcpu_put()Dave Martin
Commit e6b673b ("KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing") introduces a specific helper kvm_arch_vcpu_put_fp() for saving the vcpu FPSIMD state during vcpu_put(). This function uses local_bh_disable()/_enable() to protect the FPSIMD context manipulation from interruption by softirqs. This approach is not correct, because vcpu_put() can be invoked either from the KVM host vcpu thread (when exiting the vcpu run loop), or via a preempt notifier. In the former case, only preemption is disabled. In the latter case, the function is called from inside __schedule(), which means that IRQs are disabled. Use of local_bh_disable()/_enable() with IRQs disabled is considerd an error, resulting in lockdep splats while running VMs if lockdep is enabled. This patch disables IRQs instead of attempting to disable softirqs, avoiding the problem of calling local_bh_enable() with IRQs disabled in the __schedule() path. This creates an additional interrupt blackout during vcpu run loop exit, but this is the rare case and the blackout latency is still less than that of __schedule(). Fixes: e6b673b741ea ("KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing") Reported-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-06-21arm64: Introduce sysreg_clear_set()Mark Rutland
Currently we have a couple of helpers to manipulate bits in particular sysregs: * config_sctlr_el1(u32 clear, u32 set) * change_cpacr(u64 val, u64 mask) The parameters of these differ in naming convention, order, and size, which is unfortunate. They also differ slightly in behaviour, as change_cpacr() skips the sysreg write if the bits are unchanged, which is a useful optimization when sysreg writes are expensive. Before we gain yet another sysreg manipulation function, let's unify these with a common helper, providing a consistent order for clear/set operands, and the write skipping behaviour from change_cpacr(). Code will be migrated to the new helper in subsequent patches. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Dave Martin <dave.martin@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-06-21KVM: arm/arm64: Drop resource size check for GICV windowArd Biesheuvel
When booting a 64 KB pages kernel on a ACPI GICv3 system that implements support for v2 emulation, the following warning is produced GICV size 0x2000 not a multiple of page size 0x10000 and support for v2 emulation is disabled, preventing GICv2 VMs from being able to run on such hosts. The reason is that vgic_v3_probe() performs a sanity check on the size of the window (it should be a multiple of the page size), while the ACPI MADT parsing code hardcodes the size of the window to 8 KB. This makes sense, considering that ACPI does not bother to describe the size in the first place, under the assumption that platforms implementing ACPI will follow the architecture and not put anything else in the same 64 KB window. So let's just drop the sanity check altogether, and assume that the window is at least 64 KB in size. Fixes: 909777324588 ("KVM: arm/arm64: vgic-new: vgic_init: implement kvm_vgic_hyp_init") Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-06-21nvme-fc: release io queues to allow fast failJames Smart
Rather than leaving io queues quiesced after tearing down an association, restart them. This allows ios to be replayed, with fastfail ios terminating and non-fastfail getting into loops of retry. This follows rdma's lead. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Sagi Grimberg <sagi@grimber.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-06-20nbd: Add the nbd NBD_DISCONNECT_ON_CLOSE config flag.Doron Roberts-Kedes
If NBD_DISCONNECT_ON_CLOSE is set on a device, then the driver will issue a disconnect from nbd_release if the device has no remaining bdev->bd_openers. Fix ret val so reconfigure with only setting the flag succeeds. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-21Merge branch 'drm-fixes-4.18' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie
into drm-fixes Bunch of amdgpu fixes mostly all going to stable. Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180620190021.2775-1-alexander.deucher@amd.com
2018-06-21Merge branch 'turbostat' of ↵Rafael J. Wysocki
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull turbostat utility changes for 4.18-rc2 from Len Brown. "This includes two regression fixes, plus a couple more random, but worthy, patches." * 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbostat: version 18.06.20 tools/power turbostat: add the missing command line switches tools/power turbostat: add single character tokens to help tools/power turbostat: alphabetize the help output tools/power turbostat: fix segfault on 'no node' machines tools/power turbostat: add optional APIC X2APIC columns tools/power turbostat: decode cpuid.1.HT tools/power turbostat: fix show/hide issues resulting from mis-merge
2018-06-21Documentation: intel_pstate: Fix typoRafael J. Wysocki
Fix a typo in the intel_pstate admin-guide documentation. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-06-21Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "Here are eight fairly small fixes collected over the last two weeks. Regression and crashing bug fixes: - mlx4/5: Fixes for issues found from various checkers - A resource tracking and uverbs regression in the core code - qedr: NULL pointer regression found during testing - rxe: Various small bugs" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: IB/rxe: Fix missing completion for mem_reg work requests RDMA/core: Save kernel caller name when creating CQ using ib_create_cq() IB/uverbs: Fix ordering of ucontext check in ib_uverbs_write IB/mlx4: Fix an error handling path in 'mlx4_ib_rereg_user_mr()' RDMA/qedr: Fix NULL pointer dereference when running over iWARP without RDMA-CM IB/mlx5: Fix return value check in flow_counters_set_data() IB/mlx5: Fix memory leak in mlx5_ib_create_flow IB/rxe: avoid double kfree skb
2018-06-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix crash on bpf_prog_load() errors, from Daniel Borkmann. 2) Fix ATM VCC memory accounting, from David Woodhouse. 3) fib6_info objects need RCU freeing, from Eric Dumazet. 4) Fix SO_BINDTODEVICE handling for TCP sockets, from David Ahern. 5) Fix clobbered error code in enic_open() failure path, from Govindarajulu Varadarajan. 6) Propagate dev_get_valid_name() error returns properly, from Li RongQing. 7) Fix suspend/resume in davinci_emac driver, from Bartosz Golaszewski. 8) Various act_ife fixes (recursive locking, IDR leaks, etc.) from Davide Caratti. 9) Fix buggy checksum handling in sungem driver, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (40 commits) ip: limit use of gso_size to udp stmmac: fix DMA channel hang in half-duplex mode net: stmmac: socfpga: add additional ocp reset line for Stratix10 net: sungem: fix rx checksum support bpfilter: ignore binary files bpfilter: fix build error net/usb/drivers: Remove useless hrtimer_active check net/sched: act_ife: preserve the action control in case of error net/sched: act_ife: fix recursive lock and idr leak net: ethernet: fix suspend/resume in davinci_emac net: propagate dev_get_valid_name return code enic: do not overwrite error code net/tcp: Fix socket lookups with SO_BINDTODEVICE ptp: replace getnstimeofday64() with ktime_get_real_ts64() net/ipv6: respect rcu grace period before freeing fib6_info net: net_failover: fix typo in net_failover_slave_register() ipvlan: use ETH_MAX_MTU as max mtu net: hamradio: use eth_broadcast_addr enic: initialize enic->rfs_h.lock in enic_probe MAINTAINERS: Add Sam as the maintainer for NCSI ...
2018-06-20block: sed-opal: Fix a couple off by one bugsDan Carpenter
resp->num is the number of tokens in resp->tok[]. It gets set in response_parse(). So if n == resp->num then we're reading beyond the end of the data. Fixes: 455a7b238cd6 ("block: Add Sed-opal library") Reviewed-by: Scott Bauer <scott.bauer@intel.com> Tested-by: Scott Bauer <scott.bauer@intel.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-20tools/power turbostat: version 18.06.20Len Brown
Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-20tools/power turbostat: add the missing command line switchesNathan Ciobanu
Document the missing command line tokens in the help() function. Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-20tools/power turbostat: add single character tokens to helpNathan Ciobanu
Improve the help() output by adding the single character tokens (e.g -a). Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-20tools/power turbostat: alphabetize the help outputNathan Ciobanu
Sort the command line arguments output of help() in alphabetical order in line with other linux tools. Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-20tools/power turbostat: fix segfault on 'no node' machinesNathan Ciobanu
Running turbostat on machines that don't expose nodes in sysfs (no /sys/bus/node) causes a segfault or a -nan value diesplayed in the log. This is caused by physical_node_id being reported as -1 and logical_node_id being calculated as a negative number resulting in the new GET_THREAD/GET_CORE returning an incorrect address. Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-20tools/power turbostat: add optional APIC X2APIC columnsLen Brown
Add APIC and X2APIC columns to the topology section. They are disabled-by-default -- enable like so: --debug or --enable APIC,X2APIC Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-20tools/power turbostat: decode cpuid.1.HTLen Brown
eg. the "HT" here: CPUID(1): SSE3 MONITOR - EIST TM2 TSC MSR ACPI-TM HT TM Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-20tools/power turbostat: fix show/hide issues resulting from mis-mergeLen Brown
The --show and --hide options failed on "Node", which was listed as "Node%". The --show and --hide options were generally fouled-up do due to come content merges that scrambled the list of column name indexes. Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-20blk-mq-debugfs: Off by one in blk_mq_rq_state_name()Dan Carpenter
If rq_state == ARRAY_SIZE() then we read one element beyond the end of the blk_mq_rq_state_name_array[] array. Fixes: ec6dcf63c55c ("blk-mq-debugfs: Show more request state information") Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-20nvmet: reset keep alive timer in controller enableMax Gurtuvoy
Controllers that are not yet enabled should not really enforce keep alive timeouts, but we still want to track a timeout and cleanup in case a host died before it enabled the controller. Hence, simply reset the keep alive timer when the controller is enabled. Suggested-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-06-20nvme-rdma: don't override opts->queue_sizeSagi Grimberg
That is user argument, and theoretically controller limits can change over time (over reconnects/resets). Instead, use the sqsize controller attribute to check queue depth boundaries and use it to the tagset allocation. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-06-20nvme-rdma: Fix command completion race at error recoveryIsrael Rukshin
The race is between completing the request at error recovery work and rdma completions. If we cancel the request before getting the good rdma completion we get a NULL deref of the request MR at nvme_rdma_process_nvme_rsp(). When Canceling the request we return its mr to the mr pool (set mr to NULL) and also unmap its data. Canceling the requests while the rdma queues are active is not safe. Because rdma queues are active and we get good rdma completions that can use the mr pointer which may be NULL. Completing the request too soon may lead also to performing DMA to/from user buffers which might have been already unmapped. The commit fixes the race by draining the QP before starting the abort commands mechanism. Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-06-20nvme-rdma: fix possible free of a non-allocated async event bufferSagi Grimberg
If nvme_rdma_configure_admin_queue fails before we allocated the async event buffer, we will falsly free it because nvme_rdma_free_queue is freeing it. Fix it by allocating the buffer right after nvme_rdma_alloc_queue and free it right before nvme_rdma_queue_free to maintain orderly reverse cleanup sequence. Reported-by: Israel Rukshin <israelr@mellanox.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-06-20nvme-rdma: fix possible double free condition when failing to create a ↵Sagi Grimberg
controller Failures after nvme_init_ctrl will defer resource cleanups to .free_ctrl when the reference is released, hence we should not free the controller queues for these failures. Fix that by moving controller queues allocation before controller initialization and correctly freeing them for failures before initialization and skip them for failures after initialization. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>