linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2019-07-25	btrfs: Fix deadlock caused by missing memory barrier	Nikolay Borisov
	Commit 06297d8cefca ("btrfs: switch extent_buffer blocking_writers from atomic to int") changed the type of blocking_writers but forgot to adjust relevant code in btrfs_tree_unlock by converting the smp_mb__after_atomic to smp_mb. This opened up the possibility of a deadlock due to re-ordering of setting blocking_writers and checking/waking up the waiter. This particular lockup is explained in a comment above waitqueue_active() function. Fix it by converting the memory barrier to a full smp_mb, accounting for the fact that blocking_writers is a simple integer. Fixes: 06297d8cefca ("btrfs: switch extent_buffer blocking_writers from atomic to int") Tested-by: Johannes Thumshirn <jthumshirn@suse.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-25	perf/x86/intel: Mark expected switch fall-throughs	Gustavo A. R. Silva
	In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. This patch fixes the following warnings: arch/x86/events/intel/core.c: In function ‘intel_pmu_init’: arch/x86/events/intel/core.c:4959:8: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/events/intel/core.c:5008:8: warning: this statement may fall through [-Wimplicit-fallthrough=] Warning level 3 was used: -Wimplicit-fallthrough=3 This patch is part of the ongoing efforts to enable -Wimplicit-fallthrough. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190624161913.GA32270@embeddedor Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25	perf/core: Fix creating kernel counters for PMUs that override event->cpu	Leonard Crestez
	Some hardware PMU drivers will override perf_event.cpu inside their event_init callback. This causes a lockdep splat when initialized through the kernel API: WARNING: CPU: 0 PID: 250 at kernel/events/core.c:2917 ctx_sched_out+0x78/0x208 pc : ctx_sched_out+0x78/0x208 Call trace: ctx_sched_out+0x78/0x208 __perf_install_in_context+0x160/0x248 remote_function+0x58/0x68 generic_exec_single+0x100/0x180 smp_call_function_single+0x174/0x1b8 perf_install_in_context+0x178/0x188 perf_event_create_kernel_counter+0x118/0x160 Fix this by calling perf_install_in_context with event->cpu, just like perf_event_open Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Frank Li <Frank.li@nxp.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/c4ebe0503623066896d7046def4d6b1e06e0eb2e.1563972056.git.leonard.crestez@nxp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25	perf/x86: Apply more accurate check on hypervisor platform	Zhenzhong Duan
	check_msr is used to fix a bug report in guest where KVM doesn't support LBR MSR and cause #GP. The msr check is bypassed on real HW to workaround a false failure, see commit d0e1a507bdc7 ("perf/x86/intel: Disable check_msr for real HW") When running a guest with CONFIG_HYPERVISOR_GUEST not set or "nopv" enabled, current check isn't enough and #GP could trigger. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1564022366-18293-1-git-send-email-zhenzhong.duan@oracle.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25	perf/x86/intel: Fix invalid Bit 13 for Icelake MSR_OFFCORE_RSP_x register	Yunying Sun
	The Intel SDM states that bit 13 of Icelake's MSR_OFFCORE_RSP_x register is valid, and used for counting hardware generated prefetches of L3 cache. Update the bitmask to allow bit 13. Before: $ perf stat -e cpu/event=0xb7,umask=0x1,config1=0x1bfff/u sleep 3 Performance counter stats for 'sleep 3': <not supported> cpu/event=0xb7,umask=0x1,config1=0x1bfff/u After: $ perf stat -e cpu/event=0xb7,umask=0x1,config1=0x1bfff/u sleep 3 Performance counter stats for 'sleep 3': 9,293 cpu/event=0xb7,umask=0x1,config1=0x1bfff/u Signed-off-by: Yunying Sun <yunying.sun@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: alexander.shishkin@linux.intel.com Cc: bp@alien8.de Cc: hpa@zytor.com Cc: jolsa@redhat.com Cc: namhyung@kernel.org Link: https://lkml.kernel.org/r/20190724082932.12833-1-yunying.sun@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25	perf/x86/intel: Fix SLOTS PEBS event constraint	Kan Liang
	Sampling SLOTS event and ref-cycles event in a group on Icelake gives EINVAL. SLOTS event is the event stands for the fixed counter 3, not fixed counter 2. Wrong mask was set to SLOTS event in intel_icl_pebs_event_constraints[]. Reported-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 6017608936c1 ("perf/x86/intel: Add Icelake support") Link: https://lkml.kernel.org/r/20190723200429.8180-1-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25	locking/mutex: Test for initialized mutex	Sebastian Andrzej Siewior
	An uninitialized/ zeroed mutex will go unnoticed because there is no check for it. There is a magic check in the unlock's slowpath path which might go unnoticed if the unlock happens in the fastpath. Add a ->magic check early in the mutex_lock() and mutex_trylock() path. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190703092125.lsdf4gpsh2plhavb@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25	locking/lockdep: Clean up #ifdef checks	Arnd Bergmann
	As Will Deacon points out, CONFIG_PROVE_LOCKING implies TRACE_IRQFLAGS, so the conditions I added in the previous patch, and some others in the same file can be simplified by only checking for the former. No functional change. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Yuyang Du <duyuyang@gmail.com> Fixes: 886532aee3cd ("locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING") Link: https://lkml.kernel.org/r/20190628102919.2345242-1-arnd@arndb.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25	locking/lockdep: Hide unused 'class' variable	Arnd Bergmann
	The usage is now hidden in an #ifdef, so we need to move the variable itself in there as well to avoid this warning: kernel/locking/lockdep_proc.c:203:21: error: unused variable 'class' [-Werror,-Wunused-variable] Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qian Cai <cai@lca.pw> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yuyang Du <duyuyang@gmail.com> Cc: frederic@kernel.org Fixes: 68d41d8c94a3 ("locking/lockdep: Fix lock used or unused stats error") Link: https://lkml.kernel.org/r/20190715092809.736834-1-arnd@arndb.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25	locking/rwsem: Add ACQUIRE comments	Peter Zijlstra
	Since we just reviewed read_slowpath for ACQUIRE correctness, add a few coments to retain our findings. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25	tty/ldsem, locking/rwsem: Add missing ACQUIRE to read_failed sleep loop	Peter Zijlstra
	While reviewing rwsem down_slowpath, Will noticed ldsem had a copy of a bug we just found for rwsem. X = 0; CPU0 CPU1 rwsem_down_read() for (;;) { set_current_state(TASK_UNINTERRUPTIBLE); X = 1; rwsem_up_write(); rwsem_mark_wake() atomic_long_add(adjustment, &sem->count); smp_store_release(&waiter->task, NULL); if (!waiter.task) break; ... } r = X; Allows 'r == 0'. Reported-by: Will Deacon <will@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 4898e640caf0 ("tty: Add timed, writer-prioritized rw semaphore") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25	lcoking/rwsem: Add missing ACQUIRE to read_slowpath sleep loop	Peter Zijlstra
	While reviewing another read_slowpath patch, both Will and I noticed another missing ACQUIRE, namely: X = 0; CPU0 CPU1 rwsem_down_read() for (;;) { set_current_state(TASK_UNINTERRUPTIBLE); X = 1; rwsem_up_write(); rwsem_mark_wake() atomic_long_add(adjustment, &sem->count); smp_store_release(&waiter->task, NULL); if (!waiter.task) break; ... } r = X; Allows 'r == 0'. Reported-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reported-by: Will Deacon <will@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25	locking/rwsem: Add missing ACQUIRE to read_slowpath exit when queue is empty	Jan Stancek
	LTP mtest06 has been observed to occasionally hit "still mapped when deleted" and following BUG_ON on arm64. The extra mapcount originated from pagefault handler, which handled pagefault for vma that has already been detached. vma is detached under mmap_sem write lock by detach_vmas_to_be_unmapped(), which also invalidates vmacache. When the pagefault handler (under mmap_sem read lock) calls find_vma(), vmacache_valid() wrongly reports vmacache as valid. After rwsem down_read() returns via 'queue empty' path (as of v5.2), it does so without an ACQUIRE on sem->count: down_read() __down_read() rwsem_down_read_failed() __rwsem_down_read_failed_common() raw_spin_lock_irq(&sem->wait_lock); if (list_empty(&sem->wait_list)) { if (atomic_long_read(&sem->count) >= 0) { raw_spin_unlock_irq(&sem->wait_lock); return sem; The problem can be reproduced by running LTP mtest06 in a loop and building the kernel (-j $NCPUS) in parallel. It does reproduces since v4.20 on arm64 HPE Apollo 70 (224 CPUs, 256GB RAM, 2 nodes). It triggers reliably in about an hour. The patched kernel ran fine for 10+ hours. Signed-off-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Will Deacon <will@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dbueso@suse.de Fixes: 4b486b535c33 ("locking/rwsem: Exit read lock slowpath if queue empty & no writer") Link: https://lkml.kernel.org/r/50b8914e20d1d62bb2dee42d342836c2c16ebee7.1563438048.git.jstancek@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25	locking/rwsem: Don't call owner_on_cpu() on read-owner	Waiman Long
	For writer, the owner value is cleared on unlock. For reader, it is left intact on unlock for providing better debugging aid on crash dump and the unlock of one reader may not mean the lock is free. As a result, the owner_on_cpu() shouldn't be used on read-owner as the task pointer value may not be valid and it might have been freed. That is the case in rwsem_spin_on_owner(), but not in rwsem_can_spin_on_owner(). This can lead to use-after-free error from KASAN. For example, BUG: KASAN: use-after-free in rwsem_down_write_slowpath (/home/miguel/kernel/linux/kernel/locking/rwsem.c:669 /home/miguel/kernel/linux/kernel/locking/rwsem.c:1125) Fix this by checking for RWSEM_READER_OWNED flag before calling owner_on_cpu(). Reported-by: Luis Henriques <lhenriques@suse.com> Tested-by: Luis Henriques <lhenriques@suse.com> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: huang ying <huang.ying.caritas@gmail.com> Fixes: 94a9717b3c40e ("locking/rwsem: Make rwsem->owner an atomic_long_t") Link: https://lkml.kernel.org/r/81e82d5b-5074-77e8-7204-28479bbe0df0@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25	sched/fair: Use RCU accessors consistently for ->numa_group	Jann Horn
	The old code used RCU annotations and accessors inconsistently for ->numa_group, which can lead to use-after-frees and NULL dereferences. Let all accesses to ->numa_group use proper RCU helpers to prevent such issues. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Fixes: 8c8a743c5087 ("sched/numa: Use {cpu, pid} to create task groups for shared faults") Link: https://lkml.kernel.org/r/20190716152047.14424-3-jannh@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25	sched/fair: Don't free p->numa_faults with concurrent readers	Jann Horn
	When going through execve(), zero out the NUMA fault statistics instead of freeing them. During execve, the task is reachable through procfs and the scheduler. A concurrent /proc/*/sched reader can read data from a freed ->numa_faults allocation (confirmed by KASAN) and write it back to userspace. I believe that it would also be possible for a use-after-free read to occur through a race between a NUMA fault and execve(): task_numa_fault() can lead to task_numa_compare(), which invokes task_weight() on the currently running task of a different CPU. Another way to fix this would be to make ->numa_faults RCU-managed or add extra locking, but it seems easier to wipe the NUMA fault statistics on execve. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Fixes: 82727018b0d3 ("sched/numa: Call task_numa_free() from do_execve()") Link: https://lkml.kernel.org/r/20190716152047.14424-1-jannh@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25	test_firmware: fix a memory leak bug	Wenwen Wang
	In test_firmware_init(), the buffer pointed to by the global pointer 'test_fw_config' is allocated through kzalloc(). Then, the buffer is initialized in __test_firmware_config_init(). In the case that the initialization fails, the following execution in test_firmware_init() needs to be terminated with an error code returned to indicate this failure. However, the allocated buffer is not freed on this execution path, leading to a memory leak bug. To fix the above issue, free the allocated buffer before returning from test_firmware_init(). Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Link: https://lore.kernel.org/r/1563084696-6865-1-git-send-email-wang6495@umn.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-25	hpet: Fix division by zero in hpet_time_div()	Kefeng Wang
	The base value in do_div() called by hpet_time_div() is truncated from unsigned long to uint32_t, resulting in a divide-by-zero exception. UBSAN: Undefined behaviour in ../drivers/char/hpet.c:572:2 division by zero CPU: 1 PID: 23682 Comm: syz-executor.3 Not tainted 4.4.184.x86_64+ #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 0000000000000000 b573382df1853d00 ffff8800a3287b98 ffffffff81ad7561 ffff8800a3287c00 ffffffff838b35b0 ffffffff838b3860 ffff8800a3287c20 0000000000000000 ffff8800a3287bb0 ffffffff81b8f25e ffffffff838b35a0 Call Trace: [<ffffffff81ad7561>] __dump_stack lib/dump_stack.c:15 [inline] [<ffffffff81ad7561>] dump_stack+0xc1/0x120 lib/dump_stack.c:51 [<ffffffff81b8f25e>] ubsan_epilogue+0x12/0x8d lib/ubsan.c:166 [<ffffffff81b900cb>] __ubsan_handle_divrem_overflow+0x282/0x2c8 lib/ubsan.c:262 [<ffffffff823560dd>] hpet_time_div drivers/char/hpet.c:572 [inline] [<ffffffff823560dd>] hpet_ioctl_common drivers/char/hpet.c:663 [inline] [<ffffffff823560dd>] hpet_ioctl_common.cold+0xa8/0xad drivers/char/hpet.c:577 [<ffffffff81e63d56>] hpet_ioctl+0xc6/0x180 drivers/char/hpet.c:676 [<ffffffff81711590>] vfs_ioctl fs/ioctl.c:43 [inline] [<ffffffff81711590>] file_ioctl fs/ioctl.c:470 [inline] [<ffffffff81711590>] do_vfs_ioctl+0x6e0/0xf70 fs/ioctl.c:605 [<ffffffff81711eb4>] SYSC_ioctl fs/ioctl.c:622 [inline] [<ffffffff81711eb4>] SyS_ioctl+0x94/0xc0 fs/ioctl.c:613 [<ffffffff82846003>] tracesys_phase2+0x90/0x95 The main C reproducer autogenerated by syzkaller, syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0); memcpy((void*)0x20000100, "/dev/hpet\000", 10); syscall(__NR_openat, 0xffffffffffffff9c, 0x20000100, 0, 0); syscall(__NR_ioctl, r[0], 0x40086806, 0x40000000000000); Fix it by using div64_ul(). Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Zhang HongJun <zhanghongjun2@huawei.com> Cc: stable <stable@vger.kernel.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20190711132757.130092-1-wangkefeng.wang@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-25	eeprom: make older eeprom drivers select NVMEM_SYSFS	Arseny Solokha
	misc/eeprom/{at24,at25,eeprom_93xx46} drivers all register their corresponding devices in the nvmem framework in compat mode which requires nvmem sysfs interface to be present. The latter, however, has been split out from nvmem under a separate Kconfig in commit ae0c2d725512 ("nvmem: core: add NVMEM_SYSFS Kconfig"). As a result, probing certain I2C-attached EEPROMs now fails with at24: probe of 0-0050 failed with error -38 because of a stub implementation of nvmem_sysfs_setup_compat() in drivers/nvmem/nvmem.h. Update the nvmem dependency for these drivers so they could load again: at24 0-0050: 32768 byte 24c256 EEPROM, writable, 64 bytes/write Cc: Adrian Bunk <bunk@kernel.org> Cc: Bartosz Golaszewski <brgl@bgdev.pl> Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Arseny Solokha <asolokha@kb.kras.ru> Link: https://lore.kernel.org/r/20190716111236.27803-1-asolokha@kb.kras.ru Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-25	vt: Grab console_lock around con_is_bound in show_bind	Daniel Vetter
	Not really harmful not to, but also not harm in grabbing the lock. And this shuts up a new WARNING I introduced in commit ddde3c18b700 ("vt: More locking checks"). Reported-by: Jens Remus <jremus@linux.ibm.com> Cc: linux-kernel@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linux-fbdev@vger.kernel.org Cc: linux-s390@vger.kernel.org Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Martin Hostettler <textshell@uchuujin.de> Cc: Adam Borowski <kilobyte@angband.pl> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Sam Ravnborg <sam@ravnborg.org> Fixes: ddde3c18b700 ("vt: More locking checks") Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Tested-by: Jens Remus <jremus@linux.ibm.com> Acked-by: Sam Ravnborg <sam@ravnborg.org> Link: https://lore.kernel.org/r/20190718080903.22622-1-daniel.vetter@ffwll.ch Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-25	vmw_balloon: Remove Julien from the maintainers list	Nadav Amit
	Julien will not be a maintainer anymore. Signed-off-by: Nadav Amit <namit@vmware.com> Link: https://lore.kernel.org/r/20190702100519.7464-1-namit@vmware.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-25	x86/speculation/mds: Apply more accurate check on hypervisor platform	Zhenzhong Duan
	X86_HYPER_NATIVE isn't accurate for checking if running on native platform, e.g. CONFIG_HYPERVISOR_GUEST isn't set or "nopv" is enabled. Checking the CPU feature bit X86_FEATURE_HYPERVISOR to determine if it's running on native platform is more accurate. This still doesn't cover the platforms on which X86_FEATURE_HYPERVISOR is unsupported, e.g. VMware, but there is nothing which can be done about this scenario. Fixes: 8a4b06d391b0 ("x86/speculation/mds: Add sysfs reporting for MDS") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1564022349-17338-1-git-send-email-zhenzhong.duan@oracle.com
2019-07-25	x86/hpet: Undo the early counter is counting check	Thomas Gleixner
	Rui reported that on a Pentium D machine which has HPET forced enabled because it is not advertised by ACPI, the early counter is counting check leads to a silent boot hang. The reason is that the ordering of checking the counter first and then reconfiguring the HPET fails to work on that machine. As the HPET is not advertised and presumably not initialized by the BIOS the early enable and the following reconfiguration seems to bring it into a broken state. Adding clocksource=jiffies to the command line results in the following clocksource watchdog warning: clocksource: timekeeping watchdog on CPU1: Marking clocksource 'tsc-early' as unstable because the skew is too large: clocksource: 'hpet' wd_now: 33 wd_last: 33 mask: ffffffff That clearly shows that the HPET is not counting after it got reconfigured and reenabled. If the counter is not working then the HPET timer is not expiring either, which explains the boot hang. Move the counter is counting check after the full configuration again to unbreak these systems. Reported-by: Rui Salvaterra <rsalvaterra@gmail.com> Fixes: 3222daf970f3 ("x86/hpet: Separate counter check out of clocksource register code") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Rui Salvaterra <rsalvaterra@gmail.com> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907250810530.1791@nanos.tec.linutronix.de
2019-07-25	tty: serial: netx: Delete driver	Linus Walleij
	The Netx ARM machine was deleted from the kernel. This driver had no users and has to go. Cc: Robert Schwebel <r.schwebel@pengutronix.de> Cc: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20190722065146.4844-1-linus.walleij@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-25	xhci: Fix crash if scatter gather is used with Immediate Data Transfer (IDT).	Mathias Nyman
	A second regression was found in the immediate data transfer (IDT) support which was added to 5.2 kernel IDT is used to transfer small amounts of data (up to 8 bytes) in the field normally used for data dma address, thus avoiding dma mapping. If the data was not already dma mapped, then IDT support assumed data was in urb->transfer_buffer, and did not take into accound that even small amounts of data (8 bytes) can be in a scatterlist instead. This caused a NULL pointer dereference when sg_dma_len() was used with non-dma mapped data. Solve this by not using IDT if scatter gather buffer list is used. Fixes: 33e39350ebd2 ("usb: xhci: add Immediate Data Transfer support") Cc: <stable@vger.kernel.org> # v5.2 Reported-by: Maik Stohn <maik.stohn@seal-one.com> Tested-by: Maik Stohn <maik.stohn@seal-one.com> CC: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/1564044861-1445-1-git-send-email-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-25	usb: usb251xb: Reallow swap-dx-lanes to apply to the upstream port	Lucas Stach
	This is a partial revert of 73d31def1aab "usb: usb251xb: Create a ports field collector method", which broke a existing devicetree (arch/arm64/boot/dts/freescale/imx8mq.dtsi). There is no reason why the swap-dx-lanes property should not apply to the upstream port. The reason given in the breaking commit was that it's inconsitent with respect to other port properties, but in fact it is not. All other properties which only apply to the downstream ports explicitly reject port 0, so there is pretty strong precedence that the driver referred to the upstream port as port 0. So there is no inconsistency in this property at all, other than the swapping being also applicable to the upstream port. CC: stable@vger.kernel.org #5.2 Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Link: https://lore.kernel.org/r/20190719084407.28041-3-l.stach@pengutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-25	Revert "usb: usb251xb: Add US port lanes inversion property"	Lucas Stach
	This property isn't needed and not yet used anywhere. The swap-dx-lanes property is perfectly fine for doing the swap on the upstream port lanes. CC: stable@vger.kernel.org #5.2 Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Link: https://lore.kernel.org/r/20190719084407.28041-2-l.stach@pengutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-25	Revert "usb: usb251xb: Add US lanes inversion dts-bindings"	Lucas Stach
	This reverts commit 3342ce35a1, as there is no need for this separate property and it breaks compatibility with existing devicetree files (arch/arm64/boot/dts/freescale/imx8mq.dtsi). CC: stable@vger.kernel.org #5.2 Fixes: 3342ce35a183 ("usb: usb251xb: Add US lanes inversion dts-bindings") Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Link: https://lore.kernel.org/r/20190719084407.28041-1-l.stach@pengutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-25	iomap: fix Invalid License ID	Masahiro Yamada
	Detected by: $ ./scripts/spdxcheck.py fs/iomap/Makefile: 1:27 Invalid License ID: GPL-2.0-or-newer Fixes: 1c230208f53d ("iomap: start moving code to fs/iomap/") Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-25	treewide: remove SPDX "WITH Linux-syscall-note" from kernel-space headers again	Masahiro Yamada
	The "WITH Linux-syscall-note" exception exists for headers exported to user space. It is strange to add it to non-exported headers. Commit 687a3e4d8e61 ("treewide: remove SPDX "WITH Linux-syscall-note" from kernel-space headers") did cleanups some months ago, but it looks like we need to do this periodically. This patch was generated by the following script: git grep -l -e Linux-syscall-note \ -- :.h :^arch//include/uapi/asm/.h :^include/uapi/ :^tools \| while read file do sed -i -e 's/($GPL-[^[:space:]]$ WITH Linux-syscall-note)/\1/g' \ -e 's/ WITH Linux-syscall-note//g' $file done I did not commit drivers/staging/android/uapi/ion.h . This header is not currently exported, but somebody may plan to move it to include/uapi/ when the time comes. I am not sure. Anyway, it will be better to check the license inconsistency in drivers/staging/android/uapi/. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-25	treewide: add "WITH Linux-syscall-note" to SPDX tag of uapi headers	Masahiro Yamada
	UAPI headers licensed under GPL are supposed to have exception "WITH Linux-syscall-note" so that they can be included into non-GPL user space application code. The exception note is missing in some UAPI headers. Some of them slipped in by the treewide conversion commit b24413180f56 ("License cleanup: add SPDX GPL-2.0 license identifier to files with no license"). Just run: $ git show --oneline b24413180f56 -- arch/x86/include/uapi/asm/ I believe they are not intentional, and should be fixed too. This patch was generated by the following script: git grep -l --not -e Linux-syscall-note --and -e SPDX-License-Identifier \ -- :arch//include/uapi/asm/.h :include/uapi/ :^/Kbuild \| while read file do sed -i -e '/[[:space:]]OR[[:space:]]/s/$GPL-[^[:space:]]$/(\1 WITH Linux-syscall-note)/g' \ -e '/[[:space:]]or[[:space:]]/s/$GPL-[^[:space:]]$/(\1 WITH Linux-syscall-note)/g' \ -e '/[[:space:]]OR[[:space:]]/!{/[[:space:]]or[[:space:]]/!s/$GPL-[^[:space:]]$/\1 WITH Linux-syscall-note/g}' $file done After this patch is applied, there are 5 UAPI headers that do not contain "WITH Linux-syscall-note". They are kept untouched since this exception applies only to GPL variants. $ git grep --not -e Linux-syscall-note --and -e SPDX-License-Identifier \ -- :arch//include/uapi/asm/.h :include/uapi/ :^/Kbuild include/uapi/drm/panfrost_drm.h:/ SPDX-License-Identifier: MIT / include/uapi/linux/batman_adv.h:/ SPDX-License-Identifier: MIT / include/uapi/linux/qemu_fw_cfg.h:/ SPDX-License-Identifier: BSD-3-Clause / include/uapi/linux/vbox_err.h:/ SPDX-License-Identifier: MIT / include/uapi/linux/virtio_iommu.h:/ SPDX-License-Identifier: BSD-3-Clause */ Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-25	Merge branch 'pm-cpufreq'	Rafael J. Wysocki
	* pm-cpufreq: cpufreq/pasemi: fix use-after-free in pas_cpufreq_cpu_init()
2019-07-25	usb: wusbcore: fix unbalanced get/put cluster_id	Phong Tran
	syzboot reported that https://syzkaller.appspot.com/bug?extid=fd2bd7df88c606eea4ef There is not consitency parameter in cluste_id_get/put calling. In case of getting the id with result is failure, the wusbhc->cluster_id will not be updated and this can not be used for wusb_cluster_id_put(). Tested report https://groups.google.com/d/msg/syzkaller-bugs/0znZopp3-9k/oxOrhLkLEgAJ Reproduce and gdb got the details: 139 addr = wusb_cluster_id_get(); (gdb) n 140 if (addr == 0) (gdb) print addr $1 = 254 '\376' (gdb) n 142 result = __hwahc_set_cluster_id(hwahc, addr); (gdb) print result $2 = -71 (gdb) break wusb_cluster_id_put Breakpoint 3 at 0xffffffff836e3f20: file drivers/usb/wusbcore/wusbhc.c, line 384. (gdb) s Thread 2 hit Breakpoint 3, wusb_cluster_id_put (id=0 '\000') at drivers/usb/wusbcore/wusbhc.c:384 384 id = 0xff - id; (gdb) n 385 BUG_ON(id >= CLUSTER_IDS); (gdb) print id $3 = 255 '\377' Reported-by: syzbot+fd2bd7df88c606eea4ef@syzkaller.appspotmail.com Signed-off-by: Phong Tran <tranmanphong@gmail.com> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20190724020601.15257-1-tranmanphong@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-25	usb/hcd: Fix a NULL vs IS_ERR() bug in usb_hcd_setup_local_mem()	Dan Carpenter
	The devm_memremap() function doesn't return NULL, it returns error pointers. Fixes: b0310c2f09bb ("USB: use genalloc for USB HCs with local memory") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20190607135709.GC16718@mwanda Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-25	usb-storage: Add a limitation for blk_queue_max_hw_sectors()	Yoshihiro Shimoda
	This patch fixes an issue that the following error happens on swiotlb environment: xhci-hcd ee000000.usb: swiotlb buffer is full (sz: 524288 bytes), total 32768 (slots), used 1338 (slots) On the kernel v5.1, block settings of a usb-storage with SuperSpeed were the following so that the block layer will allocate buffers up to 64 KiB, and then the issue didn't happen. max_segment_size = 65536 max_hw_sectors_kb = 1024 After the commit 09324d32d2a0 ("block: force an unlimited segment size on queues with a virt boundary") is applied, the block settings are the following. So, the block layer will allocate buffers up to 1024 KiB, and then the issue happens: max_segment_size = 4294967295 max_hw_sectors_kb = 1024 To fix the issue, the usb-storage driver checks the maximum size of a mapping for the device and then adjusts the max_hw_sectors_kb if required. After this patch is applied, the block settings will be the following, and then the issue doesn't happen. max_segment_size = 4294967295 max_hw_sectors_kb = 256 Fixes: 09324d32d2a0 ("block: force an unlimited segment size on queues with a virt boundary") Cc: stable <stable@vger.kernel.org> Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/1563793105-20597-1-git-send-email-yoshihiro.shimoda.uh@renesas.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-25	usb: pci-quirks: Minor cleanup for AMD PLL quirk	Ryan Kennedy
	usb_amd_find_chipset_info() is used for chipset detection for several quirks. It is strange that its return value indicates the need for the PLL quirk, which means it is often ignored. This patch adds a function specifically for checking the PLL quirk like the other ones. Additionally, rename probe_result to something more appropriate. Signed-off-by: Ryan Kennedy <ryan5544@gmail.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/20190704153529.9429-3-ryan5544@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-25	usb: pci-quirks: Correct AMD PLL quirk detection	Ryan Kennedy
	The AMD PLL USB quirk is incorrectly enabled on newer Ryzen chipsets. The logic in usb_amd_find_chipset_info currently checks for unaffected chipsets rather than affected ones. This broke once a new chipset was added in e788787ef. It makes more sense to reverse the logic so it won't need to be updated as new chipsets are added. Note that the core of the workaround in usb_amd_quirk_pll does correctly check the chipset. Signed-off-by: Ryan Kennedy <ryan5544@gmail.com> Fixes: e788787ef4f9 ("usb:xhci:Add quirk for Certain failing HP keyboard on reset after resume") Cc: stable <stable@vger.kernel.org> Acked-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/20190704153529.9429-2-ryan5544@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-25	ALSA: hda - Add a conexant codec entry to let mute led work	Hui Wang
	This conexant codec isn't in the supported codec list yet, the hda generic driver can drive this codec well, but on a Lenovo machine with mute/mic-mute leds, we need to apply CXT_FIXUP_THINKPAD_ACPI to make the leds work. After adding this codec to the list, the driver patch_conexant.c will apply THINKPAD_ACPI to this machine. Cc: stable@vger.kernel.org Signed-off-by: Hui Wang <hui.wang@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-07-25	objtool: Improve UACCESS coverage	Peter Zijlstra
	A clang build reported an (obvious) double CLAC while a GCC build did not; it turns out that objtool only re-visits instructions if the first visit was with AC=0. If OTOH the first visit was with AC=1, it completely ignores any subsequent visit, even when it has AC=0. Fix this by using a visited mask instead of a boolean, and (explicitly) mark the AC state. $ ./objtool check -b --no-fp --retpoline --uaccess drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: .altinstr_replacement+0x22: redundant UACCESS disable drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: eb_copy_relocations.isra.34()+0xea: (alt) drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: .altinstr_replacement+0xffffffffffffffff: (branch) drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: eb_copy_relocations.isra.34()+0xd9: (alt) drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: eb_copy_relocations.isra.34()+0xb2: (branch) drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: eb_copy_relocations.isra.34()+0x39: (branch) drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: eb_copy_relocations.isra.34()+0x0: <=== (func) Reported-by: Josh Poimboeuf <jpoimboe@redhat.com> Reported-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Link: https://github.com/ClangBuiltLinux/linux/issues/617 Link: https://lkml.kernel.org/r/5359166aad2d53f3145cd442d83d0e5115e0cd17.1564007838.git.jpoimboe@redhat.com
2019-07-25	ALSA: hda - Fix intermittent CORB/RIRB stall on Intel chips	Takashi Iwai
	It turned out that the recent Intel HD-audio controller chips show a significant stall during the system PM resume intermittently. It doesn't happen so often and usually it may read back successfully after one or more seconds, but in some rare worst cases the driver went into fallback mode. After trial-and-error, we found out that the communication stall seems covered by issuing the sync after each verb write, as already done for AMD and other chipsets. So this patch enables the write-sync flag for the recent Intel chips, Skylake and onward, as a workaround. Also, since Broxton and co have the very same driver flags as Skylake, refer to the Skylake driver flags instead of defining the same contents again for simplification. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=201901 Reported-and-tested-by: Todd Brandt <todd.e.brandt@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-07-24	dt-bindings: interrupt-controller: al-fic: remove redundant binding	Talel Shenhar
	Remove dt binding description for standard binding. Signed-off-by: Talel Shenhar <talel@amazon.com> Signed-off-by: Rob Herring <robh@kernel.org>
2019-07-24	ktest: Fix some typos in config-bisect.pl	Masanari Iida
	This patch fixes some spelling typos in config-bisect.pl Link: http://lkml.kernel.org/r/20190723032445.14220-1-standby24x7@gmail.com Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-07-24	access: avoid the RCU grace period for the temporary subjective credentials	Linus Torvalds
	It turns out that 'access()' (and 'faccessat()') can cause a lot of RCU work because it installs a temporary credential that gets allocated and freed for each system call. The allocation and freeing overhead is mostly benign, but because credentials can be accessed under the RCU read lock, the freeing involves a RCU grace period. Which is not a huge deal normally, but if you have a lot of access() calls, this causes a fair amount of seconday damage: instead of having a nice alloc/free patterns that hits in hot per-CPU slab caches, you have all those delayed free's, and on big machines with hundreds of cores, the RCU overhead can end up being enormous. But it turns out that all of this is entirely unnecessary. Exactly because access() only installs the credential as the thread-local subjective credential, the temporary cred pointer doesn't actually need to be RCU free'd at all. Once we're done using it, we can just free it synchronously and avoid all the RCU overhead. So add a 'non_rcu' flag to 'struct cred', which can be set by users that know they only use it in non-RCU context (there are other potential users for this). We can make it a union with the rcu freeing list head that we need for the RCU case, so this doesn't need any extra storage. Note that this also makes 'get_current_cred()' clear the new non_rcu flag, in case we have filesystems that take a long-term reference to the cred and then expect the RCU delayed freeing afterwards. It's not entirely clear that this is required, but it makes for clear semantics: the subjective cred remains non-RCU as long as you only access it synchronously using the thread-local accessors, but you _can_ use it as a generic cred if you want to. It is possible that we should just remove the whole RCU markings for ->cred entirely. Only ->real_cred is really supposed to be accessed through RCU, and the long-term cred copies that nfs uses might want to explicitly re-enable RCU freeing if required, rather than have get_current_cred() do it implicitly. But this is a "minimal semantic changes" change for the immediate problem. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Jan Glauber <jglauber@marvell.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Jayachandran Chandrasekharan Nair <jnair@marvell.com> Cc: Greg KH <greg@kroah.com> Cc: Kees Cook <keescook@chromium.org> Cc: David Howells <dhowells@redhat.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-24	Merge tag 'powerpc-5.3-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "An assortment of non-regression fixes that have accumulated since the start of the merge window. - A fix for a user triggerable oops on machines where transactional memory is disabled, eg. Power9 bare metal, Power8 with TM disabled on the command line, or all Power7 or earlier machines. - Three fixes for handling of PMU and power saving registers when running nested KVM on Power9. - Two fixes for bugs found while stress testing the XIVE interrupt controller code, also on Power9. - A fix to allow guests to boot under Qemu/KVM on Power9 using the the Hash MMU with >= 1TB of memory. - Two fixes for bugs in the recent DMA cleanup, one of which could lead to checkstops. - And finally three fixes for the PAPR SCM nvdimm driver. Thanks to: Alexey Kardashevskiy, Andrea Arcangeli, Cédric Le Goater, Christoph Hellwig, David Gibson, Gautham R. Shenoy, Michael Neuling, Oliver O'Halloran, Satheesh Rajendran, Shawn Anastasio, Suraj Jitindar Singh, Vaibhav Jain" * tag 'powerpc-5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/papr_scm: Force a scm-unbind if initial scm-bind fails powerpc/papr_scm: Update drc_pmem_unbind() to use H_SCM_UNBIND_ALL powerpc/pseries: Update SCM hcall op-codes in hvcall.h powerpc/tm: Fix oops on sigreturn on systems without TM powerpc/dma: Fix invalid DMA mmap behavior KVM: PPC: Book3S HV: XIVE: fix rollback when kvmppc_xive_create fails powerpc/xive: Fix loop exit-condition in xive_find_target_in_mask() powerpc: fix off by one in max_zone_pfn initialization for ZONE_DMA KVM: PPC: Book3S HV: Save and restore guest visible PSSCR bits on pseries powerpc/pmu: Set pmcregs_in_use in paca when running as LPAR KVM: PPC: Book3S HV: Always save guest pmu for guest capable of nesting powerpc/mm: Limit rma_size to 1TB when running without HV mode
2019-07-24	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds
	Pull KVM fixes from Paolo Bonzini: "Bugfixes, a pvspinlock optimization, and documentation moving" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: X86: Boost queue head vCPU to mitigate lock waiter preemption Documentation: move Documentation/virtual to Documentation/virt KVM: nVMX: Set cached_vmcs12 and cached_shadow_vmcs12 NULL after free KVM: X86: Dynamically allocate user_fpu KVM: X86: Fix fpu state crash in kvm guest Revert "kvm: x86: Use task structs fpu field for user" KVM: nVMX: Clear pending KVM_REQ_GET_VMCS12_PAGES when leaving nested
2019-07-24	Merge tag 'dma-mapping-5.3-2' of git://git.infradead.org/users/hch/dma-mapping	Linus Torvalds
	Pull dma-mapping regression fix from Christoph Hellwig: "Ensure that dma_addressing_limited doesn't crash on devices without a dma mask (Eric Auger)" * tag 'dma-mapping-5.3-2' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: use dma_get_mask in dma_addressing_limited
2019-07-24	selinux: check sidtab limit before adding a new entry	Ondrej Mosnacek
	We need to error out when trying to add an entry above SIDTAB_MAX in sidtab_reverse_lookup() to avoid overflow on the odd chance that this happens. Cc: stable@vger.kernel.org Fixes: ee1a84fdfeed ("selinux: overhaul sidtab to fix bug and improve performance") Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-07-24	dt-bindings: clk: allwinner,sun4i-a10-ccu: Correct path in $id	Rob Herring
	The path in the schema '$id' value is wrong. Fix it. Cc: Michael Turquette <mturquette@baylibre.com> Cc: Stephen Boyd <sboyd@kernel.org> Cc: Chen-Yu Tsai <wens@csie.org> Cc: linux-clk@vger.kernel.org Acked-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: Rob Herring <robh@kernel.org>
2019-07-24	KVM: X86: Boost queue head vCPU to mitigate lock waiter preemption	Wanpeng Li
	Commit 11752adb (locking/pvqspinlock: Implement hybrid PV queued/unfair locks) introduces hybrid PV queued/unfair locks - queued mode (no starvation) - unfair mode (good performance on not heavily contended lock) The lock waiter goes into the unfair mode especially in VMs with over-commit vCPUs since increaing over-commitment increase the likehood that the queue head vCPU may have been preempted and not actively spinning. However, reschedule queue head vCPU timely to acquire the lock still can get better performance than just depending on lock stealing in over-subscribe scenario. Testing on 80 HT 2 socket Xeon Skylake server, with 80 vCPUs VM 80GB RAM: ebizzy -M vanilla boosting improved 1VM 23520 25040 6% 2VM 8000 13600 70% 3VM 3100 5400 74% The lock holder vCPU yields to the queue head vCPU when unlock, to boost queue head vCPU which is involuntary preemption or the one which is voluntary halt due to fail to acquire the lock after a short spin in the guest. Cc: Waiman Long <longman@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-24	x86/entry/32: Pass cr2 to do_async_page_fault()	Matt Mullins
	Commit a0d14b8909de ("x86/mm, tracing: Fix CR2 corruption") added the address parameter to do_async_page_fault(), but does not pass it from the 32-bit entry point. To plumb it through, factor-out common_exception_read_cr2 in the same fashion as common_exception, and uses it from both page_fault and async_page_fault. For a 32-bit KVM guest, this fixes: Run /sbin/init as init process Starting init: /sbin/init exists but couldn't execute it (error -14) Fixes: a0d14b8909de ("x86/mm, tracing: Fix CR2 corruption") Signed-off-by: Matt Mullins <mmullins@fb.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190724042058.24506-1-mmullins@fb.com