summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2016-08-16vlan: use a valid default mtu value for vlan over macsecPaolo Abeni
[ Upstream commit 18d3df3eab23796d7f852f9c6bb60962b8372ced ] macsec can't cope with mtu frames which need vlan tag insertion, and vlan device set the default mtu equal to the underlying dev's one. By default vlan over macsec devices use invalid mtu, dropping all the large packets. This patch adds a netif helper to check if an upper vlan device needs mtu reduction. The helper is used during vlan devices initialization to set a valid default and during mtu updating to forbid invalid, too bit, mtu values. The helper currently only check if the lower dev is a macsec device, if we get more users, we need to update only the helper (possibly reserving an additional IFF bit). Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-10radix-tree: fix radix_tree_iter_retry() for tagged iterators.Andrey Ryabinin
commit 3cb9185c67304b2a7ea9be73e7d13df6fb2793a1 upstream. radix_tree_iter_retry() resets slot to NULL, but it doesn't reset tags. Then NULL slot and non-zero iter.tags passed to radix_tree_next_slot() leading to crash: RIP: radix_tree_next_slot include/linux/radix-tree.h:473 find_get_pages_tag+0x334/0x930 mm/filemap.c:1452 .... Call Trace: pagevec_lookup_tag+0x3a/0x80 mm/swap.c:960 mpage_prepare_extent_to_map+0x321/0xa90 fs/ext4/inode.c:2516 ext4_writepages+0x10be/0x2b20 fs/ext4/inode.c:2736 do_writepages+0x97/0x100 mm/page-writeback.c:2364 __filemap_fdatawrite_range+0x248/0x2e0 mm/filemap.c:300 filemap_write_and_wait_range+0x121/0x1b0 mm/filemap.c:490 ext4_sync_file+0x34d/0xdb0 fs/ext4/fsync.c:115 vfs_fsync_range+0x10a/0x250 fs/sync.c:195 vfs_fsync fs/sync.c:209 do_fsync+0x42/0x70 fs/sync.c:219 SYSC_fdatasync fs/sync.c:232 SyS_fdatasync+0x19/0x20 fs/sync.c:230 entry_SYSCALL_64_fastpath+0x23/0xc1 arch/x86/entry/entry_64.S:207 We must reset iterator's tags to bail out from radix_tree_next_slot() and go to the slow-path in radix_tree_next_chunk(). Fixes: 46437f9a554f ("radix-tree: fix race in gang lookup") Link: http://lkml.kernel.org/r/1468495196-10604-1-git-send-email-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-10mm: memcontrol: fix cgroup creation failure after many small jobsJohannes Weiner
commit 73f576c04b9410ed19660f74f97521bee6e1c546 upstream. The memory controller has quite a bit of state that usually outlives the cgroup and pins its CSS until said state disappears. At the same time it imposes a 16-bit limit on the CSS ID space to economically store IDs in the wild. Consequently, when we use cgroups to contain frequent but small and short-lived jobs that leave behind some page cache, we quickly run into the 64k limitations of outstanding CSSs. Creating a new cgroup fails with -ENOSPC while there are only a few, or even no user-visible cgroups in existence. Although pinning CSSs past cgroup removal is common, there are only two instances that actually need an ID after a cgroup is deleted: cache shadow entries and swapout records. Cache shadow entries reference the ID weakly and can deal with the CSS having disappeared when it's looked up later. They pose no hurdle. Swap-out records do need to pin the css to hierarchically attribute swapins after the cgroup has been deleted; though the only pages that remain swapped out after offlining are tmpfs/shmem pages. And those references are under the user's control, so they are manageable. This patch introduces a private 16-bit memcg ID and switches swap and cache shadow entries over to using that. This ID can then be recycled after offlining when the CSS remains pinned only by objects that don't specifically need it. This script demonstrates the problem by faulting one cache page in a new cgroup and deleting it again: set -e mkdir -p pages for x in `seq 128000`; do [ $((x % 1000)) -eq 0 ] && echo $x mkdir /cgroup/foo echo $$ >/cgroup/foo/cgroup.procs echo trex >pages/$x echo $$ >/cgroup/cgroup.procs rmdir /cgroup/foo done When run on an unpatched kernel, we eventually run out of possible IDs even though there are no visible cgroups: [root@ham ~]# ./cssidstress.sh [...] 65000 mkdir: cannot create directory '/cgroup/foo': No space left on device After this patch, the IDs get released upon cgroup destruction and the cache and css objects get released once memory reclaim kicks in. [hannes@cmpxchg.org: init the IDR] Link: http://lkml.kernel.org/r/20160621154601.GA22431@cmpxchg.org Fixes: b2052564e66d ("mm: memcontrol: continue cache reclaim from offlined groups") Link: http://lkml.kernel.org/r/20160617162516.GD19084@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: John Garcia <john.garcia@mesosphere.io> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-10mm: thp: refix false positive BUG in page_move_anon_rmap()Hugh Dickins
commit 5a49973d7143ebbabd76e1dcd69ee42e349bb7b9 upstream. The VM_BUG_ON_PAGE in page_move_anon_rmap() is more trouble than it's worth: the syzkaller fuzzer hit it again. It's still wrong for some THP cases, because linear_page_index() was never intended to apply to addresses before the start of a vma. That's easily fixed with a signed long cast inside linear_page_index(); and Dmitry has tested such a patch, to verify the false positive. But why extend linear_page_index() just for this case? when the avoidance in page_move_anon_rmap() has already grown ugly, and there's no reason for the check at all (nothing else there is using address or index). Remove address arg from page_move_anon_rmap(), remove VM_BUG_ON_PAGE, remove CONFIG_DEBUG_VM PageTransHuge adjustment. And one more thing: should the compound_head(page) be done inside or outside page_move_anon_rmap()? It's usually pushed down to the lowest level nowadays (and mm/memory.c shows no other explicit use of it), so I think it's better done in page_move_anon_rmap() than by caller. Fixes: 0798d3c022dc ("mm: thp: avoid false positive VM_BUG_ON_PAGE in page_move_anon_rmap()") Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1607120444540.12528@eggly.anvils Signed-off-by: Hugh Dickins <hughd@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-10vmlinux.lds: account for destructor sectionsDmitry Vyukov
commit e41f501d391265ff568f3e49d6128cc30856a36f upstream. If CONFIG_KASAN is enabled and gcc is configured with --disable-initfini-array and/or gold linker is used, gcc emits .ctors/.dtors and .text.startup/.text.exit sections instead of .init_array/.fini_array. .dtors section is not explicitly accounted in the linker script and messes vvar/percpu layout. We want: ffffffff822bfd80 D _edata ffffffff822c0000 D __vvar_beginning_hack ffffffff822c0000 A __vvar_page ffffffff822c0080 0000000000000098 D vsyscall_gtod_data ffffffff822c1000 A __init_begin ffffffff822c1000 D init_per_cpu__irq_stack_union ffffffff822c1000 A __per_cpu_load ffffffff822d3000 D init_per_cpu__gdt_page We got: ffffffff8279a600 D _edata ffffffff8279b000 A __vvar_page ffffffff8279c000 A __init_begin ffffffff8279c000 D init_per_cpu__irq_stack_union ffffffff8279c000 A __per_cpu_load ffffffff8279e000 D __vvar_beginning_hack ffffffff8279e080 0000000000000098 D vsyscall_gtod_data ffffffff827ae000 D init_per_cpu__gdt_page This happens because __vvar_page and .vvar get different addresses in arch/x86/kernel/vmlinux.lds.S: . = ALIGN(PAGE_SIZE); __vvar_page = .; .vvar : AT(ADDR(.vvar) - LOAD_OFFSET) { /* work around gold bug 13023 */ __vvar_beginning_hack = .; Discard .dtors/.fini_array/.text.exit, since we don't call dtors. Merge .text.startup into init text. Link: http://lkml.kernel.org/r/1467386363-120030-1-git-send-email-dvyukov@google.com Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-10uapi: export lirc.h headerMauro Carvalho Chehab
commit 12cb22bb8ae9aff9d72a9c0a234f26d641b20eb6 upstream. This header contains the userspace API for lirc. This is a fixup for commit b7be755733dc ("[media] bz#75751: Move internal header file lirc.h to uapi/"). It moved the header to the right place, but it forgot to add it at Kbuild. So, despite being at uapi, it is not copied to the right place. Fixes: b7be755733dc44c72 ("[media] bz#75751: Move internal header file lirc.h to uapi/") Link: http://lkml.kernel.org/r/320c765d32bfc82c582e336d52ffe1026c73c644.1468439021.git.mchehab@s-opensource.com Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Cc: Alec Leamas <leamas.alec@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-10x86/quirks: Add early quirk to reset Apple AirPort cardLukas Wunner
commit abb2bafd295fe962bbadc329dbfb2146457283ac upstream. The EFI firmware on Macs contains a full-fledged network stack for downloading OS X images from osrecovery.apple.com. Unfortunately on Macs introduced 2011 and 2012, EFI brings up the Broadcom 4331 wireless card on every boot and leaves it enabled even after ExitBootServices has been called. The card continues to assert its IRQ line, causing spurious interrupts if the IRQ is shared. It also corrupts memory by DMAing received packets, allowing for remote code execution over the air. This only stops when a driver is loaded for the wireless card, which may be never if the driver is not installed or blacklisted. The issue seems to be constrained to the Broadcom 4331. Chris Milsted has verified that the newer Broadcom 4360 built into the MacBookPro11,3 (2013/2014) does not exhibit this behaviour. The chances that Apple will ever supply a firmware fix for the older machines appear to be zero. The solution is to reset the card on boot by writing to a reset bit in its mmio space. This must be done as an early quirk and not as a plain vanilla PCI quirk to successfully combat memory corruption by DMAed packets: Matthew Garrett found out in 2012 that the packets are written to EfiBootServicesData memory (http://mjg59.dreamwidth.org/11235.html). This type of memory is made available to the page allocator by efi_free_boot_services(). Plain vanilla PCI quirks run much later, in subsys initcall level. In-between a time window would be open for memory corruption. Random crashes occurring in this time window and attributed to DMAed packets have indeed been observed in the wild by Chris Bainbridge. When Matthew Garrett analyzed the memory corruption issue in 2012, he sought to fix it with a grub quirk which transitions the card to D3hot: http://git.savannah.gnu.org/cgit/grub.git/commit/?id=9d34bb85da56 This approach does not help users with other bootloaders and while it may prevent DMAed packets, it does not cure the spurious interrupts emanating from the card. Unfortunately the card's mmio space is inaccessible in D3hot, so to reset it, we have to undo the effect of Matthew's grub patch and transition the card back to D0. Note that the quirk takes a few shortcuts to reduce the amount of code: The size of BAR 0 and the location of the PM capability is identical on all affected machines and therefore hardcoded. Only the address of BAR 0 differs between models. Also, it is assumed that the BCMA core currently mapped is the 802.11 core. The EFI driver seems to always take care of this. Michael Büsch, Bjorn Helgaas and Matt Fleming contributed feedback towards finding the best solution to this problem. The following should be a comprehensive list of affected models: iMac13,1 2012 21.5" [Root Port 00:1c.3 = 8086:1e16] iMac13,2 2012 27" [Root Port 00:1c.3 = 8086:1e16] Macmini5,1 2011 i5 2.3 GHz [Root Port 00:1c.1 = 8086:1c12] Macmini5,2 2011 i5 2.5 GHz [Root Port 00:1c.1 = 8086:1c12] Macmini5,3 2011 i7 2.0 GHz [Root Port 00:1c.1 = 8086:1c12] Macmini6,1 2012 i5 2.5 GHz [Root Port 00:1c.1 = 8086:1e12] Macmini6,2 2012 i7 2.3 GHz [Root Port 00:1c.1 = 8086:1e12] MacBookPro8,1 2011 13" [Root Port 00:1c.1 = 8086:1c12] MacBookPro8,2 2011 15" [Root Port 00:1c.1 = 8086:1c12] MacBookPro8,3 2011 17" [Root Port 00:1c.1 = 8086:1c12] MacBookPro9,1 2012 15" [Root Port 00:1c.1 = 8086:1e12] MacBookPro9,2 2012 13" [Root Port 00:1c.1 = 8086:1e12] MacBookPro10,1 2012 15" [Root Port 00:1c.1 = 8086:1e12] MacBookPro10,2 2012 13" [Root Port 00:1c.1 = 8086:1e12] For posterity, spurious interrupts caused by the Broadcom 4331 wireless card resulted in splats like this (stacktrace omitted): irq 17: nobody cared (try booting with the "irqpoll" option) handlers: [<ffffffff81374370>] pcie_isr [<ffffffffc0704550>] sdhci_irq [sdhci] threaded [<ffffffffc07013c0>] sdhci_thread_irq [sdhci] [<ffffffffc0a0b960>] azx_interrupt [snd_hda_codec] Disabling IRQ #17 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111781 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=728916 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951#c16 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1098621 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632#c5 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1279130 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1332732 Tested-by: Konstantin Simanov <k.simanov@stlk.ru> # [MacBookPro8,1] Tested-by: Lukas Wunner <lukas@wunner.de> # [MacBookPro9,1] Tested-by: Bryan Paradis <bryan.paradis@gmail.com> # [MacBookPro9,2] Tested-by: Andrew Worsley <amworsley@gmail.com> # [MacBookPro10,1] Tested-by: Chris Bainbridge <chris.bainbridge@gmail.com> # [MacBookPro10,2] Signed-off-by: Lukas Wunner <lukas@wunner.de> Acked-by: Rafał Miłecki <zajec5@gmail.com> Acked-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chris Milsted <cmilsted@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Michael Buesch <m@bues.ch> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Cc: b43-dev@lists.infradead.org Cc: linux-pci@vger.kernel.org Cc: linux-wireless@vger.kernel.org Link: http://lkml.kernel.org/r/48d0972ac82a53d460e5fce77a07b2560db95203.1465690253.git.lukas@wunner.de [ Did minor readability edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27drm/ttm: Make ttm_bo_mem_compat availableSinclair Yeh
commit 94477bff390aa4612d2332c8abafaae0a13d6923 upstream. There are cases where it is desired to see if a proposed placement is compatible with a buffer object before calling ttm_bo_validate(). Signed-off-by: Sinclair Yeh <syeh@vmware.com> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27vfs: add d_real_inode() helperMiklos Szeredi
commit a118084432d642eeccb961c7c8cc61525a941fcb upstream. Needed by the following fix. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27net_sched: fix mirrored packets checksumWANG Cong
[ Upstream commit 82a31b9231f02d9c1b7b290a46999d517b0d312a ] Similar to commit 9b368814b336 ("net: fix bridge multicast packet checksum validation") we need to fixup the checksum for CHECKSUM_COMPLETE when pushing skb on RX path. Otherwise we get similar splats. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27packet: Use symmetric hash for PACKET_FANOUT_HASH.David S. Miller
[ Upstream commit eb70db8756717b90c01ccc765fdefc4dd969fc74 ] People who use PACKET_FANOUT_HASH want a symmetric hash, meaning that they want packets going in both directions on a flow to hash to the same bucket. The core kernel SKB hash became non-symmetric when the ipv6 flow label and other entities were incorporated into the standard flow hash order to increase entropy. But there are no users of PACKET_FANOUT_HASH who want an assymetric hash, they all want a symmetric one. Therefore, use the flow dissector to compute a flat symmetric hash over only the protocol, addresses and ports. This hash does not get installed into and override the normal skb hash, so this change has no effect whatsoever on the rest of the stack. Reported-by: Eric Leblond <eric@regit.org> Tested-by: Eric Leblond <eric@regit.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27rpc: share one xps between all backchannelsJ. Bruce Fields
commit 39a9beab5acb83176e8b9a4f0778749a09341f1f upstream. The spec allows backchannels for multiple clients to share the same tcp connection. When that happens, we need to use the same xprt for all of them. Similarly, we need the same xps. This fixes list corruption introduced by the multipath code. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Acked-by: Trond Myklebust <trondmy@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27nfsd4/rpc: move backchannel create logic into rpc codeJ. Bruce Fields
commit d50039ea5ee63c589b0434baa5ecf6e5075bb6f9 upstream. Also simplify the logic a bit. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Acked-by: Trond Myklebust <trondmy@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27cpuidle: Do not access cpuidle_devices when !CONFIG_CPU_IDLECatalin Marinas
commit 9bd616e3dbedfc103f158197c8ad93678849b1ed upstream. The cpuidle_devices per-CPU variable is only defined when CPU_IDLE is enabled. Commit c8cc7d4de7a4 ("sched/idle: Reorganize the idle loop") removed the #ifdef CONFIG_CPU_IDLE around cpuidle_idle_call() with the compiler optimising away __this_cpu_read(cpuidle_devices). However, with CONFIG_UBSAN && !CONFIG_CPU_IDLE, this optimisation no longer happens and the kernel fails to link since cpuidle_devices is not defined. This patch introduces an accessor function for the current CPU cpuidle device (returning NULL when !CONFIG_CPU_IDLE) and uses it in cpuidle_idle_call(). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27locking/static_key: Fix concurrent static_key_slow_inc()Paolo Bonzini
commit 4c5ea0a9cd02d6aa8adc86e100b2a4cff8d614ff upstream. The following scenario is possible: CPU 1 CPU 2 static_key_slow_inc() atomic_inc_not_zero() -> key.enabled == 0, no increment jump_label_lock() atomic_inc_return() -> key.enabled == 1 now static_key_slow_inc() atomic_inc_not_zero() -> key.enabled == 1, inc to 2 return ** static key is wrong! jump_label_update() jump_label_unlock() Testing the static key at the point marked by (**) will follow the wrong path for jumps that have not been patched yet. This can actually happen when creating many KVM virtual machines with userspace LAPIC emulation; just run several copies of the following program: #include <fcntl.h> #include <unistd.h> #include <sys/ioctl.h> #include <linux/kvm.h> int main(void) { for (;;) { int kvmfd = open("/dev/kvm", O_RDONLY); int vmfd = ioctl(kvmfd, KVM_CREATE_VM, 0); close(ioctl(vmfd, KVM_CREATE_VCPU, 1)); close(vmfd); close(kvmfd); } return 0; } Every KVM_CREATE_VCPU ioctl will attempt a static_key_slow_inc() call. The static key's purpose is to skip NULL pointer checks and indeed one of the processes eventually dereferences NULL. As explained in the commit that introduced the bug: 706249c222f6 ("locking/static_keys: Rework update logic") jump_label_update() needs key.enabled to be true. The solution adopted here is to temporarily make key.enabled == -1, and use go down the slow path when key.enabled <= 0. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 706249c222f6 ("locking/static_keys: Rework update logic") Link: http://lkml.kernel.org/r/1466527937-69798-1-git-send-email-pbonzini@redhat.com [ Small stylistic edits to the changelog and the code. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27locking/qspinlock: Fix spin_unlock_wait() some morePeter Zijlstra
commit 2c610022711675ee908b903d242f0b90e1db661f upstream. While this prior commit: 54cf809b9512 ("locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()") ... fixes spin_is_locked() and spin_unlock_wait() for the usage in ipc/sem and netfilter, it does not in fact work right for the usage in task_work and futex. So while the 2 locks crossed problem: spin_lock(A) spin_lock(B) if (!spin_is_locked(B)) spin_unlock_wait(A) foo() foo(); ... works with the smp_mb() injected by both spin_is_locked() and spin_unlock_wait(), this is not sufficient for: flag = 1; smp_mb(); spin_lock() spin_unlock_wait() if (!flag) // add to lockless list // iterate lockless list ... because in this scenario, the store from spin_lock() can be delayed past the load of flag, uncrossing the variables and loosing the guarantee. This patch reworks spin_is_locked() and spin_unlock_wait() to work in both cases by exploiting the observation that while the lock byte store can be delayed, the contender must have registered itself visibly in other state contained in the word. It also allows for architectures to override both functions, as PPC and ARM64 have an additional issue for which we currently have no generic solution. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Giovanni Gherdovich <ggherdovich@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <waiman.long@hpe.com> Cc: Will Deacon <will.deacon@arm.com> Fixes: 54cf809b9512 ("locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()") Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27USB: EHCI: declare hostpc register as zero-length arrayAlan Stern
commit 7e8b3dfef16375dbfeb1f36a83eb9f27117c51fd upstream. The HOSTPC extension registers found in some EHCI implementations form a variable-length array, with one element for each port. Therefore the hostpc field in struct ehci_regs should be declared as a zero-length array, not a single-element array. This fixes a problem reported by UBSAN. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Wilfried Klaebe <linux-kernel@lebenslange-mailadresse.de> Tested-by: Wilfried Klaebe <linux-kernel@lebenslange-mailadresse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27IB/rdmavt: Correct qp_priv_alloc() return value testMike Marciniszyn
commit c755f4afa66ad3ed98870bd3254f37c47fb2c800 upstream. The current drivers return errors from this calldown wrapped in an ERR_PTR(). The rdmavt code incorrectly tests for NULL. The code is fixed to use IS_ERR() and change ret according to the driver return value. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27IB/core: Fix bit curruption in ib_device_cap_flags structureMax Gurtovoy
commit 47355b3cd7d3c9c5226bff7c449b9d269fb17fa6 upstream. ib_device_cap_flags 64-bit expansion caused caps overlapping and made consumers read wrong device capabilities. For example IB_DEVICE_SG_GAPS_REG was falsely read by the iser driver causing it to use a non-existing capability. This happened because signed int becomes sign extended when converted it to u64. Fix this by casting IB_DEVICE_ON_DEMAND_PAGING enumeration to ULL. Fixes: f5aa9159a418 ('IB/core: Add arbitrary sg_list support') Reported-by: Robert LeBlanc <robert@leblancnet.us> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-11bpf, perf: delay release of BPF prog after grace periodDaniel Borkmann
[ Upstream commit ceb56070359b7329b5678b5d95a376fcb24767be ] Commit dead9f29ddcc ("perf: Fix race in BPF program unregister") moved destruction of BPF program from free_event_rcu() callback to __free_event(), which is problematic if used with tail calls: if prog A is attached as trace event directly, but at the same time present in a tail call map used by another trace event program elsewhere, then we need to delay destruction via RCU grace period since it can still be in use by the program doing the tail call (the prog first needs to be dropped from the tail call map, then trace event with prog A attached destroyed, so we get immediate destruction). Fixes: dead9f29ddcc ("perf: Fix race in BPF program unregister") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Jann Horn <jann@thejh.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-11sock_diag: do not broadcast raw socket destructionWillem de Bruijn
[ Upstream commit 9a0fee2b552b1235fb1706ae1fc664ae74573be8 ] Diag intends to broadcast tcp_sk and udp_sk socket destruction. Testing sk->sk_protocol for IPPROTO_TCP/IPPROTO_UDP alone is not sufficient for this. Raw sockets can have the same type. Add a test for sk->sk_type. Fixes: eb4cb008529c ("sock_diag: define destruction multicast groups") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-11net: Don't forget pr_fmt on net_dbg_ratelimited for CONFIG_DYNAMIC_DEBUGJason A. Donenfeld
[ Upstream commit daddef76c3deaaa7922f9d7b18edbf0a061215c3 ] The implementation of net_dbg_ratelimited in the CONFIG_DYNAMIC_DEBUG case was added with 2c94b5373 ("net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case"). The implementation strategy was to take the usual definition of the dynamic_pr_debug macro, but alter it by adding a call to "net_ratelimit()" in the if statement. This is, in fact, the correct approach. However, while doing this, the author of the commit forgot to surround fmt by pr_fmt, resulting in unprefixed log messages appearing in the console. So, this commit adds back the pr_fmt(fmt) invocation, making net_dbg_ratelimited properly consistent across DEBUG, no DEBUG, and DYNAMIC_DEBUG cases, and bringing parity with the behavior of dynamic_pr_debug as well. Fixes: 2c94b5373 ("net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Tim Bingham <tbingham@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24netfilter: x_tables: introduce and use xt_copy_counters_from_userFlorian Westphal
commit d7591f0c41ce3e67600a982bab6989ef0f07b3ce upstream. The three variants use same copy&pasted code, condense this into a helper and use that. Make sure info.name is 0-terminated. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24netfilter: x_tables: xt_compat_match_from_user doesn't need a retvalFlorian Westphal
commit 0188346f21e6546498c2a0f84888797ad4063fc5 upstream. Always returned 0. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24netfilter: x_tables: check for bogus target offsetFlorian Westphal
commit ce683e5f9d045e5d67d1312a42b359cb2ab2a13c upstream. We're currently asserting that targetoff + targetsize <= nextoff. Extend it to also check that targetoff is >= sizeof(xt_entry). Since this is generic code, add an argument pointing to the start of the match/target, we can then derive the base structure size from the delta. We also need the e->elems pointer in a followup change to validate matches. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24netfilter: x_tables: add compat version of xt_check_entry_offsetsFlorian Westphal
commit fc1221b3a163d1386d1052184202d5dc50d302d1 upstream. 32bit rulesets have different layout and alignment requirements, so once more integrity checks get added to xt_check_entry_offsets it will reject well-formed 32bit rulesets. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24netfilter: x_tables: add and use xt_check_entry_offsetsFlorian Westphal
commit 7d35812c3214afa5b37a675113555259cfd67b98 upstream. Currently arp/ip and ip6tables each implement a short helper to check that the target offset is large enough to hold one xt_entry_target struct and that t->u.target_size fits within the current rule. Unfortunately these checks are not sufficient. To avoid adding new tests to all of ip/ip6/arptables move the current checks into a helper, then extend this helper in followup patches. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24irqchip/gic-v3: Fix ICC_SGI1R_EL1.INTID decoding maskMarc Zyngier
commit dd5f1b049dc139876801db3cdd0f20d21fd428cc upstream. The INTID mask is wrong, and is made a signed value, which has nteresting effects in the KVM emulation. Let's sanitize it. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24uapi glibc compat: fix compilation when !__USE_MISC in glibcNicolas Dichtel
[ Upstream commit f0a3fdca794d1e68ae284ef4caefe681f7c18e89 ] These structures are defined only if __USE_MISC is set in glibc net/if.h headers, ie when _BSD_SOURCE or _SVID_SOURCE are defined. CC: Jan Engelhardt <jengelh@inai.de> CC: Josh Boyer <jwboyer@fedoraproject.org> CC: Stephen Hemminger <shemming@brocade.com> CC: Waldemar Brodkorb <mail@waldemar-brodkorb.de> CC: Gabriel Laskar <gabriel@lse.epita.fr> CC: Mikko Rapeli <mikko.rapeli@iki.fi> Fixes: 4a91cb61bb99 ("uapi glibc compat: fix compile errors when glibc net/if.h included before linux/if.h") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24switchdev: pass pointer to fib_info instead of copyJiri Pirko
[ Upstream commit da4ed55165d41b1073f9a476f1c18493e9bf8c8e ] The problem is that fib_info->nh is [0] so the struct fib_info allocation size depends on number of nexthops. If we just copy fib_info, we do not copy the nexthops info and driver accesses memory which is not ours. Given the fact that fib4 does not defer operations and therefore it does not need copy, just pass the pointer down to drivers as it was done before. Fixes: 850d0cbc91 ("switchdev: remove pointers from switchdev objects") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07drm/imx: Match imx-ipuv3-crtc components using device node in platform dataPhilipp Zabel
commit 310944d148e3600dcff8b346bee7fa01d34903b1 upstream. The component master driver imx-drm-core matches component devices using their of_node. Since commit 950b410dd1ab ("gpu: ipu-v3: Fix imx-ipuv3-crtc module autoloading"), the imx-ipuv3-crtc dev->of_node is not set during probing. Before that, of_node was set and caused an of: modalias to be used instead of the platform: modalias, which broke module autoloading. On the other hand, if dev->of_node is not set yet when the imx-ipuv3-crtc probe function calls component_add, component matching in imx-drm-core fails. While dev->of_node will be set once the next component tries to bring up the component master, imx-drm-core component binding will never succeed if one of the crtc devices is probed last. Add of_node to the component platform data and match against the pdata->of_node instead of dev->of_node in imx-drm-core to work around this problem. Fixes: 950b410dd1ab ("gpu: ipu-v3: Fix imx-ipuv3-crtc module autoloading") Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Tested-by: Fabio Estevam <fabio.estevam@nxp.com> Tested-by: Lothar Waßmann <LW@KARO-electronics.de> Tested-by: Heiko Schocher <hs@denx.de> Tested-by: Chris Ruehl <chris.ruehl@gtsys.com.hk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07drm: Add helper for DP++ adaptorsVille Syrjälä
commit b3daa5ef52c26acd7432c787989bd92d48070c76 upstream. Add a helper which aids in the identification of DP dual mode (aka. DP++) adaptors. There are several types of adaptors specified: type 1 DVI, type 1 HDMI, type 2 DVI, type 2 HDMI Type 1 adaptors have a max TMDS clock limit of 165MHz, type 2 adaptors may go as high as 300MHz and they provide a register informing the source device what the actual limit is. Supposedly also type 1 adaptors may optionally implement this register. This TMDS clock limit is the main reason why we need to identify these adaptors. Type 1 adaptors provide access to their internal registers and the sink DDC bus through I2C. Type 2 adaptors provide this access both via I2C and I2C-over-AUX. A type 2 source device may choose to implement either of these methods. If a source device implements the I2C-over-AUX method, then the driver will obviously need specific support for such adaptors since the port is driven like an HDMI port, but DDC communication happes over the AUX channel. This helper should be enough to identify the adaptor type (some type 1 DVI adaptors may be a slight exception) and the maximum TMDS clock limit. Another feature that may be available is control over the TMDS output buffers on the adaptor, possibly allowing for some power saving when the TMDS link is down. Other user controllable features that may be available in the adaptors are downstream i2c bus speed control when using i2c-over-aux, and some control over the CEC pin. I chose not to provide any helper functions for those since I have no use for them in i915 at this time. The rest of the registers in the adaptor are mostly just information, eg. IEEE OUI, hardware and firmware revision, etc. v2: Pass adaptor type to helper functions to ease driver implementation Fix a bunch of typoes (Paulo) Add DRM_DP_DUAL_MODE_UNKNOWN for the case where we don't (yet) know the type (Paulo) Reject 0x00 and 0xff DP_DUAL_MODE_MAX_TMDS_CLOCK values (Paulo) Adjust drm_dp_dual_mode_detect() type2 vs. type1 detection to ease future LSPCON enabling Remove the unused DP_DUAL_MODE_LAST_RESERVED define v3: Fix kernel doc function argument descriptions (Jani) s/NONE/UNKNOWN/ in drm_dp_dual_mode_detect() docs Add kernel doc for enum drm_dp_dual_mode_type Actually build the docs Fix more typoes v4: Adjust code indentation of type2 adaptor detection (Shashank) Add debug messages for failurs cases (Shashank) v5: EXPORT_SYMBOL(drm_dp_dual_mode_read) (Paulo) Cc: Tore Anderson <tore@fud.no> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Shashank Sharma <shashank.sharma@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Shashank Sharma <shashank.sharma@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Shashank Sharma <shashank.sharma@intel.com> (v4) Link: http://patchwork.freedesktop.org/patch/msgid/1462542412-25533-1-git-send-email-ville.syrjala@linux.intel.com (cherry picked from commit ede53344dbfd1dd43bfd73eb6af743d37c56a7c3) Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07mm: use phys_addr_t for reserve_bootmem_region() argumentsStefan Bader
commit 4b50bcc7eda4d3cc9e3f2a0aa60e590fedf728c5 upstream. Since commit 92923ca3aace ("mm: meminit: only set page reserved in the memblock region") the reserved bit is set on reserved memblock regions. However start and end address are passed as unsigned long. This is only 32bit on i386, so it can end up marking the wrong pages reserved for ranges at 4GB and above. This was observed on a 32bit Xen dom0 which was booted with initial memory set to a value below 4G but allowing to balloon in memory (dom0_mem=1024M for example). This would define a reserved bootmem region for the additional memory (for example on a 8GB system there was a reverved region covering the 4GB-8GB range). But since the addresses were passed on as unsigned long, this was actually marking all pages from 0 to 4GB as reserved. Fixes: 92923ca3aacef63 ("mm: meminit: only set page reserved in the memblock region") Link: http://lkml.kernel.org/r/1463491221-10573-1-git-send-email-stefan.bader@canonical.com Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01scsi: Add intermediate STARGET_REMOVE state to scsi_target_stateJohannes Thumshirn
commit f05795d3d771f30a7bdc3a138bf714b06d42aa95 upstream. Add intermediate STARGET_REMOVE state to scsi_target_state to avoid running into the BUG_ON() in scsi_target_reap(). The STARGET_REMOVE state is only valid in the path from scsi_remove_target() to scsi_target_destroy() indicating this target is going to be removed. This re-fixes the problem introduced in commits bc3f02a795d3 ("[SCSI] scsi_remove_target: fix softlockup regression on hot remove") and 40998193560d ("scsi: restart list search after unlock in scsi_remove_target") in a more comprehensive way. [mkp: Included James' fix for scsi_target_destroy()] Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Fixes: 40998193560dab6c3ce8d25f4fa58a23e252ef38 Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: James Bottomley <jejb@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01SIGNAL: Move generic copy_siginfo() to signal.hJames Hogan
commit ca9eb49aa9562eaadf3cea071ec7018ad6800425 upstream. The generic copy_siginfo() is currently defined in asm-generic/siginfo.h, after including uapi/asm-generic/siginfo.h which defines the generic struct siginfo. However this makes it awkward for an architecture to use it if it has to define its own struct siginfo (e.g. MIPS and potentially IA64), since it means that asm-generic/siginfo.h can only be included after defining the arch-specific siginfo, which may be problematic if the arch-specific definition needs definitions from uapi/asm-generic/siginfo.h. It is possible to work around this by first including uapi/asm-generic/siginfo.h to get the constants before defining the arch-specific siginfo, and include asm-generic/siginfo.h after. However uapi headers can't be included by other uapi headers, so that first include has to be in an ifdef __kernel__, with the non __kernel__ case including the non-UAPI header instead. Instead of that mess, move the generic copy_siginfo() definition into linux/signal.h, which allows an arch-specific uapi/asm/siginfo.h to include asm-generic/siginfo.h and define the arch-specific siginfo, and for the generic copy_siginfo() to see that arch-specific definition. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Petr Malat <oss@malat.biz> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Christopher Ferris <cferris@google.com> Cc: linux-arch@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-ia64@vger.kernel.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/12478/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()Peter Zijlstra
commit 54cf809b9512be95f53ed4a5e3b631d1ac42f0fa upstream. Similar to commits: 51d7d5205d33 ("powerpc: Add smp_mb() to arch_spin_is_locked()") d86b8da04dfa ("arm64: spinlock: serialise spin_unlock_wait against concurrent lockers") qspinlock suffers from the fact that the _Q_LOCKED_VAL store is unordered inside the ACQUIRE of the lock. And while this is not a problem for the regular mutual exclusive critical section usage of spinlocks, it breaks creative locking like: spin_lock(A) spin_lock(B) spin_unlock_wait(B) if (!spin_is_locked(A)) do_something() do_something() In that both CPUs can end up running do_something at the same time, because our _Q_LOCKED_VAL store can drop past the spin_unlock_wait() spin_is_locked() loads (even on x86!!). To avoid making the normal case slower, add smp_mb()s to the less used spin_unlock_wait() / spin_is_locked() side of things to avoid this problem. Reported-and-tested-by: Davidlohr Bueso <dave@stgolabs.net> Reported-by: Giovanni Gherdovich <ggherdovich@suse.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01Fix OpenSSH pty regression on closeBrian Bloniarz
commit 0f40fbbcc34e093255a2b2d70b6b0fb48c3f39aa upstream. OpenSSH expects the (non-blocking) read() of pty master to return EAGAIN only if it has received all of the slave-side output after it has received SIGCHLD. This used to work on pre-3.12 kernels. This fix effectively forces non-blocking read() and poll() to block for parallel i/o to complete for all ttys. It also unwinds these changes: 1) f8747d4a466ab2cafe56112c51b3379f9fdb7a12 tty: Fix pty master read() after slave closes 2) 52bce7f8d4fc633c9a9d0646eef58ba6ae9a3b73 pty, n_tty: Simplify input processing on final close 3) 1a48632ffed61352a7810ce089dc5a8bcd505a60 pty: Fix input race when closing Inspired by analysis and patch from Marc Aurele La France <tsi@tuyoix.net> Reported-by: Volth <openssh@volth.com> Reported-by: Marc Aurele La France <tsi@tuyoix.net> BugLink: https://bugzilla.mindrot.org/show_bug.cgi?id=52 BugLink: https://bugzilla.mindrot.org/show_bug.cgi?id=2492 Signed-off-by: Brian Bloniarz <brian.bloniarz@gmail.com> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01usb: core: hub: hub_port_init lock controller instead of busChris Bainbridge
commit feb26ac31a2a5cb88d86680d9a94916a6343e9e6 upstream. The XHCI controller presents two USB buses to the system - one for USB2 and one for USB3. The hub init code (hub_port_init) is reentrant but only locks one bus per thread, leading to a race condition failure when two threads attempt to simultaneously initialise a USB2 and USB3 device: [ 8.034843] xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command [ 13.183701] usb 3-3: device descriptor read/all, error -110 On a test system this failure occurred on 6% of all boots. The call traces at the point of failure are: Call Trace: [<ffffffff81b9bab7>] schedule+0x37/0x90 [<ffffffff817da7cd>] usb_kill_urb+0x8d/0xd0 [<ffffffff8111e5e0>] ? wake_up_atomic_t+0x30/0x30 [<ffffffff817dafbe>] usb_start_wait_urb+0xbe/0x150 [<ffffffff817db10c>] usb_control_msg+0xbc/0xf0 [<ffffffff817d07de>] hub_port_init+0x51e/0xb70 [<ffffffff817d4697>] hub_event+0x817/0x1570 [<ffffffff810f3e6f>] process_one_work+0x1ff/0x620 [<ffffffff810f3dcf>] ? process_one_work+0x15f/0x620 [<ffffffff810f4684>] worker_thread+0x64/0x4b0 [<ffffffff810f4620>] ? rescuer_thread+0x390/0x390 [<ffffffff810fa7f5>] kthread+0x105/0x120 [<ffffffff810fa6f0>] ? kthread_create_on_node+0x200/0x200 [<ffffffff81ba183f>] ret_from_fork+0x3f/0x70 [<ffffffff810fa6f0>] ? kthread_create_on_node+0x200/0x200 Call Trace: [<ffffffff817fd36d>] xhci_setup_device+0x53d/0xa40 [<ffffffff817fd87e>] xhci_address_device+0xe/0x10 [<ffffffff817d047f>] hub_port_init+0x1bf/0xb70 [<ffffffff811247ed>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff817d4697>] hub_event+0x817/0x1570 [<ffffffff810f3e6f>] process_one_work+0x1ff/0x620 [<ffffffff810f3dcf>] ? process_one_work+0x15f/0x620 [<ffffffff810f4684>] worker_thread+0x64/0x4b0 [<ffffffff810f4620>] ? rescuer_thread+0x390/0x390 [<ffffffff810fa7f5>] kthread+0x105/0x120 [<ffffffff810fa6f0>] ? kthread_create_on_node+0x200/0x200 [<ffffffff81ba183f>] ret_from_fork+0x3f/0x70 [<ffffffff810fa6f0>] ? kthread_create_on_node+0x200/0x200 Which results from the two call chains: hub_port_init usb_get_device_descriptor usb_get_descriptor usb_control_msg usb_internal_control_msg usb_start_wait_urb usb_submit_urb / wait_for_completion_timeout / usb_kill_urb hub_port_init hub_set_address xhci_address_device xhci_setup_device Mathias Nyman explains the current behaviour violates the XHCI spec: hub_port_reset() will end up moving the corresponding xhci device slot to default state. As hub_port_reset() is called several times in hub_port_init() it sounds reasonable that we could end up with two threads having their xhci device slots in default state at the same time, which according to xhci 4.5.3 specs still is a big no no: "Note: Software shall not transition more than one Device Slot to the Default State at a time" So both threads fail at their next task after this. One fails to read the descriptor, and the other fails addressing the device. Fix this in hub_port_init by locking the USB controller (instead of an individual bus) to prevent simultaneous initialisation of both buses. Fixes: 638139eb95d2 ("usb: hub: allow to process more usb hub events in parallel") Link: https://lkml.org/lkml/2016/2/8/312 Link: https://lkml.org/lkml/2016/2/4/748 Signed-off-by: Chris Bainbridge <chris.bainbridge@gmail.com> Acked-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01USB: leave LPM alone if possible when binding/unbinding interface driversAlan Stern
commit 6fb650d43da3e7054984dc548eaa88765a94d49f upstream. When a USB driver is bound to an interface (either through probing or by claiming it) or is unbound from an interface, the USB core always disables Link Power Management during the transition and then re-enables it afterward. The reason is because the driver might want to prevent hub-initiated link power transitions, in which case the HCD would have to recalculate the various LPM parameters. This recalculation takes place when LPM is re-enabled and the new parameters are sent to the device and its parent hub. However, if the driver does not want to prevent hub-initiated link power transitions then none of this work is necessary. The parameters don't need to be recalculated, and LPM doesn't need to be disabled and re-enabled. It turns out that disabling and enabling LPM can be time-consuming, enough so that it interferes with user programs that want to claim and release interfaces rapidly via usbfs. Since the usbfs kernel driver doesn't set the disable_hub_initiated_lpm flag, we can speed things up and get the user programs to work by leaving LPM alone whenever the flag isn't set. And while we're improving the way disable_hub_initiated_lpm gets used, let's also fix its kerneldoc. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Tested-by: Matthew Giassa <matthew@giassa.net> CC: Mathias Nyman <mathias.nyman@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01can: fix handling of unmodifiable configuration optionsOliver Hartkopp
commit bb208f144cf3f59d8f89a09a80efd04389718907 upstream. As described in 'can: m_can: tag current CAN FD controllers as non-ISO' (6cfda7fbebe) it is possible to define fixed configuration options by setting the according bit in 'ctrlmode' and clear it in 'ctrlmode_supported'. This leads to the incovenience that the fixed configuration bits can not be passed by netlink even when they have the correct values (e.g. non-ISO, FD). This patch fixes that issue and not only allows fixed set bit values to be set again but now requires(!) to provide these fixed values at configuration time. A valid CAN FD configuration consists of a nominal/arbitration bittiming, a data bittiming and a control mode with CAN_CTRLMODE_FD set - which is now enforced by a new can_validate() function. This fix additionally removed the inconsistency that was prohibiting the support of 'CANFD-only' controller drivers, like the RCar CAN FD. For this reason a new helper can_set_static_ctrlmode() has been introduced to provide a proper interface to handle static enabled CAN controller options. Reported-by: Ramesh Shanmugasundaram <ramesh.shanmugasundaram@bp.renesas.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Reviewed-by: Ramesh Shanmugasundaram <ramesh.shanmugasundaram@bp.renesas.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01fscrypto/f2fs: allow fs-specific key prefix for fs encryptionJaegeuk Kim
commit b5a7aef1ef436ec005fef0efe31a676ec5f4ab31 upstream. This patch allows fscrypto to handle a second key prefix given by filesystem. The main reason is to provide backward compatibility, since previously f2fs used "f2fs:" as a crypto prefix instead of "fscrypt:". Later, ext4 should also provide key_prefix() to give "ext4:". One concern decribed by Ted would be kinda double check overhead of prefixes. In x86, for example, validate_user_key consumes 8 ms after boot-up, which turns out derive_key_aes() consumed most of the time to load specific crypto module. After such the cold miss, it shows almost zero latencies, which treats as a negligible overhead. Note that request_key() detects wrong prefix in prior to derive_key_aes() even. Cc: Ted Tso <tytso@mit.edu> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Overlayfs fixes from Miklos, assorted fixes from me. Stable fodder of varying severity, all sat in -next for a while" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ovl: ignore permissions on underlying lookup vfs: add lookup_hash() helper vfs: rename: check backing inode being equal vfs: add vfs_select_inode() helper get_rock_ridge_filename(): handle malformed NM entries ecryptfs: fix handling of directory opening atomic_open(): fix the handling of create_error fix the copy vs. map logics in blk_rq_map_user_iov() do_splice_to(): cap the size before passing to ->splice_read()
2016-05-13Merge branch 'for-4.6-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "During v4.6-rc1 cgroup namespace support was merged. There is an issue where it's impossible to tell whether a given cgroup mount point is bind mounted or namespaced. Serge has been working on the issue but it took longer than expected to resolve, so the late pull request. Given that it's a completely new feature and the patches don't touch anything else, the risk seems acceptable. However, if this is too late, an alternative is plugging new cgroup ns creation for v4.6 and retrying for v4.7" * 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: fix compile warning kernfs: kernfs_sop_show_path: don't return 0 after seq_dentry call cgroup, kernfs: make mountinfo show properly scoped path for cgroup namespaces kernfs_path_from_node_locked: don't overwrite nlen
2016-05-13Merge tag 'regulator-fix-v4.6-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "A small collection of driver specific fixes for the regulator subsysetem: - Fix handling of probe deferral for GPIO regulators - Fix a typo in the module alias for DA9053 - Fix the definition of BUCK9 in the S2MPS11 driver. This change looks larger than it is because an irregularity in the hardware means that the macro used to define bucks 6-10 needs duplicating and tweaking to have a separate macro for 9 - Fix a series of errors in the definitions of the LDOs the AXP20x regulators, some of which had always been present and some of which were introduced in the merge window" * tag 'regulator-fix-v4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: da9063: Correct module alias prefix to fix module autoloading regulator: axp20x: Fix axp22x ldo_io registration error on cold boot regulator: axp20x: Fix axp22x ldo_io voltage ranges regulator: axp20x: Fix LDO4 linear voltage range regulator: s2mps11: Fix invalid selector mask and voltages for buck9 regulator: gpio: check return value of of_get_named_gpio
2016-05-13Merge remote-tracking branches 'regulator/fix/axp20x', ↵Mark Brown
'regulator/fix/da9063', 'regulator/fix/gpio' and 'regulator/fix/s2mps11' into regulator-linus
2016-05-12mm: thp: calculate the mapcount correctly for THP pages during WP faultsAndrea Arcangeli
This will provide fully accuracy to the mapcount calculation in the write protect faults, so page pinning will not get broken by false positive copy-on-writes. total_mapcount() isn't the right calculation needed in reuse_swap_page(), so this introduces a page_trans_huge_mapcount() that is effectively the full accurate return value for page_mapcount() if dealing with Transparent Hugepages, however we only use the page_trans_huge_mapcount() during COW faults where it strictly needed, due to its higher runtime cost. This also provide at practical zero cost the total_mapcount information which is needed to know if we can still relocate the page anon_vma to the local vma. If page_trans_huge_mapcount() returns 1 we can reuse the page no matter if it's a pte or a pmd_trans_huge triggering the fault, but we can only relocate the page anon_vma to the local vma->anon_vma if we're sure it's only this "vma" mapping the whole THP physical range. Kirill A. Shutemov discovered the problem with moving the page anon_vma to the local vma->anon_vma in a previous version of this patch and another problem in the way page_move_anon_rmap() was called. Andrew Morton discovered that CONFIG_SWAP=n wouldn't build in a previous version, because reuse_swap_page must be a macro to call page_trans_huge_mapcount from swap.h, so this uses a macro again instead of an inline function. With this change at least it's a less dangerous usage than it was before, because "page" is used only once now, while with the previous code reuse_swap_page(page++) would have called page_mapcount on page+1 and it would have increased page twice instead of just once. Dean Luick noticed an uninitialized variable that could result in a rmap inefficiency for the non-THP case in a previous version. Mike Marciniszyn said: : Our RDMA tests are seeing an issue with memory locking that bisects to : commit 61f5d698cc97 ("mm: re-enable THP") : : The test program registers two rather large MRs (512M) and RDMA : writes data to a passive peer using the first and RDMA reads it back : into the second MR and compares that data. The sizes are chosen randomly : between 0 and 1024 bytes. : : The test will get through a few (<= 4 iterations) and then gets a : compare error. : : Tracing indicates the kernel logical addresses associated with the individual : pages at registration ARE correct , the data in the "RDMA read response only" : packets ARE correct. : : The "corruption" occurs when the packet crosse two pages that are not physically : contiguous. The second page reads back as zero in the program. : : It looks like the user VA at the point of the compare error no longer points to : the same physical address as was registered. : : This patch totally resolves the issue! Link: http://lkml.kernel.org/r/1462547040-1737-2-git-send-email-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: "Kirill A. Shutemov" <kirill@shutemov.name> Reviewed-by: Dean Luick <dean.luick@intel.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Tested-by: Josh Collier <josh.d.collier@intel.com> Cc: Marc Haber <mh+linux-kernel@zugschlus.de> Cc: <stable@vger.kernel.org> [4.5] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-11Merge branch 'ovl-fixes' into for-linusAl Viro
2016-05-10vfs: add lookup_hash() helperMiklos Szeredi
Overlayfs needs lookup without inode_permission() and already has the name hash (in form of dentry->d_name on overlayfs dentry). It also doesn't support filesystems with d_op->d_hash() so basically it only needs the actual hashed lookup from lookup_one_len_unlocked() So add a new helper that does unlocked lookup of a hashed name. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-05-10vfs: add vfs_select_inode() helperMiklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org> # v4.2+
2016-05-10export tc ife uapi headerJamal Hadi Salim
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>