Age | Commit message (Collapse) | Author |
|
[ Upstream commit 18d3df3eab23796d7f852f9c6bb60962b8372ced ]
macsec can't cope with mtu frames which need vlan tag insertion, and
vlan device set the default mtu equal to the underlying dev's one.
By default vlan over macsec devices use invalid mtu, dropping
all the large packets.
This patch adds a netif helper to check if an upper vlan device
needs mtu reduction. The helper is used during vlan devices
initialization to set a valid default and during mtu updating to
forbid invalid, too bit, mtu values.
The helper currently only check if the lower dev is a macsec device,
if we get more users, we need to update only the helper (possibly
reserving an additional IFF bit).
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3cb9185c67304b2a7ea9be73e7d13df6fb2793a1 upstream.
radix_tree_iter_retry() resets slot to NULL, but it doesn't reset tags.
Then NULL slot and non-zero iter.tags passed to radix_tree_next_slot()
leading to crash:
RIP: radix_tree_next_slot include/linux/radix-tree.h:473
find_get_pages_tag+0x334/0x930 mm/filemap.c:1452
....
Call Trace:
pagevec_lookup_tag+0x3a/0x80 mm/swap.c:960
mpage_prepare_extent_to_map+0x321/0xa90 fs/ext4/inode.c:2516
ext4_writepages+0x10be/0x2b20 fs/ext4/inode.c:2736
do_writepages+0x97/0x100 mm/page-writeback.c:2364
__filemap_fdatawrite_range+0x248/0x2e0 mm/filemap.c:300
filemap_write_and_wait_range+0x121/0x1b0 mm/filemap.c:490
ext4_sync_file+0x34d/0xdb0 fs/ext4/fsync.c:115
vfs_fsync_range+0x10a/0x250 fs/sync.c:195
vfs_fsync fs/sync.c:209
do_fsync+0x42/0x70 fs/sync.c:219
SYSC_fdatasync fs/sync.c:232
SyS_fdatasync+0x19/0x20 fs/sync.c:230
entry_SYSCALL_64_fastpath+0x23/0xc1 arch/x86/entry/entry_64.S:207
We must reset iterator's tags to bail out from radix_tree_next_slot()
and go to the slow-path in radix_tree_next_chunk().
Fixes: 46437f9a554f ("radix-tree: fix race in gang lookup")
Link: http://lkml.kernel.org/r/1468495196-10604-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 73f576c04b9410ed19660f74f97521bee6e1c546 upstream.
The memory controller has quite a bit of state that usually outlives the
cgroup and pins its CSS until said state disappears. At the same time
it imposes a 16-bit limit on the CSS ID space to economically store IDs
in the wild. Consequently, when we use cgroups to contain frequent but
small and short-lived jobs that leave behind some page cache, we quickly
run into the 64k limitations of outstanding CSSs. Creating a new cgroup
fails with -ENOSPC while there are only a few, or even no user-visible
cgroups in existence.
Although pinning CSSs past cgroup removal is common, there are only two
instances that actually need an ID after a cgroup is deleted: cache
shadow entries and swapout records.
Cache shadow entries reference the ID weakly and can deal with the CSS
having disappeared when it's looked up later. They pose no hurdle.
Swap-out records do need to pin the css to hierarchically attribute
swapins after the cgroup has been deleted; though the only pages that
remain swapped out after offlining are tmpfs/shmem pages. And those
references are under the user's control, so they are manageable.
This patch introduces a private 16-bit memcg ID and switches swap and
cache shadow entries over to using that. This ID can then be recycled
after offlining when the CSS remains pinned only by objects that don't
specifically need it.
This script demonstrates the problem by faulting one cache page in a new
cgroup and deleting it again:
set -e
mkdir -p pages
for x in `seq 128000`; do
[ $((x % 1000)) -eq 0 ] && echo $x
mkdir /cgroup/foo
echo $$ >/cgroup/foo/cgroup.procs
echo trex >pages/$x
echo $$ >/cgroup/cgroup.procs
rmdir /cgroup/foo
done
When run on an unpatched kernel, we eventually run out of possible IDs
even though there are no visible cgroups:
[root@ham ~]# ./cssidstress.sh
[...]
65000
mkdir: cannot create directory '/cgroup/foo': No space left on device
After this patch, the IDs get released upon cgroup destruction and the
cache and css objects get released once memory reclaim kicks in.
[hannes@cmpxchg.org: init the IDR]
Link: http://lkml.kernel.org/r/20160621154601.GA22431@cmpxchg.org
Fixes: b2052564e66d ("mm: memcontrol: continue cache reclaim from offlined groups")
Link: http://lkml.kernel.org/r/20160617162516.GD19084@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: John Garcia <john.garcia@mesosphere.io>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5a49973d7143ebbabd76e1dcd69ee42e349bb7b9 upstream.
The VM_BUG_ON_PAGE in page_move_anon_rmap() is more trouble than it's
worth: the syzkaller fuzzer hit it again. It's still wrong for some THP
cases, because linear_page_index() was never intended to apply to
addresses before the start of a vma.
That's easily fixed with a signed long cast inside linear_page_index();
and Dmitry has tested such a patch, to verify the false positive. But
why extend linear_page_index() just for this case? when the avoidance in
page_move_anon_rmap() has already grown ugly, and there's no reason for
the check at all (nothing else there is using address or index).
Remove address arg from page_move_anon_rmap(), remove VM_BUG_ON_PAGE,
remove CONFIG_DEBUG_VM PageTransHuge adjustment.
And one more thing: should the compound_head(page) be done inside or
outside page_move_anon_rmap()? It's usually pushed down to the lowest
level nowadays (and mm/memory.c shows no other explicit use of it), so I
think it's better done in page_move_anon_rmap() than by caller.
Fixes: 0798d3c022dc ("mm: thp: avoid false positive VM_BUG_ON_PAGE in page_move_anon_rmap()")
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1607120444540.12528@eggly.anvils
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e41f501d391265ff568f3e49d6128cc30856a36f upstream.
If CONFIG_KASAN is enabled and gcc is configured with
--disable-initfini-array and/or gold linker is used, gcc emits
.ctors/.dtors and .text.startup/.text.exit sections instead of
.init_array/.fini_array. .dtors section is not explicitly accounted in
the linker script and messes vvar/percpu layout.
We want:
ffffffff822bfd80 D _edata
ffffffff822c0000 D __vvar_beginning_hack
ffffffff822c0000 A __vvar_page
ffffffff822c0080 0000000000000098 D vsyscall_gtod_data
ffffffff822c1000 A __init_begin
ffffffff822c1000 D init_per_cpu__irq_stack_union
ffffffff822c1000 A __per_cpu_load
ffffffff822d3000 D init_per_cpu__gdt_page
We got:
ffffffff8279a600 D _edata
ffffffff8279b000 A __vvar_page
ffffffff8279c000 A __init_begin
ffffffff8279c000 D init_per_cpu__irq_stack_union
ffffffff8279c000 A __per_cpu_load
ffffffff8279e000 D __vvar_beginning_hack
ffffffff8279e080 0000000000000098 D vsyscall_gtod_data
ffffffff827ae000 D init_per_cpu__gdt_page
This happens because __vvar_page and .vvar get different addresses in
arch/x86/kernel/vmlinux.lds.S:
. = ALIGN(PAGE_SIZE);
__vvar_page = .;
.vvar : AT(ADDR(.vvar) - LOAD_OFFSET) {
/* work around gold bug 13023 */
__vvar_beginning_hack = .;
Discard .dtors/.fini_array/.text.exit, since we don't call dtors.
Merge .text.startup into init text.
Link: http://lkml.kernel.org/r/1467386363-120030-1-git-send-email-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 12cb22bb8ae9aff9d72a9c0a234f26d641b20eb6 upstream.
This header contains the userspace API for lirc.
This is a fixup for commit b7be755733dc ("[media] bz#75751: Move
internal header file lirc.h to uapi/"). It moved the header to the
right place, but it forgot to add it at Kbuild. So, despite being at
uapi, it is not copied to the right place.
Fixes: b7be755733dc44c72 ("[media] bz#75751: Move internal header file lirc.h to uapi/")
Link: http://lkml.kernel.org/r/320c765d32bfc82c582e336d52ffe1026c73c644.1468439021.git.mchehab@s-opensource.com
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Cc: Alec Leamas <leamas.alec@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit abb2bafd295fe962bbadc329dbfb2146457283ac upstream.
The EFI firmware on Macs contains a full-fledged network stack for
downloading OS X images from osrecovery.apple.com. Unfortunately
on Macs introduced 2011 and 2012, EFI brings up the Broadcom 4331
wireless card on every boot and leaves it enabled even after
ExitBootServices has been called. The card continues to assert its IRQ
line, causing spurious interrupts if the IRQ is shared. It also corrupts
memory by DMAing received packets, allowing for remote code execution
over the air. This only stops when a driver is loaded for the wireless
card, which may be never if the driver is not installed or blacklisted.
The issue seems to be constrained to the Broadcom 4331. Chris Milsted
has verified that the newer Broadcom 4360 built into the MacBookPro11,3
(2013/2014) does not exhibit this behaviour. The chances that Apple will
ever supply a firmware fix for the older machines appear to be zero.
The solution is to reset the card on boot by writing to a reset bit in
its mmio space. This must be done as an early quirk and not as a plain
vanilla PCI quirk to successfully combat memory corruption by DMAed
packets: Matthew Garrett found out in 2012 that the packets are written
to EfiBootServicesData memory (http://mjg59.dreamwidth.org/11235.html).
This type of memory is made available to the page allocator by
efi_free_boot_services(). Plain vanilla PCI quirks run much later, in
subsys initcall level. In-between a time window would be open for memory
corruption. Random crashes occurring in this time window and attributed
to DMAed packets have indeed been observed in the wild by Chris
Bainbridge.
When Matthew Garrett analyzed the memory corruption issue in 2012, he
sought to fix it with a grub quirk which transitions the card to D3hot:
http://git.savannah.gnu.org/cgit/grub.git/commit/?id=9d34bb85da56
This approach does not help users with other bootloaders and while it
may prevent DMAed packets, it does not cure the spurious interrupts
emanating from the card. Unfortunately the card's mmio space is
inaccessible in D3hot, so to reset it, we have to undo the effect of
Matthew's grub patch and transition the card back to D0.
Note that the quirk takes a few shortcuts to reduce the amount of code:
The size of BAR 0 and the location of the PM capability is identical
on all affected machines and therefore hardcoded. Only the address of
BAR 0 differs between models. Also, it is assumed that the BCMA core
currently mapped is the 802.11 core. The EFI driver seems to always take
care of this.
Michael Büsch, Bjorn Helgaas and Matt Fleming contributed feedback
towards finding the best solution to this problem.
The following should be a comprehensive list of affected models:
iMac13,1 2012 21.5" [Root Port 00:1c.3 = 8086:1e16]
iMac13,2 2012 27" [Root Port 00:1c.3 = 8086:1e16]
Macmini5,1 2011 i5 2.3 GHz [Root Port 00:1c.1 = 8086:1c12]
Macmini5,2 2011 i5 2.5 GHz [Root Port 00:1c.1 = 8086:1c12]
Macmini5,3 2011 i7 2.0 GHz [Root Port 00:1c.1 = 8086:1c12]
Macmini6,1 2012 i5 2.5 GHz [Root Port 00:1c.1 = 8086:1e12]
Macmini6,2 2012 i7 2.3 GHz [Root Port 00:1c.1 = 8086:1e12]
MacBookPro8,1 2011 13" [Root Port 00:1c.1 = 8086:1c12]
MacBookPro8,2 2011 15" [Root Port 00:1c.1 = 8086:1c12]
MacBookPro8,3 2011 17" [Root Port 00:1c.1 = 8086:1c12]
MacBookPro9,1 2012 15" [Root Port 00:1c.1 = 8086:1e12]
MacBookPro9,2 2012 13" [Root Port 00:1c.1 = 8086:1e12]
MacBookPro10,1 2012 15" [Root Port 00:1c.1 = 8086:1e12]
MacBookPro10,2 2012 13" [Root Port 00:1c.1 = 8086:1e12]
For posterity, spurious interrupts caused by the Broadcom 4331 wireless
card resulted in splats like this (stacktrace omitted):
irq 17: nobody cared (try booting with the "irqpoll" option)
handlers:
[<ffffffff81374370>] pcie_isr
[<ffffffffc0704550>] sdhci_irq [sdhci] threaded [<ffffffffc07013c0>] sdhci_thread_irq [sdhci]
[<ffffffffc0a0b960>] azx_interrupt [snd_hda_codec]
Disabling IRQ #17
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111781
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=728916
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951#c16
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1098621
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632#c5
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1279130
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1332732
Tested-by: Konstantin Simanov <k.simanov@stlk.ru> # [MacBookPro8,1]
Tested-by: Lukas Wunner <lukas@wunner.de> # [MacBookPro9,1]
Tested-by: Bryan Paradis <bryan.paradis@gmail.com> # [MacBookPro9,2]
Tested-by: Andrew Worsley <amworsley@gmail.com> # [MacBookPro10,1]
Tested-by: Chris Bainbridge <chris.bainbridge@gmail.com> # [MacBookPro10,2]
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Rafał Miłecki <zajec5@gmail.com>
Acked-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chris Milsted <cmilsted@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Michael Buesch <m@bues.ch>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: b43-dev@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Cc: linux-wireless@vger.kernel.org
Link: http://lkml.kernel.org/r/48d0972ac82a53d460e5fce77a07b2560db95203.1465690253.git.lukas@wunner.de
[ Did minor readability edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 94477bff390aa4612d2332c8abafaae0a13d6923 upstream.
There are cases where it is desired to see if a proposed placement
is compatible with a buffer object before calling ttm_bo_validate().
Signed-off-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a118084432d642eeccb961c7c8cc61525a941fcb upstream.
Needed by the following fix.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 82a31b9231f02d9c1b7b290a46999d517b0d312a ]
Similar to commit 9b368814b336 ("net: fix bridge multicast packet checksum validation")
we need to fixup the checksum for CHECKSUM_COMPLETE when
pushing skb on RX path. Otherwise we get similar splats.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit eb70db8756717b90c01ccc765fdefc4dd969fc74 ]
People who use PACKET_FANOUT_HASH want a symmetric hash, meaning that
they want packets going in both directions on a flow to hash to the
same bucket.
The core kernel SKB hash became non-symmetric when the ipv6 flow label
and other entities were incorporated into the standard flow hash order
to increase entropy.
But there are no users of PACKET_FANOUT_HASH who want an assymetric
hash, they all want a symmetric one.
Therefore, use the flow dissector to compute a flat symmetric hash
over only the protocol, addresses and ports. This hash does not get
installed into and override the normal skb hash, so this change has
no effect whatsoever on the rest of the stack.
Reported-by: Eric Leblond <eric@regit.org>
Tested-by: Eric Leblond <eric@regit.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 39a9beab5acb83176e8b9a4f0778749a09341f1f upstream.
The spec allows backchannels for multiple clients to share the same tcp
connection. When that happens, we need to use the same xprt for all of
them. Similarly, we need the same xps.
This fixes list corruption introduced by the multipath code.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Trond Myklebust <trondmy@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d50039ea5ee63c589b0434baa5ecf6e5075bb6f9 upstream.
Also simplify the logic a bit.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Trond Myklebust <trondmy@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9bd616e3dbedfc103f158197c8ad93678849b1ed upstream.
The cpuidle_devices per-CPU variable is only defined when CPU_IDLE is
enabled. Commit c8cc7d4de7a4 ("sched/idle: Reorganize the idle loop")
removed the #ifdef CONFIG_CPU_IDLE around cpuidle_idle_call() with the
compiler optimising away __this_cpu_read(cpuidle_devices). However, with
CONFIG_UBSAN && !CONFIG_CPU_IDLE, this optimisation no longer happens
and the kernel fails to link since cpuidle_devices is not defined.
This patch introduces an accessor function for the current CPU cpuidle
device (returning NULL when !CONFIG_CPU_IDLE) and uses it in
cpuidle_idle_call().
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4c5ea0a9cd02d6aa8adc86e100b2a4cff8d614ff upstream.
The following scenario is possible:
CPU 1 CPU 2
static_key_slow_inc()
atomic_inc_not_zero()
-> key.enabled == 0, no increment
jump_label_lock()
atomic_inc_return()
-> key.enabled == 1 now
static_key_slow_inc()
atomic_inc_not_zero()
-> key.enabled == 1, inc to 2
return
** static key is wrong!
jump_label_update()
jump_label_unlock()
Testing the static key at the point marked by (**) will follow the
wrong path for jumps that have not been patched yet. This can
actually happen when creating many KVM virtual machines with userspace
LAPIC emulation; just run several copies of the following program:
#include <fcntl.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <linux/kvm.h>
int main(void)
{
for (;;) {
int kvmfd = open("/dev/kvm", O_RDONLY);
int vmfd = ioctl(kvmfd, KVM_CREATE_VM, 0);
close(ioctl(vmfd, KVM_CREATE_VCPU, 1));
close(vmfd);
close(kvmfd);
}
return 0;
}
Every KVM_CREATE_VCPU ioctl will attempt a static_key_slow_inc() call.
The static key's purpose is to skip NULL pointer checks and indeed one
of the processes eventually dereferences NULL.
As explained in the commit that introduced the bug:
706249c222f6 ("locking/static_keys: Rework update logic")
jump_label_update() needs key.enabled to be true. The solution adopted
here is to temporarily make key.enabled == -1, and use go down the
slow path when key.enabled <= 0.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 706249c222f6 ("locking/static_keys: Rework update logic")
Link: http://lkml.kernel.org/r/1466527937-69798-1-git-send-email-pbonzini@redhat.com
[ Small stylistic edits to the changelog and the code. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2c610022711675ee908b903d242f0b90e1db661f upstream.
While this prior commit:
54cf809b9512 ("locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()")
... fixes spin_is_locked() and spin_unlock_wait() for the usage
in ipc/sem and netfilter, it does not in fact work right for the
usage in task_work and futex.
So while the 2 locks crossed problem:
spin_lock(A) spin_lock(B)
if (!spin_is_locked(B)) spin_unlock_wait(A)
foo() foo();
... works with the smp_mb() injected by both spin_is_locked() and
spin_unlock_wait(), this is not sufficient for:
flag = 1;
smp_mb(); spin_lock()
spin_unlock_wait() if (!flag)
// add to lockless list
// iterate lockless list
... because in this scenario, the store from spin_lock() can be delayed
past the load of flag, uncrossing the variables and loosing the
guarantee.
This patch reworks spin_is_locked() and spin_unlock_wait() to work in
both cases by exploiting the observation that while the lock byte
store can be delayed, the contender must have registered itself
visibly in other state contained in the word.
It also allows for architectures to override both functions, as PPC
and ARM64 have an additional issue for which we currently have no
generic solution.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Giovanni Gherdovich <ggherdovich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <waiman.long@hpe.com>
Cc: Will Deacon <will.deacon@arm.com>
Fixes: 54cf809b9512 ("locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7e8b3dfef16375dbfeb1f36a83eb9f27117c51fd upstream.
The HOSTPC extension registers found in some EHCI implementations form
a variable-length array, with one element for each port. Therefore
the hostpc field in struct ehci_regs should be declared as a
zero-length array, not a single-element array.
This fixes a problem reported by UBSAN.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Wilfried Klaebe <linux-kernel@lebenslange-mailadresse.de>
Tested-by: Wilfried Klaebe <linux-kernel@lebenslange-mailadresse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c755f4afa66ad3ed98870bd3254f37c47fb2c800 upstream.
The current drivers return errors from this calldown
wrapped in an ERR_PTR().
The rdmavt code incorrectly tests for NULL.
The code is fixed to use IS_ERR() and change ret according
to the driver return value.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 47355b3cd7d3c9c5226bff7c449b9d269fb17fa6 upstream.
ib_device_cap_flags 64-bit expansion caused caps overlapping
and made consumers read wrong device capabilities. For example
IB_DEVICE_SG_GAPS_REG was falsely read by the iser driver causing
it to use a non-existing capability. This happened because signed
int becomes sign extended when converted it to u64. Fix this by
casting IB_DEVICE_ON_DEMAND_PAGING enumeration to ULL.
Fixes: f5aa9159a418 ('IB/core: Add arbitrary sg_list support')
Reported-by: Robert LeBlanc <robert@leblancnet.us>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ceb56070359b7329b5678b5d95a376fcb24767be ]
Commit dead9f29ddcc ("perf: Fix race in BPF program unregister") moved
destruction of BPF program from free_event_rcu() callback to __free_event(),
which is problematic if used with tail calls: if prog A is attached as
trace event directly, but at the same time present in a tail call map used
by another trace event program elsewhere, then we need to delay destruction
via RCU grace period since it can still be in use by the program doing the
tail call (the prog first needs to be dropped from the tail call map, then
trace event with prog A attached destroyed, so we get immediate destruction).
Fixes: dead9f29ddcc ("perf: Fix race in BPF program unregister")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Jann Horn <jann@thejh.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 9a0fee2b552b1235fb1706ae1fc664ae74573be8 ]
Diag intends to broadcast tcp_sk and udp_sk socket destruction.
Testing sk->sk_protocol for IPPROTO_TCP/IPPROTO_UDP alone is not
sufficient for this. Raw sockets can have the same type.
Add a test for sk->sk_type.
Fixes: eb4cb008529c ("sock_diag: define destruction multicast groups")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit daddef76c3deaaa7922f9d7b18edbf0a061215c3 ]
The implementation of net_dbg_ratelimited in the CONFIG_DYNAMIC_DEBUG
case was added with 2c94b5373 ("net: Implement net_dbg_ratelimited() for
CONFIG_DYNAMIC_DEBUG case"). The implementation strategy was to take the
usual definition of the dynamic_pr_debug macro, but alter it by adding a
call to "net_ratelimit()" in the if statement. This is, in fact, the
correct approach.
However, while doing this, the author of the commit forgot to surround
fmt by pr_fmt, resulting in unprefixed log messages appearing in the
console. So, this commit adds back the pr_fmt(fmt) invocation, making
net_dbg_ratelimited properly consistent across DEBUG, no DEBUG, and
DYNAMIC_DEBUG cases, and bringing parity with the behavior of
dynamic_pr_debug as well.
Fixes: 2c94b5373 ("net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Tim Bingham <tbingham@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d7591f0c41ce3e67600a982bab6989ef0f07b3ce upstream.
The three variants use same copy&pasted code, condense this into a
helper and use that.
Make sure info.name is 0-terminated.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0188346f21e6546498c2a0f84888797ad4063fc5 upstream.
Always returned 0.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ce683e5f9d045e5d67d1312a42b359cb2ab2a13c upstream.
We're currently asserting that targetoff + targetsize <= nextoff.
Extend it to also check that targetoff is >= sizeof(xt_entry).
Since this is generic code, add an argument pointing to the start of the
match/target, we can then derive the base structure size from the delta.
We also need the e->elems pointer in a followup change to validate matches.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit fc1221b3a163d1386d1052184202d5dc50d302d1 upstream.
32bit rulesets have different layout and alignment requirements, so once
more integrity checks get added to xt_check_entry_offsets it will reject
well-formed 32bit rulesets.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7d35812c3214afa5b37a675113555259cfd67b98 upstream.
Currently arp/ip and ip6tables each implement a short helper to check that
the target offset is large enough to hold one xt_entry_target struct and
that t->u.target_size fits within the current rule.
Unfortunately these checks are not sufficient.
To avoid adding new tests to all of ip/ip6/arptables move the current
checks into a helper, then extend this helper in followup patches.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit dd5f1b049dc139876801db3cdd0f20d21fd428cc upstream.
The INTID mask is wrong, and is made a signed value, which has
nteresting effects in the KVM emulation. Let's sanitize it.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit f0a3fdca794d1e68ae284ef4caefe681f7c18e89 ]
These structures are defined only if __USE_MISC is set in glibc net/if.h
headers, ie when _BSD_SOURCE or _SVID_SOURCE are defined.
CC: Jan Engelhardt <jengelh@inai.de>
CC: Josh Boyer <jwboyer@fedoraproject.org>
CC: Stephen Hemminger <shemming@brocade.com>
CC: Waldemar Brodkorb <mail@waldemar-brodkorb.de>
CC: Gabriel Laskar <gabriel@lse.epita.fr>
CC: Mikko Rapeli <mikko.rapeli@iki.fi>
Fixes: 4a91cb61bb99 ("uapi glibc compat: fix compile errors when glibc net/if.h included before linux/if.h")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit da4ed55165d41b1073f9a476f1c18493e9bf8c8e ]
The problem is that fib_info->nh is [0] so the struct fib_info
allocation size depends on number of nexthops. If we just copy fib_info,
we do not copy the nexthops info and driver accesses memory which is not
ours.
Given the fact that fib4 does not defer operations and therefore it does
not need copy, just pass the pointer down to drivers as it was done
before.
Fixes: 850d0cbc91 ("switchdev: remove pointers from switchdev objects")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 310944d148e3600dcff8b346bee7fa01d34903b1 upstream.
The component master driver imx-drm-core matches component devices using
their of_node. Since commit 950b410dd1ab ("gpu: ipu-v3: Fix imx-ipuv3-crtc
module autoloading"), the imx-ipuv3-crtc dev->of_node is not set during
probing. Before that, of_node was set and caused an of: modalias to be
used instead of the platform: modalias, which broke module autoloading.
On the other hand, if dev->of_node is not set yet when the imx-ipuv3-crtc
probe function calls component_add, component matching in imx-drm-core
fails. While dev->of_node will be set once the next component tries to
bring up the component master, imx-drm-core component binding will never
succeed if one of the crtc devices is probed last.
Add of_node to the component platform data and match against the
pdata->of_node instead of dev->of_node in imx-drm-core to work around
this problem.
Fixes: 950b410dd1ab ("gpu: ipu-v3: Fix imx-ipuv3-crtc module autoloading")
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Tested-by: Fabio Estevam <fabio.estevam@nxp.com>
Tested-by: Lothar Waßmann <LW@KARO-electronics.de>
Tested-by: Heiko Schocher <hs@denx.de>
Tested-by: Chris Ruehl <chris.ruehl@gtsys.com.hk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b3daa5ef52c26acd7432c787989bd92d48070c76 upstream.
Add a helper which aids in the identification of DP dual mode
(aka. DP++) adaptors. There are several types of adaptors
specified: type 1 DVI, type 1 HDMI, type 2 DVI, type 2 HDMI
Type 1 adaptors have a max TMDS clock limit of 165MHz, type 2 adaptors
may go as high as 300MHz and they provide a register informing the
source device what the actual limit is. Supposedly also type 1 adaptors
may optionally implement this register. This TMDS clock limit is the
main reason why we need to identify these adaptors.
Type 1 adaptors provide access to their internal registers and the sink
DDC bus through I2C. Type 2 adaptors provide this access both via I2C
and I2C-over-AUX. A type 2 source device may choose to implement either
of these methods. If a source device implements the I2C-over-AUX
method, then the driver will obviously need specific support for such
adaptors since the port is driven like an HDMI port, but DDC
communication happes over the AUX channel.
This helper should be enough to identify the adaptor type (some
type 1 DVI adaptors may be a slight exception) and the maximum TMDS
clock limit. Another feature that may be available is control over
the TMDS output buffers on the adaptor, possibly allowing for some
power saving when the TMDS link is down.
Other user controllable features that may be available in the adaptors
are downstream i2c bus speed control when using i2c-over-aux, and
some control over the CEC pin. I chose not to provide any helper
functions for those since I have no use for them in i915 at this time.
The rest of the registers in the adaptor are mostly just information,
eg. IEEE OUI, hardware and firmware revision, etc.
v2: Pass adaptor type to helper functions to ease driver implementation
Fix a bunch of typoes (Paulo)
Add DRM_DP_DUAL_MODE_UNKNOWN for the case where we don't (yet) know
the type (Paulo)
Reject 0x00 and 0xff DP_DUAL_MODE_MAX_TMDS_CLOCK values (Paulo)
Adjust drm_dp_dual_mode_detect() type2 vs. type1 detection to
ease future LSPCON enabling
Remove the unused DP_DUAL_MODE_LAST_RESERVED define
v3: Fix kernel doc function argument descriptions (Jani)
s/NONE/UNKNOWN/ in drm_dp_dual_mode_detect() docs
Add kernel doc for enum drm_dp_dual_mode_type
Actually build the docs
Fix more typoes
v4: Adjust code indentation of type2 adaptor detection (Shashank)
Add debug messages for failurs cases (Shashank)
v5: EXPORT_SYMBOL(drm_dp_dual_mode_read) (Paulo)
Cc: Tore Anderson <tore@fud.no>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Shashank Sharma <shashank.sharma@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Shashank Sharma <shashank.sharma@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Shashank Sharma <shashank.sharma@intel.com> (v4)
Link: http://patchwork.freedesktop.org/patch/msgid/1462542412-25533-1-git-send-email-ville.syrjala@linux.intel.com
(cherry picked from commit ede53344dbfd1dd43bfd73eb6af743d37c56a7c3)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4b50bcc7eda4d3cc9e3f2a0aa60e590fedf728c5 upstream.
Since commit 92923ca3aace ("mm: meminit: only set page reserved in the
memblock region") the reserved bit is set on reserved memblock regions.
However start and end address are passed as unsigned long. This is only
32bit on i386, so it can end up marking the wrong pages reserved for
ranges at 4GB and above.
This was observed on a 32bit Xen dom0 which was booted with initial
memory set to a value below 4G but allowing to balloon in memory
(dom0_mem=1024M for example). This would define a reserved bootmem
region for the additional memory (for example on a 8GB system there was
a reverved region covering the 4GB-8GB range). But since the addresses
were passed on as unsigned long, this was actually marking all pages
from 0 to 4GB as reserved.
Fixes: 92923ca3aacef63 ("mm: meminit: only set page reserved in the memblock region")
Link: http://lkml.kernel.org/r/1463491221-10573-1-git-send-email-stefan.bader@canonical.com
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f05795d3d771f30a7bdc3a138bf714b06d42aa95 upstream.
Add intermediate STARGET_REMOVE state to scsi_target_state to avoid
running into the BUG_ON() in scsi_target_reap(). The STARGET_REMOVE
state is only valid in the path from scsi_remove_target() to
scsi_target_destroy() indicating this target is going to be removed.
This re-fixes the problem introduced in commits bc3f02a795d3 ("[SCSI]
scsi_remove_target: fix softlockup regression on hot remove") and
40998193560d ("scsi: restart list search after unlock in
scsi_remove_target") in a more comprehensive way.
[mkp: Included James' fix for scsi_target_destroy()]
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Fixes: 40998193560dab6c3ce8d25f4fa58a23e252ef38
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: James Bottomley <jejb@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ca9eb49aa9562eaadf3cea071ec7018ad6800425 upstream.
The generic copy_siginfo() is currently defined in
asm-generic/siginfo.h, after including uapi/asm-generic/siginfo.h which
defines the generic struct siginfo. However this makes it awkward for an
architecture to use it if it has to define its own struct siginfo (e.g.
MIPS and potentially IA64), since it means that asm-generic/siginfo.h
can only be included after defining the arch-specific siginfo, which may
be problematic if the arch-specific definition needs definitions from
uapi/asm-generic/siginfo.h.
It is possible to work around this by first including
uapi/asm-generic/siginfo.h to get the constants before defining the
arch-specific siginfo, and include asm-generic/siginfo.h after. However
uapi headers can't be included by other uapi headers, so that first
include has to be in an ifdef __kernel__, with the non __kernel__ case
including the non-UAPI header instead.
Instead of that mess, move the generic copy_siginfo() definition into
linux/signal.h, which allows an arch-specific uapi/asm/siginfo.h to
include asm-generic/siginfo.h and define the arch-specific siginfo, and
for the generic copy_siginfo() to see that arch-specific definition.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Petr Malat <oss@malat.biz>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Christopher Ferris <cferris@google.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/12478/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 54cf809b9512be95f53ed4a5e3b631d1ac42f0fa upstream.
Similar to commits:
51d7d5205d33 ("powerpc: Add smp_mb() to arch_spin_is_locked()")
d86b8da04dfa ("arm64: spinlock: serialise spin_unlock_wait against concurrent lockers")
qspinlock suffers from the fact that the _Q_LOCKED_VAL store is
unordered inside the ACQUIRE of the lock.
And while this is not a problem for the regular mutual exclusive
critical section usage of spinlocks, it breaks creative locking like:
spin_lock(A) spin_lock(B)
spin_unlock_wait(B) if (!spin_is_locked(A))
do_something() do_something()
In that both CPUs can end up running do_something at the same time,
because our _Q_LOCKED_VAL store can drop past the spin_unlock_wait()
spin_is_locked() loads (even on x86!!).
To avoid making the normal case slower, add smp_mb()s to the less used
spin_unlock_wait() / spin_is_locked() side of things to avoid this
problem.
Reported-and-tested-by: Davidlohr Bueso <dave@stgolabs.net>
Reported-by: Giovanni Gherdovich <ggherdovich@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0f40fbbcc34e093255a2b2d70b6b0fb48c3f39aa upstream.
OpenSSH expects the (non-blocking) read() of pty master to return
EAGAIN only if it has received all of the slave-side output after
it has received SIGCHLD. This used to work on pre-3.12 kernels.
This fix effectively forces non-blocking read() and poll() to
block for parallel i/o to complete for all ttys. It also unwinds
these changes:
1) f8747d4a466ab2cafe56112c51b3379f9fdb7a12
tty: Fix pty master read() after slave closes
2) 52bce7f8d4fc633c9a9d0646eef58ba6ae9a3b73
pty, n_tty: Simplify input processing on final close
3) 1a48632ffed61352a7810ce089dc5a8bcd505a60
pty: Fix input race when closing
Inspired by analysis and patch from Marc Aurele La France <tsi@tuyoix.net>
Reported-by: Volth <openssh@volth.com>
Reported-by: Marc Aurele La France <tsi@tuyoix.net>
BugLink: https://bugzilla.mindrot.org/show_bug.cgi?id=52
BugLink: https://bugzilla.mindrot.org/show_bug.cgi?id=2492
Signed-off-by: Brian Bloniarz <brian.bloniarz@gmail.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit feb26ac31a2a5cb88d86680d9a94916a6343e9e6 upstream.
The XHCI controller presents two USB buses to the system - one for USB2
and one for USB3. The hub init code (hub_port_init) is reentrant but
only locks one bus per thread, leading to a race condition failure when
two threads attempt to simultaneously initialise a USB2 and USB3 device:
[ 8.034843] xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command
[ 13.183701] usb 3-3: device descriptor read/all, error -110
On a test system this failure occurred on 6% of all boots.
The call traces at the point of failure are:
Call Trace:
[<ffffffff81b9bab7>] schedule+0x37/0x90
[<ffffffff817da7cd>] usb_kill_urb+0x8d/0xd0
[<ffffffff8111e5e0>] ? wake_up_atomic_t+0x30/0x30
[<ffffffff817dafbe>] usb_start_wait_urb+0xbe/0x150
[<ffffffff817db10c>] usb_control_msg+0xbc/0xf0
[<ffffffff817d07de>] hub_port_init+0x51e/0xb70
[<ffffffff817d4697>] hub_event+0x817/0x1570
[<ffffffff810f3e6f>] process_one_work+0x1ff/0x620
[<ffffffff810f3dcf>] ? process_one_work+0x15f/0x620
[<ffffffff810f4684>] worker_thread+0x64/0x4b0
[<ffffffff810f4620>] ? rescuer_thread+0x390/0x390
[<ffffffff810fa7f5>] kthread+0x105/0x120
[<ffffffff810fa6f0>] ? kthread_create_on_node+0x200/0x200
[<ffffffff81ba183f>] ret_from_fork+0x3f/0x70
[<ffffffff810fa6f0>] ? kthread_create_on_node+0x200/0x200
Call Trace:
[<ffffffff817fd36d>] xhci_setup_device+0x53d/0xa40
[<ffffffff817fd87e>] xhci_address_device+0xe/0x10
[<ffffffff817d047f>] hub_port_init+0x1bf/0xb70
[<ffffffff811247ed>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff817d4697>] hub_event+0x817/0x1570
[<ffffffff810f3e6f>] process_one_work+0x1ff/0x620
[<ffffffff810f3dcf>] ? process_one_work+0x15f/0x620
[<ffffffff810f4684>] worker_thread+0x64/0x4b0
[<ffffffff810f4620>] ? rescuer_thread+0x390/0x390
[<ffffffff810fa7f5>] kthread+0x105/0x120
[<ffffffff810fa6f0>] ? kthread_create_on_node+0x200/0x200
[<ffffffff81ba183f>] ret_from_fork+0x3f/0x70
[<ffffffff810fa6f0>] ? kthread_create_on_node+0x200/0x200
Which results from the two call chains:
hub_port_init
usb_get_device_descriptor
usb_get_descriptor
usb_control_msg
usb_internal_control_msg
usb_start_wait_urb
usb_submit_urb / wait_for_completion_timeout / usb_kill_urb
hub_port_init
hub_set_address
xhci_address_device
xhci_setup_device
Mathias Nyman explains the current behaviour violates the XHCI spec:
hub_port_reset() will end up moving the corresponding xhci device slot
to default state.
As hub_port_reset() is called several times in hub_port_init() it
sounds reasonable that we could end up with two threads having their
xhci device slots in default state at the same time, which according to
xhci 4.5.3 specs still is a big no no:
"Note: Software shall not transition more than one Device Slot to the
Default State at a time"
So both threads fail at their next task after this.
One fails to read the descriptor, and the other fails addressing the
device.
Fix this in hub_port_init by locking the USB controller (instead of an
individual bus) to prevent simultaneous initialisation of both buses.
Fixes: 638139eb95d2 ("usb: hub: allow to process more usb hub events in parallel")
Link: https://lkml.org/lkml/2016/2/8/312
Link: https://lkml.org/lkml/2016/2/4/748
Signed-off-by: Chris Bainbridge <chris.bainbridge@gmail.com>
Acked-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6fb650d43da3e7054984dc548eaa88765a94d49f upstream.
When a USB driver is bound to an interface (either through probing or
by claiming it) or is unbound from an interface, the USB core always
disables Link Power Management during the transition and then
re-enables it afterward. The reason is because the driver might want
to prevent hub-initiated link power transitions, in which case the HCD
would have to recalculate the various LPM parameters. This
recalculation takes place when LPM is re-enabled and the new
parameters are sent to the device and its parent hub.
However, if the driver does not want to prevent hub-initiated link
power transitions then none of this work is necessary. The parameters
don't need to be recalculated, and LPM doesn't need to be disabled and
re-enabled.
It turns out that disabling and enabling LPM can be time-consuming,
enough so that it interferes with user programs that want to claim and
release interfaces rapidly via usbfs. Since the usbfs kernel driver
doesn't set the disable_hub_initiated_lpm flag, we can speed things up
and get the user programs to work by leaving LPM alone whenever the
flag isn't set.
And while we're improving the way disable_hub_initiated_lpm gets used,
let's also fix its kerneldoc.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Matthew Giassa <matthew@giassa.net>
CC: Mathias Nyman <mathias.nyman@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit bb208f144cf3f59d8f89a09a80efd04389718907 upstream.
As described in 'can: m_can: tag current CAN FD controllers as non-ISO'
(6cfda7fbebe) it is possible to define fixed configuration options by
setting the according bit in 'ctrlmode' and clear it in 'ctrlmode_supported'.
This leads to the incovenience that the fixed configuration bits can not be
passed by netlink even when they have the correct values (e.g. non-ISO, FD).
This patch fixes that issue and not only allows fixed set bit values to be set
again but now requires(!) to provide these fixed values at configuration time.
A valid CAN FD configuration consists of a nominal/arbitration bittiming, a
data bittiming and a control mode with CAN_CTRLMODE_FD set - which is now
enforced by a new can_validate() function. This fix additionally removed the
inconsistency that was prohibiting the support of 'CANFD-only' controller
drivers, like the RCar CAN FD.
For this reason a new helper can_set_static_ctrlmode() has been introduced to
provide a proper interface to handle static enabled CAN controller options.
Reported-by: Ramesh Shanmugasundaram <ramesh.shanmugasundaram@bp.renesas.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Reviewed-by: Ramesh Shanmugasundaram <ramesh.shanmugasundaram@bp.renesas.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b5a7aef1ef436ec005fef0efe31a676ec5f4ab31 upstream.
This patch allows fscrypto to handle a second key prefix given by filesystem.
The main reason is to provide backward compatibility, since previously f2fs
used "f2fs:" as a crypto prefix instead of "fscrypt:".
Later, ext4 should also provide key_prefix() to give "ext4:".
One concern decribed by Ted would be kinda double check overhead of prefixes.
In x86, for example, validate_user_key consumes 8 ms after boot-up, which turns
out derive_key_aes() consumed most of the time to load specific crypto module.
After such the cold miss, it shows almost zero latencies, which treats as a
negligible overhead.
Note that request_key() detects wrong prefix in prior to derive_key_aes() even.
Cc: Ted Tso <tytso@mit.edu>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
"Overlayfs fixes from Miklos, assorted fixes from me.
Stable fodder of varying severity, all sat in -next for a while"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ovl: ignore permissions on underlying lookup
vfs: add lookup_hash() helper
vfs: rename: check backing inode being equal
vfs: add vfs_select_inode() helper
get_rock_ridge_filename(): handle malformed NM entries
ecryptfs: fix handling of directory opening
atomic_open(): fix the handling of create_error
fix the copy vs. map logics in blk_rq_map_user_iov()
do_splice_to(): cap the size before passing to ->splice_read()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
"During v4.6-rc1 cgroup namespace support was merged. There is an
issue where it's impossible to tell whether a given cgroup mount point
is bind mounted or namespaced. Serge has been working on the issue
but it took longer than expected to resolve, so the late pull request.
Given that it's a completely new feature and the patches don't touch
anything else, the risk seems acceptable. However, if this is too
late, an alternative is plugging new cgroup ns creation for v4.6 and
retrying for v4.7"
* 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: fix compile warning
kernfs: kernfs_sop_show_path: don't return 0 after seq_dentry call
cgroup, kernfs: make mountinfo show properly scoped path for cgroup namespaces
kernfs_path_from_node_locked: don't overwrite nlen
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A small collection of driver specific fixes for the regulator
subsysetem:
- Fix handling of probe deferral for GPIO regulators
- Fix a typo in the module alias for DA9053
- Fix the definition of BUCK9 in the S2MPS11 driver. This change
looks larger than it is because an irregularity in the hardware
means that the macro used to define bucks 6-10 needs duplicating
and tweaking to have a separate macro for 9
- Fix a series of errors in the definitions of the LDOs the AXP20x
regulators, some of which had always been present and some of which
were introduced in the merge window"
* tag 'regulator-fix-v4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: da9063: Correct module alias prefix to fix module autoloading
regulator: axp20x: Fix axp22x ldo_io registration error on cold boot
regulator: axp20x: Fix axp22x ldo_io voltage ranges
regulator: axp20x: Fix LDO4 linear voltage range
regulator: s2mps11: Fix invalid selector mask and voltages for buck9
regulator: gpio: check return value of of_get_named_gpio
|
|
'regulator/fix/da9063', 'regulator/fix/gpio' and 'regulator/fix/s2mps11' into regulator-linus
|
|
This will provide fully accuracy to the mapcount calculation in the
write protect faults, so page pinning will not get broken by false
positive copy-on-writes.
total_mapcount() isn't the right calculation needed in
reuse_swap_page(), so this introduces a page_trans_huge_mapcount()
that is effectively the full accurate return value for page_mapcount()
if dealing with Transparent Hugepages, however we only use the
page_trans_huge_mapcount() during COW faults where it strictly needed,
due to its higher runtime cost.
This also provide at practical zero cost the total_mapcount
information which is needed to know if we can still relocate the page
anon_vma to the local vma. If page_trans_huge_mapcount() returns 1 we
can reuse the page no matter if it's a pte or a pmd_trans_huge
triggering the fault, but we can only relocate the page anon_vma to
the local vma->anon_vma if we're sure it's only this "vma" mapping the
whole THP physical range.
Kirill A. Shutemov discovered the problem with moving the page
anon_vma to the local vma->anon_vma in a previous version of this
patch and another problem in the way page_move_anon_rmap() was called.
Andrew Morton discovered that CONFIG_SWAP=n wouldn't build in a
previous version, because reuse_swap_page must be a macro to call
page_trans_huge_mapcount from swap.h, so this uses a macro again
instead of an inline function. With this change at least it's a less
dangerous usage than it was before, because "page" is used only once
now, while with the previous code reuse_swap_page(page++) would have
called page_mapcount on page+1 and it would have increased page twice
instead of just once.
Dean Luick noticed an uninitialized variable that could result in a
rmap inefficiency for the non-THP case in a previous version.
Mike Marciniszyn said:
: Our RDMA tests are seeing an issue with memory locking that bisects to
: commit 61f5d698cc97 ("mm: re-enable THP")
:
: The test program registers two rather large MRs (512M) and RDMA
: writes data to a passive peer using the first and RDMA reads it back
: into the second MR and compares that data. The sizes are chosen randomly
: between 0 and 1024 bytes.
:
: The test will get through a few (<= 4 iterations) and then gets a
: compare error.
:
: Tracing indicates the kernel logical addresses associated with the individual
: pages at registration ARE correct , the data in the "RDMA read response only"
: packets ARE correct.
:
: The "corruption" occurs when the packet crosse two pages that are not physically
: contiguous. The second page reads back as zero in the program.
:
: It looks like the user VA at the point of the compare error no longer points to
: the same physical address as was registered.
:
: This patch totally resolves the issue!
Link: http://lkml.kernel.org/r/1462547040-1737-2-git-send-email-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Tested-by: Josh Collier <josh.d.collier@intel.com>
Cc: Marc Haber <mh+linux-kernel@zugschlus.de>
Cc: <stable@vger.kernel.org> [4.5]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
Overlayfs needs lookup without inode_permission() and already has the name
hash (in form of dentry->d_name on overlayfs dentry). It also doesn't
support filesystems with d_op->d_hash() so basically it only needs
the actual hashed lookup from lookup_one_len_unlocked()
So add a new helper that does unlocked lookup of a hashed name.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org> # v4.2+
|
|
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|