Age | Commit message (Collapse) | Author |
|
commit d8f9cc328c8888369880e2527e9186d745f2bbf6 upstream.
To allow rereg_user_mr to modify the MR from read-only to writable without
using get_user_pages again, we needed to define the initial MR as writable.
However, this was originally done unconditionally, without taking into
account the writability of the underlying virtual memory.
As a result, any attempt to register a read-only MR over read-only
virtual memory failed.
To fix this, do not add the writable flag bit when the user virtual memory
is not writable (e.g. const memory).
However, when the underlying memory is NOT writable (and we therefore
do not define the initial MR as writable), the IB core adds a
"force writable" flag to its user-pages request. If this succeeds,
the reg_user_mr caller gets a writable copy of the original pages.
If the user-space caller then does a rereg_user_mr operation to enable
writability, this will succeed. This should not be allowed, since
the original virtual memory was not writable.
Cc: <stable@vger.kernel.org>
Fixes: 9376932d0c26 ("IB/mlx4_ib: Add support for user MR re-registration")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 08bb558ac11ab944e0539e78619d7b4c356278bd upstream.
Make the MR writability flags check, which is performed in umem.c,
a static inline function in file ib_verbs.h
This allows the function to be used by low-level infiniband drivers.
Cc: <stable@vger.kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2fd1d2c4ceb2248a727696962cf3370dc9f5a0a4 upstream.
Andrei Vagin writes:
FYI: This bug has been reproduced on 4.11.7
> BUG: Dentry ffff895a3dd01240{i=4e7c09a,n=lo} still in use (1) [unmount of proc proc]
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 13588 at fs/dcache.c:1445 umount_check+0x6e/0x80
> CPU: 1 PID: 13588 Comm: kworker/1:1 Not tainted 4.11.7-200.fc25.x86_64 #1
> Hardware name: CompuLab sbc-flt1/fitlet, BIOS SBCFLT_0.08.04 06/27/2015
> Workqueue: events proc_cleanup_work
> Call Trace:
> dump_stack+0x63/0x86
> __warn+0xcb/0xf0
> warn_slowpath_null+0x1d/0x20
> umount_check+0x6e/0x80
> d_walk+0xc6/0x270
> ? dentry_free+0x80/0x80
> do_one_tree+0x26/0x40
> shrink_dcache_for_umount+0x2d/0x90
> generic_shutdown_super+0x1f/0xf0
> kill_anon_super+0x12/0x20
> proc_kill_sb+0x40/0x50
> deactivate_locked_super+0x43/0x70
> deactivate_super+0x5a/0x60
> cleanup_mnt+0x3f/0x90
> mntput_no_expire+0x13b/0x190
> kern_unmount+0x3e/0x50
> pid_ns_release_proc+0x15/0x20
> proc_cleanup_work+0x15/0x20
> process_one_work+0x197/0x450
> worker_thread+0x4e/0x4a0
> kthread+0x109/0x140
> ? process_one_work+0x450/0x450
> ? kthread_park+0x90/0x90
> ret_from_fork+0x2c/0x40
> ---[ end trace e1c109611e5d0b41 ]---
> VFS: Busy inodes after unmount of proc. Self-destruct in 5 seconds. Have a nice day...
> BUG: unable to handle kernel NULL pointer dereference at (null)
> IP: _raw_spin_lock+0xc/0x30
> PGD 0
Fix this by taking a reference to the super block in proc_sys_prune_dcache.
The superblock reference is the core of the fix however the sysctl_inodes
list is converted to a hlist so that hlist_del_init_rcu may be used. This
allows proc_sys_prune_dache to remove inodes the sysctl_inodes list, while
not causing problems for proc_sys_evict_inode when if it later choses to
remove the inode from the sysctl_inodes list. Removing inodes from the
sysctl_inodes list allows proc_sys_prune_dcache to have a progress
guarantee, while still being able to drop all locks. The fact that
head->unregistering is set in start_unregistering ensures that no more
inodes will be added to the the sysctl_inodes list.
Previously the code did a dance where it delayed calling iput until the
next entry in the list was being considered to ensure the inode remained on
the sysctl_inodes list until the next entry was walked to. The structure
of the loop in this patch does not need that so is much easier to
understand and maintain.
Cc: stable@vger.kernel.org
Reported-by: Andrei Vagin <avagin@gmail.com>
Tested-by: Andrei Vagin <avagin@openvz.org>
Fixes: ace0c791e6c3 ("proc/sysctl: Don't grab i_lock under sysctl_lock.")
Fixes: d6cffbbe9a7e ("proc/sysctl: prune stale dentries during unregistering")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ace0c791e6c3cf5ef37cad2df69f0d90ccc40ffb upstream.
Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
> This patch has locking problem. I've got lockdep splat under LTP.
>
> [ 6633.115456] ======================================================
> [ 6633.115502] [ INFO: possible circular locking dependency detected ]
> [ 6633.115553] 4.9.10-debug+ #9 Tainted: G L
> [ 6633.115584] -------------------------------------------------------
> [ 6633.115627] ksm02/284980 is trying to acquire lock:
> [ 6633.115659] (&sb->s_type->i_lock_key#4){+.+...}, at: [<ffffffff816bc1ce>] igrab+0x1e/0x80
> [ 6633.115834] but task is already holding lock:
> [ 6633.115882] (sysctl_lock){+.+...}, at: [<ffffffff817e379b>] unregister_sysctl_table+0x6b/0x110
> [ 6633.116026] which lock already depends on the new lock.
> [ 6633.116026]
> [ 6633.116080]
> [ 6633.116080] the existing dependency chain (in reverse order) is:
> [ 6633.116117]
> -> #2 (sysctl_lock){+.+...}:
> -> #1 (&(&dentry->d_lockref.lock)->rlock){+.+...}:
> -> #0 (&sb->s_type->i_lock_key#4){+.+...}:
>
> d_lock nests inside i_lock
> sysctl_lock nests inside d_lock in d_compare
>
> This patch adds i_lock nesting inside sysctl_lock.
Al Viro <viro@ZenIV.linux.org.uk> replied:
> Once ->unregistering is set, you can drop sysctl_lock just fine. So I'd
> try something like this - use rcu_read_lock() in proc_sys_prune_dcache(),
> drop sysctl_lock() before it and regain after. Make sure that no inodes
> are added to the list ones ->unregistering has been set and use RCU list
> primitives for modifying the inode list, with sysctl_lock still used to
> serialize its modifications.
>
> Freeing struct inode is RCU-delayed (see proc_destroy_inode()), so doing
> igrab() is safe there. Since we don't drop inode reference until after we'd
> passed beyond it in the list, list_for_each_entry_rcu() should be fine.
I agree with Al Viro's analsysis of the situtation.
Fixes: d6cffbbe9a7e ("proc/sysctl: prune stale dentries during unregistering")
Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Tested-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d6cffbbe9a7e51eb705182965a189457c17ba8a3 upstream.
Currently unregistering sysctl table does not prune its dentries.
Stale dentries could slowdown sysctl operations significantly.
For example, command:
# for i in {1..100000} ; do unshare -n -- sysctl -a &> /dev/null ; done
creates a millions of stale denties around sysctls of loopback interface:
# sysctl fs.dentry-state
fs.dentry-state = 25812579 24724135 45 0 0 0
All of them have matching names thus lookup have to scan though whole
hash chain and call d_compare (proc_sys_compare) which checks them
under system-wide spinlock (sysctl_lock).
# time sysctl -a > /dev/null
real 1m12.806s
user 0m0.016s
sys 1m12.400s
Currently only memory reclaimer could remove this garbage.
But without significant memory pressure this never happens.
This patch collects sysctl inodes into list on sysctl table header and
prunes all their dentries once that table unregisters.
Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
> On 10.02.2017 10:47, Al Viro wrote:
>> how about >> the matching stats *after* that patch?
>
> dcache size doesn't grow endlessly, so stats are fine
>
> # sysctl fs.dentry-state
> fs.dentry-state = 92712 58376 45 0 0 0
>
> # time sysctl -a &>/dev/null
>
> real 0m0.013s
> user 0m0.004s
> sys 0m0.008s
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 119e1ef80ecfe0d1deb6378d4ab41f5b71519de1 upstream.
__legitimize_mnt() has two problems - one is that in case of success
the check of mount_lock is not ordered wrt preceding increment of
refcount, making it possible to have successful __legitimize_mnt()
on one CPU just before the otherwise final mntpu() on another,
with __legitimize_mnt() not seeing mntput() taking the lock and
mntput() not seeing the increment done by __legitimize_mnt().
Solved by a pair of barriers.
Another is that failure of __legitimize_mnt() on the second
read_seqretry() leaves us with reference that'll need to be
dropped by caller; however, if that races with final mntput()
we can end up with caller dropping rcu_read_lock() and doing
mntput() to release that reference - with the first mntput()
having freed the damn thing just as rcu_read_lock() had been
dropped. Solution: in "do mntput() yourself" failure case
grab mount_lock, check if MNT_DOOMED has been set by racing
final mntput() that has missed our increment and if it has -
undo the increment and treat that as "failure, caller doesn't
need to drop anything" case.
It's not easy to hit - the final mntput() has to come right
after the first read_seqretry() in __legitimize_mnt() *and*
manage to miss the increment done by __legitimize_mnt() before
the second read_seqretry() in there. The things that are almost
impossible to hit on bare hardware are not impossible on SMP
KVM, though...
Reported-by: Oleg Nesterov <oleg@redhat.com>
Fixes: 48a066e72d97 ("RCU'd vsfmounts")
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9ea0a46ca2c318fcc449c1e6b62a7230a17888f1 upstream.
mntput_no_expire() does the calculation of total refcount under mount_lock;
unfortunately, the decrement (as well as all increments) are done outside
of it, leading to false positives in the "are we dropping the last reference"
test. Consider the following situation:
* mnt is a lazy-umounted mount, kept alive by two opened files. One
of those files gets closed. Total refcount of mnt is 2. On CPU 42
mntput(mnt) (called from __fput()) drops one reference, decrementing component
* After it has looked at component #0, the process on CPU 0 does
mntget(), incrementing component #0, gets preempted and gets to run again -
on CPU 69. There it does mntput(), which drops the reference (component #69)
and proceeds to spin on mount_lock.
* On CPU 42 our first mntput() finishes counting. It observes the
decrement of component #69, but not the increment of component #0. As the
result, the total it gets is not 1 as it should've been - it's 0. At which
point we decide that vfsmount needs to be killed and proceed to free it and
shut the filesystem down. However, there's still another opened file
on that filesystem, with reference to (now freed) vfsmount, etc. and we are
screwed.
It's not a wide race, but it can be reproduced with artificial slowdown of
the mnt_get_count() loop, and it should be easier to hit on SMP KVM setups.
Fix consists of moving the refcount decrement under mount_lock; the tricky
part is that we want (and can) keep the fast case (i.e. mount that still
has non-NULL ->mnt_ns) entirely out of mount_lock. All places that zero
mnt->mnt_ns are dropping some reference to mnt and they call synchronize_rcu()
before that mntput(). IOW, if mntput() observes (under rcu_read_lock())
a non-NULL ->mnt_ns, it is guaranteed that there is another reference yet to
be dropped.
Reported-by: Jann Horn <jannh@google.com>
Tested-by: Jann Horn <jannh@google.com>
Fixes: 48a066e72d97 ("RCU'd vsfmounts")
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4c0d7cd5c8416b1ef41534d19163cb07ffaa03ab upstream.
RCU pathwalk relies upon the assumption that anything that changes
->d_inode of a dentry will invalidate its ->d_seq. That's almost
true - the one exception is that the final dput() of already unhashed
dentry does *not* touch ->d_seq at all. Unhashing does, though,
so for anything we'd found by RCU dcache lookup we are fine.
Unfortunately, we can *start* with an unhashed dentry or jump into
it.
We could try and be careful in the (few) places where that could
happen. Or we could just make the final dput() invalidate the damn
thing, unhashed or not. The latter is much simpler and easier to
backport, so let's do it that way.
Reported-by: "Dae R. Jeong" <threeearcat@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 90bad5e05bcdb0308cfa3d3a60f5c0b9c8e2efb3 upstream.
Since mountpoint crossing can happen without leaving lazy mode,
root dentries do need the same protection against having their
memory freed without RCU delay as everything else in the tree.
It's partially hidden by RCU delay between detaching from the
mount tree and dropping the vfsmount reference, but the starting
point of pathwalk can be on an already detached mount, in which
case umount-caused RCU delay has already passed by the time the
lazy pathwalk grabs rcu_read_lock(). If the starting point
happens to be at the root of that vfsmount *and* that vfsmount
covers the entire filesystem, we get trouble.
Fixes: 48a066e72d97 ("RCU'd vsfmounts")
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b5b1404d0815894de0690de8a1ab58269e56eae6 upstream.
This is purely a preparatory patch for upcoming changes during the 4.19
merge window.
We have a function called "boot_cpu_state_init()" that isn't really
about the bootup cpu state: that is done much earlier by the similarly
named "boot_cpu_init()" (note lack of "state" in name).
This function initializes some hotplug CPU state, and needs to run after
the percpu data has been properly initialized. It even has a comment to
that effect.
Except it _doesn't_ actually run after the percpu data has been properly
initialized. On x86 it happens to do that, but on at least arm and
arm64, the percpu base pointers are initialized by the arch-specific
'smp_prepare_boot_cpu()' hook, which ran _after_ boot_cpu_state_init().
This had some unexpected results, and in particular we have a patch
pending for the merge window that did the obvious cleanup of using
'this_cpu_write()' in the cpu hotplug init code:
- per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE;
+ this_cpu_write(cpuhp_state.state, CPUHP_ONLINE);
which is obviously the right thing to do. Except because of the
ordering issue, it actually failed miserably and unexpectedly on arm64.
So this just fixes the ordering, and changes the name of the function to
be 'boot_cpu_hotplug_init()' to make it obvious that it's about cpu
hotplug state, because the core CPU state was supposed to have already
been done earlier.
Marked for stable, since the (not yet merged) patch that will show this
problem is marked for stable.
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Mian Yousaf Kaukab <yousaf.kaukab@suse.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
enabled
commit 1214fd7b497400d200e3f4e64e2338b303a20949 upstream.
Surround scsi_execute() calls with scsi_autopm_get_device() and
scsi_autopm_put_device(). Note: removing sr_mutex protection from the
scsi_cd_get() and scsi_cd_put() calls is safe because the purpose of
sr_mutex is to serialize cdrom_*() calls.
This patch avoids that complaints similar to the following appear in the
kernel log if runtime power management is enabled:
INFO: task systemd-udevd:650 blocked for more than 120 seconds.
Not tainted 4.18.0-rc7-dbg+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
systemd-udevd D28176 650 513 0x00000104
Call Trace:
__schedule+0x444/0xfe0
schedule+0x4e/0xe0
schedule_preempt_disabled+0x18/0x30
__mutex_lock+0x41c/0xc70
mutex_lock_nested+0x1b/0x20
__blkdev_get+0x106/0x970
blkdev_get+0x22c/0x5a0
blkdev_open+0xe9/0x100
do_dentry_open.isra.19+0x33e/0x570
vfs_open+0x7c/0xd0
path_openat+0x6e3/0x1120
do_filp_open+0x11c/0x1c0
do_sys_open+0x208/0x2d0
__x64_sys_openat+0x59/0x70
do_syscall_64+0x77/0x230
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Maurizio Lombardi <mlombard@redhat.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: <stable@vger.kernel.org>
Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit fdcb613d49321b5bf5d5a1bd0fba8e7c241dcc70 upstream.
The LPSS PWM device on on Bay Trail and Cherry Trail devices has a set
of private registers at offset 0x800, the current lpss_device_desc for
them already sets the LPSS_SAVE_CTX flag to have these saved/restored
over device-suspend, but the current lpss_device_desc was not setting
the prv_offset field, leading to the regular device registers getting
saved/restored instead.
This is causing the PWM controller to no longer work, resulting in a black
screen, after a suspend/resume on systems where the firmware clears the
APB clock and reset bits at offset 0x804.
This commit fixes this by properly setting prv_offset to 0x800 for
the PWM devices.
Cc: stable@vger.kernel.org
Fixes: e1c748179754 ("ACPI / LPSS: Add Intel BayTrail ACPI mode PWM")
Fixes: 1bfbd8eb8a7f ("ACPI / LPSS: Add ACPI IDs for Intel Braswell")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Rafael J . Wysocki <rjw@rjwysocki.net>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d472b3a6cf63cd31cae1ed61930f07e6cd6671b5 upstream.
skb_shinfo() can change when calling __pskb_pull_tail(): Don't cache
its return value.
Cc: stable@vger.kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3c53776e29f81719efcf8f7a6e30cdf753bee94d upstream.
Way back in 4.9, we committed 4cd13c21b207 ("softirq: Let ksoftirqd do
its job"), and ever since we've had small nagging issues with it. For
example, we've had:
1ff688209e2e ("watchdog: core: make sure the watchdog_worker is not deferred")
8d5755b3f77b ("watchdog: softdog: fire watchdog even if softirqs do not get to run")
217f69743681 ("net: busy-poll: allow preemption in sk_busy_loop()")
all of which worked around some of the effects of that commit.
The DVB people have also complained that the commit causes excessive USB
URB latencies, which seems to be due to the USB code using tasklets to
schedule USB traffic. This seems to be an issue mainly when already
living on the edge, but waiting for ksoftirqd to handle it really does
seem to cause excessive latencies.
Now Hanna Hawa reports that this issue isn't just limited to USB URB and
DVB, but also causes timeout problems for the Marvell SoC team:
"I'm facing kernel panic issue while running raid 5 on sata disks
connected to Macchiatobin (Marvell community board with Armada-8040
SoC with 4 ARMv8 cores of CA72) Raid 5 built with Marvell DMA engine
and async_tx mechanism (ASYNC_TX_DMA [=y]); the DMA driver (mv_xor_v2)
uses a tasklet to clean the done descriptors from the queue"
The latency problem causes a panic:
mv_xor_v2 f0400000.xor: dma_sync_wait: timeout!
Kernel panic - not syncing: async_tx_quiesce: DMA error waiting for transaction
We've discussed simply just reverting the original commit entirely, and
also much more involved solutions (with per-softirq threads etc). This
patch is intentionally stupid and fairly limited, because the issue
still remains, and the other solutions either got sidetracked or had
other issues.
We should probably also consider the timer softirqs to be synchronous
and not be delayed to ksoftirqd (since they were the issue with the
earlier watchdog problems), but that should be done as a separate patch.
This does only the tasklet cases.
Reported-and-tested-by: Hanna Hawa <hannah@marvell.com>
Reported-and-tested-by: Josef Griebichler <griebichler.josef@gmx.at>
Reported-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 12c8f25a016dff69ee284aa3338bebfd2cfcba33 upstream.
KASAN uses the __no_sanitize_address macro to disable instrumentation of
particular functions. Right now it's defined only for GCC build, which
causes false positives when clang is used.
This patch adds a definition for clang.
Note, that clang's revision 329612 or higher is required.
[andreyknvl@google.com: remove redundant #ifdef CONFIG_KASAN check]
Link: http://lkml.kernel.org/r/c79aa31a2a2790f6131ed607c58b0dd45dd62a6c.1523967959.git.andreyknvl@google.com
Link: http://lkml.kernel.org/r/4ad725cc903f8534f8c8a60f0daade5e3d674f8d.1523554166.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Paul Lawrence <paullawrence@google.com>
Cc: Sandipan Das <sandipan@linux.vnet.ibm.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Sodagudi Prasad <psodagud@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit fedb8da96355f5f64353625bf96dc69423ad1826 upstream.
For years I thought all parisc machines executed loads and stores in
order. However, Jeff Law recently indicated on gcc-patches that this is
not correct. There are various degrees of out-of-order execution all the
way back to the PA7xxx processor series (hit-under-miss). The PA8xxx
series has full out-of-order execution for both integer operations, and
loads and stores.
This is described in the following article:
http://web.archive.org/web/20040214092531/http://www.cpus.hp.com/technical_references/advperf.shtml
For this reason, we need to define mb() and to insert a memory barrier
before the store unlocking spinlocks. This ensures that all memory
accesses are complete prior to unlocking. The ldcw instruction performs
the same function on entry.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 66509a276c8c1d19ee3f661a41b418d101c57d29 upstream.
Enable the -mlong-calls compiler option by default, because otherwise in most
cases linking the vmlinux binary fails due to truncations of R_PARISC_PCREL22F
relocations. This fixes building the 64-bit defconfig.
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3ab2011ea368ec3433ad49e1b9e1c7b70d2e65df upstream.
There is a race condition in tpm_common_write function allowing
two threads on the same /dev/tpm<N>, or two different applications
on the same /dev/tpmrm<N> to overwrite each other commands/responses.
Fixed this by taking the priv->buffer_mutex early in the function.
Also converted the priv->data_pending from atomic to a regular size_t
type. There is no need for it to be atomic since it is only touched
under the protection of the priv->buffer_mutex.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5012284700775a4e6e3fbe7eac4c543c4874b559 upstream.
Commit 8844618d8aa7: "ext4: only look at the bg_flags field if it is
valid" will complain if block group zero does not have the
EXT4_BG_INODE_ZEROED flag set. Unfortunately, this is not correct,
since a freshly created file system has this flag cleared. It gets
almost immediately after the file system is mounted read-write --- but
the following somewhat unlikely sequence will end up triggering a
false positive report of a corrupted file system:
mkfs.ext4 /dev/vdc
mount -o ro /dev/vdc /vdc
mount -o remount,rw /dev/vdc
Instead, when initializing the inode table for block group zero, test
to make sure that itable_unused count is not too large, since that is
the case that will result in some or all of the reserved inodes
getting cleared.
This fixes the failures reported by Eric Whiteney when running
generic/230 and generic/231 in the the nojournal test case.
Fixes: 8844618d8aa7 ("ext4: only look at the bg_flags field if it is valid")
Reported-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
commit 92d34134193e5b129dc24f8d79cb9196626e8d7a upstream.
The code is assuming the buffer is max_size length, but we weren't
allocating enough space for it.
Signed-off-by: Shankara Pailoor <shankarapailoor@gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b697d7d8c741f27b728a878fc55852b06d0f6f5e upstream.
The __get_txreq() function can return a pointer, ERR_PTR(-EBUSY), or NULL.
All of the relevant call sites look for IS_ERR, so the NULL return would
lead to a NULL pointer exception.
Do not use the ERR_PTR mechanism for this function.
Update all call sites to handle the return value correctly.
Clean up error paths to reflect return value.
Fixes: 45842abbb292 ("staging/rdma/hfi1: move txreq header code")
Cc: <stable@vger.kernel.org> # 4.9.x+
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e01e80634ecdde1dd113ac43b3adad21b47f3957 upstream.
One of the classes of kernel stack content leaks[1] is exposing the
contents of prior heap or stack contents when a new process stack is
allocated. Normally, those stacks are not zeroed, and the old contents
remain in place. In the face of stack content exposure flaws, those
contents can leak to userspace.
Fixing this will make the kernel no longer vulnerable to these flaws, as
the stack will be wiped each time a stack is assigned to a new process.
There's not a meaningful change in runtime performance; it almost looks
like it provides a benefit.
Performing back-to-back kernel builds before:
Run times: 157.86 157.09 158.90 160.94 160.80
Mean: 159.12
Std Dev: 1.54
and after:
Run times: 159.31 157.34 156.71 158.15 160.81
Mean: 158.46
Std Dev: 1.46
Instead of making this a build or runtime config, Andy Lutomirski
recommended this just be enabled by default.
[1] A noisy search for many kinds of stack content leaks can be seen here:
https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=linux+kernel+stack+leak
I did some more with perf and cycle counts on running 100,000 execs of
/bin/true.
before:
Cycles: 218858861551 218853036130 214727610969 227656844122 224980542841
Mean: 221015379122.60
Std Dev: 4662486552.47
after:
Cycles: 213868945060 213119275204 211820169456 224426673259 225489986348
Mean: 217745009865.40
Std Dev: 5935559279.99
It continues to look like it's faster, though the deviation is rather
wide, but I'm not sure what I could do that would be less noisy. I'm
open to ideas!
Link: http://lkml.kernel.org/r/20180221021659.GA37073@beast
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ Srivatsa: Backported to 4.9.y ]
Signed-off-by: Srivatsa S. Bhat <srivatsa@csail.mit.edu>
Reviewed-by: Srinidhi Rao <srinidhir@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ca182551857cc2c1e6a2b7f1e72090a137a15008 upstream.
Kmemleak considers any pointers on task stacks as references. This
patch clears newly allocated and reused vmap stacks.
Link: http://lkml.kernel.org/r/150728990124.744199.8403409836394318684.stgit@buzz
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ Srivatsa: Backported to 4.9.y ]
Signed-off-by: Srivatsa S. Bhat <srivatsa@csail.mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 58152ecbbcc6a0ce7fddd5bf5f6ee535834ece0c upstream.
In case skb in out_or_order_queue is the result of
multiple skbs coalescing, we would like to get a proper gso_segs
counter tracking, so that future tcp_drop() can report an accurate
number.
I chose to not implement this tracking for skbs in receive queue,
since they are not dropped, unless socket is disconnected.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit bd3599a0e142cd73edd3b6801068ac3f48ac771a upstream.
When we clone a range into a file we can end up dropping existing
extent maps (or trimming them) and replacing them with new ones if the
range to be cloned overlaps with a range in the destination inode.
When that happens we add the new extent maps to the list of modified
extents in the inode's extent map tree, so that a "fast" fsync (the flag
BTRFS_INODE_NEEDS_FULL_SYNC not set in the inode) will see the extent maps
and log corresponding extent items. However, at the end of range cloning
operation we do truncate all the pages in the affected range (in order to
ensure future reads will not get stale data). Sometimes this truncation
will release the corresponding extent maps besides the pages from the page
cache. If this happens, then a "fast" fsync operation will miss logging
some extent items, because it relies exclusively on the extent maps being
present in the inode's extent tree, leading to data loss/corruption if
the fsync ends up using the same transaction used by the clone operation
(that transaction was not committed in the meanwhile). An extent map is
released through the callback btrfs_invalidatepage(), which gets called by
truncate_inode_pages_range(), and it calls __btrfs_releasepage(). The
later ends up calling try_release_extent_mapping() which will release the
extent map if some conditions are met, like the file size being greater
than 16Mb, gfp flags allow blocking and the range not being locked (which
is the case during the clone operation) nor being the extent map flagged
as pinned (also the case for cloning).
The following example, turned into a test for fstests, reproduces the
issue:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ xfs_io -f -c "pwrite -S 0x18 9000K 6908K" /mnt/foo
$ xfs_io -f -c "pwrite -S 0x20 2572K 156K" /mnt/bar
$ xfs_io -c "fsync" /mnt/bar
# reflink destination offset corresponds to the size of file bar,
# 2728Kb minus 4Kb.
$ xfs_io -c ""reflink ${SCRATCH_MNT}/foo 0 2724K 15908K" /mnt/bar
$ xfs_io -c "fsync" /mnt/bar
$ md5sum /mnt/bar
95a95813a8c2abc9aa75a6c2914a077e /mnt/bar
<power fail>
$ mount /dev/sdb /mnt
$ md5sum /mnt/bar
207fd8d0b161be8a84b945f0df8d5f8d /mnt/bar
# digest should be 95a95813a8c2abc9aa75a6c2914a077e like before the
# power failure
In the above example, the destination offset of the clone operation
corresponds to the size of the "bar" file minus 4Kb. So during the clone
operation, the extent map covering the range from 2572Kb to 2728Kb gets
trimmed so that it ends at offset 2724Kb, and a new extent map covering
the range from 2724Kb to 11724Kb is created. So at the end of the clone
operation when we ask to truncate the pages in the range from 2724Kb to
2724Kb + 15908Kb, the page invalidation callback ends up removing the new
extent map (through try_release_extent_mapping()) when the page at offset
2724Kb is passed to that callback.
Fix this by setting the bit BTRFS_INODE_NEEDS_FULL_SYNC whenever an extent
map is removed at try_release_extent_mapping(), forcing the next fsync to
search for modified extents in the fs/subvolume tree instead of relying on
the presence of extent maps in memory. This way we can continue doing a
"fast" fsync if the destination range of a clone operation does not
overlap with an existing range or if any of the criteria necessary to
remove an extent map at try_release_extent_mapping() is not met (file
size not bigger then 16Mb or gfp flags do not allow blocking).
CC: stable@vger.kernel.org # 3.16+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9f9e3e0d4dd3338b3f3dde080789f71901e1e4ff upstream.
Make sure to call reinit_completion() before dma is started to avoid race
condition where reinit_completion() is called after complete() and before
wait_for_completion_timeout().
Signed-off-by: Esben Haabendal <eha@deif.com>
Fixes: ce1a78840ff7 ("i2c: imx: add DMA support for freescale i2c driver")
Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 73c8d8945505acdcbae137c2e00a1232e0be709f upstream.
Maintain the tracing on/off setting of the ring_buffer when switching
to the trace buffer snapshot.
Taking a snapshot is done by swapping the backup ring buffer
(max_tr_buffer). But since the tracing on/off setting is defined
by the ring buffer, when swapping it, the tracing on/off setting
can also be changed. This causes a strange result like below:
/sys/kernel/debug/tracing # cat tracing_on
1
/sys/kernel/debug/tracing # echo 0 > tracing_on
/sys/kernel/debug/tracing # cat tracing_on
0
/sys/kernel/debug/tracing # echo 1 > snapshot
/sys/kernel/debug/tracing # cat tracing_on
1
/sys/kernel/debug/tracing # echo 1 > snapshot
/sys/kernel/debug/tracing # cat tracing_on
0
We don't touch tracing_on, but snapshot changes tracing_on
setting each time. This is an anomaly, because user doesn't know
that each "ring_buffer" stores its own tracing-enable state and
the snapshot is done by swapping ring buffers.
Link: http://lkml.kernel.org/r/153149929558.11274.11730609978254724394.stgit@devbox
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Hiraku Toyooka <hiraku.toyooka@cybertrust.co.jp>
Cc: stable@vger.kernel.org
Fixes: debdd57f5145 ("tracing: Make a snapshot feature available from userspace")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
[ Updated commit log and comment in the code ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a0040c0145945d3bd203df8fa97f6dfa819f3f7d upstream.
Hyper-V instances support PCI pass-through which is implemented through PV
pci-hyperv driver. When a device is passed through, a new root PCI bus is
created in the guest. The bus sits on top of VMBus and has no associated
information in ACPI. acpi_pci_add_bus() in this case proceeds all the way
to acpi_evaluate_dsm(), which reports
ACPI: \: failed to evaluate _DSM (0x1001)
While acpi_pci_slot_enumerate() and acpiphp_enumerate_slots() are protected
against ACPI_HANDLE() being NULL and do nothing, acpi_evaluate_dsm() is not
and gives us the error. It seems the correct fix is to not do anything in
acpi_pci_add_bus() in such cases.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Sinan Kaya <okaya@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 44de022c4382541cebdd6de4465d1f4f465ff1dd upstream.
Ext4_check_descriptors() was getting called before s_gdb_count was
initialized. So for file systems w/o the meta_bg feature, allocation
bitmaps could overlap the block group descriptors and ext4 wouldn't
notice.
For file systems with the meta_bg feature enabled, there was a
fencepost error which would cause the ext4_check_descriptors() to
incorrectly believe that the block allocation bitmap overlaps with the
block group descriptor blocks, and it would reject the mount.
Fix both of these problems.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Gilbert <bgilbert@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 91874ecf32e41b5d86a4cb9d60e0bee50d828058 upstream.
It's legal to have 64 groups for netlink_sock.
As user-supplied nladdr->nl_groups is __u32, it's possible to subscribe
only to first 32 groups.
The check for correctness of .bind() userspace supplied parameter
is done by applying mask made from ngroups shift. Which broke Android
as they have 64 groups and the shift for mask resulted in an overflow.
Fixes: 61f4b23769f0 ("netlink: Don't shift with UB on nlk->ngroups")
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: netdev@vger.kernel.org
Cc: stable@vger.kernel.org
Reported-and-Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 61f4b23769f0cc72ae62c9a81cf08f0397d40da8 ]
On i386 nlk->ngroups might be 32 or 0. Which leads to UB, resulting in
hang during boot.
Check for 0 ngroups and use (unsigned long long) as a type to shift.
Fixes: 7acf9d4237c4 ("netlink: Do not subscribe to non-existent groups").
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 7acf9d4237c46894e0fa0492dd96314a41742e84 ]
Make ABI more strict about subscribing to group > ngroups.
Code doesn't check for that and it looks bogus.
(one can subscribe to non-existing group)
Still, it's possible to bind() to all possible groups with (-1)
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 80d20d35af1edd632a5e7a3b9c0ab7ceff92769e upstream.
local_timer_softirq_pending() checks whether the timer softirq is
pending with: local_softirq_pending() & TIMER_SOFTIRQ.
This is wrong because TIMER_SOFTIRQ is the softirq number and not a
bitmask. So the test checks for the wrong bit.
Use BIT(TIMER_SOFTIRQ) instead.
Fixes: 5d62c183f9e9 ("nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()")
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Cc: bigeasy@linutronix.de
Cc: peterz@infradead.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180731161358.29472-1-anna-maria@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d1f0301b3333eef5efbfa1fe0f0edbea01863d5d upstream.
The support of force threading interrupts which are set up with both a
primary and a threaded handler wreckaged the setup of regular requested
threaded interrupts (primary handler == NULL).
The reason is that it does not check whether the primary handler is set to
the default handler which wakes the handler thread. Instead it replaces the
thread handler with the primary handler as it would do with force threaded
interrupts which have been requested via request_irq(). So both the primary
and the thread handler become the same which then triggers the warnon that
the thread handler tries to wakeup a not configured secondary thread.
Fortunately this only happens when the driver omits the IRQF_ONESHOT flag
when requesting the threaded interrupt, which is normaly caught by the
sanity checks when force irq threading is disabled.
Fix it by skipping the force threading setup when a regular threaded
interrupt is requested. As a consequence the interrupt request which lacks
the IRQ_ONESHOT flag is rejected correctly instead of silently wreckaging
it.
Fixes: 2a1d3ab8986d ("genirq: Handle force threading of irqs with primary and thread handler")
Reported-by: Kurt Kanzenbach <kurt.kanzenbach@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Kurt Kanzenbach <kurt.kanzenbach@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b4146c4929ef61d5afca011474d59d0918a0cd82 upstream.
Propagate the task management completion status properly to avoid
unnecessary waits for commands to complete.
Fixes: faef62d13463 ("[SCSI] qla2xxx: Fix Task Management command asynchronous handling")
Cc: <stable@vger.kernel.org>
Signed-off-by: Anil Gurumurthy <anil.gurumurthy@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b08abbd9f5996309f021684f9ca74da30dcca36a upstream.
During unload process, the chip can encounter problem where a FW dump would
be captured. For this case, the full reset sequence will be skip to bring
the chip back to full operational state.
Fixes: e315cd28b9ef ("[SCSI] qla2xxx: Code changes for qla data structure refactoring")
Cc: <stable@vger.kernel.org>
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
commit c170e5a8d222537e98aa8d4fddb667ff7a2ee114 upstream.
Fix a minor memory leak when there is an error opening a /dev/sg device.
Fixes: cc833acbee9d ("sg: O_EXCL and other lock handling")
Cc: <stable@vger.kernel.org>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a6a00918d4ad8718c3ccde38c02cec17f116b2fd upstream.
This is needed to ensure ->is_unity is correct when the plane was
previously configured to output a multi-planar format with scaling
enabled, and is then being reconfigured to output a uniplanar format.
Fixes: fc04023fafec ("drm/vc4: Add support for YUV planes.")
Cc: <stable@vger.kernel.org>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20180724133601.32114-1-boris.brezillon@bootlin.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 46d8c4b28652d35dc6cfb5adf7f54e102fc04384 upstream.
This was detected by the self-test thanks to Ard's chunking patch.
I finally got around to testing this out on my ancient Via box. It
turns out that the workaround got the assembly wrong and we end up
doing count + initial cycles of the loop instead of just count.
This obviously causes corruption, either by overwriting the source
that is yet to be processed, or writing over the end of the buffer.
On CPUs that don't require the workaround only ECB is affected.
On Nano CPUs both ECB and CBC are affected.
This patch fixes it by doing the subtraction prior to the assembly.
Fixes: a76c1c23d0c3 ("crypto: padlock-aes - work around Nano CPU...")
Cc: <stable@vger.kernel.org>
Reported-by: Jamie Heilman <jamie@audible.transient.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 63aff65573d73eb8dda4732ad4ef222dd35e4862 upstream.
VPID for the nested vcpu is allocated at vmx_create_vcpu whenever nested
vmx is turned on with the module parameter.
However, it's only freed if the L1 guest has executed VMXON which is not
a given.
As a result, on a system with nested==on every creation+deletion of an
L1 vcpu without running an L2 guest results in leaking one vpid. Since
the total number of vpids is limited to 64k, they can eventually get
exhausted, preventing L2 from starting.
Delay allocation of the L2 vpid until VMXON emulation, thus matching its
freeing.
Fixes: 5c614b3583e7b6dab0c86356fa36c2bcbb8322a0
Cc: stable@vger.kernel.org
Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 89da619bc18d79bca5304724c11d4ba3b67ce2c6 upstream.
Kernel panic when with high memory pressure, calltrace looks like,
PID: 21439 TASK: ffff881be3afedd0 CPU: 16 COMMAND: "java"
#0 [ffff881ec7ed7630] machine_kexec at ffffffff81059beb
#1 [ffff881ec7ed7690] __crash_kexec at ffffffff81105942
#2 [ffff881ec7ed7760] crash_kexec at ffffffff81105a30
#3 [ffff881ec7ed7778] oops_end at ffffffff816902c8
#4 [ffff881ec7ed77a0] no_context at ffffffff8167ff46
#5 [ffff881ec7ed77f0] __bad_area_nosemaphore at ffffffff8167ffdc
#6 [ffff881ec7ed7838] __node_set at ffffffff81680300
#7 [ffff881ec7ed7860] __do_page_fault at ffffffff8169320f
#8 [ffff881ec7ed78c0] do_page_fault at ffffffff816932b5
#9 [ffff881ec7ed78f0] page_fault at ffffffff8168f4c8
[exception RIP: _raw_spin_lock_irqsave+47]
RIP: ffffffff8168edef RSP: ffff881ec7ed79a8 RFLAGS: 00010046
RAX: 0000000000000246 RBX: ffffea0019740d00 RCX: ffff881ec7ed7fd8
RDX: 0000000000020000 RSI: 0000000000000016 RDI: 0000000000000008
RBP: ffff881ec7ed79a8 R8: 0000000000000246 R9: 000000000001a098
R10: ffff88107ffda000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000008 R14: ffff881ec7ed7a80 R15: ffff881be3afedd0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
It happens in the pagefault and results in double pagefault
during compacting pages when memory allocation fails.
Analysed the vmcore, the page leads to second pagefault is corrupted
with _mapcount=-256, but private=0.
It's caused by the race between migration and ballooning, and lock
missing in virtballoon_migratepage() of virtio_balloon driver.
This patch fix the bug.
Fixes: e22504296d4f64f ("virtio_balloon: introduce migration primitives to balloon pages")
Cc: stable@vger.kernel.org
Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn>
Signed-off-by: Huang Chong <huang.chong@zte.com.cn>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c8e8cd579bb4265651df8223730105341e61a2d1 upstream.
'call' is a user-controlled value, so sanitize the array index after the
bounds check to avoid speculating past the bounds of the 'nargs' array.
Found with the help of Smatch:
net/socket.c:2508 __do_sys_socketcall() warn: potential spectre issue
'nargs' [r] (local cap)
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jeremy Cline <jcline@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 72c05f32f4a5055c9c8fe889bb6903ec959c0aad upstream.
ems_usb_probe() allocates memory for dev->tx_msg_buffer, but there
is no its deallocation in ems_usb_disconnect().
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Anton Vasilyev <vasilyev@ispras.ru>
Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 71755ee5350b63fb1f283de8561cdb61b47f4d1d upstream.
The squashfs fragment reading code doesn't actually verify that the
fragment is inside the fragment table. The end result _is_ verified to
be inside the image when actually reading the fragment data, but before
that is done, we may end up taking a page fault because the fragment
table itself might not even exist.
Another report from Anatoly and his endless squashfs image fuzzing.
Reported-by: Анатолий Тросиненко <anatoly.trosinenko@gmail.com>
Acked-by:: Phillip Lougher <phillip.lougher@gmail.com>,
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d512584780d3e6a7cacb2f482834849453d444a1 upstream.
Anatoly reports another squashfs fuzzing issue, where the decompression
parameters themselves are in a compressed block.
This causes squashfs_read_data() to be called in order to read the
decompression options before the decompression stream having been set
up, making squashfs go sideways.
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Acked-by: Phillip Lougher <phillip.lougher@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit b7d0f08e9129c45ed41bc0cfa8e77067881e45fd ]
WoL won't work in PCI-based setups because we are not saving the PCI EP
state before entering suspend state and not allowing D3 wake.
Fix this by using a wrapper around stmmac_{suspend/resume} which
correctly sets the PCI EP state.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit bc5b6c0b62b932626a135f516a41838c510c6eba ]
'protocol' is a user-controlled value, so sanitize it after the bounds
check to avoid using it for speculative out-of-bounds access to arrays
indexed by it.
This addresses the following accesses detected with the help of smatch:
* net/netlink/af_netlink.c:654 __netlink_create() warn: potential
spectre issue 'nlk_cb_mutex_keys' [w]
* net/netlink/af_netlink.c:654 __netlink_create() warn: potential
spectre issue 'nlk_cb_mutex_key_strings' [w]
* net/netlink/af_netlink.c:685 netlink_create() warn: potential spectre
issue 'nl_table' [w] (local cap)
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jeremy Cline <jcline@redhat.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit a94c689e6c9e72e722f28339e12dff191ee5a265 ]
If a DSA slave network device was previously disabled, there is no need
to suspend or resume it.
Fixes: 2446254915a7 ("net: dsa: allow switch drivers to implement suspend/resume hooks")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|