Age | Commit message (Collapse) | Author |
|
From: Jann Horn <jannh@google.com>
[ Upstream commit 0c17d1d2c61936401f4702e1846e2c19b200f958 ]
Properly handle register truncation to a smaller size.
The old code first mirrors the clearing of the high 32 bits in the bitwise
tristate representation, which is correct. But then, it computes the new
arithmetic bounds as the intersection between the old arithmetic bounds and
the bounds resulting from the bitwise tristate representation. Therefore,
when coerce_reg_to_32() is called on a number with bounds
[0xffff'fff8, 0x1'0000'0007], the verifier computes
[0xffff'fff8, 0xffff'ffff] as bounds of the truncated number.
This is incorrect: The truncated number could also be in the range [0, 7],
and no meaningful arithmetic bounds can be computed in that case apart from
the obvious [0, 0xffff'ffff].
Starting with v4.14, this is exploitable by unprivileged users as long as
the unprivileged_bpf_disabled sysctl isn't set.
Debian assigned CVE-2017-16996 for this issue.
v2:
- flip the mask during arithmetic bounds calculation (Ben Hutchings)
v3:
- add CVE number (Ben Hutchings)
Fixes: b03c9f9fdc37 ("bpf/verifier: track signed and unsigned min/max values")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
From: Jann Horn <jannh@google.com>
[ Upstream commit 95a762e2c8c942780948091f8f2a4f32fce1ac6f ]
Distinguish between
BPF_ALU64|BPF_MOV|BPF_K (load 32-bit immediate, sign-extended to 64-bit)
and BPF_ALU|BPF_MOV|BPF_K (load 32-bit immediate, zero-padded to 64-bit);
only perform sign extension in the first case.
Starting with v4.14, this is exploitable by unprivileged users as long as
the unprivileged_bpf_disabled sysctl isn't set.
Debian assigned CVE-2017-16995 for this issue.
v3:
- add CVE number (Ben Hutchings)
Fixes: 484611357c19 ("bpf: allow access into map value arrays")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
From: Edward Cree <ecree@solarflare.com>
[ Upstream commit 4374f256ce8182019353c0c639bb8d0695b4c941 ]
Incorrect signed bounds were being computed.
If the old upper signed bound was positive and the old lower signed bound was
negative, this could cause the new upper signed bound to be too low,
leading to security issues.
Fixes: b03c9f9fdc37 ("bpf/verifier: track signed and unsigned min/max values")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
[jannh@google.com: changed description to reflect bug impact]
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 283ca526a9bd75aed7350220d7b1f8027d99c3fd ]
When tracing and networking programs are both attached in the
system and both use event-output helpers that eventually call
into perf_event_output(), then we could end up in a situation
where the tracing attached program runs in user context while
a cls_bpf program is triggered on that same CPU out of softirq
context.
Since both rely on the same per-cpu perf_sample_data, we could
potentially corrupt it. This can only ever happen in a combination
of the two types; all tracing programs use a bpf_prog_active
counter to bail out in case a program is already running on
that CPU out of a different context. XDP and cls_bpf programs
by themselves don't have this issue as they run in the same
context only. Therefore, split both perf_sample_data so they
cannot be accessed from each other.
Fixes: 20b9d7ac4852 ("bpf: avoid excessive stack usage for perf_sample_data")
Reported-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Song Liu <songliubraving@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
From: Alexei Starovoitov <ast@fb.com>
[ Upstream commit c131187db2d3fa2f8bf32fdf4e9a4ef805168467 ]
when the verifier detects that register contains a runtime constant
and it's compared with another constant it will prune exploration
of the branch that is guaranteed not to be taken at runtime.
This is all correct, but malicious program may be constructed
in such a way that it always has a constant comparison and
the other branch is never taken under any conditions.
In this case such path through the program will not be explored
by the verifier. It won't be taken at run-time either, but since
all instructions are JITed the malicious program may cause JITs
to complain about using reserved fields, etc.
To fix the issue we have to track the instructions explored by
the verifier and sanitize instructions that are dead at run time
with NOPs. We cannot reject such dead code, since llvm generates
it for valid C code, since it doesn't do as much data flow
analysis as the verifier does.
Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit a15f7fc20389a8827d5859907568b201234d4b79 ]
There are a small number of 'generic fields' (comm/COMM/cpu/CPU) that
are found by trace_find_event_field() but are only meant for
filtering. Specifically, they unlike normal fields, they have a size
of 0 and thus wreak havoc when used as a histogram key.
Exclude these (return -EINVAL) when used as histogram keys.
Link: http://lkml.kernel.org/r/956154cbc3e8a4f0633d619b886c97f0f0edf7b4.1506105045.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3382290ed2d5e275429cef510ab21889d3ccd164 upstream.
[ Note, this is a Git cherry-pick of the following commit:
506458efaf15 ("locking/barriers: Convert users of lockless_dereference() to READ_ONCE()")
... for easier x86 PTI code testing and back-porting. ]
READ_ONCE() now has an implicit smp_read_barrier_depends() call, so it
can be used instead of lockless_dereference() without any change in
semantics.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1508840570-22169-4-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 95b982b45122c57da2ee0b46cce70775e1d987af ]
Problem: This flag does not get cleared currently in the suspend or
resume path in the following cases:
* In case some driver's suspend routine returns an error.
* Successful s2idle case
* etc?
Why is this a problem: What happens is that the next suspend attempt
could fail even though the user did not enable the flag by writing to
/sys/power/wakeup_count. This is 1 use case how the issue can be seen
(but similar use case with driver suspend failure can be thought of):
1. Read /sys/power/wakeup_count
2. echo count > /sys/power/wakeup_count
3. echo freeze > /sys/power/wakeup_count
4. Let the system suspend, and wakeup the system using some wake source
that calls pm_wakeup_event() e.g. power button or something.
5. Note that the combined wakeup count would be incremented due
to the pm_wakeup_event() in the resume path.
6. After resuming the events_check_enabled flag is still set.
At this point if the user attempts to freeze again (without writing to
/sys/power/wakeup_count), the suspend would fail even though there has
been no wake event since the past resume.
Address that by clearing the flag just before a resume is completed,
so that it is always cleared for the corner cases mentioned above.
Signed-off-by: Rajat Jain <rajatja@google.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit cef31d9af908243421258f1df35a4a644604efbe upstream.
timer_create() specifies via sigevent->sigev_notify the signal delivery for
the new timer. The valid modes are SIGEV_NONE, SIGEV_SIGNAL, SIGEV_THREAD
and (SIGEV_SIGNAL | SIGEV_THREAD_ID).
The sanity check in good_sigevent() is only checking the valid combination
for the SIGEV_THREAD_ID bit, i.e. SIGEV_SIGNAL, but if SIGEV_THREAD_ID is
not set it accepts any random value.
This has no real effects on the posix timer and signal delivery code, but
it affects show_timer() which handles the output of /proc/$PID/timers. That
function uses a string array to pretty print sigev_notify. The access to
that array has no bound checks, so random sigev_notify cause access beyond
the array bounds.
Add proper checks for the valid notify modes and remove the SIGEV_THREAD_ID
masking from various code pathes as SIGEV_NONE can never be set in
combination with SIGEV_THREAD_ID.
Reported-by: Eric Biggers <ebiggers3@gmail.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f73c52a5bcd1710994e53fbccc378c42b97a06b6 upstream.
Daniel Wagner reported a crash on the BeagleBone Black SoC.
This is a single CPU architecture, and does not have a functional
arch_send_call_function_single_ipi() implementation which can crash
the kernel if that is called.
As it only has one CPU, it shouldn't be called, but if the kernel is
compiled for SMP, the push/pull RT scheduling logic now calls it for
irq_work if the one CPU is overloaded, it can use that function to call
itself and crash the kernel.
Ideally, we should disable the SCHED_FEAT(RT_PUSH_IPI) if the system
only has a single CPU. But SCHED_FEAT is a constant if sched debugging
is turned off. Another fix can also be used, and this should also help
with normal SMP machines. That is, do not initiate the pull code if
there's only one RT overloaded CPU, and that CPU happens to be the
current CPU that is scheduling in a lower priority task.
Even on a system with many CPUs, if there's many RT tasks waiting to
run on a single CPU, and that CPU schedules in another RT task of lower
priority, it will initiate the PULL logic in case there's a higher
priority RT task on another CPU that is waiting to run. But if there is
no other CPU with waiting RT tasks, it will initiate the RT pull logic
on itself (as it still has RT tasks waiting to run). This is a wasted
effort.
Not only does this help with SMP code where the current CPU is the only
one with RT overloaded tasks, it should also solve the issue that
Daniel encountered, because it will prevent the PULL logic from
executing, as there's only one CPU on the system, and the check added
here will cause it to exit the RT pull code.
Reported-by: Daniel Wagner <wagi@monom.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Fixes: 4bdced5c9 ("sched/rt: Simplify the IPI based RT balancing logic")
Link: http://lkml.kernel.org/r/20171202130454.4cbbfe8d@vmware.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 90e406f96f630c07d631a021fd4af10aac913e77 upstream.
The default NR_CPUS can be very large, but actual possible nr_cpu_ids
usually is very small. For my x86 distribution, the NR_CPUS is 8192 and
nr_cpu_ids is 4. About 2 pages are wasted.
Most machines don't have so many CPUs, so define a array with NR_CPUS
just wastes memory. So let's allocate the buffer dynamically when need.
With this change, the mutext tracing_cpumask_update_lock also can be
removed now, which was used to protect mask_str.
Link: http://lkml.kernel.org/r/1512013183-19107-1-git-send-email-changbin.du@intel.com
Fixes: 36dfe9252bd4c ("ftrace: make use of tracing_cpumask")
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit bdcf0a423ea1c40bbb40e7ee483b50fc8aa3d758 upstream.
In testing, we found that nfsd threads may call set_groups in parallel
for the same entry cached in auth.unix.gid, racing in the call of
groups_sort, corrupting the groups for that entry and leading to
permission denials for the client.
This patch:
- Make groups_sort globally visible.
- Move the call to groups_sort to the modifiers of group_info
- Remove the call to groups_sort from set_groups
Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.com
Signed-off-by: Thiago Rafael Becker <thiago.becker@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 173743dd99a49c956b124a74c8aacb0384739a4c ]
Prior to this patch we enabled audit in audit_init(), which is too
late for PID 1 as the standard initcalls are run after the PID 1 task
is forked. This means that we never allocate an audit_context (see
audit_alloc()) for PID 1 and therefore miss a lot of audit events
generated by PID 1.
This patch enables audit as early as possible to help ensure that when
PID 1 is forked it can allocate an audit_context if required.
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 33e8a907804428109ce1d12301c3365d619cc4df ]
The API to end auditing has historically been for auditd to set the
pid to 0. This patch restores that functionality.
See: https://github.com/linux-audit/audit-kernel/issues/69
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 92ee46efeb505ead3ab06d3c5ce695637ed5f152 ]
Fengguang Wu reported that running the rcuperf test during boot can cause
the jump_label_test() to hit a WARN_ON(). The issue is that the core jump
label code relies on kernel_text_address() to detect when it can no longer
update branches that may be contained in __init sections. The
kernel_text_address() in turn assumes that if the system_state variable is
greter than or equal to SYSTEM_RUNNING then __init sections are no longer
valid (since the assumption is that they have been freed). However, when
rcuperf is setup to run in early boot it can call kernel_power_off() which
sets the system_state to SYSTEM_POWER_OFF.
Since rcuperf initialization is invoked via a module_init(), we can make
the dependency of jump_label_test() needing to complete before rcuperf
explicit by calling it via early_initcall().
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jason Baron <jbaron@akamai.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1510609727-2238-1-git-send-email-jbaron@akamai.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 89ad2fa3f043a1e8daae193bcb5fe34d5f8caf28 ]
pcpu_freelist_pop() needs the same lockdep awareness than
pcpu_freelist_populate() to avoid a false positive.
[ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]
switchto-defaul/12508 [HC0[0]:SC0[6]:HE0:SE0] is trying to acquire:
(&htab->buckets[i].lock){......}, at: [<ffffffff9dc099cb>] __htab_percpu_map_update_elem+0x1cb/0x300
and this task is already holding:
(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2){+.-...}, at: [<ffffffff9e135848>] __dev_queue_xmit+0
x868/0x1240
which would create a new lock dependency:
(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2){+.-...} -> (&htab->buckets[i].lock){......}
but this new dependency connects a SOFTIRQ-irq-safe lock:
(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2){+.-...}
... which became SOFTIRQ-irq-safe at:
[<ffffffff9db5931b>] __lock_acquire+0x42b/0x1f10
[<ffffffff9db5b32c>] lock_acquire+0xbc/0x1b0
[<ffffffff9da05e38>] _raw_spin_lock+0x38/0x50
[<ffffffff9e135848>] __dev_queue_xmit+0x868/0x1240
[<ffffffff9e136240>] dev_queue_xmit+0x10/0x20
[<ffffffff9e1965d9>] ip_finish_output2+0x439/0x590
[<ffffffff9e197410>] ip_finish_output+0x150/0x2f0
[<ffffffff9e19886d>] ip_output+0x7d/0x260
[<ffffffff9e19789e>] ip_local_out+0x5e/0xe0
[<ffffffff9e197b25>] ip_queue_xmit+0x205/0x620
[<ffffffff9e1b8398>] tcp_transmit_skb+0x5a8/0xcb0
[<ffffffff9e1ba152>] tcp_write_xmit+0x242/0x1070
[<ffffffff9e1baffc>] __tcp_push_pending_frames+0x3c/0xf0
[<ffffffff9e1b3472>] tcp_rcv_established+0x312/0x700
[<ffffffff9e1c1acc>] tcp_v4_do_rcv+0x11c/0x200
[<ffffffff9e1c3dc2>] tcp_v4_rcv+0xaa2/0xc30
[<ffffffff9e191107>] ip_local_deliver_finish+0xa7/0x240
[<ffffffff9e191a36>] ip_local_deliver+0x66/0x200
[<ffffffff9e19137d>] ip_rcv_finish+0xdd/0x560
[<ffffffff9e191e65>] ip_rcv+0x295/0x510
[<ffffffff9e12ff88>] __netif_receive_skb_core+0x988/0x1020
[<ffffffff9e130641>] __netif_receive_skb+0x21/0x70
[<ffffffff9e1306ff>] process_backlog+0x6f/0x230
[<ffffffff9e132129>] net_rx_action+0x229/0x420
[<ffffffff9da07ee8>] __do_softirq+0xd8/0x43d
[<ffffffff9e282bcc>] do_softirq_own_stack+0x1c/0x30
[<ffffffff9dafc2f5>] do_softirq+0x55/0x60
[<ffffffff9dafc3a8>] __local_bh_enable_ip+0xa8/0xb0
[<ffffffff9db4c727>] cpu_startup_entry+0x1c7/0x500
[<ffffffff9daab333>] start_secondary+0x113/0x140
to a SOFTIRQ-irq-unsafe lock:
(&head->lock){+.+...}
... which became SOFTIRQ-irq-unsafe at:
... [<ffffffff9db5971f>] __lock_acquire+0x82f/0x1f10
[<ffffffff9db5b32c>] lock_acquire+0xbc/0x1b0
[<ffffffff9da05e38>] _raw_spin_lock+0x38/0x50
[<ffffffff9dc0b7fa>] pcpu_freelist_pop+0x7a/0xb0
[<ffffffff9dc08b2c>] htab_map_alloc+0x50c/0x5f0
[<ffffffff9dc00dc5>] SyS_bpf+0x265/0x1200
[<ffffffff9e28195f>] entry_SYSCALL_64_fastpath+0x12/0x17
other info that might help us debug this:
Chain exists of:
dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2 --> &htab->buckets[i].lock --> &head->lock
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&head->lock);
local_irq_disable();
lock(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2);
lock(&htab->buckets[i].lock);
<Interrupt>
lock(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2);
*** DEADLOCK ***
Fixes: e19494edab82 ("bpf: introduce percpu_freelist")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 98159d977f71c3b3dee898d1c34e56f520b094e7 ]
Patch series "A few round_pipe_size() and pipe-max-size fixups", v3.
While backporting Michael's "pipe: fix limit handling" patchset to a
distro-kernel, Mikulas noticed that current upstream pipe limit handling
contains a few problems:
1 - procfs signed wrap: echo'ing a large number into
/proc/sys/fs/pipe-max-size and then cat'ing it back out shows a
negative value.
2 - round_pipe_size() nr_pages overflow on 32bit: this would
subsequently try roundup_pow_of_two(0), which is undefined.
3 - visible non-rounded pipe-max-size value: there is no mutual
exclusion or protection between the time pipe_max_size is assigned
a raw value from proc_dointvec_minmax() and when it is rounded.
4 - unsigned long -> unsigned int conversion makes for potential odd
return errors from do_proc_douintvec_minmax_conv() and
do_proc_dopipe_max_size_conv().
This version underwent the same testing as v1:
https://marc.info/?l=linux-kernel&m=150643571406022&w=2
This patch (of 4):
pipe_max_size is defined as an unsigned int:
unsigned int pipe_max_size = 1048576;
but its procfs/sysctl representation is an integer:
static struct ctl_table fs_table[] = {
...
{
.procname = "pipe-max-size",
.data = &pipe_max_size,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = &pipe_proc_fn,
.extra1 = &pipe_min_size,
},
...
that is signed:
int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf,
size_t *lenp, loff_t *ppos)
{
...
ret = proc_dointvec_minmax(table, write, buf, lenp, ppos)
This leads to signed results via procfs for large values of pipe_max_size:
% echo 2147483647 >/proc/sys/fs/pipe-max-size
% cat /proc/sys/fs/pipe-max-size
-2147483648
Use unsigned operations on this variable to avoid such negative values.
Link: http://lkml.kernel.org/r/1507658689-11669-2-git-send-email-joe.lawrence@redhat.com
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c07d35338081d107e57cf37572d8cc931a8e32e2 upstream.
kallsyms_symbol_next() returns a boolean (true on success). Currently
kdb_read() tests the return value with an inequality that
unconditionally evaluates to true.
This is fixed in the obvious way and, since the conditional branch is
supposed to be unreachable, we also add a WARN_ON().
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 46febd37f9c758b05cd25feae8512f22584742fe upstream.
Commit 31487f8328f2 ("smp/cfd: Convert core to hotplug state machine")
accidently put this step on the wrong place. The step should be at the
cpuhp_ap_states[] rather than the cpuhp_bp_states[].
grep smpcfd /sys/devices/system/cpu/hotplug/states
40: smpcfd:prepare
129: smpcfd:dying
"smpcfd:dying" was missing before.
So was the invocation of the function smpcfd_dying_cpu().
Fixes: 31487f8328f2 ("smp/cfd: Convert core to hotplug state machine")
Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lkml.kernel.org/r/20171128131954.81229-1-jiangshanlai@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit a30b85df7d599f626973e9cd3056fe755bd778e0 ]
We want to wait for all potentially preempted kprobes trampoline
execution to have completed. This guarantees that any freed
trampoline memory is not in use by any task in the system anymore.
synchronize_rcu_tasks() gives such a guarantee, so use it.
Also, this guarantees to wait for all potentially preempted tasks
on the instructions which will be replaced with a jump.
Since this becomes a problem only when CONFIG_PREEMPT=y, enable
CONFIG_TASKS_RCU=y for synchronize_rcu_tasks() in that case.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naveen N . Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/150845661962.5443.17724352636247312231.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit a9cd8194e1e6bd09619954721dfaf0f94fe2003e ]
Event timestamps are serialized using ctx->lock, make sure to hold it
over reading all values.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4f8413a3a799c958f7a10a6310a451e6b8aef5ad upstream.
When requesting a shared interrupt, we assume that the firmware
support code (DT or ACPI) has called irqd_set_trigger_type
already, so that we can retrieve it and check that the requester
is being reasonnable.
Unfortunately, we still have non-DT, non-ACPI systems around,
and these guys won't call irqd_set_trigger_type before requesting
the interrupt. The consequence is that we fail the request that
would have worked before.
We can either chase all these use cases (boring), or address it
in core code (easier). Let's have a per-irq_desc flag that
indicates whether irqd_set_trigger_type has been called, and
let's just check it when checking for a shared interrupt.
If it hasn't been set, just take whatever the interrupt
requester asks.
Fixes: 382bd4de6182 ("genirq: Use irqd_get_trigger_type to compare the trigger type for shared IRQs")
Reported-and-tested-by: Petr Cvek <petrcvekcz@gmail.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4bdced5c9a2922521e325896a7bbbf0132c94e56 upstream.
When a CPU lowers its priority (schedules out a high priority task for a
lower priority one), a check is made to see if any other CPU has overloaded
RT tasks (more than one). It checks the rto_mask to determine this and if so
it will request to pull one of those tasks to itself if the non running RT
task is of higher priority than the new priority of the next task to run on
the current CPU.
When we deal with large number of CPUs, the original pull logic suffered
from large lock contention on a single CPU run queue, which caused a huge
latency across all CPUs. This was caused by only having one CPU having
overloaded RT tasks and a bunch of other CPUs lowering their priority. To
solve this issue, commit:
b6366f048e0c ("sched/rt: Use IPI to trigger RT task push migration instead of pulling")
changed the way to request a pull. Instead of grabbing the lock of the
overloaded CPU's runqueue, it simply sent an IPI to that CPU to do the work.
Although the IPI logic worked very well in removing the large latency build
up, it still could suffer from a large number of IPIs being sent to a single
CPU. On a 80 CPU box, I measured over 200us of processing IPIs. Worse yet,
when I tested this on a 120 CPU box, with a stress test that had lots of
RT tasks scheduling on all CPUs, it actually triggered the hard lockup
detector! One CPU had so many IPIs sent to it, and due to the restart
mechanism that is triggered when the source run queue has a priority status
change, the CPU spent minutes! processing the IPIs.
Thinking about this further, I realized there's no reason for each run queue
to send its own IPI. As all CPUs with overloaded tasks must be scanned
regardless if there's one or many CPUs lowering their priority, because
there's no current way to find the CPU with the highest priority task that
can schedule to one of these CPUs, there really only needs to be one IPI
being sent around at a time.
This greatly simplifies the code!
The new approach is to have each root domain have its own irq work, as the
rto_mask is per root domain. The root domain has the following fields
attached to it:
rto_push_work - the irq work to process each CPU set in rto_mask
rto_lock - the lock to protect some of the other rto fields
rto_loop_start - an atomic that keeps contention down on rto_lock
the first CPU scheduling in a lower priority task
is the one to kick off the process.
rto_loop_next - an atomic that gets incremented for each CPU that
schedules in a lower priority task.
rto_loop - a variable protected by rto_lock that is used to
compare against rto_loop_next
rto_cpu - The cpu to send the next IPI to, also protected by
the rto_lock.
When a CPU schedules in a lower priority task and wants to make sure
overloaded CPUs know about it. It increments the rto_loop_next. Then it
atomically sets rto_loop_start with a cmpxchg. If the old value is not "0",
then it is done, as another CPU is kicking off the IPI loop. If the old
value is "0", then it will take the rto_lock to synchronize with a possible
IPI being sent around to the overloaded CPUs.
If rto_cpu is greater than or equal to nr_cpu_ids, then there's either no
IPI being sent around, or one is about to finish. Then rto_cpu is set to the
first CPU in rto_mask and an IPI is sent to that CPU. If there's no CPUs set
in rto_mask, then there's nothing to be done.
When the CPU receives the IPI, it will first try to push any RT tasks that is
queued on the CPU but can't run because a higher priority RT task is
currently running on that CPU.
Then it takes the rto_lock and looks for the next CPU in the rto_mask. If it
finds one, it simply sends an IPI to that CPU and the process continues.
If there's no more CPUs in the rto_mask, then rto_loop is compared with
rto_loop_next. If they match, everything is done and the process is over. If
they do not match, then a CPU scheduled in a lower priority task as the IPI
was being passed around, and the process needs to start again. The first CPU
in rto_mask is sent the IPI.
This change removes this duplication of work in the IPI logic, and greatly
lowers the latency caused by the IPIs. This removed the lockup happening on
the 120 CPU machine. It also simplifies the code tremendously. What else
could anyone ask for?
Thanks to Peter Zijlstra for simplifying the rto_loop_start atomic logic and
supplying me with the rto_start_trylock() and rto_start_unlock() helper
functions.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Clark Williams <williams@redhat.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott Wood <swood@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170424114732.1aac6dc4@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7c2102e56a3f7d85b5d8f33efbd7aecc1f36fdd8 upstream.
The current implementation of synchronize_sched_expedited() incorrectly
assumes that resched_cpu() is unconditional, which it is not. This means
that synchronize_sched_expedited() can hang when resched_cpu()'s trylock
fails as follows (analysis by Neeraj Upadhyay):
o CPU1 is waiting for expedited wait to complete:
sync_rcu_exp_select_cpus
rdp->exp_dynticks_snap & 0x1 // returns 1 for CPU5
IPI sent to CPU5
synchronize_sched_expedited_wait
ret = swait_event_timeout(rsp->expedited_wq,
sync_rcu_preempt_exp_done(rnp_root),
jiffies_stall);
expmask = 0x20, CPU 5 in idle path (in cpuidle_enter())
o CPU5 handles IPI and fails to acquire rq lock.
Handles IPI
sync_sched_exp_handler
resched_cpu
returns while failing to try lock acquire rq->lock
need_resched is not set
o CPU5 calls rcu_idle_enter() and as need_resched is not set, goes to
idle (schedule() is not called).
o CPU 1 reports RCU stall.
Given that resched_cpu() is now used only by RCU, this commit fixes the
assumption by making resched_cpu() unconditional.
Reported-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Suggested-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 07458f6a5171d97511dfbdf6ce549ed2ca0280c7 upstream.
'cached_raw_freq' is used to get the next frequency quickly but should
always be in sync with sg_policy->next_freq. There is a case where it is
not and in such cases it should be reset to avoid switching to incorrect
frequencies.
Consider this case for example:
- policy->cur is 1.2 GHz (Max)
- New request comes for 780 MHz and we store that in cached_raw_freq.
- Based on 780 MHz, we calculate the effective frequency as 800 MHz.
- We then see the CPU wasn't idle recently and choose to keep the next
freq as 1.2 GHz.
- Now we have cached_raw_freq is 780 MHz and sg_policy->next_freq is
1.2 GHz.
- Now if the utilization doesn't change in then next request, then the
next target frequency will still be 780 MHz and it will match with
cached_raw_freq. But we will choose 1.2 GHz instead of 800 MHz here.
Fixes: b7eaf1aab9f8 (cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely)
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 135bd1a230bb69a68c9808a7d25467318900b80a upstream.
The pending-callbacks check in rcu_prepare_for_idle() is backwards.
It should accelerate if there are pending callbacks, but the check
rather uselessly accelerates only if there are no callbacks. This commit
therefore inverts this check.
Fixes: 15fecf89e46a ("srcu: Abstract multi-tail callback list handling")
Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull final power management fixes from Rafael Wysocki:
"These fix a regression in the schedutil cpufreq governor introduced by
a recent change and blacklist Dell XPS13 9360 from using the Low Power
S0 Idle _DSM interface which triggers serious problems on one of these
machines.
Specifics:
- Prevent the schedutil cpufreq governor from using the utilization
of a wrong CPU in some cases which started to happen after one of
the recent changes in it (Chris Redpath).
- Blacklist Dell XPS13 9360 from using the Low Power S0 Idle _DSM
interface as that causes serious issue (related to NVMe) to appear
on one of these machines, even though the other Dells XPS13 9360 in
somewhat different HW configurations behave correctly (Rafael
Wysocki)"
* tag 'pm-final-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / PM: Blacklist Low Power S0 Idle _DSM for Dell XPS13 9360
cpufreq: schedutil: Examine the correct CPU when we update util
|
|
* pm-cpufreq-sched:
cpufreq: schedutil: Examine the correct CPU when we update util
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fix from Tejun Heo:
"Another fix for a really old bug.
It only affects drain_workqueue() which isn't used often and even then
triggers only during a pretty small race window, so it isn't too
surprising that it stayed hidden for so long.
The fix is straight-forward and low-risk. Kudos to Li Bin for
reporting and fixing the bug"
* 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Fix NULL pointer dereference
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Various fixes:
- synchronize kernel and tooling headers
- cgroup support fix
- two tooling fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tools/headers: Synchronize kernel ABI headers
perf/cgroup: Fix perf cgroup hierarchy support
perf tools: Unwind properly location after REJECT
perf symbols: Fix memory corruption because of zero length symbols
|
|
After commit 674e75411fc2 (sched: cpufreq: Allow remote cpufreq
callbacks) we stopped to always read the utilization for the CPU we
are running the governor on, and instead we read it for the CPU
which we've been told has updated utilization. This is stored in
sugov_cpu->cpu.
The value is set in sugov_register() but we clear it in sugov_start()
which leads to always looking at the utilization of CPU0 instead of
the correct one.
Fix this by consolidating the initialization code into sugov_start().
Fixes: 674e75411fc2 (sched: cpufreq: Allow remote cpufreq callbacks)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Reviewed-by: Patrick Bellasi <patrick.bellasi@arm.com>
Reviewed-by: Brendan Jackman <brendan.jackman@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
We want to fix an objtool build warning that got introduced in the latest upstream kernel.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull initial SPDX identifiers from Greg KH:
"License cleanup: add SPDX license identifiers to some files
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the
'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally
binding shorthand, which can be used instead of the full boiler plate
text.
This patch is based on work done by Thomas Gleixner and Kate Stewart
and Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset
of the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to
license had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied
to a file was done in a spreadsheet of side by side results from of
the output of two independent scanners (ScanCode & Windriver)
producing SPDX tag:value files created by Philippe Ombredanne.
Philippe prepared the base worksheet, and did an initial spot review
of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537
files assessed. Kate Stewart did a file by file comparison of the
scanner results in the spreadsheet to determine which SPDX license
identifier(s) to be applied to the file. She confirmed any
determination that was not immediately clear with lawyers working with
the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained
>5 lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that
was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that
became the concluded license(s).
- when there was disagreement between the two scanners (one detected
a license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply
(and which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases,
confirmation by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.
The Windriver scanner is based on an older version of FOSSology in
part, so they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot
checks in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect
the correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial
patch version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch
license was not GPL-2.0 WITH Linux-syscall-note to ensure that the
applied SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
* tag 'spdx_identifiers-4.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
License cleanup: add SPDX license identifier to uapi header files with a license
License cleanup: add SPDX license identifier to uapi header files with no license
License cleanup: add SPDX GPL-2.0 license identifier to files with no license
|
|
In commit 30d6e0a4190d ("futex: Remove duplicated code and fix undefined
behaviour"), I let FUTEX_WAKE_OP to fail on invalid op. Namely when op
should be considered as shift and the shift is out of range (< 0 or > 31).
But strace's test suite does this madness:
futex(0x7fabd78bcffc, 0x5, 0xfacefeed, 0xb, 0x7fabd78bcffc, 0xa0caffee);
futex(0x7fabd78bcffc, 0x5, 0xfacefeed, 0xb, 0x7fabd78bcffc, 0xbadfaced);
futex(0x7fabd78bcffc, 0x5, 0xfacefeed, 0xb, 0x7fabd78bcffc, 0xffffffff);
When I pick the first 0xa0caffee, it decodes as:
0x80000000 & 0xa0caffee: oparg is shift
0x70000000 & 0xa0caffee: op is FUTEX_OP_OR
0x0f000000 & 0xa0caffee: cmp is FUTEX_OP_CMP_EQ
0x00fff000 & 0xa0caffee: oparg is sign-extended 0xcaf = -849
0x00000fff & 0xa0caffee: cmparg is sign-extended 0xfee = -18
That means the op tries to do this:
(futex |= (1 << (-849))) == -18
which is completely bogus. The new check of op in the code is:
if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) {
if (oparg < 0 || oparg > 31)
return -EINVAL;
oparg = 1 << oparg;
}
which results obviously in the "Invalid argument" errno:
FAIL: futex
===========
futex(0x7fabd78bcffc, 0x5, 0xfacefeed, 0xb, 0x7fabd78bcffc, 0xa0caffee) = -1: Invalid argument
futex.test: failed test: ../futex failed with code 1
So let us soften the failure to print only a (ratelimited) message, crop
the value and continue as if it were right. When userspace keeps up, we
can switch this to return -EINVAL again.
[v2] Do not return 0 immediatelly, proceed with the cropped value.
Fixes: 30d6e0a4190d ("futex: Remove duplicated code and fix undefined behaviour")
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <dvhart@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull signal bugfix from Eric Biederman:
"When making the generic support for SIGEMT conditional on the presence
of SIGEMT I made a typo that causes it to fail to activate. It was
noticed comparatively quickly but the bug report just made it to me
today"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
signal: Fix name of SIGEMT in #if defined() check
|
|
Commit cc731525f26a ("signal: Remove kernel interal si_code magic")
added a check for SIGMET and NSIGEMT being defined. That SIGMET should
in fact be SIGEMT, with SIGEMT being defined in
arch/{alpha,mips,sparc}/include/uapi/asm/signal.h
This was actually pointed out by BenHutchings in a lwn.net comment
here https://lwn.net/Comments/734608/
Fixes: cc731525f26a ("signal: Remove kernel interal si_code magic")
Signed-off-by: Andrew Clayton <andrew@digital-domain.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
Guenter reported:
There is still a problem. When running
echo 6 > /proc/sys/kernel/watchdog_thresh
echo 5 > /proc/sys/kernel/watchdog_thresh
repeatedly, the message
NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
stops after a while (after ~10-30 iterations, with fluctuations).
Maybe watchdog_cpus needs to be atomic ?
That's correct as this again is affected by the asynchronous nature of the
smpboot thread unpark mechanism.
CPU 0 CPU1 CPU2
write(watchdog_thresh, 6)
stop()
park()
update()
start()
unpark()
thread->unpark()
cnt++;
write(watchdog_thresh, 5) thread->unpark()
stop()
park() thread->park()
cnt--; cnt++;
update()
start()
unpark()
That's not a functional problem, it just affects the informational message.
Convert watchdog_cpus to atomic_t to prevent the problem
Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20171101181126.j727fqjmdthjz4xk@redhat.com
|
|
Simplify deferred event destroy")
Guenter reported a crash in the watchdog/perf code, which is caused by
cleanup() and enable() running concurrently. The reason for this is:
The watchdog functions are serialized via the watchdog_mutex and cpu
hotplug locking, but the enable of the perf based watchdog happens in
context of the unpark callback of the smpboot thread. But that unpark
function is not synchronous inside the locking. The unparking of the thread
just wakes it up and leaves so there is no guarantee when the thread is
executing.
If it starts running _before_ the cleanup happened then it will create a
event and overwrite the dead event pointer. The new event is then cleaned
up because the event is marked dead.
lock(watchdog_mutex);
lockup_detector_reconfigure();
cpus_read_lock();
stop();
park()
update();
start();
unpark()
cpus_read_unlock(); thread runs()
overwrite dead event ptr
cleanup();
free new event, which is active inside perf....
unlock(watchdog_mutex);
The park side is safe as that actually waits for the thread to reach
parked state.
Commit a33d44843d45 removed the protection against this kind of scenario
under the stupid assumption that the hotplug serialization and the
watchdog_mutex cover everything.
Bring it back.
Reverts: a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy")
Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thomas Feels-stupid Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Don Zickus <dzickus@redhat.com>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1710312145190.1942@nanos
|
|
Dmitry (through syzbot) reported being able to trigger the WARN in
get_pi_state() and a use-after-free on:
raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
Both are due to this race:
exit_pi_state_list() put_pi_state()
lock(&curr->pi_lock)
while() {
pi_state = list_first_entry(head);
hb = hash_futex(&pi_state->key);
unlock(&curr->pi_lock);
dec_and_test(&pi_state->refcount);
lock(&hb->lock)
lock(&pi_state->pi_mutex.wait_lock) // uaf if pi_state free'd
lock(&curr->pi_lock);
....
unlock(&curr->pi_lock);
get_pi_state(); // WARN; refcount==0
The problem is we take the reference count too late, and don't allow it
being 0. Fix it by using inc_not_zero() and simply retrying the loop
when we fail to get a refcount. In that case put_pi_state() should
remove the entry from the list.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Gratian Crisan <gratian.crisan@ni.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: dvhart@infradead.org
Cc: syzbot <bot+2af19c9e1ffe4d4ee1d16c56ae7580feaee75765@syzkaller.appspotmail.com>
Cc: syzkaller-bugs@googlegroups.com
Cc: <stable@vger.kernel.org>
Fixes: c74aef2d06a9 ("futex: Fix pi_state->owner serialization")
Link: http://lkml.kernel.org/r/20171031101853.xpfh72y643kdfhjs@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Now that SK_REDIRECT is no longer a valid return code. Remove it
from the UAPI completely. Then do a namespace remapping internal
to sockmap so SK_REDIRECT is no longer externally visible.
Patchs primary change is to do a namechange from SK_REDIRECT to
__SK_REDIRECT
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When queue_work() is used in irq (not in task context), there is
a potential case that trigger NULL pointer dereference.
----------------------------------------------------------------
worker_thread()
|-spin_lock_irq()
|-process_one_work()
|-worker->current_pwq = pwq
|-spin_unlock_irq()
|-worker->current_func(work)
|-spin_lock_irq()
|-worker->current_pwq = NULL
|-spin_unlock_irq()
//interrupt here
|-irq_handler
|-__queue_work()
//assuming that the wq is draining
|-is_chained_work(wq)
|-current_wq_worker()
//Here, 'current' is the interrupted worker!
|-current->current_pwq is NULL here!
|-schedule()
----------------------------------------------------------------
Avoid it by checking for task context in current_wq_worker(), and
if not in task context, we shouldn't use the 'current' to check the
condition.
Reported-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: Li Bin <huawei.libin@huawei.com>
Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 8d03ecfe4718 ("workqueue: reimplement is_chained_work() using current_wq_worker()")
Cc: stable@vger.kernel.org # v3.9+
|
|
The following commit:
864c2357ca89 ("perf/core: Do not set cpuctx->cgrp for unscheduled cgroups")
made list_update_cgroup_event() skip setting cpuctx->cgrp if no cgroup event
targets %current's cgroup.
This breaks perf_event's hierarchical support because events which target one
of the ancestors get ignored.
Fix it by using cgroup_is_descendant() test instead of equality.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: David Carrillo-Cisneros <davidcc@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@fb.com
Cc: stable@vger.kernel.org # v4.9+
Fixes: 864c2357ca89 ("perf/core: Do not set cpuctx->cgrp for unscheduled cgroups")
Link: http://lkml.kernel.org/r/20171028164237.GA972780@devbig577.frc2.facebook.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Pull networking fixes from David Miller:
1) Fix route leak in xfrm_bundle_create().
2) In mac80211, validate user rate mask before configuring it. From
Johannes Berg.
3) Properly enforce memory limits in fair queueing code, from Toke
Hoiland-Jorgensen.
4) Fix lockdep splat in inet_csk_route_req(), from Eric Dumazet.
5) Fix TSO header allocation and management in mvpp2 driver, from Yan
Markman.
6) Don't take socket lock in BH handler in strparser code, from Tom
Herbert.
7) Don't show sockets from other namespaces in AF_UNIX code, from
Andrei Vagin.
8) Fix double free in error path of tap_open(), from Girish Moodalbail.
9) Fix TX map failure path in igb and ixgbe, from Jean-Philippe Brucker
and Alexander Duyck.
10) Fix DCB mode programming in stmmac driver, from Jose Abreu.
11) Fix err_count handling in various tunnels (ipip, ip6_gre). From Xin
Long.
12) Properly align SKB head before building SKB in tuntap, from Jason
Wang.
13) Avoid matching qdiscs with a zero handle during lookups, from Cong
Wang.
14) Fix various endianness bugs in sctp, from Xin Long.
15) Fix tc filter callback races and add selftests which trigger the
problem, from Cong Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (73 commits)
selftests: Introduce a new test case to tc testsuite
selftests: Introduce a new script to generate tc batch file
net_sched: fix call_rcu() race on act_sample module removal
net_sched: add rtnl assertion to tcf_exts_destroy()
net_sched: use tcf_queue_work() in tcindex filter
net_sched: use tcf_queue_work() in rsvp filter
net_sched: use tcf_queue_work() in route filter
net_sched: use tcf_queue_work() in u32 filter
net_sched: use tcf_queue_work() in matchall filter
net_sched: use tcf_queue_work() in fw filter
net_sched: use tcf_queue_work() in flower filter
net_sched: use tcf_queue_work() in flow filter
net_sched: use tcf_queue_work() in cgroup filter
net_sched: use tcf_queue_work() in bpf filter
net_sched: use tcf_queue_work() in basic filter
net_sched: introduce a workqueue for RCU callbacks of tc filter
sctp: fix some type cast warnings introduced since very beginning
sctp: fix a type cast warnings that causes a_rwnd gets the wrong value
sctp: fix some type cast warnings introduced by transport rhashtable
sctp: fix some type cast warnings introduced by stream reconf
...
|
|
Recent additions to support multiple programs in cgroups impose
a strict requirement, "all yes is yes, any no is no". To enforce
this the infrastructure requires the 'no' return code, SK_DROP in
this case, to be 0.
To apply these rules to SK_SKB program types the sk_actions return
codes need to be adjusted.
This fix adds SK_PASS and makes 'SK_DROP = 0'. Finally, remove
SK_ABORTED to remove any chance that the API may allow aborted
program flows to be passed up the stack. This would be incorrect
behavior and allow programs to break existing policies.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
SK_SKB program types use bpf_compute_data to store the end of the
packet data. However, bpf_compute_data assumes the cb is stored in the
qdisc layer format. But, for SK_SKB this is the wrong layer of the
stack for this type.
It happens to work (sort of!) because in most cases nothing happens
to be overwritten today. This is very fragile and error prone.
Fortunately, we have another hole in tcp_skb_cb we can use so lets
put the data_end value there.
Note, SK_SKB program types do not use data_meta, they are failed by
sk_skb_is_valid_access().
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fix from Tejun Heo:
"This is a fix for an old bug in workqueue. Workqueue used a mutex to
arbitrate who gets to be the manager of a pool. When the manager role
gets released, the mutex gets unlocked while holding the pool's
irqsafe spinlock. This can lead to deadlocks as mutex's internal
spinlock isn't irqsafe. This got discovered by recent fixes to mutex
lockdep annotations.
The fix is a bit invasive for rc6 but if anything were wrong with the
fix it would likely have already blown up in -next, and we want the
fix in -stable anyway"
* 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: replace pool->manager_arb mutex with a flag
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull smp/hotplug fix from Thomas Gleixner:
"The recent rework of the callback invocation missed to cleanup the
leftovers of the operation, so under certain circumstances a
subsequent CPU hotplug operation accesses stale data and crashes.
Clean it up."
* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu/hotplug: Reset node state after operation
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A set of small fixes mostly in the irq drivers area:
- Make the tango irq chip work correctly, which requires a new
function in the generiq irq chip implementation
- A set of updates to the GIC-V3 ITS driver removing a bogus BUG_ON()
and parsing the VCPU table size correctly"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: generic chip: remove irq_gc_mask_disable_reg_and_ack()
irqchip/tango: Use irq_gc_mask_disable_and_ack_set
genirq: generic chip: Add irq_gc_mask_disable_and_ack_set()
irqchip/gic-v3-its: Add missing changes to support 52bit physical address
irqchip/gic-v3-its: Fix the incorrect parsing of VCPU table size
irqchip/gic-v3-its: Fix the incorrect BUG_ON in its_init_vpe_domain()
DT: arm,gic-v3: Update the ITS size in the examples
|