summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-03-16perf/core: Better explain the inherit magicPeter Zijlstra
While going through the event inheritance code Oleg got confused. Add some comments to better explain the silent dissapearance of orphaned events. So what happens is that at perf_event_release_kernel() time; when an event looses its connection to userspace (and ceases to exist from the user's perspective) we can still have an arbitrary amount of inherited copies of the event. We want to synchronously find and remove all these child events. Since that requires a bit of lock juggling, there is the possibility that concurrent clone()s will create new child events. Therefore we first mark the parent event as DEAD, which marks all the extant child events as orphaned. We then avoid copying orphaned events; in order to avoid getting more of them. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: fweisbec@gmail.com Link: http://lkml.kernel.org/r/20170316125823.289567442@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16perf/core: Simplify perf_event_free_task()Peter Zijlstra
We have ctx->event_list that contains all events; no need to repeatedly iterate the group lists to find them all. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: fweisbec@gmail.com Link: http://lkml.kernel.org/r/20170316125823.239678244@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16perf/core: Fix event inheritance on fork()Peter Zijlstra
While hunting for clues to a use-after-free, Oleg spotted that perf_event_init_context() can loose an error value with the result that fork() can succeed even though we did not fully inherit the perf event context. Spotted-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: oleg@redhat.com Cc: stable@vger.kernel.org Fixes: 889ff0150661 ("perf/core: Split context's event group list into pinned and non-pinned lists") Link: http://lkml.kernel.org/r/20170316125823.190342547@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16perf/core: Fix use-after-free in perf_release()Peter Zijlstra
Dmitry reported syzcaller tripped a use-after-free in perf_release(). After much puzzlement Oleg spotted the below scenario: Task1 Task2 fork() perf_event_init_task() /* ... */ goto bad_fork_$foo; /* ... */ perf_event_free_task() mutex_lock(ctx->lock) perf_free_event(B) perf_event_release_kernel(A) mutex_lock(A->child_mutex) list_for_each_entry(child, ...) { /* child == B */ ctx = B->ctx; get_ctx(ctx); mutex_unlock(A->child_mutex); mutex_lock(A->child_mutex) list_del_init(B->child_list) mutex_unlock(A->child_mutex) /* ... */ mutex_unlock(ctx->lock); put_ctx() /* >0 */ free_task(); mutex_lock(ctx->lock); mutex_lock(A->child_mutex); /* ... */ mutex_unlock(A->child_mutex); mutex_unlock(ctx->lock) put_ctx() /* 0 */ ctx->task && !TOMBSTONE put_task_struct() /* UAF */ This patch closes the hole by making perf_event_free_task() destroy the task <-> ctx relation such that perf_event_release_kernel() will no longer observe the now dead task. Spotted-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: fweisbec@gmail.com Cc: oleg@redhat.com Cc: stable@vger.kernel.org Fixes: c6e5b73242d2 ("perf: Synchronously clean up child events") Link: http://lkml.kernel.org/r/20170314155949.GE32474@worktop Link: http://lkml.kernel.org/r/20170316125823.140295131@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16powerpc: Wire up statx() syscallChandan Rajendra
Test runs on a ppc64 BE guest succeeded. linux/samples/statx/test-statx program was executed on the following file types, 1. Regular file 2. Directory 3. device file 4. symlink 5. Named pipe The test run also included invoking test-statx with the runtime options provided in the main() function of test-statx.c Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-16sched/deadline: Use deadline instead of period when calculating overflowSteven Rostedt (VMware)
I was testing Daniel's changes with his test case, and tweaked it a little. Instead of having the runtime equal to the deadline, I increased the deadline ten fold. Daniel's test case had: attr.sched_runtime = 2 * 1000 * 1000; /* 2 ms */ attr.sched_deadline = 2 * 1000 * 1000; /* 2 ms */ attr.sched_period = 2 * 1000 * 1000 * 1000; /* 2 s */ To make it more interesting, I changed it to: attr.sched_runtime = 2 * 1000 * 1000; /* 2 ms */ attr.sched_deadline = 20 * 1000 * 1000; /* 20 ms */ attr.sched_period = 2 * 1000 * 1000 * 1000; /* 2 s */ The results were rather surprising. The behavior that Daniel's patch was fixing came back. The task started using much more than .1% of the CPU. More like 20%. Looking into this I found that it was due to the dl_entity_overflow() constantly returning true. That's because it uses the relative period against relative runtime vs the absolute deadline against absolute runtime. runtime / (deadline - t) > dl_runtime / dl_period There's even a comment mentioning this, and saying that when relative deadline equals relative period, that the equation is the same as using deadline instead of period. That comment is backwards! What we really want is: runtime / (deadline - t) > dl_runtime / dl_deadline We care about if the runtime can make its deadline, not its period. And then we can say "when the deadline equals the period, the equation is the same as using dl_period instead of dl_deadline". After correcting this, now when the task gets enqueued, it can throttle correctly, and Daniel's fix to the throttling of sleeping deadline tasks works even when the runtime and deadline are not the same. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luca Abeni <luca.abeni@santannapisa.it> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Romulo Silva de Oliveira <romulo.deoliveira@ufsc.br> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tommaso Cucinotta <tommaso.cucinotta@sssup.it> Link: http://lkml.kernel.org/r/02135a27f1ae3fe5fd032568a5a2f370e190e8d7.1488392936.git.bristot@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16sched/deadline: Throttle a constrained deadline task activated after the ↵Daniel Bristot de Oliveira
deadline During the activation, CBS checks if it can reuse the current task's runtime and period. If the deadline of the task is in the past, CBS cannot use the runtime, and so it replenishes the task. This rule works fine for implicit deadline tasks (deadline == period), and the CBS was designed for implicit deadline tasks. However, a task with constrained deadline (deadine < period) might be awakened after the deadline, but before the next period. In this case, replenishing the task would allow it to run for runtime / deadline. As in this case deadline < period, CBS enables a task to run for more than the runtime / period. In a very loaded system, this can cause a domino effect, making other tasks miss their deadlines. To avoid this problem, in the activation of a constrained deadline task after the deadline but before the next period, throttle the task and set the replenishing timer to the begin of the next period, unless it is boosted. Reproducer: --------------- %< --------------- int main (int argc, char **argv) { int ret; int flags = 0; unsigned long l = 0; struct timespec ts; struct sched_attr attr; memset(&attr, 0, sizeof(attr)); attr.size = sizeof(attr); attr.sched_policy = SCHED_DEADLINE; attr.sched_runtime = 2 * 1000 * 1000; /* 2 ms */ attr.sched_deadline = 2 * 1000 * 1000; /* 2 ms */ attr.sched_period = 2 * 1000 * 1000 * 1000; /* 2 s */ ts.tv_sec = 0; ts.tv_nsec = 2000 * 1000; /* 2 ms */ ret = sched_setattr(0, &attr, flags); if (ret < 0) { perror("sched_setattr"); exit(-1); } for(;;) { /* XXX: you may need to adjust the loop */ for (l = 0; l < 150000; l++); /* * The ideia is to go to sleep right before the deadline * and then wake up before the next period to receive * a new replenishment. */ nanosleep(&ts, NULL); } exit(0); } --------------- >% --------------- On my box, this reproducer uses almost 50% of the CPU time, which is obviously wrong for a task with 2/2000 reservation. Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luca Abeni <luca.abeni@santannapisa.it> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Romulo Silva de Oliveira <romulo.deoliveira@ufsc.br> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tommaso Cucinotta <tommaso.cucinotta@sssup.it> Link: http://lkml.kernel.org/r/edf58354e01db46bf42df8d2dd32418833f68c89.1488392936.git.bristot@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16sched/deadline: Make sure the replenishment timer fires in the next periodDaniel Bristot de Oliveira
Currently, the replenishment timer is set to fire at the deadline of a task. Although that works for implicit deadline tasks because the deadline is equals to the begin of the next period, that is not correct for constrained deadline tasks (deadline < period). For instance: f.c: --------------- %< --------------- int main (void) { for(;;); } --------------- >% --------------- # gcc -o f f.c # trace-cmd record -e sched:sched_switch \ -e syscalls:sys_exit_sched_setattr \ chrt -d --sched-runtime 490000000 \ --sched-deadline 500000000 \ --sched-period 1000000000 0 ./f # trace-cmd report | grep "{pid of ./f}" After setting parameters, the task is replenished and continue running until being throttled: f-11295 [003] 13322.113776: sys_exit_sched_setattr: 0x0 The task is throttled after running 492318 ms, as expected: f-11295 [003] 13322.606094: sched_switch: f:11295 [-1] R ==> watchdog/3:32 [0] But then, the task is replenished 500719 ms after the first replenishment: <idle>-0 [003] 13322.614495: sched_switch: swapper/3:0 [120] R ==> f:11295 [-1] Running for 490277 ms: f-11295 [003] 13323.104772: sched_switch: f:11295 [-1] R ==> swapper/3:0 [120] Hence, in the first period, the task runs 2 * runtime, and that is a bug. During the first replenishment, the next deadline is set one period away. So the runtime / period starts to be respected. However, as the second replenishment took place in the wrong instant, the next replenishment will also be held in a wrong instant of time. Rather than occurring in the nth period away from the first activation, it is taking place in the (nth period - relative deadline). Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Luca Abeni <luca.abeni@santannapisa.it> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Romulo Silva de Oliveira <romulo.deoliveira@ufsc.br> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tommaso Cucinotta <tommaso.cucinotta@sssup.it> Link: http://lkml.kernel.org/r/ac50d89887c25285b47465638354b63362f8adff.1488392936.git.bristot@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16locking/rwsem: Fix down_write_killable() for CONFIG_RWSEM_GENERIC_SPINLOCK=yNiklas Cassel
We hang if SIGKILL has been sent, but the task is stuck in down_read() (after do_exit()), even though no task is doing down_write() on the rwsem in question: INFO: task libupnp:21868 blocked for more than 120 seconds. libupnp D 0 21868 1 0x08100008 ... Call Trace: __schedule() schedule() __down_read() do_exit() do_group_exit() __wake_up_parent() This bug has already been fixed for CONFIG_RWSEM_XCHGADD_ALGORITHM=y in the following commit: 04cafed7fc19 ("locking/rwsem: Fix down_write_killable()") ... however, this bug also exists for CONFIG_RWSEM_GENERIC_SPINLOCK=y. Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <mhocko@suse.com> Cc: <stable@vger.kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Niklas Cassel <niklass@axis.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: d47996082f52 ("locking/rwsem: Introduce basis for down_write_killable()") Link: http://lkml.kernel.org/r/1487981873-12649-1-git-send-email-niklass@axis.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16sched/loadavg: Use {READ,WRITE}_ONCE() for sample windowMatt Fleming
'calc_load_update' is accessed without any kind of locking and there's a clear assumption in the code that only a single value is read or written. Make this explicit by using READ_ONCE() and WRITE_ONCE(), and avoid unintentionally seeing multiple values, or having the load/stores split. Technically the loads in calc_global_*() don't require this since those are the only functions that update 'calc_load_update', but I've added the READ_ONCE() for consistency. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincent Guittot <vincent.guittot@linaro.org> Link: http://lkml.kernel.org/r/20170217120731.11868-3-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accountingMatt Fleming
If we crossed a sample window while in NO_HZ we will add LOAD_FREQ to the pending sample window time on exit, setting the next update not one window into the future, but two. This situation on exiting NO_HZ is described by: this_rq->calc_load_update < jiffies < calc_load_update In this scenario, what we should be doing is: this_rq->calc_load_update = calc_load_update [ next window ] But what we actually do is: this_rq->calc_load_update = calc_load_update + LOAD_FREQ [ next+1 window ] This has the effect of delaying load average updates for potentially up to ~9seconds. This can result in huge spikes in the load average values due to per-cpu uninterruptible task counts being out of sync when accumulated across all CPUs. It's safe to update the per-cpu active count if we wake between sample windows because any load that we left in 'calc_load_idle' will have been zero'd when the idle load was folded in calc_global_load(). This issue is easy to reproduce before, commit 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking") just by forking short-lived process pipelines built from ps(1) and grep(1) in a loop. I'm unable to reproduce the spikes after that commit, but the bug still seems to be present from code review. Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincent Guittot <vincent.guittot@linaro.org> Fixes: commit 5167e8d ("sched/nohz: Rewrite and fix load-avg computation -- again") Link: http://lkml.kernel.org/r/20170217120731.11868-2-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16sched/deadline: Add missing update_rq_clock() in dl_task_timer()Wanpeng Li
The following warning can be triggered by hot-unplugging the CPU on which an active SCHED_DEADLINE task is running on: ------------[ cut here ]------------ WARNING: CPU: 7 PID: 0 at kernel/sched/sched.h:833 replenish_dl_entity+0x71e/0xc40 rq->clock_update_flags < RQCF_ACT_SKIP CPU: 7 PID: 0 Comm: swapper/7 Tainted: G B 4.11.0-rc1+ #24 Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY, BIOS FBKTC1AUS 02/16/2016 Call Trace: <IRQ> dump_stack+0x85/0xc4 __warn+0x172/0x1b0 warn_slowpath_fmt+0xb4/0xf0 ? __warn+0x1b0/0x1b0 ? debug_check_no_locks_freed+0x2c0/0x2c0 ? cpudl_set+0x3d/0x2b0 replenish_dl_entity+0x71e/0xc40 enqueue_task_dl+0x2ea/0x12e0 ? dl_task_timer+0x777/0x990 ? __hrtimer_run_queues+0x270/0xa50 dl_task_timer+0x316/0x990 ? enqueue_task_dl+0x12e0/0x12e0 ? enqueue_task_dl+0x12e0/0x12e0 __hrtimer_run_queues+0x270/0xa50 ? hrtimer_cancel+0x20/0x20 ? hrtimer_interrupt+0x119/0x600 hrtimer_interrupt+0x19c/0x600 ? trace_hardirqs_off+0xd/0x10 local_apic_timer_interrupt+0x74/0xe0 smp_apic_timer_interrupt+0x76/0xa0 apic_timer_interrupt+0x93/0xa0 The DL task will be migrated to a suitable later deadline rq once the DL timer fires and currnet rq is offline. The rq clock of the new rq should be updated. This patch fixes it by updating the rq clock after holding the new rq's rq lock. Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1488865888-15894-1-git-send-email-wanpeng.li@hotmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16x86/mpx: Make unnecessarily global function staticTobias Klauser
Make the function get_user_bd_entry() static as it is not used outside of arch/x86/mm/mpx.c This fixes a sparse warning. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16Merge branch 'drm-fixes-4.11' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie
into drm-fixes A few amd fixes. * 'drm-fixes-4.11' of git://people.freedesktop.org/~agd5f/linux: drm/amd/amdgpu: Fix debugfs reg read/write address width drm/amdgpu/si: add dpm quirk for Oland drm/radeon/si: add dpm quirk for Oland drm: amd: remove broken include path drm/amd/powerplay: fix copy error in smu7_clockpoweragting.c drm/amdgpu: fix parser init error path to avoid crash in parser fini drm/amd/amdgpu: Disable GFX_PG on Carrizo until compute issues solved
2017-03-15Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "Four small fixes for this cycle: - followup fix from Neil for a fix that went in before -rc2, ensuring that we always see the full per-task bio_list. - fix for blk-mq-sched from me that ensures that we retain similar direct-to-issue behavior on running the queue. - fix from Sagi fixing a potential NULL pointer dereference in blk-mq on spurious CPU unplug. - a memory leak fix in writeback from Tahsin, fixing a case where device removal of a mounted device can leak a struct wb_writeback_work" * 'for-linus' of git://git.kernel.dk/linux-block: blk-mq-sched: don't run the queue async from blk_mq_try_issue_directly() writeback: fix memory leak in wb_queue_work() blk-mq: Fix tagset reinit in the presence of cpu hot-unplug blk: Ensure users for current->bio_list can see the full list.
2017-03-16cpufreq: Fix and clean up show_cpuinfo_cur_freq()Rafael J. Wysocki
There is a missing newline in show_cpuinfo_cur_freq(), so add it, but while at it clean that function up somewhat too. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: All applicable <stable@vger.kernel.org>
2017-03-15parisc: Avoid compiler warnings with access_ok()Helge Deller
Commit 09b871ffd4d8 (parisc: Define access_ok() as macro) missed to mark uaddr as used, which then gives compiler warnings about unused variables. Fix it by comparing uaddr to uaddr which then gets optimized away by the compiler. Signed-off-by: Helge Deller <deller@gmx.de> Fixes: 09b871ffd4d8 ("parisc: Define access_ok() as macro")
2017-03-15drm/amd/amdgpu: Fix debugfs reg read/write address widthTom St Denis
The MMIO space is wider now so we mask the lower 22 bits instead of 18. Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-03-15drm/amdgpu/si: add dpm quirk for OlandAlex Deucher
OLAND 0x1002:0x6604 0x1028:0x066F 0x00 seems to have problems with higher sclks. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2017-03-15drm/radeon/si: add dpm quirk for OlandAlex Deucher
OLAND 0x1002:0x6604 0x1028:0x066F 0x00 seems to have problems with higher sclks. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2017-03-15parisc: Wire up statx system callHelge Deller
Signed-off-by: Helge Deller <deller@gmx.de>
2017-03-15parisc: Optimize flush_kernel_vmap_range and invalidate_kernel_vmap_rangeJohn David Anglin
The previously submitted patch did not resolve the random segmentation faults observed on the phantom buildd system. There are still unresolved problems with the Debian 4.8 and 4.9 kernels on C8000. The attached patch removes the flush of the offset map pages and does a whole data cache flush for large ranges. No other arch flushes the offset map in these routines as far as I can tell. I have not observed any random segmentation faults on rp3440 in two weeks of testing with 4.10.0 and 4.10.1. Signed-off-by: John David Anglin <dave.anglin@bell.net> Cc: stable@vger.kernel.org # v4.8+ Signed-off-by: Helge Deller <deller@gmx.de>
2017-03-15parisc: support R_PARISC_SECREL32 relocation in modulesMikulas Patocka
The parisc kernel doesn't work with CONFIG_MODVERSIONS since the commit 71810db27c1c853b335675bee335d893bc3d324b. It can't load modules with the error: "module unix: Unknown relocation: 41". The commit changes __kcrctab from 64-bit valus to 32-bit values. The assembler generates R_PARISC_SECREL32 secrel relocation for them and the module loader doesn't support this relocation. This patch adds the R_PARISC_SECREL32 relocation to the module loader. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org # v4.10+ Signed-off-by: Helge Deller <deller@gmx.de>
2017-03-15Merge remote-tracking branch 'mkp-scsi/4.11/scsi-fixes' into fixesJames Bottomley
2017-03-15Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "This is a rather large set of fixes. The bulk are for lpfc correcting a lot of issues in the new NVME driver code which just went in in the merge window. The others are: - fix a hang in the vmware paravirt driver caused by incorrect handling of the new MSI vector allocation - long standing bug in storvsc, which recent block changes turned from being a harmless annoyance into a hang - yet more fallout (in mpt3sas) from the changes to device blocking The remainder are small fixes and updates" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (34 commits) scsi: lpfc: Add shutdown method for kexec scsi: storvsc: Workaround for virtual DVD SCSI version scsi: lpfc: revise version number to 11.2.0.10 scsi: lpfc: code cleanups in NVME initiator discovery scsi: lpfc: code cleanups in NVME initiator base scsi: lpfc: correct rdp diag portnames scsi: lpfc: remove dead sli3 nvme code scsi: lpfc: correct double print scsi: lpfc: Rename LPFC_MAX_EQ_DELAY to LPFC_MAX_EQ_DELAY_EQID_CNT scsi: lpfc: Rework lpfc Kconfig for NVME options scsi: lpfc: add transport eh_timed_out reference scsi: lpfc: Fix eh_deadline setting for sli3 adapters. scsi: lpfc: add NVME exchange aborts scsi: lpfc: Fix nvme allocation bug on failed nvme_fc_register_localport scsi: lpfc: Fix IO submission if WQ is full scsi: lpfc: Fix NVME CMD IU byte swapped word 1 problem scsi: lpfc: Fix RCTL value on NVME LS request and response scsi: lpfc: Fix crash during Hardware error recovery on SLI3 adapters scsi: lpfc: fix missing spin_unlock on sql_list_lock scsi: lpfc: don't dereference dma_buf->iocbq before null check ...
2017-03-15scsi: lpfc: Finalize Kconfig options for nvmeJames Smart
Reviewing the result of what was just added for Kconfig, we made a poor choice. It worked well for full kernel builds, but not so much for how it would be deployed on a distro. Here's the final result: - lpfc will compile in NVME initiator and/or NVME target support based on whether the kernel has the corresponding subsystem support. Kconfig is not used to drive this specifically for lpfc. - There is a module parameter, lpfc_enable_fc4_type, that indicates whether the ports will do FCP-only or FCP & NVME (NVME-only not yet possible due to dependency on fc transport). As FCP & NVME divvys up exchange resources, and given NVME will not be often initially, the default is changed to FCP only. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-15scsi: ufs: don't check unsigned type for a negative valueTomas Winkler
Fix compilation warning: drivers/scsi/ufs/ufshcd.c:7645:13: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits] if ((value < UFS_PM_LVL_0) || (value >= UFS_PM_LVL_MAX)) Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-15scsi: hpsa: do not timeout reset operationsDon Brace
Resets can take longer than DEFAULT_TIMEOUT. Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-15scsi: hpsa: limit outstanding rescansDon Brace
Avoid rescan storms. No need to queue another if one is pending. Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-15scsi: hpsa: update check for logical volume statusDon Brace
- Add in a new case for volume offline. Resolves internal testing bug for multilun array management. - Return correct status for failed TURs. Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Scott Teel <scott.teel@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-15Merge tag 'gfs2-4.11-rc3.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fix from Bob Peterson: "This is an emergency patch for 4.11-rc3 The GFS2 developers uncovered a really nasty problem that can lead to random corruption and kernel panic, much like the last one. Andreas Gruenbacher wrote a simple one-line patch to fix the problem." * tag 'gfs2-4.11-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Avoid alignment hole in struct lm_lockname
2017-03-15Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: - self-test failure of crc32c on powerpc - regressions of ecb(aes) when used with xts/lrw in s5p-sss - a number of bugs in the omap RNG driver * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: s5p-sss - Fix spinlock recursion on LRW(AES) hwrng: omap - Do not access INTMASK_REG on EIP76 hwrng: omap - use devm_clk_get() instead of of_clk_get() hwrng: omap - write registers after enabling the clock crypto: s5p-sss - Fix completing crypto request in IRQ handler crypto: powerpc - Fix initialisation of crc32c context
2017-03-15cpufreq: intel_pstate: Avoid percentages in limits-related computationsRafael J. Wysocki
Currently, intel_pstate_update_perf_limits() first converts the policy minimum and maximum limits into percentages of the maximum turbo frequency (rounding up to an integer) and then converts these percentages to fractions (by using fixed-point arithmetic to divide them by 100). That introduces a rounding error unnecessarily, because the fractions can be obtained by carrying out fixed-point divisions directly on the input numbers. Rework the computations in intel_pstate_hwp_set() to use fractions instead of percentages (and drop redundant local variables from there) and modify intel_pstate_update_perf_limits() to compute the fractions directly and percentages out of them. While at it, introduce percent_ext_fp() for converting percentages to fractions (with extended number of fraction bits) and use it in the computations. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-16openrisc: Export symbols needed by modulesStafford Horne
This was detected by allmodconfig, errors reported: ERROR: "empty_zero_page" [net/ceph/libceph.ko] undefined! ERROR: "__ucmpdi2" [lib/842/842_decompress.ko] undefined! ERROR: "empty_zero_page" [fs/nfs/objlayout/objlayoutdriver.ko] undefined! ERROR: "empty_zero_page" [fs/exofs/exofs.ko] undefined! ERROR: "empty_zero_page" [fs/crypto/fscrypto.ko] undefined! ERROR: "__ucmpdi2" [fs/btrfs/btrfs.ko] undefined! ERROR: "pm_power_off" [drivers/regulator/act8865-regulator.ko] undefined! ERROR: "__ucmpdi2" [drivers/media/i2c/adv7842.ko] undefined! ERROR: "__clear_user" [drivers/md/dm-mod.ko] undefined! ERROR: "__clear_user" [net/netfilter/x_tables.ko] undefined! Signed-off-by: Stafford Horne <shorne@gmail.com>
2017-03-16openrisc: fix issue handling 8 byte get_user callsStafford Horne
Was getting the following error with allmodconfig: ERROR: "__get_user_bad" [lib/test_user_copy.ko] undefined! This was simply a missing break statement, causing an unwanted fall through. Signed-off-by: Stafford Horne <shorne@gmail.com>
2017-03-16openrisc: xchg: fix `computed is not used` warningStafford Horne
When building allmodconfig this warning shows. fs/ocfs2/file.c: In function 'ocfs2_file_write_iter': ./arch/openrisc/include/asm/cmpxchg.h:81:3: warning: value computed is not used [-Wunused-value] ((typeof(*(ptr)))__xchg((unsigned long)(with), (ptr), sizeof(*(ptr)))) ^ Applying the same patch logic that was done to the cmpxchg macro. Signed-off-by: Stafford Horne <shorne@gmail.com>
2017-03-15gfs2: Avoid alignment hole in struct lm_locknameAndreas Gruenbacher
Commit 88ffbf3e03 switches to using rhashtables for glocks, hashing over the entire struct lm_lockname instead of its individual fields. On some architectures, struct lm_lockname contains a hole of uninitialized memory due to alignment rules, which now leads to incorrect hash values. Get rid of that hole. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> CC: <stable@vger.kernel.org> #v4.3+
2017-03-15xfs: verify inline directory data forksDarrick J. Wong
When we're reading or writing the data fork of an inline directory, check the contents to make sure we're not overflowing buffers or eating garbage data. xfs/348 corrupts an inline symlink into an inline directory, triggering a buffer overflow bug. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> --- v2: add more checks consistent with _dir2_sf_check and make the verifier usable from anywhere.
2017-03-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Ensure that mtu is at least IPV6_MIN_MTU in ipv6 VTI tunnel driver, from Steffen Klassert. 2) Fix crashes when user tries to get_next_key on an LPM bpf map, from Alexei Starovoitov. 3) Fix detection of VLAN fitlering feature for bnx2x VF devices, from Michal Schmidt. 4) We can get a divide by zero when TCP socket are morphed into listening state, fix from Eric Dumazet. 5) Fix socket refcounting bugs in skb_complete_wifi_ack() and skb_complete_tx_timestamp(). From Eric Dumazet. 6) Use after free in dccp_feat_activate_values(), also from Eric Dumazet. 7) Like bonding team needs to use ETH_MAX_MTU as netdev->max_mtu, from Jarod Wilson. 8) Fix use after free in vrf_xmit(), from David Ahern. 9) Don't do UDP Fragmentation Offload on IPComp ipsec packets, from Alexey Kodanev. 10) Properly check napi_complete_done() return value in order to decide whether to re-enable IRQs or not in amd-xgbe driver, from Thomas Lendacky. 11) Fix double free of hwmon device in marvell phy driver, from Andrew Lunn. 12) Don't crash on malformed netlink attributes in act_connmark, from Etienne Noss. 13) Don't remove routes with a higher metric in ipv6 ECMP route replace, from Sabrina Dubroca. 14) Don't write into a cloned SKB in ipv6 fragmentation handling, from Florian Westphal. 15) Fix routing redirect races in dccp and tcp, basically the ICMP handler can't modify the socket's cached route in it's locked by the user at this moment. From Jon Maxwell. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (108 commits) qed: Enable iSCSI Out-of-Order qed: Correct out-of-bound access in OOO history qed: Fix interrupt flags on Rx LL2 qed: Free previous connections when releasing iSCSI qed: Fix mapping leak on LL2 rx flow qed: Prevent creation of too-big u32-chains qed: Align CIDs according to DORQ requirement mlxsw: reg: Fix SPVMLR max record count mlxsw: reg: Fix SPVM max record count net: Resend IGMP memberships upon peer notification. dccp: fix memory leak during tear-down of unsuccessful connection request tun: fix premature POLLOUT notification on tun devices dccp/tcp: fix routing redirect race ucc/hdlc: fix two little issue vxlan: fix ovs support net: use net->count to check whether a netns is alive or not bridge: drop netfilter fake rtable unconditionally ipv6: avoid write to a possibly cloned skb net: wimax/i2400m: fix NULL-deref at probe isdn/gigaset: fix NULL-deref at probe ...
2017-03-15Merge tag 'drm-intel-fixes-2017-03-14' of ↵Dave Airlie
git://anongit.freedesktop.org/git/drm-intel into drm-fixes drm/i915 fixes for v4.11-rc3 * tag 'drm-intel-fixes-2017-03-14' of git://anongit.freedesktop.org/git/drm-intel: drm/i915: Fix forcewake active domain tracking drm/i915: Nuke skl_update_plane debug message from the pipe update critical section drm/i915: use correct node for handling cache domain eviction drm/i915: Drain the freed state from the tail of the next commit drm/i915: Nuke debug messages from the pipe update critical section drm/i915: Use pagecache write to prepopulate shmemfs from pwrite-ioctl drm/i915: Store a permanent error in obj->mm.pages drm/i915: Move updating color management to before vblank evasion drm/i915/gen9: Increase PCODE request timeout to 50ms drm/i915: Avoid tweaking evaluation thresholds on Baytrail v3 drm/i915: Remove the vma from the drm_mm if binding fails drm/i915/fbdev: Stop repeating tile configuration on stagnation drm/i915/glk: Fix watermark computations for third sprite plane drm/i915: Squelch any ktime/jiffie rounding errors for wait-ioctl
2017-03-15Merge branch 'for-upstream/malidp-fixes' of git://linux-arm.org/linux-ld ↵Dave Airlie
into drm-fixes * 'for-upstream/malidp-fixes' of git://linux-arm.org/linux-ld: drm: mali-dp: Fix smart layer not going to composition drm: mali-dp: Remove mclk rate management
2017-03-15Merge tag 'omapdrm-4.11-fixes' of ↵Dave Airlie
git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux into drm-fixes omapdrm fixes for v4.11 - Fix types in omapdrm uapi header to avoid userspace compilation errors - Fix dmabuf mmap for dma_alloc'ed buffers * tag 'omapdrm-4.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux: uapi: fix drm/omap_drm.h userspace compilation errors drm/omap: fix dmabuf mmap for dma_alloc'ed buffers
2017-03-15Merge tag 'tilcdc-4.11-fixes' of https://github.com/jsarha/linux into drm-fixesDave Airlie
drm/tilcdc fixes for Linux v4.11 * tag 'tilcdc-4.11-fixes' of https://github.com/jsarha/linux: drm/tilcdc: Set framebuffer DMA address to HW only if CRTC is enabled drm/tilcdc: Fix hardcoded fail-return value in tilcdc_crtc_create()
2017-03-14drm: amd: remove broken include pathArnd Bergmann
The AMD ACP driver adds "-I../acp -I../acp/include" to the gcc command line, which makes no sense, since these are evaluated relative to the build directory. When we build with "make W=1", they instead cause a warning: cc1: error: ../acp/: No such file or directory [-Werror=missing-include-dirs] cc1: error: ../acp/include: No such file or directory [-Werror=missing-include-dirs] cc1: all warnings being treated as errors ../scripts/Makefile.build:289: recipe for target 'drivers/gpu/drm/amd/amdgpu/amdgpu_drv.o' failed ../scripts/Makefile.build:289: recipe for target 'drivers/gpu/drm/amd/amdgpu/amdgpu_device.o' failed ../scripts/Makefile.build:289: recipe for target 'drivers/gpu/drm/amd/amdgpu/amdgpu_kms.o' failed This removes the subdir-ccflags variable that evidently did not serve any purpose here. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-03-14Merge branch 'for-4.11-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "Three cgroup fixes. Nothing critical: - the pids controller could trigger suspicious RCU warning spuriously. Fixed. - in the debug controller, %p -> %pK to protect kernel pointer from getting exposed. - documentation formatting fix" * 'for-4.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroups: censor kernel pointer in debug files cgroup/pids: remove spurious suspicious RCU usage warning cgroup: Fix indenting in PID controller documentation
2017-03-14Merge branch 'for-4.11-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata fixes from Tejun Heo: "Three libata fixes: - fix for a circular reference bug in sysfs code which prevented pata_legacy devices from being released after probe failure, which in turn prevented devres from releasing the associated resources. - drop spurious WARN in the command issue path which can be triggered by a legitimate passthrough command. - an ahci_qoriq specific fix" * 'for-4.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: ahci: qoriq: correct the sata ecc setting error libata: drop WARN from protocol error in ata_sff_qc_issue() libata: transport: Remove circular dependency at free time
2017-03-14Merge branch 'for-4.11-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fix from Tejun Heo: "If a delayed work is queued with NULL @wq, workqueue code explodes after the timer expires at which point it's difficult to tell who the culprit was. This actually happened and the offender was net/smc this time. Add an explicit sanity check for it in the queueing path" * 'for-4.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: trigger WARN if queue_delayed_work() is called with NULL @wq
2017-03-14Merge branch 'for-4.11-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu fixes from Tejun Heo: - the allocation path was updating pcpu_nr_empty_pop_pages without the required locking which can lead to incorrect handling of empty chunks (e.g. keeping too many around), which is buggy but shouldn't lead to critical failures. Fixed by adding the locking - a trivial patch to drop an unused param from pcpu_get_pages() * 'for-4.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu: remove unused chunk_alloc parameter from pcpu_get_pages() percpu: acquire pcpu_lock when updating pcpu_nr_empty_pop_pages
2017-03-14x86/intel_rdt: Put group node in rdtgroup_kn_unlockJiri Olsa
The rdtgroup_kn_unlock waits for the last user to release and put its node. But it's calling kernfs_put on the node which calls the rdtgroup_kn_unlock, which might not be the group's directory node, but another group's file node. This race could be easily reproduced by running 2 instances of following script: mount -t resctrl resctrl /sys/fs/resctrl/ pushd /sys/fs/resctrl/ mkdir krava echo "krava" > krava/schemata rmdir krava popd umount /sys/fs/resctrl It triggers the slub debug error message with following command line config: slub_debug=,kernfs_node_cache. Call kernfs_put on the group's node to fix it. Fixes: 60cf5e101fd4 ("x86/intel_rdt: Add mkdir to resctrl file system") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Shaohua Li <shli@fb.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1489501253-20248-1-git-send-email-jolsa@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-14x86/unwind: Fix last frame check for aligned function stacksJosh Poimboeuf
Pavel Machek reported the following warning on x86-32: WARNING: kernel stack frame pointer at f50cdf98 in swapper/2:0 has bad value (null) The warning is caused by the unwinder not realizing that it reached the end of the stack, due to an unusual prologue which gcc sometimes generates for aligned stacks. The prologue is based on a gcc feature called the Dynamic Realign Argument Pointer (DRAP). It's almost always enabled for aligned stacks when -maccumulate-outgoing-args isn't set. This issue is similar to the one fixed by the following commit: 8023e0e2a48d ("x86/unwind: Adjust last frame check for aligned function stacks") ... but that fix was specific to x86-64. Make the fix more generic to cover x86-32 as well, and also ensure that the return address referred to by the frame pointer is a copy of the original return address. Fixes: acb4608ad186 ("x86/unwind: Create stack frames for saved syscall registers") Reported-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/50d4924db716c264b14f1633037385ec80bf89d2.1489465609.git.jpoimboe@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>