summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-06-29powerpc/64s: Handle data breakpoints in Radix modeNaveen N. Rao
commit d89ba5353f301971dd7d2f9fdf25c4432728f38e upstream. On Power9, trying to use data breakpoints throws the splat shown below. This is because the check for a data breakpoint in DSISR is in do_hash_page(), which is not called when in Radix mode. Unable to handle kernel paging request for data at address 0xc000000000e19218 Faulting instruction address: 0xc0000000001155e8 cpu 0x0: Vector: 300 (Data Access) at [c0000000ef1e7b20] pc: c0000000001155e8: find_pid_ns+0x48/0xe0 lr: c000000000116ac4: find_task_by_vpid+0x44/0x90 sp: c0000000ef1e7da0 msr: 9000000000009033 dar: c000000000e19218 dsisr: 400000 Move the check to handle_page_fault() so as to catch data breakpoints in both Hash and Radix MMU modes. We have to change the check in do_hash_page() against 0xa410 to use 0xa450, so as to include the value of (DSISR_DABRMATCH << 16). There are two sites that call handle_page_fault() when in Radix, both already pass DSISR in r4. Fixes: caca285e5ab4 ("powerpc/mm/radix: Use STD_MMU_64 to properly isolate hash related code") Reported-by: Shriya R. Kulkarni <shriykul@in.ibm.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> [mpe: Fix the fall-through case on hash, we need to reload DSISR] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29powerpc/kprobes: Pause function_graph tracing during jprobes handlingNaveen N. Rao
commit a9f8553e935f26cb5447f67e280946b0923cd2dc upstream. This fixes a crash when function_graph and jprobes are used together. This is essentially commit 237d28db036e ("ftrace/jprobes/x86: Fix conflict between jprobes and function graph tracing"), but for powerpc. Jprobes breaks function_graph tracing since the jprobe hook needs to use jprobe_return(), which never returns back to the hook, but instead to the original jprobe'd function. The solution is to momentarily pause function_graph tracing before invoking the jprobe hook and re-enable it when returning back to the original jprobe'd function. Fixes: 6794c78243bf ("powerpc64: port of the function graph tracer") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29signal: Only reschedule timers on signals timers have sentEric W. Biederman
commit 57db7e4a2d92c2d3dfbca4ef8057849b2682436b upstream. Thomas Gleixner wrote: > The CRIU support added a 'feature' which allows a user space task to send > arbitrary (kernel) signals to itself. The changelog says: > > The kernel prevents sending of siginfo with positive si_code, because > these codes are reserved for kernel. I think we can allow a task to > send such a siginfo to itself. This operation should not be dangerous. > > Quite contrary to that claim, it turns out that it is outright dangerous > for signals with info->si_code == SI_TIMER. The following code sequence in > a user space task allows to crash the kernel: > > id = timer_create(CLOCK_XXX, ..... signo = SIGX); > timer_set(id, ....); > info->si_signo = SIGX; > info->si_code = SI_TIMER: > info->_sifields._timer._tid = id; > info->_sifields._timer._sys_private = 2; > rt_[tg]sigqueueinfo(..., SIGX, info); > sigemptyset(&sigset); > sigaddset(&sigset, SIGX); > rt_sigtimedwait(sigset, info); > > For timers based on CLOCK_PROCESS_CPUTIME_ID, CLOCK_THREAD_CPUTIME_ID this > results in a kernel crash because sigwait() dequeues the signal and the > dequeue code observes: > > info->si_code == SI_TIMER && info->_sifields._timer._sys_private != 0 > > which triggers the following callchain: > > do_schedule_next_timer() -> posix_cpu_timer_schedule() -> arm_timer() > > arm_timer() executes a list_add() on the timer, which is already armed via > the timer_set() syscall. That's a double list add which corrupts the posix > cpu timer list. As a consequence the kernel crashes on the next operation > touching the posix cpu timer list. > > Posix clocks which are internally implemented based on hrtimers are not > affected by this because hrtimer_start() can handle already armed timers > nicely, but it's a reliable way to trigger the WARN_ON() in > hrtimer_forward(), which complains about calling that function on an > already armed timer. This problem has existed since the posix timer code was merged into 2.5.63. A few releases earlier in 2.5.60 ptrace gained the ability to inject not just a signal (which linux has supported since 1.0) but the full siginfo of a signal. The core problem is that the code will reschedule in response to signals getting dequeued not just for signals the timers sent but for other signals that happen to a si_code of SI_TIMER. Avoid this confusion by testing to see if the queued signal was preallocated as all timer signals are preallocated, and so far only the timer code preallocates signals. Move the check for if a timer needs to be rescheduled up into collect_signal where the preallocation check must be performed, and pass the result back to dequeue_signal where the code reschedules timers. This makes it clear why the code cares about preallocated timers. Reported-by: Thomas Gleixner <tglx@linutronix.de> History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Reference: 66dd34ad31e5 ("signal: allow to send any siginfo to itself") Reference: 1669ce53e2ff ("Add PTRACE_GETSIGINFO and PTRACE_SETSIGINFO") Fixes: db8b50ba75f2 ("[PATCH] POSIX clocks & timers") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29random: silence compiler warnings and fix raceJason A. Donenfeld
commit 4a072c71f49b0a0e495ea13423bdb850da73c58c upstream. Odd versions of gcc for the sh4 architecture will actually warn about flags being used while uninitialized, so we set them to zero. Non crazy gccs will optimize that out again, so it doesn't make a difference. Next, over aggressive gccs could inline the expression that defines use_lock, which could then introduce a race resulting in a lock imbalance. By using READ_ONCE, we prevent that fate. Finally, we make that assignment const, so that gcc can still optimize a nice amount. Finally, we fix a potential deadlock between primary_crng.lock and batched_entropy_reset_lock, where they could be called in opposite order. Moving the call to invalidate_batched_entropy to outside the lock rectifies this issue. Fixes: b169c13de473a85b3c859bb36216a4cb5f00a54a Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29HID: Add quirk for Dell PIXART OEM mouseSebastian Parschauer
commit 3db28271f0feae129262d30e41384a7c4c767987 upstream. This mouse is also known under other IDs. It needs the quirk ALWAYS_POLL or will disconnect in runlevel 1 or 3. Signed-off-by: Sebastian Parschauer <sparschauer@suse.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29cxgb4: notify uP to route ctrlq compl to rdma rspqRaju Rangoju
commit dec6b33163d24e2c19ba521c89fffbaab53ae986 upstream. During the module initialisation there is a possible race (basically race between uld and lld) where neither the uld nor lld notifies the uP about where to route the ctrl queue completions. LLD skips notifying uP as the rdma queues were not created by then (will leave it to ULD to notify the uP). As the ULD comes up, it also skips notifying the uP as the flag FULL_INIT_DONE is not set yet (ULD assumes that the interface is not up yet). Consequently, this race between uld and lld leaves uP unnotified about where to send the ctrl queue completions to, leading to iwarp RI_RES WR failure. Here is the race: CPU 0 CPU1 - allocates nic rx queus - t4_sge_alloc_ctrl_txq() (if rdma rsp queues exists, tell uP to route ctrl queue compl to rdma rspq) - acquires the mutex_lock - allocates rdma response queues - if FULL_INIT_DONE set, tell uP to route ctrl queue compl to rdma rspq - relinquishes mutex_lock - acquires the mutex_lock - enable_rx() - set FULL_INIT_DONE - relinquishes mutex_lock This patch fixes the above issue. Fixes: e7519f9926f1('cxgb4: avoid enabling napi twice to the same queue') Signed-off-by: Raju Rangoju <rajur@chelsio.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29CIFS: Fix some return values in case of error in 'crypt_message'Christophe Jaillet
commit 517a6e43c4872c89794af5b377fa085e47345952 upstream. 'rc' is known to be 0 at this point. So if 'init_sg' or 'kzalloc' fails, we should return -ENOMEM instead. Also remove a useless 'rc' in a debug message as it is meaningless here. Fixes: 026e93dc0a3ee ("CIFS: Encrypt SMB3 requests before sending") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29CIFS: Improve readdir verbosityPavel Shilovsky
commit dcd87838c06f05ab7650b249ebf0d5b57ae63e1e upstream. Downgrade the loglevel for SMB2 to prevent filling the log with messages if e.g. readdir was interrupted. Also make SMB2 and SMB1 codepaths do the same logging during readdir. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29KVM: PPC: Book3S HV: Save/restore host values of debug registersPaul Mackerras
commit 7ceaa6dcd8c6f59588428cec37f3c8093dd1011f upstream. At present, HV KVM on POWER8 and POWER9 machines loses any instruction or data breakpoint set in the host whenever a guest is run. Instruction breakpoints are currently only used by xmon, but ptrace and the perf_event subsystem can set data breakpoints as well as xmon. To fix this, we save the host values of the debug registers (CIABR, DAWR and DAWRX) before entering the guest and restore them on exit. To provide space to save them in the stack frame, we expand the stack frame allocated by kvmppc_hv_entry() from 112 to 144 bytes. Fixes: b005255e12a3 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08) Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29KVM: PPC: Book3S HV: Restore critical SPRs to host values on guest exitPaul Mackerras
commit 4c3bb4ccd074e1a0552078c0bf94c662367a1658 upstream. This restores several special-purpose registers (SPRs) to sane values on guest exit that were missed before. TAR and VRSAVE are readable and writable by userspace, and we need to save and restore them to prevent the guest from potentially affecting userspace execution (not that TAR or VRSAVE are used by any known program that run uses the KVM_RUN ioctl). We save/restore these in kvmppc_vcpu_run_hv() rather than on every guest entry/exit. FSCR affects userspace execution in that it can prohibit access to certain facilities by userspace. We restore it to the normal value for the task on exit from the KVM_RUN ioctl. IAMR is normally 0, and is restored to 0 on guest exit. However, with a radix host on POWER9, it is set to a value that prevents the kernel from executing user-accessible memory. On POWER9, we save IAMR on guest entry and restore it on guest exit to the saved value rather than 0. On POWER8 we continue to set it to 0 on guest exit. PSPB is normally 0. We restore it to 0 on guest exit to prevent userspace taking advantage of the guest having set it non-zero (which would allow userspace to set its SMT priority to high). UAMOR is normally 0. We restore it to 0 on guest exit to prevent the AMR from being used as a covert channel between userspace processes, since the AMR is not context-switched at present. Fixes: b005255e12a3 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08) Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29KVM: PPC: Book3S HV: Context-switch EBB registers properlyPaul Mackerras
commit ca8efa1df1d15a1795a2da57f9f6aada6ed6b946 upstream. This adds code to save the values of three SPRs (special-purpose registers) used by userspace to control event-based branches (EBBs), which are essentially interrupts that get delivered directly to userspace. These registers are loaded up with guest values when entering the guest, and their values are saved when exiting the guest, but we were not saving the host values and restoring them before going back to userspace. On POWER8 this would only affect userspace programs which explicitly request the use of EBBs and also use the KVM_RUN ioctl, since the only source of EBBs on POWER8 is the PMU, and there is an explicit enable bit in the PMU registers (and those PMU registers do get properly context-switched between host and guest). On POWER9 there is provision for externally-generated EBBs, and these are not subject to the control in the PMU registers. Since these registers only affect userspace, we can save them when we first come in from userspace and restore them before returning to userspace, rather than saving/restoring the host values on every guest entry/exit. Similarly, we don't need to worry about their values on offline secondary threads since they execute in the context of the idle task, which never executes in userspace. Fixes: b005255e12a3 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08) Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29KVM: PPC: Book3S HV: Ignore timebase offset on POWER9 DD1Paul Mackerras
commit 3d3efb68c19e539f0535c93a5258c1299270215f upstream. POWER9 DD1 has an erratum where writing to the TBU40 register, which is used to apply an offset to the timebase, can cause the timebase to lose counts. This results in the timebase on some CPUs getting out of sync with other CPUs, which then results in misbehaviour of the timekeeping code. To work around the problem, we make KVM ignore the timebase offset for all guests on POWER9 DD1 machines. This means that live migration cannot be supported on POWER9 DD1 machines. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29KVM: PPC: Book3S HV: Preserve userspace HTM state properlyPaul Mackerras
commit 46a704f8409f79fd66567ad3f8a7304830a84293 upstream. If userspace attempts to call the KVM_RUN ioctl when it has hardware transactional memory (HTM) enabled, the values that it has put in the HTM-related SPRs TFHAR, TFIAR and TEXASR will get overwritten by guest values. To fix this, we detect this condition and save those SPR values in the thread struct, and disable HTM for the task. If userspace goes to access those SPRs or the HTM facility in future, a TM-unavailable interrupt will occur and the handler will reload those SPRs and re-enable HTM. If userspace has started a transaction and suspended it, we would currently lose the transactional state in the guest entry path and would almost certainly get a "TM Bad Thing" interrupt, which would cause the host to crash. To avoid this, we detect this case and return from the KVM_RUN ioctl with an EINVAL error, with the KVM exit reason set to KVM_EXIT_FAIL_ENTRY. Fixes: b005255e12a3 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08) Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29KVM: PPC: Book3S HV: Cope with host using large decrementer modePaul Mackerras
commit 2f2724630f7a8d582470f03ee56b96746767d270 upstream. POWER9 introduces a new mode for the decrementer register, called large decrementer mode, in which the decrementer counter is 56 bits wide rather than 32, and reads are sign-extended rather than zero-extended. For the decrementer, this new mode is optional and controlled by a bit in the LPCR. The hypervisor decrementer (HDEC) is 56 bits wide on POWER9 and has no mode control. Since KVM code reads and writes the decrementer and hypervisor decrementer registers in a few places, it needs to be aware of the need to treat the decrementer value as a 64-bit quantity, and only do a 32-bit sign extension when large decrementer mode is not in effect. Similarly, the HDEC should always be treated as a 64-bit quantity on POWER9. We define a new EXTEND_HDEC macro to encapsulate the feature test for POWER9 and the sign extension. To enable the sign extension to be removed in large decrementer mode, we test the LPCR_LD bit in the host LPCR image stored in the struct kvm for the guest. If is set then large decrementer mode is enabled and the sign extension should be skipped. This is partly based on an earlier patch by Oliver O'Halloran. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29KVM: s390: gaccess: fix real-space designation asce handling for gmap shadowsHeiko Carstens
commit addb63c18a0d52a9ce2611d039f981f7b6148d2b upstream. For real-space designation asces the asce origin part is only a token. The asce token origin must not be used to generate an effective address for storage references. This however is erroneously done within kvm_s390_shadow_tables(). Furthermore within the same function the wrong parts of virtual addresses are used to generate a corresponding real address (e.g. the region second index is used as region first index). Both of the above can result in incorrect address translations. Only for real space designations with a token origin of zero and addresses below one megabyte the translation was correct. Furthermore replace a "!asce.r" statement with a "!*fake" statement to make it more obvious that a specific condition has nothing to do with the architecture, but with the fake handling of real space designations. Fixes: 3218f7094b6b ("s390/mm: support real-space for gmap shadows") Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29KVM: MIPS: Fix maybe-uninitialized build failureJames Cowgill
commit e27a9eca5d4a392b96ce5d5238c8d637bcb0a52c upstream. This commit fixes a "maybe-uninitialized" build failure in arch/mips/kvm/tlb.c when KVM, DYNAMIC_DEBUG and JUMP_LABEL are all enabled. The failure is: In file included from ./include/linux/printk.h:329:0, from ./include/linux/kernel.h:13, from ./include/asm-generic/bug.h:15, from ./arch/mips/include/asm/bug.h:41, from ./include/linux/bug.h:4, from ./include/linux/thread_info.h:11, from ./include/asm-generic/current.h:4, from ./arch/mips/include/generated/asm/current.h:1, from ./include/linux/sched.h:11, from arch/mips/kvm/tlb.c:13: arch/mips/kvm/tlb.c: In function ‘kvm_mips_host_tlb_inv’: ./include/linux/dynamic_debug.h:126:3: error: ‘idx_kernel’ may be used uninitialized in this function [-Werror=maybe-uninitialized] __dynamic_pr_debug(&descriptor, pr_fmt(fmt), \ ^~~~~~~~~~~~~~~~~~ arch/mips/kvm/tlb.c:169:16: note: ‘idx_kernel’ was declared here int idx_user, idx_kernel; ^~~~~~~~~~ There is a similar error relating to "idx_user". Both errors were observed with GCC 6. As far as I can tell, it is impossible for either idx_user or idx_kernel to be uninitialized when they are later read in the calls to kvm_debug, but to satisfy the compiler, add zero initializers to both variables. Signed-off-by: James Cowgill <James.Cowgill@imgtec.com> Fixes: 57e3869cfaae ("KVM: MIPS/TLB: Generalise host TLB invalidate to kernel ASID") Acked-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29KVM: x86: fix singlestepping over syscallPaolo Bonzini
commit c8401dda2f0a00cd25c0af6a95ed50e478d25de4 upstream. TF is handled a bit differently for syscall and sysret, compared to the other instructions: TF is checked after the instruction completes, so that the OS can disable #DB at a syscall by adding TF to FMASK. When the sysret is executed the #DB is taken "as if" the syscall insn just completed. KVM emulates syscall so that it can trap 32-bit syscall on Intel processors. Fix the behavior, otherwise you could get #DB on a user stack which is not nice. This does not affect Linux guests, as they use an IST or task gate for #DB. This fixes CVE-2017-7518. Reported-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29perf probe: Fix probe definition for inlined functionsBjörn Töpel
commit 7598f8bc1383ffd77686cb4e92e749bef3c75937 upstream. In commit 613f050d68a8 ("perf probe: Fix to probe on gcc generated functions in modules"), the offset from symbol is, incorrectly, added to the trace point address. This leads to incorrect probe trace points for inlined functions and when using relative line number on symbols. Prior this patch: $ perf probe -m nf_nat -D in_range p:probe/in_range nf_nat:in_range.isra.9+0 $ perf probe -m i40e -D i40e_clean_rx_irq p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+2212 $ perf probe -m i40e -D i40e_clean_rx_irq:16 p:probe/i40e_clean_rx_irq i40e:i40e_lan_xmit_frame+626 After: $ perf probe -m nf_nat -D in_range p:probe/in_range nf_nat:in_range.isra.9+0 $ perf probe -m i40e -D i40e_clean_rx_irq p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+1106 $ perf probe -m i40e -D i40e_clean_rx_irq:16 p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+2665 Committer testing: Using 'pfunct', a tool found in the 'dwarves' package [1], one can ask what are the functions that while not being explicitely marked as inline, were inlined by the compiler: # pfunct --cc_inlined /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko | head __ew32 e1000_regdump e1000e_dump_ps_pages e1000_desc_unused e1000e_systim_to_hwtstamp e1000e_rx_hwtstamp e1000e_update_rdt_wa e1000e_update_tdt_wa e1000_put_txbuf e1000_consume_page Then ask 'perf probe' to produce the kprobe_tracer probe definitions for two of them: # perf probe -m e1000e -D e1000e_rx_hwtstamp p:probe/e1000e_rx_hwtstamp e1000e:e1000_receive_skb+74 # perf probe -m e1000e -D e1000_consume_page p:probe/e1000_consume_page e1000e:e1000_clean_jumbo_rx_irq+876 p:probe/e1000_consume_page_1 e1000e:e1000_clean_jumbo_rx_irq+1506 p:probe/e1000_consume_page_2 e1000e:e1000_clean_rx_irq_ps+1074 Now lets concentrate on the 'e1000_consume_page' one, that was inlined twice in e1000_clean_jumbo_rx_irq(), lets see what readelf says about the DWARF tags for that function: $ readelf -wi /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko <SNIP> <1><13e27b>: Abbrev Number: 121 (DW_TAG_subprogram) <13e27c> DW_AT_name : (indirect string, offset: 0xa8945): e1000_clean_jumbo_rx_irq <13e287> DW_AT_low_pc : 0x17a30 <3><13e6ef>: Abbrev Number: 119 (DW_TAG_inlined_subroutine) <13e6f0> DW_AT_abstract_origin: <0x13ed2c> <13e6f4> DW_AT_low_pc : 0x17be6 <SNIP> <1><13ed2c>: Abbrev Number: 142 (DW_TAG_subprogram) <13ed2e> DW_AT_name : (indirect string, offset: 0xa54c3): e1000_consume_page So, the first time in e1000_clean_jumbo_rx_irq() where e1000_consume_page() is inlined is at PC 0x17be6, which subtracted from e1000_clean_jumbo_rx_irq()'s address, gives us the offset we should use in the probe definition: 0x17be6 - 0x17a30 = 438 but above we have 876, which is twice as much. Lets see the second inline expansion of e1000_consume_page() in e1000_clean_jumbo_rx_irq(): <3><13e86e>: Abbrev Number: 119 (DW_TAG_inlined_subroutine) <13e86f> DW_AT_abstract_origin: <0x13ed2c> <13e873> DW_AT_low_pc : 0x17d21 0x17d21 - 0x17a30 = 753 So we where adding it at twice the offset from the containing function as we should. And then after this patch: # perf probe -m e1000e -D e1000e_rx_hwtstamp p:probe/e1000e_rx_hwtstamp e1000e:e1000_receive_skb+37 # perf probe -m e1000e -D e1000_consume_page p:probe/e1000_consume_page e1000e:e1000_clean_jumbo_rx_irq+438 p:probe/e1000_consume_page_1 e1000e:e1000_clean_jumbo_rx_irq+753 p:probe/e1000_consume_page_2 e1000e:e1000_clean_jumbo_rx_irq+1353 # Which matches the two first expansions and shows that because we were doubling the offset it would spill over the next function: readelf -sw /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko 673: 0000000000017a30 1626 FUNC LOCAL DEFAULT 2 e1000_clean_jumbo_rx_irq 674: 0000000000018090 2013 FUNC LOCAL DEFAULT 2 e1000_clean_rx_irq_ps This is the 3rd inline expansion of e1000_consume_page() in e1000_clean_jumbo_rx_irq(): <3><13ec77>: Abbrev Number: 119 (DW_TAG_inlined_subroutine) <13ec78> DW_AT_abstract_origin: <0x13ed2c> <13ec7c> DW_AT_low_pc : 0x17f79 0x17f79 - 0x17a30 = 1353 So: 0x17a30 + 2 * 1353 = 0x184c2 And: 0x184c2 - 0x18090 = 1074 Which explains the bogus third expansion for e1000_consume_page() to end up at: p:probe/e1000_consume_page_2 e1000e:e1000_clean_rx_irq_ps+1074 All fixed now :-) [1] https://git.kernel.org/pub/scm/devel/pahole/pahole.git/ Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Fixes: 613f050d68a8 ("perf probe: Fix to probe on gcc generated functions in modules") Link: http://lkml.kernel.org/r/20170621164134.5701-1-bjorn.topel@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29perf/x86/intel: Add 1G DTLB load/store miss support for SKLKan Liang
commit fb3a5055cd7098f8d1dd0cd38d7172211113255f upstream. Current DTLB load/store miss events (0x608/0x649) only counts 4K,2M and 4M page size. Need to extend the events to support any page size (4K/2M/4M/1G). The complete DTLB load/store miss events are: DTLB_LOAD_MISSES.WALK_COMPLETED 0xe08 DTLB_STORE_MISSES.WALK_COMPLETED 0xe49 Signed-off-by: Kan Liang <Kan.liang@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: eranian@google.com Link: http://lkml.kernel.org/r/20170619142609.11058-1-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29lib/cmdline.c: fix get_options() overflow while parsing rangesIlya Matveychikov
commit a91e0f680bcd9e10c253ae8b62462a38bd48f09f upstream. When using get_options() it's possible to specify a range of numbers, like 1-100500. The problem is that it doesn't track array size while calling internally to get_range() which iterates over the range and fills the memory with numbers. Link: http://lkml.kernel.org/r/2613C75C-B04D-4BFF-82A6-12F97BA0F620@gmail.com Signed-off-by: Ilya V. Matveychikov <matvejchikov@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29fs/dax.c: fix inefficiency in dax_writeback_mapping_range()Jan Kara
commit 1eb643d02b21412e603b42cdd96010a2ac31c05f upstream. dax_writeback_mapping_range() fails to update iteration index when searching radix tree for entries needing cache flushing. Thus each pagevec worth of entries is searched starting from the start which is inefficient and prone to livelocks. Update index properly. Link: http://lkml.kernel.org/r/20170619124531.21491-1-jack@suse.cz Fixes: 9973c98ecfda3 ("dax: add support for fsync/sync") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAILNeilBrown
commit 9fa4eb8e490a28de40964b1b0e583d8db4c7e57c upstream. If a positive status is passed with the AUTOFS_DEV_IOCTL_FAIL ioctl, autofs4_d_automount() will return ERR_PTR(status) with that status to follow_automount(), which will then dereference an invalid pointer. So treat a positive status the same as zero, and map to ENOENT. See comment in systemd src/core/automount.c::automount_send_ready(). Link: http://lkml.kernel.org/r/871sqwczx5.fsf@notabene.neil.brown.name Signed-off-by: NeilBrown <neilb@suse.com> Cc: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29powerpc/perf: Fix oops when kthread execs user processRavi Bangoria
commit bf05fc25f268cd62f147f368fe65ad3e5b04fe9f upstream. When a kthread calls call_usermodehelper() the steps are: 1. allocate current->mm 2. load_elf_binary() 3. populate current->thread.regs While doing this, interrupts are not disabled. If there is a perf interrupt in the middle of this process (i.e. step 1 has completed but not yet reached to step 3) and if perf tries to read userspace regs, kernel oops with following log: Unable to handle kernel paging request for data at address 0x00000000 Faulting instruction address: 0xc0000000000da0fc ... Call Trace: perf_output_sample_regs+0x6c/0xd0 perf_output_sample+0x4e4/0x830 perf_event_output_forward+0x64/0x90 __perf_event_overflow+0x8c/0x1e0 record_and_restart+0x220/0x5c0 perf_event_interrupt+0x2d8/0x4d0 performance_monitor_exception+0x54/0x70 performance_monitor_common+0x158/0x160 --- interrupt: f01 at avtab_search_node+0x150/0x1a0 LR = avtab_search_node+0x100/0x1a0 ... load_elf_binary+0x6e8/0x15a0 search_binary_handler+0xe8/0x290 do_execveat_common.isra.14+0x5f4/0x840 call_usermodehelper_exec_async+0x170/0x210 ret_from_kernel_thread+0x5c/0x7c Fix it by setting abi to PERF_SAMPLE_REGS_ABI_NONE when userspace pt_regs are not set. Fixes: ed4a4ef85cf5 ("powerpc/perf: Add support for sampling interrupt register state") Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29fs/exec.c: account for argv/envp pointersKees Cook
commit 98da7d08850fb8bdeb395d6368ed15753304aa0c upstream. When limiting the argv/envp strings during exec to 1/4 of the stack limit, the storage of the pointers to the strings was not included. This means that an exec with huge numbers of tiny strings could eat 1/4 of the stack limit in strings and then additional space would be later used by the pointers to the strings. For example, on 32-bit with a 8MB stack rlimit, an exec with 1677721 single-byte strings would consume less than 2MB of stack, the max (8MB / 4) amount allowed, but the pointers to the strings would consume the remaining additional stack space (1677721 * 4 == 6710884). The result (1677721 + 6710884 == 8388605) would exhaust stack space entirely. Controlling this stack exhaustion could result in pathological behavior in setuid binaries (CVE-2017-1000365). [akpm@linux-foundation.org: additional commenting from Kees] Fixes: b6a2fea39318 ("mm: variable length argument support") Link: http://lkml.kernel.org/r/20170622001720.GA32173@beast Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Qualys Security Advisory <qsa@qualys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29ALSA: hda - Apply quirks to Broxton-T, tooTakashi Iwai
commit c7ecb9068e6772c43941ce609f08bc53f36e1dce upstream. Broxton-T was a forgotten child and we didn't apply the quirks for Skylake+ properly. Meanwhile, a quirk for reducing the DMA latency seems specific to the early Broxton model, so we leave as is. Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29ALSA: hda - Add Coffelake PCI IDMegha Dey
commit e79b0006c45c9b0b22f3ea54ff6e256b34c1f208 upstream. Coffelake is another Intel part, so need to add PCI ID for it. Signed-off-by: Megha Dey <megha.dey@intel.com> Signed-off-by: Subhransu S. Prusty <subhransu.s.prusty@intel.com> Acked-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29ALSA: pcm: Don't treat NULL chmap as a fatal errorTakashi Iwai
commit 2deaeaf102d692cb6f764123b1df7aa118a8e97c upstream. The standard PCM chmap helper callbacks treat the NULL info->chmap as a fatal error and spews the kernel warning with stack trace when CONFIG_SND_DEBUG is on. This was OK, originally it was supposed to be always static and non-NULL. But, as the recent addition of Intel LPE audio driver shows, the chmap content may vary dynamically, and it can be even NULL when disconnected. The user still sees the kernel warning unnecessarily. For clearing such a confusion, this patch simply removes the snd_BUG_ON() in each place, just returns an error without warning. Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29ALSA: firewire-lib: Fix stall of process context at packet errorTakashi Sakamoto
commit 4a9bfafc64f44ef83de4e00ca1b57352af6cd8c2 upstream. At Linux v3.5, packet processing can be done in process context of ALSA PCM application as well as software IRQ context for OHCI 1394. Below is an example of the callgraph (some calls are omitted). ioctl(2) with e.g. HWSYNC (sound/core/pcm_native.c) ->snd_pcm_common_ioctl1() ->snd_pcm_hwsync() ->snd_pcm_stream_lock_irq (sound/core/pcm_lib.c) ->snd_pcm_update_hw_ptr() ->snd_pcm_udpate_hw_ptr0() ->struct snd_pcm_ops.pointer() (sound/firewire/*) = Each handler on drivers in ALSA firewire stack (sound/firewire/amdtp-stream.c) ->amdtp_stream_pcm_pointer() (drivers/firewire/core-iso.c) ->fw_iso_context_flush_completions() ->struct fw_card_driver.flush_iso_completion() (drivers/firewire/ohci.c) = flush_iso_completions() ->struct fw_iso_context.callback.sc (sound/firewire/amdtp-stream.c) = in_stream_callback() or out_stream_callback() ->... ->snd_pcm_stream_unlock_irq When packet queueing error occurs or detecting invalid packets in 'in_stream_callback()' or 'out_stream_callback()', 'snd_pcm_stop_xrun()' is called on local CPU with disabled IRQ. (sound/firewire/amdtp-stream.c) in_stream_callback() or out_stream_callback() ->amdtp_stream_pcm_abort() ->snd_pcm_stop_xrun() ->snd_pcm_stream_lock_irqsave() ->snd_pcm_stop() ->snd_pcm_stream_unlock_irqrestore() The process is stalled on the CPU due to attempt to acquire recursive lock. [ 562.630853] INFO: rcu_sched detected stalls on CPUs/tasks: [ 562.630861] 2-...: (1 GPs behind) idle=37d/140000000000000/0 softirq=38323/38323 fqs=7140 [ 562.630862] (detected by 3, t=15002 jiffies, g=21036, c=21035, q=5933) [ 562.630866] Task dump for CPU 2: [ 562.630867] alsa-source-OXF R running task 0 6619 1 0x00000008 [ 562.630870] Call Trace: [ 562.630876] ? vt_console_print+0x79/0x3e0 [ 562.630880] ? msg_print_text+0x9d/0x100 [ 562.630883] ? up+0x32/0x50 [ 562.630885] ? irq_work_queue+0x8d/0xa0 [ 562.630886] ? console_unlock+0x2b6/0x4b0 [ 562.630888] ? vprintk_emit+0x312/0x4a0 [ 562.630892] ? dev_vprintk_emit+0xbf/0x230 [ 562.630895] ? do_sys_poll+0x37a/0x550 [ 562.630897] ? dev_printk_emit+0x4e/0x70 [ 562.630900] ? __dev_printk+0x3c/0x80 [ 562.630903] ? _raw_spin_lock+0x20/0x30 [ 562.630909] ? snd_pcm_stream_lock+0x31/0x50 [snd_pcm] [ 562.630914] ? _snd_pcm_stream_lock_irqsave+0x2e/0x40 [snd_pcm] [ 562.630918] ? snd_pcm_stop_xrun+0x16/0x70 [snd_pcm] [ 562.630922] ? in_stream_callback+0x3e6/0x450 [snd_firewire_lib] [ 562.630925] ? handle_ir_packet_per_buffer+0x8e/0x1a0 [firewire_ohci] [ 562.630928] ? ohci_flush_iso_completions+0xa3/0x130 [firewire_ohci] [ 562.630932] ? fw_iso_context_flush_completions+0x15/0x20 [firewire_core] [ 562.630935] ? amdtp_stream_pcm_pointer+0x2d/0x40 [snd_firewire_lib] [ 562.630938] ? pcm_capture_pointer+0x19/0x20 [snd_oxfw] [ 562.630943] ? snd_pcm_update_hw_ptr0+0x47/0x3d0 [snd_pcm] [ 562.630945] ? poll_select_copy_remaining+0x150/0x150 [ 562.630947] ? poll_select_copy_remaining+0x150/0x150 [ 562.630952] ? snd_pcm_update_hw_ptr+0x10/0x20 [snd_pcm] [ 562.630956] ? snd_pcm_hwsync+0x45/0xb0 [snd_pcm] [ 562.630960] ? snd_pcm_common_ioctl1+0x1ff/0xc90 [snd_pcm] [ 562.630962] ? futex_wake+0x90/0x170 [ 562.630966] ? snd_pcm_capture_ioctl1+0x136/0x260 [snd_pcm] [ 562.630970] ? snd_pcm_capture_ioctl+0x27/0x40 [snd_pcm] [ 562.630972] ? do_vfs_ioctl+0xa3/0x610 [ 562.630974] ? vfs_read+0x11b/0x130 [ 562.630976] ? SyS_ioctl+0x79/0x90 [ 562.630978] ? entry_SYSCALL_64_fastpath+0x1e/0xad This commit fixes the above bug. This assumes two cases: 1. Any error is detected in software IRQ context of OHCI 1394 context. In this case, PCM substream should be aborted in packet handler. On the other hand, it should not be done in any process context. TO distinguish these two context, use 'in_interrupt()' macro. 2. Any error is detect in process context of ALSA PCM application. In this case, PCM substream should not be aborted in packet handler because PCM substream lock is acquired. The task to abort PCM substream should be done in ALSA PCM core. For this purpose, SNDRV_PCM_POS_XRUN is returned at 'struct snd_pcm_ops.pointer()'. Suggested-by: Clemens Ladisch <clemens@ladisch.de> Fixes: e9148dddc3c7("ALSA: firewire-lib: flush completed packets when reading PCM position") Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29xen-blkback: don't leak stack data via response ringJan Beulich
commit 089bc0143f489bd3a4578bdff5f4ca68fb26f341 upstream. Rather than constructing a local structure instance on the stack, fill the fields directly on the shared ring, just like other backends do. Build on the fact that all response structure flavors are actually identical (the old code did make this assumption too). This is XSA-216. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29xen/blkback: fix disconnect while I/Os in flightJuergen Gross
commit 46464411307746e6297a034a9983a22c9dfc5a0c upstream. Today disconnecting xen-blkback is broken in case there are still I/Os in flight: xen_blkif_disconnect() will bail out early without releasing all resources in the hope it will be called again when the last request has terminated. This, however, won't happen as xen_blkif_free() won't be called on termination of the last running request: xen_blkif_put() won't decrement the blkif refcnt to 0 as xen_blkif_disconnect() didn't finish before thus some xen_blkif_put() calls in xen_blkif_disconnect() didn't happen. To solve this deadlock xen_blkif_disconnect() and xen_blkif_alloc_rings() shouldn't use xen_blkif_put() and xen_blkif_get() but use some other way to do their accounting of resources. This at once fixes another error in xen_blkif_disconnect(): when it returned early with -EBUSY for another ring than 0 it would call xen_blkif_put() again for already handled rings on a subsequent call. This will lead to inconsistencies in the refcnt handling. Signed-off-by: Juergen Gross <jgross@suse.com> Tested-by: Steven Haigh <netwiz@crc.id.au> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29clk: sunxi-ng: sun5i: Fix ahb_bist_clk definitionBoris Brezillon
commit 370d9192719e6c174167888cf9240df2542e3b4b upstream. AHB BIST gate is actually controlled with bit 7. This bug was detected while trying to use the NAND controller which is using the DMA engine to transfer data to the NAND. Since the ahb_bist_clk gate bit conflicts with the ahb_dma_clk gate bit, the core was disabling the DMA engine clock as part of its 'disable unused clks' procedure, which was causing all DMA transfers to fail after this point. Fixes: 5e73761786d6 ("clk: sunxi-ng: Add sun5i CCU driver") Reported-by: Angus Ainslie <angus@akkea.ca> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Tested-by: Angus Ainslie <angus@akkea.ca> Reviewed-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Michael Turquette <mturquette@baylibre.com> Link: lkml.kernel.org/r/1495643669-28221-1-git-send-email-boris.brezillon@free-electrons.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29clk: sunxi-ng: v3s: Fix usb otg device reset bitYong Deng
commit 7ffc781ec46ef1e9aedb482f5f04425bd8bb2753 upstream. V3S's usb otg device reset bit should be 24, not 23. Signed-off-by: Yong Deng <iemdey@gmail.com> Reviewed-By: Icenowy Zheng <icenowy@aosc.io> Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29clk: sunxi-ng: a31: Correct lcd1-ch1 clock register offsetChen-Yu Tsai
commit 38b8f823864707eb1cf331d2247608c419ed388c upstream. The register offset for the lcd1-ch1 clock was incorrectly pointing to the lcd0-ch1 clock. This resulted in the lcd0-ch1 clock being disabled when the clk core disables unused clocks. This then stops the simplefb HDMI output path. Reported-by: Bob Ham <rah@settrans.net> Fixes: c6e6c96d8fa6 ("clk: sunxi-ng: Add A31/A31s clocks") Signed-off-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-24Linux 4.11.7v4.11.7Greg Kroah-Hartman
2017-06-24mm: fix new crash in unmapped_area_topdown()Hugh Dickins
commit f4cb767d76cf7ee72f97dd76f6cfa6c76a5edc89 upstream. Trinity gets kernel BUG at mm/mmap.c:1963! in about 3 minutes of mmap testing. That's the VM_BUG_ON(gap_end < gap_start) at the end of unmapped_area_topdown(). Linus points out how MAP_FIXED (which does not have to respect our stack guard gap intentions) could result in gap_end below gap_start there. Fix that, and the similar case in its alternative, unmapped_area(). Fixes: 1be7107fbe18 ("mm: larger stack guard gap, between vmas") Reported-by: Dave Jones <davej@codemonkey.org.uk> Debugged-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-24Allow stack to grow up to address space limitHelge Deller
commit bd726c90b6b8ce87602208701b208a208e6d5600 upstream. Fix expand_upwards() on architectures with an upward-growing stack (parisc, metag and partly IA-64) to allow the stack to reliably grow exactly up to the address space limit given by TASK_SIZE. Signed-off-by: Helge Deller <deller@gmx.de> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-24mm: larger stack guard gap, between vmasHugh Dickins
commit 1be7107fbe18eed3e319a6c3e83c78254b693acb upstream. Stack guard page is a useful feature to reduce a risk of stack smashing into a different mapping. We have been using a single page gap which is sufficient to prevent having stack adjacent to a different mapping. But this seems to be insufficient in the light of the stack usage in userspace. E.g. glibc uses as large as 64kB alloca() in many commonly used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX] which is 256kB or stack strings with MAX_ARG_STRLEN. This will become especially dangerous for suid binaries and the default no limit for the stack size limit because those applications can be tricked to consume a large portion of the stack and a single glibc call could jump over the guard page. These attacks are not theoretical, unfortunatelly. Make those attacks less probable by increasing the stack guard gap to 1MB (on systems with 4k pages; but make it depend on the page size because systems with larger base pages might cap stack allocations in the PAGE_SIZE units) which should cover larger alloca() and VLA stack allocations. It is obviously not a full fix because the problem is somehow inherent, but it should reduce attack space a lot. One could argue that the gap size should be configurable from userspace, but that can be done later when somebody finds that the new 1MB is wrong for some special case applications. For now, add a kernel command line option (stack_guard_gap) to specify the stack gap size (in page units). Implementation wise, first delete all the old code for stack guard page: because although we could get away with accounting one extra page in a stack vma, accounting a larger gap can break userspace - case in point, a program run with "ulimit -S -v 20000" failed when the 1MB gap was counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK and strict non-overcommit mode. Instead of keeping gap inside the stack vma, maintain the stack guard gap as a gap between vmas: using vm_start_gap() in place of vm_start (or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few places which need to respect the gap - mainly arch_get_unmapped_area(), and and the vma tree's subtree_gap support for that. Original-patch-by: Oleg Nesterov <oleg@redhat.com> Original-patch-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Tested-by: Helge Deller <deller@gmx.de> # parisc Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [wt: backport to 4.11: adjust context] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-24ARM: dts: am335x-sl50: Fix cannot claim requested pins for spi0Enric Balletbo i Serra
commit db145db99f5bf30acc12d7450b9ad0707072a7be upstream. We don't need to bitbang these pins anymore, instead we muxed these pins as SPI, after this change, done in commit 6c69f726, we introduced the following error: pinctrl-single 44e10800.pinmux: pin PIN85 already requested \ by 44e10800.pinmux; cannot claim for 48030000.spi pinctrl-single 44e10800.pinmux: pin-85 (48030000.spi) status -22 Fixes: 6c69f726 ("ARM: dts: am335x-sl50: Enable SPI0 interface and Flash Memory") Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-24ARM: dts: am335x-sl50: Fix card detect pin for mmc1Enric Balletbo i Serra
commit 56b74ed9c1e8050408b9beeee25888a49a458997 upstream. The second version of the hardware moved the card detect pin from gpio0_6 to gpio1_9, as we won't support the first hardware version fix the pinmux configuration of this pin. Fixes: 8584d4fc ("ARM: dts: am335x-sl50: Add Toby-Churchill SL50 board support.") Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-24crypto: Work around deallocated stack frame reference gcc bug on sparc.David Miller
commit d41519a69b35b10af7fda867fb9100df24fdf403 upstream. On sparc, if we have an alloca() like situation, as is the case with SHASH_DESC_ON_STACK(), we can end up referencing deallocated stack memory. The result can be that the value is clobbered if a trap or interrupt arrives at just the right instruction. It only occurs if the function ends returning a value from that alloca() area and that value can be placed into the return value register using a single instruction. For example, in lib/libcrc32c.c:crc32c() we end up with a return sequence like: return %i7+8 lduw [%o5+16], %o0 ! MEM[(u32 *)__shash_desc.1_10 + 16B], %o5 holds the base of the on-stack area allocated for the shash descriptor. But the return released the stack frame and the register window. So if an intererupt arrives between 'return' and 'lduw', then the value read at %o5+16 can be corrupted. Add a data compiler barrier to work around this problem. This is exactly what the gcc fix will end up doing as well, and it absolutely should not change the code generated for other cpus (unless gcc on them has the same bug :-) With crucial insight from Eric Sandeen. Reported-by: Anatoly Pugachev <matorola@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-24MIPS: .its targets depend on vmlinuxPaul Burton
commit bcd7c45e0d5a82be9a64b90050f0e09d41a50758 upstream. The .its targets require information about the kernel binary, such as its entry point, which is extracted from the vmlinux ELF. We therefore require that the ELF is built before the .its files are generated. Declare this requirement in the Makefile such that make will ensure this is always the case, otherwise in corner cases we can hit issues as the .its is generated with an incorrect (either invalid or stale) entry point. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Fixes: cf2a5e0bb4c6 ("MIPS: Support generating Flattened Image Trees (.itb)") Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/16179/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-24MIPS: Fix bnezc/jialc return address calculationPaul Burton
commit 1a73d9310e093fc3adffba4d0a67b9fab2ee3f63 upstream. The code handling the pop76 opcode (ie. bnezc & jialc instructions) in __compute_return_epc_for_insn() needs to set the value of $31 in the jialc case, which is encoded with rs = 0. However its check to differentiate bnezc (rs != 0) from jialc (rs = 0) was unfortunately backwards, meaning that if we emulate a bnezc instruction we clobber $31 & if we emulate a jialc instruction it actually behaves like a jic instruction. Fix this by inverting the check of rs to match the way the instructions are actually encoded. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Fixes: 28d6f93d201d ("MIPS: Emulate the new MIPS R6 BNEZC and JIALC instructions") Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/16178/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-24virtio_balloon: disable VIOMMU supportMichael S. Tsirkin
commit e41b1355508debe45fda33ef8c03ff3ba5d206b9 upstream. virtio balloon bypasses the DMA API entirely so does not support the VIOMMU right now. It's not clear we need that support, for now let's just make sure we don't pretend to support it. Cc: Wei Wang <wei.w.wang@intel.com> Fixes: 1a937693993f ("virtio: new feature to detect IOMMU device quirk") Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-24alarmtimer: Rate limit periodic intervalsThomas Gleixner
commit ff86bf0c65f14346bf2440534f9ba5ac232c39a0 upstream. The alarmtimer code has another source of potentially rearming itself too fast. Interval timers with a very samll interval have a similar CPU hog effect as the previously fixed overflow issue. The reason is that alarmtimers do not implement the normal protection against this kind of problem which the other posix timer use: timer expires -> queue signal -> deliver signal -> rearm timer This scheme brings the rearming under scheduler control and prevents permanently firing timers which hog the CPU. Bringing this scheme to the alarm timer code is a major overhaul because it lacks all the necessary mechanisms completely. So for a quick fix limit the interval to one jiffie. This is not problematic in practice as alarmtimers are usually backed by an RTC for suspend which have 1 second resolution. It could be therefor argued that the resolution of this clock should be set to 1 second in general, but that's outside the scope of this fix. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kostya Serebryany <kcc@google.com> Cc: syzkaller <syzkaller@googlegroups.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Dmitry Vyukov <dvyukov@google.com> Link: http://lkml.kernel.org/r/20170530211655.896767100@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-24alarmtimer: Prevent overflow of relative timersThomas Gleixner
commit f4781e76f90df7aec400635d73ea4c35ee1d4765 upstream. Andrey reported a alartimer related RCU stall while fuzzing the kernel with syzkaller. The reason for this is an overflow in ktime_add() which brings the resulting time into negative space and causes immediate expiry of the timer. The following rearm with a small interval does not bring the timer back into positive space due to the same issue. This results in a permanent firing alarmtimer which hogs the CPU. Use ktime_add_safe() instead which detects the overflow and clamps the result to KTIME_SEC_MAX. Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kostya Serebryany <kcc@google.com> Cc: syzkaller <syzkaller@googlegroups.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Dmitry Vyukov <dvyukov@google.com> Link: http://lkml.kernel.org/r/20170530211655.802921648@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-24genirq: Release resources in __setup_irq() error pathHeiner Kallweit
commit fa07ab72cbb0d843429e61bf179308aed6cbe0dd upstream. In case __irq_set_trigger() fails the resources requested via irq_request_resources() are not released. Add the missing release call into the error handling path. Fixes: c1bacbae8192 ("genirq: Provide irq_request/release_resources chip callbacks") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/655538f5-cb20-a892-ff15-fbd2dd1fa4ec@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-24sched/core: Idle_task_exit() shouldn't use switch_mm_irqs_off()Andy Lutomirski
commit 252d2a4117bc181b287eeddf848863788da733ae upstream. idle_task_exit() can be called with IRQs on x86 on and therefore should use switch_mm(), not switch_mm_irqs_off(). This doesn't seem to cause any problems right now, but it will confuse my upcoming TLB flush changes. Nonetheless, I think it should be backported because it's trivial. There won't be any meaningful performance impact because idle_task_exit() is only used when offlining a CPU. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: f98db6013c55 ("sched/core: Add switch_mm_irqs_off() and use it in the scheduler") Link: http://lkml.kernel.org/r/ca3d1a9fa93a0b49f5a8ff729eda3640fb6abdf9.1497034141.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-24iio: adc: meson-saradc: fix potential crash in meson_sar_adc_clear_fifoMartin Blumenstingl
commit 103a07d4278203d6299798cd74cdc4d209801cac upstream. meson_sar_adc_clear_fifo passes a 0 as value-pointer to regmap_read(). In case of the meson-saradc driver this ends up in regmap_mmio_read(), where the value-pointer is de-referenced unconditionally to assign the value which was read. Fix this by passing an actual pointer, even though all we want to do is to discard the value. As a side-effect this fixes a sparse warning ("Using plain integer as NULL pointer") as reported by Paolo Cretaro. Fixes: 3adbf3427330 ("iio: adc: add a driver for the SAR ADC found in Amlogic Meson SoCs") Reported-by: Paolo Cretaro <paolocretaro@gmail.com> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-24staging: iio: ad7152: Fix deadlock in ad7152_write_raw_samp_freq()Alexey Khoroshilov
commit 95264c8c6a9040e84edda883dbbe9d193c41f46c upstream. ad7152_write_raw_samp_freq() is called by ad7152_write_raw() with chip->state_lock held. So, there is unavoidable deadlock when ad7152_write_raw_samp_freq() locks the mutex itself. The patch removes unneeded locking. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Fixes: 6572389bcc11 ("staging: iio: cdc: ad7152: Implement IIO_CHAN_INFO_SAMP_FREQ attribute") Acked-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-24iio: imu: inv_mpu6050: add accel lpf setting for chip >= MPU6500Jean-Baptiste Maneyrol
commit 948588e25b8af5e66962ed3f53e1cae1656fa5af upstream. Starting from MPU6500, accelerometer dlpf is set in a separate register named ACCEL_CONFIG_2. Add this new register in the map and set it for the corresponding chips. Signed-off-by: Jean-Baptiste Maneyrol <jmaneyrol@invensense.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>