summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-12-14mm, oom_reaper: fix memory corruptionMichal Hocko
David Rientjes has reported the following memory corruption while the oom reaper tries to unmap the victims address space BUG: Bad page map in process oom_reaper pte:6353826300000000 pmd:00000000 addr:00007f50cab1d000 vm_flags:08100073 anon_vma:ffff9eea335603f0 mapping: (null) index:7f50cab1d file: (null) fault: (null) mmap: (null) readpage: (null) CPU: 2 PID: 1001 Comm: oom_reaper Call Trace: unmap_page_range+0x1068/0x1130 __oom_reap_task_mm+0xd5/0x16b oom_reaper+0xff/0x14c kthread+0xc1/0xe0 Tetsuo Handa has noticed that the synchronization inside exit_mmap is insufficient. We only synchronize with the oom reaper if tsk_is_oom_victim which is not true if the final __mmput is called from a different context than the oom victim exit path. This can trivially happen from context of any task which has grabbed mm reference (e.g. to read /proc/<pid>/ file which requires mm etc.). The race would look like this oom_reaper oom_victim task mmget_not_zero do_exit mmput __oom_reap_task_mm mmput __mmput exit_mmap remove_vma unmap_page_range Fix this issue by providing a new mm_is_oom_victim() helper which operates on the mm struct rather than a task. Any context which operates on a remote mm struct should use this helper in place of tsk_is_oom_victim. The flag is set in mark_oom_victim and never cleared so it is stable in the exit_mmap path. Debugged by Tetsuo Handa. Link: http://lkml.kernel.org/r/20171210095130.17110-1-mhocko@kernel.org Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently") Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: David Rientjes <rientjes@google.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Andrea Argangeli <andrea@kernel.org> Cc: <stable@vger.kernel.org> [4.14] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14kernel: make groups_sort calling a responsibility group_info allocatorsThiago Rafael Becker
In testing, we found that nfsd threads may call set_groups in parallel for the same entry cached in auth.unix.gid, racing in the call of groups_sort, corrupting the groups for that entry and leading to permission denials for the client. This patch: - Make groups_sort globally visible. - Move the call to groups_sort to the modifiers of group_info - Remove the call to groups_sort from set_groups Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.com Signed-off-by: Thiago Rafael Becker <thiago.becker@gmail.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: NeilBrown <neilb@suse.com> Acked-by: "J. Bruce Fields" <bfields@fieldses.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14mm/frame_vector.c: release a semaphore in 'get_vaddr_frames()'Christophe JAILLET
A semaphore is acquired before this check, so we must release it before leaving. Link: http://lkml.kernel.org/r/20171211211009.4971-1-christophe.jaillet@wanadoo.fr Fixes: b7f0554a56f2 ("mm: fail get_vaddr_frames() for filesystem-dax mappings") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: David Sterba <dsterba@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14tools/slabinfo-gnuplot: force to use bash shellLiu, Changcheng
On some linux distributions, the default link of sh is dash which deoesn't support split array like "${var//,/ }" It's better to force to use bash shell directly. Link: http://lkml.kernel.org/r/20171208093751.GA175471@sofia Signed-off-by: Liu Changcheng <changcheng.liu@intel.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14kcov: fix comparison callback signatureDmitry Vyukov
Fix a silly copy-paste bug. We truncated u32 args to u16. Link: http://lkml.kernel.org/r/20171207101134.107168-1-dvyukov@google.com Fixes: ded97d2c2b2c ("kcov: support comparison operands collection") Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Cc: syzkaller@googlegroups.com Cc: Alexander Potapenko <glider@google.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14mm/slab.c: do not hash pointers when debugging slabGeert Uytterhoeven
If CONFIG_DEBUG_SLAB/CONFIG_DEBUG_SLAB_LEAK are enabled, the slab code prints extra debug information when e.g. corruption is detected. This includes pointers, which are not very useful when hashed. Fix this by using %px to print unhashed pointers instead where it makes sense, and by removing the printing of a last user pointer referring to code. [geert+renesas@glider.be: v2] Link: http://lkml.kernel.org/r/1513179267-2509-1-git-send-email-geert+renesas@glider.be Link: http://lkml.kernel.org/r/1512641861-5113-1-git-send-email-geert+renesas@glider.be Fixes: ad67b74d2469d9b8 ("printk: hash addresses printed with %p") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Christoph Lameter <cl@linux.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: "Tobin C . Harding" <me@tobin.cc> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14mm/page_alloc.c: avoid excessive IRQ disabled times in free_unref_page_list()Lucas Stach
Since commit 9cca35d42eb6 ("mm, page_alloc: enable/disable IRQs once when freeing a list of pages") we see excessive IRQ disabled times of up to 25ms on an embedded ARM system (tracing overhead included). This is due to graphics buffers being freed back to the system via release_pages(). Graphics buffers can be huge, so it's not hard to hit cases where the list of pages to free has 2048 entries. Disabling IRQs while freeing all those pages is clearly not a good idea. Introduce a batch limit, which allows IRQ servicing once every few pages. The batch count is the same as used in other parts of the MM subsystem when dealing with IRQ disabled regions. Link: http://lkml.kernel.org/r/20171207170314.4419-1-l.stach@pengutronix.de Fixes: 9cca35d42eb6 ("mm, page_alloc: enable/disable IRQs once when freeing a list of pages") Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14mm/memory.c: mark wp_huge_pmd() inline to prevent build failureGeert Uytterhoeven
With gcc 4.1.2: mm/memory.o: In function `wp_huge_pmd': memory.c:(.text+0x9b4): undefined reference to `do_huge_pmd_wp_page' Interestingly, wp_huge_pmd() is emitted in the assembler output, but never called. Apparently replacing the call to pmd_write() in __handle_mm_fault() by a call to the more complex pmd_access_permitted() reduced the ability of the compiler to remove unused code. Fix this by marking wp_huge_pmd() inline, like was done in commit 91a90140f998 ("mm/memory.c: mark create_huge_pmd() inline to prevent build failure") for a similar problem. [akpm@linux-foundation.org: add comment] Link: http://lkml.kernel.org/r/1512335500-10889-1-git-send-email-geert@linux-m68k.org Fixes: c7da82b894e9eef6 ("mm: replace pmd_write with pmd_access_permitted in fault + gup paths") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14scripts/faddr2line: fix CROSS_COMPILE unset errorLiu, Changcheng
faddr2line hit var unbound error when CROSS_COMPILE isn't set since nounset option is set in bash script. Link: http://lkml.kernel.org/r/20171206013022.GA83929@sofia Fixes: 95a879825419 ("scripts/faddr2line: extend usage on generic arch") Signed-off-by: Liu Changcheng <changcheng.liu@intel.com> Reported-by: Richard Weinberger <richard.weinberger@gmail.com> Reviewed-by: Richard Weinberger <richard@nod.at> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: NeilBrown <neilb@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14Documentation/vm/zswap.txt: update with same-value filled page featureSrividya Desireddy
Update zswap document with details on same-value filled pages identification feature. The usage of zswap.same_filled_pages_enabled module parameter is explained. Link: http://lkml.kernel.org/r/20171206114852epcms5p6973b02a9f455d5d3c765eafda0fe2631@epcms5p6 Signed-off-by: Srividya Desireddy <srividya.dr@samsung.com> Acked-by: Dan Streetman <ddstreet@ieee.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14exec: avoid gcc-8 warning for get_task_commArnd Bergmann
gcc-8 warns about using strncpy() with the source size as the limit: fs/exec.c:1223:32: error: argument to 'sizeof' in 'strncpy' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess] This is indeed slightly suspicious, as it protects us from source arguments without NUL-termination, but does not guarantee that the destination is terminated. This keeps the strncpy() to ensure we have properly padded target buffer, but ensures that we use the correct length, by passing the actual length of the destination buffer as well as adding a build-time check to ensure it is exactly TASK_COMM_LEN. There are only 23 callsites which I all reviewed to ensure this is currently the case. We could get away with doing only the check or passing the right length, but it doesn't hurt to do both. Link: http://lkml.kernel.org/r/20171205151724.1764896-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: Kees Cook <keescook@chromium.org> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Serge Hallyn <serge@hallyn.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Aleksa Sarai <asarai@suse.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14autofs: fix careless error in recent commitNeilBrown
Commit ecc0c469f277 ("autofs: don't fail mount for transient error") was meant to replace an 'if' with a 'switch', but instead added the 'switch' leaving the case in place. Link: http://lkml.kernel.org/r/87zi6wstmw.fsf@notabene.neil.brown.name Fixes: ecc0c469f277 ("autofs: don't fail mount for transient error") Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: NeilBrown <neilb@suse.com> Cc: Ian Kent <raven@themaw.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14string.h: workaround for increased stack usageArnd Bergmann
The hardened strlen() function causes rather large stack usage in at least one file in the kernel, in particular when CONFIG_KASAN is enabled: drivers/media/usb/em28xx/em28xx-dvb.c: In function 'em28xx_dvb_init': drivers/media/usb/em28xx/em28xx-dvb.c:2062:1: error: the frame size of 3256 bytes is larger than 204 bytes [-Werror=frame-larger-than=] Analyzing this problem led to the discovery that gcc fails to merge the stack slots for the i2c_board_info[] structures after we strlcpy() into them, due to the 'noreturn' attribute on the source string length check. I reported this as a gcc bug, but it is unlikely to get fixed for gcc-8, since it is relatively easy to work around, and it gets triggered rarely. An earlier workaround I did added an empty inline assembly statement before the call to fortify_panic(), which works surprisingly well, but is really ugly and unintuitive. This is a new approach to the same problem, this time addressing it by not calling the 'extern __real_strnlen()' function for string constants where __builtin_strlen() is a compile-time constant and therefore known to be safe. We do this by checking if the last character in the string is a compile-time constant '\0'. If it is, we can assume that strlen() of the string is also constant. As a side-effect, this should also improve the object code output for any other call of strlen() on a string constant. [akpm@linux-foundation.org: add comment] Link: http://lkml.kernel.org/r/20171205215143.3085755-1-arnd@arndb.de Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82365 Link: https://patchwork.kernel.org/patch/9980413/ Link: https://patchwork.kernel.org/patch/9974047/ Fixes: 6974f0c4555 ("include/linux/string.h: add the option of fortified string.h functions") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Kees Cook <keescook@chromium.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Daniel Micay <danielmicay@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Martin Wilck <mwilck@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14mm/kmemleak.c: make cond_resched() rate-limiting more efficientAndrew Morton
Commit bde5f6bc68db ("kmemleak: add scheduling point to kmemleak_scan()") tries to rate-limit the frequency of cond_resched() calls, but does it in a way which might incur an expensive division operation in the inner loop. Simplify this. Fixes: bde5f6bc68db5 ("kmemleak: add scheduling point to kmemleak_scan()") Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Yisheng Xie <xieyisheng1@huawei.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14lib/rbtree,drm/mm: add rbtree_replace_node_cached()Chris Wilson
Add a variant of rbtree_replace_node() that maintains the leftmost cache of struct rbtree_root_cached when replacing nodes within the rbtree. As drm_mm is the only rb_replace_node() being used on an interval tree, the mistake looks fairly self-contained. Furthermore the only user of drm_mm_replace_node() is its testsuite... Testcase: igt/drm_mm/replace Link: http://lkml.kernel.org/r/20171122100729.3742-1-chris@chris-wilson.co.uk Link: https://patchwork.freedesktop.org/patch/msgid/20171109212435.9265-1-chris@chris-wilson.co.uk Fixes: f808c13fd373 ("lib/interval_tree: fast overlap detection") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Davidlohr Bueso <dbueso@suse.de> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14include/linux/idr.h: add #include <linux/bug.h>Wei Wang
The <linux/bug.h> was removed from radix-tree.h by commit f5bba9d11a25 ("include/linux/radix-tree.h: remove unneeded #include <linux/bug.h>"). Since that commit, tools/testing/radix-tree/ couldn't pass compilation due to tools/testing/radix-tree/idr.c:17: undefined reference to WARN_ON_ONCE. This patch adds the bug.h header to idr.h to solve the issue. Link: http://lkml.kernel.org/r/1511963726-34070-2-git-send-email-wei.w.wang@intel.com Fixes: f5bba9d11a2 ("include/linux/radix-tree.h: remove unneeded #include <linux/bug.h>") Signed-off-by: Wei Wang <wei.w.wang@intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Jan Kara <jack@suse.cz> Cc: Eric Biggers <ebiggers@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14Merge tag '4.15-rc-smb3' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Small SMB3 fixes for stable and 4.15rc" * tag '4.15-rc-smb3' of git://git.samba.org/sfrench/cifs-2.6: CIFS: don't log STATUS_NOT_FOUND errors for DFS cifs: fix NULL deref in SMB2_read
2017-12-14Merge tag 'drm-misc-fixes-2017-12-14' of ↵Linus Torvalds
git://anongit.freedesktop.org/drm/drm-misc Pull drm fixes from Daniel Vetter: - two fixes for new core features - a corner case fix for the connnector_iter fix from last week (this one is cc: stable) - one vc4 fix * tag 'drm-misc-fixes-2017-12-14' of git://anongit.freedesktop.org/drm/drm-misc: drm/drm_lease: Prevent deadlock in case drm_lease_create() fails drm: rework delayed connector cleanup in connector_iter drm: Update edid-derived drm_display_info fields at edid property set [v2] drm/vc4: Release fence after signalling
2017-12-14virtio_mmio: fix devm cleanupMark Rutland
Recent rework of the virtio_mmio probe/remove paths balanced a devm_ioremap() with an iounmap() rather than its devm variant. This ends up corrupting the devm datastructures, and results in the following boot-time splat on arm64 under QEMU 2.9.0: [ 3.450397] ------------[ cut here ]------------ [ 3.453822] Trying to vfree() nonexistent vm area (00000000c05b4844) [ 3.460534] WARNING: CPU: 1 PID: 1 at mm/vmalloc.c:1525 __vunmap+0x1b8/0x220 [ 3.475898] Kernel panic - not syncing: panic_on_warn set ... [ 3.475898] [ 3.493933] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.15.0-rc3 #1 [ 3.513109] Hardware name: linux,dummy-virt (DT) [ 3.525382] Call trace: [ 3.531683] dump_backtrace+0x0/0x368 [ 3.543921] show_stack+0x20/0x30 [ 3.547767] dump_stack+0x108/0x164 [ 3.559584] panic+0x25c/0x51c [ 3.569184] __warn+0x29c/0x31c [ 3.576023] report_bug+0x1d4/0x290 [ 3.586069] bug_handler.part.2+0x40/0x100 [ 3.597820] bug_handler+0x4c/0x88 [ 3.608400] brk_handler+0x11c/0x218 [ 3.613430] do_debug_exception+0xe8/0x318 [ 3.627370] el1_dbg+0x18/0x78 [ 3.634037] __vunmap+0x1b8/0x220 [ 3.648747] vunmap+0x6c/0xc0 [ 3.653864] __iounmap+0x44/0x58 [ 3.659771] devm_ioremap_release+0x34/0x68 [ 3.672983] release_nodes+0x404/0x880 [ 3.683543] devres_release_all+0x6c/0xe8 [ 3.695692] driver_probe_device+0x250/0x828 [ 3.706187] __driver_attach+0x190/0x210 [ 3.717645] bus_for_each_dev+0x14c/0x1f0 [ 3.728633] driver_attach+0x48/0x78 [ 3.740249] bus_add_driver+0x26c/0x5b8 [ 3.752248] driver_register+0x16c/0x398 [ 3.757211] __platform_driver_register+0xd8/0x128 [ 3.770860] virtio_mmio_init+0x1c/0x24 [ 3.782671] do_one_initcall+0xe0/0x398 [ 3.791890] kernel_init_freeable+0x594/0x660 [ 3.798514] kernel_init+0x18/0x190 [ 3.810220] ret_from_fork+0x10/0x18 To fix this, we can simply rip out the explicit cleanup that the devm infrastructure will do for us when our probe function returns an error code, or when our remove function returns. We only need to ensure that we call put_device() if a call to register_virtio_device() fails in the probe path. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Fixes: 7eb781b1bbb7136f ("virtio_mmio: add cleanup for virtio_mmio_probe") Fixes: 25f32223bce5c580 ("virtio_mmio: add cleanup for virtio_mmio_remove") Cc: Cornelia Huck <cohuck@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: weiping zhang <zhangweiping@didichuxing.com> Cc: virtualization@lists.linux-foundation.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
2017-12-14arm64/sve: Report SVE to userspace via CPUID only if supportedDave Martin
Currently, the SVE field in ID_AA64PFR0_EL1 is visible unconditionally to userspace via the CPU ID register emulation, irrespective of the kernel config. This means that if a kernel configured with CONFIG_ARM64_SVE=n is run on SVE-capable hardware, userspace will see SVE reported as present in the ID regs even though the kernel forbids execution of SVE instructions. This patch makes the exposure of the SVE field in ID_AA64PFR0_EL1 conditional on CONFIG_ARM64_SVE=y. Since future architecture features are likely to encounter a similar requirement, this patch adds a suitable helper macros for use when declaring config-conditional ID register fields. Fixes: 43994d824e84 ("arm64/sve: Detect SVE and activate runtime support") Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Cc: Suzuki Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-12-14Merge branch 'bpf-bpftool-cgroup-ops'Daniel Borkmann
Roman Gushchin says: ==================== This patchset adds basic cgroup bpf operations to bpftool. Right now there is no convenient way to perform these operations. The /samples/bpf/load_sock_ops.c implements attach/detacg operations, but only for BPF_CGROUP_SOCK_OPS programs. Bps (part of bcc) implements bpf introspection, but lacks any cgroup-related specific. I find having a tool to perform these basic operations in the kernel tree very useful, as it can be used in the corresponding bpf documentation without creating additional dependencies. And bpftool seems to be a right tool to extend with such functionality. v4: - ATTACH_FLAGS and ATTACH_TYPE are listed and described in docs and usage - ATTACH_FLAG names converted to "multi" and "override" - do_attach() recognizes ATTACH_FLAG abbreviations, e.g "mul" - Local variables sorted ("reverse Christmas tree") - unknown attach flags value will be never truncated v3: - SRC replaced with OBJ in prog load docs - Output unknown attach type in hex - License header in SPDX format - Minor style fixes (e.g. variable reordering) v2: - Added prog load operations - All cgroup operations are looking like bpftool cgroup <command> - All cgroup-related stuff is moved to a separate file - Added support for attach flags - Added support for attaching/detaching programs by id, pinned name, etc - Changed cgroup detach arguments order - Added empty json output for succesful programs - Style fixed: includes order, strncmp and macroses, error handling - Added man pages v1: https://lwn.net/Articles/740366/ ==================== Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-14bpftool: implement cgroup bpf operationsRoman Gushchin
This patch adds basic cgroup bpf operations to bpftool: cgroup list, attach and detach commands. Usage is described in the corresponding man pages, and examples are provided. Syntax: $ bpftool cgroup list CGROUP $ bpftool cgroup attach CGROUP ATTACH_TYPE PROG [ATTACH_FLAGS] $ bpftool cgroup detach CGROUP ATTACH_TYPE PROG Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jakub Kicinski <jakub.kicinski@netronome.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: David Ahern <dsahern@gmail.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-14bpftool: implement prog load commandRoman Gushchin
Add the prog load command to load a bpf program from a specified binary file and pin it to bpffs. Usage description and examples are given in the corresponding man page. Syntax: $ bpftool prog load OBJ FILE FILE is a non-existing file on bpffs. Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Quentin Monnet <quentin.monnet@netronome.com> Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-14libbpf: prefer global symbols as bpf program name sourceRoman Gushchin
Libbpf picks the name of the first symbol in the corresponding elf section to use as a program name. But without taking symbol's scope into account it may end's up with some local label as a program name. E.g.: $ bpftool prog 1: type 15 name LBB0_10 tag 0390a5136ba23f5c loaded_at Dec 07/17:22 uid 0 xlated 456B not jited memlock 4096B Fix this by preferring global symbols as program name. For instance: $ bpftool prog 1: type 15 name bpf_prog1 tag 0390a5136ba23f5c loaded_at Dec 07/17:26 uid 0 xlated 456B not jited memlock 4096B Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jakub Kicinski <jakub.kicinski@netronome.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Quentin Monnet <quentin.monnet@netronome.com> Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-14libbpf: add ability to guess program type based on section nameRoman Gushchin
The bpf_prog_load() function will guess program type if it's not specified explicitly. This functionality will be used to implement loading of different programs without asking a user to specify the program type. In first order it will be used by bpftool. Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jakub Kicinski <jakub.kicinski@netronome.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Quentin Monnet <quentin.monnet@netronome.com> Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-14arm64: fix CONFIG_DEBUG_WX address reportingMark Rutland
In ptdump_check_wx(), we pass walk_pgd() a start address of 0 (rather than VA_START) for the init_mm. This means that any reported W&X addresses are offset by VA_START, which is clearly wrong and can make them appear like userspace addresses. Fix this by telling the ptdump code that we're walking init_mm starting at VA_START. We don't need to update the addr_markers, since these are still valid bounds regardless. Cc: <stable@vger.kernel.org> Fixes: 1404d6f13e47 ("arm64: dump: Add checking for writable and exectuable pages") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Kees Cook <keescook@chromium.org> Cc: Laura Abbott <labbott@redhat.com> Reported-by: Timur Tabi <timur@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-12-14ovl: fix overlay: warning prefixAmir Goldstein
Conform two stray warning messages to the standard overlayfs: prefix. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-12-14drm/drm_lease: Prevent deadlock in case drm_lease_create() failsMarius Vlad
This case can been seen when creating the lease with the same objects passed. [ 605.515097] 2 locks held by testapp/3337: [ 605.519027] #0: (&dev->mode_config.idr_mutex){......}, at: [<ffff0000085f1664>] drm_mode_create_lease_ioctl+0x384/0x858 [ 605.530045] #1: (&dev->mode_config.idr_mutex){......}, at: [<ffff0000085f11bc>] drm_lease_destroy+0x2c/0x110 Which was causing the process to hang: [ 605.398827] [<ffff0000080856cc>] __switch_to+0x94/0xa8 [ 605.404030] [<ffff000008c05d00>] __schedule+0x1b0/0x698 [ 605.409322] [<ffff000008c06224>] schedule+0x3c/0xa8 [ 605.414260] [<ffff000008c06628>] schedule_preempt_disabled+0x20/0x38 [ 605.420677] [<ffff000008c07370>] mutex_lock_nested+0x158/0x340 [ 605.426572] [<ffff0000085f11bc>] drm_lease_destroy+0x2c/0x110 [ 605.432389] [<ffff0000085cecf0>] drm_master_put+0xc0/0xc8 [ 605.437845] [<ffff0000085f175c>] drm_mode_create_lease_ioctl+0x47c/0x858 [ 605.444612] [<ffff0000085d4460>] drm_ioctl+0x198/0x448 [ 605.449811] [<ffff000008201134>] do_vfs_ioctl+0xa4/0x748 [ 605.455192] [<ffff000008201864>] SyS_ioctl+0x8c/0xa0 [ 605.460216] [<ffff000008082f4c>] __sys_trace_return+0x0/0x4 drm_mode_create_lease_ioctl() calls drm_lease_create() which acquires a lock on dev->mode_config.idr_mutex. In case of failure, drm_lease_create() calls drm_master_put() which in turn tries to acquire the same lock when calling drm_lease_destroy(). v2: - Reverse the order at exit in case of fail, so that unlocking takes place before dropping the reference. - Include detail information about deadlock (Daniel Vetter) Signed-off-by: Marius Vlad <marius-cristian.vlad@nxp.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20171213181048.32719-1-marius-cristian.vlad@nxp.com
2017-12-13Merge tag 'xfs-4.15-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Darrick Wong: "Here are a few more bug fixes & cleanups for 4.15-rc4: - clean up duplicate includes - remove ancient 'no-alloc' crap code that occasionally caused hard fs shutdowns due to lack of proper space reservations - fix regression in FIEMAP behavior when reporting xattr extents" * tag 'xfs-4.15-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: make iomap_begin functions trim iomaps consistently xfs: remove "no-allocation" reservations for file creations fs: xfs: remove duplicate includes
2017-12-13Merge tag 'riscv-for-linus-4.15-rc4-riscv_fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux Pull RISC-V fixes from Palmer Dabbelt: "This contains three small fixes: - A fix to a typo in sys_riscv_flush_icache. This only effects error handling, but I think it's a small and obvious enough change that it's sane outside the merge window. - The addition of smp_mb__after_spinlock(), which was recently removed due to an incorrect comment. This is largly a comment change (as there's a big one now), and while it's necessary for complience with the RISC-V memory model the lack of this fence shouldn't manifest as a bug on current implementations. Nonetheless, it still seems saner to have the fence in 4.15. - The removal of some of the HVC_RISCV_SBI driver that snuck into the arch port. This is compile-time dead code in 4.15 (as the driver isn't in yet), and during the review process we found a better way to implement early printk on RISC-V. While this change doesn't do anything, it will make staging our HVC driver easier: without this change the HVC driver we hope to upstream won't build on 4.15 (because the 4.15 arch code would reference a function that no longer exists). I don't think this is the last patch set we'll want for 4.15: I think I'll want to remove some of the first-level irqchip driver that snuck in as well, which will look a lot like the HVC patch here. This is pending some asm-generic cleanup I'm doing that I haven't quite gotten clean enough to send out yet, though, but hopefully it'll be ready by next week (and still OK for that late)" * tag 'riscv-for-linus-4.15-rc4-riscv_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux: RISC-V: Remove unused CONFIG_HVC_RISCV_SBI code RISC-V: Resurrect smp_mb__after_spinlock() RISC-V: Logical vs Bitwise typo
2017-12-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2017-12-13 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Addition of explicit scheduling points to map alloc/free in order to avoid having to hold the CPU for too long, from Eric. 2) Fixing of a corruption in overlapping perf_event_output calls from different BPF prog types on the same CPU out of different contexts, from Daniel. 3) Fallout fixes for recent correction of broken uapi for BPF_PROG_TYPE_PERF_EVENT. um had a missing asm header that needed to be pulled in from asm-generic and for BPF selftests the asm-generic include did not work, so similar asm include scheme was adapted for that problematic header that perf is having with other header files under tools, from Daniel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13drm: rework delayed connector cleanup in connector_iterDaniel Vetter
PROBE_DEFER also uses system_wq to reprobe drivers, which means when that again fails, and we try to flush the overall system_wq (to get all the delayed connectore cleanup work_struct completed), we deadlock. Fix this by using just a single cleanup work, so that we can only flush that one and don't block on anything else. That means a free list plus locking, a standard pattern. v2: - Correctly free connectors only on last ref. Oops (Chris). - use llist_head/node (Chris). v3 - Add init_llist_head (Chris). Fixes: a703c55004e1 ("drm: safely free connectors from connector_iter") Fixes: 613051dac40d ("drm: locking&new iterators for connector_list") Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Dave Airlie <airlied@gmail.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Sean Paul <seanpaul@chromium.org> Cc: <stable@vger.kernel.org> # v4.11+: 613051dac40d ("drm: locking&new iterators for connector_list" Cc: <stable@vger.kernel.org> # v4.11+ Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Gustavo Padovan <gustavo@padovan.org> Cc: David Airlie <airlied@linux.ie> Cc: Javier Martinez Canillas <javier@dowhile0.org> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Guillaume Tucker <guillaume.tucker@collabora.com> Cc: Mark Brown <broonie@kernel.org> Cc: Kevin Hilman <khilman@baylibre.com> Cc: Matt Hart <matthew.hart@linaro.org> Cc: Thierry Escande <thierry.escande@collabora.co.uk> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com> Cc: Enric Balletbo i Serra <enric.balletbo@collabora.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171213124936.17914-1-daniel.vetter@ffwll.ch
2017-12-13bpf/tracing: fix kernel/events/core.c compilation errorYonghong Song
Commit f371b304f12e ("bpf/tracing: allow user space to query prog array on the same tp") introduced a perf ioctl command to query prog array attached to the same perf tracepoint. The commit introduced a compilation error under certain config conditions, e.g., (1). CONFIG_BPF_SYSCALL is not defined, or (2). CONFIG_TRACING is defined but neither CONFIG_UPROBE_EVENTS nor CONFIG_KPROBE_EVENTS is defined. Error message: kernel/events/core.o: In function `perf_ioctl': core.c:(.text+0x98c4): undefined reference to `bpf_event_query_prog_array' This patch fixed this error by guarding the real definition under CONFIG_BPF_EVENTS and provided static inline dummy function if CONFIG_BPF_EVENTS was not defined. It renamed the function from bpf_event_query_prog_array to perf_event_query_prog_array and moved the definition from linux/bpf.h to linux/trace_events.h so the definition is in proximity to other prog_array related functions. Fixes: f371b304f12e ("bpf/tracing: allow user space to query prog array on the same tp") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-13Merge branch 'mlx4-misc-fixes'David S. Miller
Tariq Toukan says: ==================== mlx4 misc fixes This patchset contains misc bug fixes from the team to the mlx4 Core and Eth drivers. Patch 1 by Eugenia fixes an MTU issue in selftest. Patch 2 by Eran fixes an accounting issue in the resource tracker. Patch 3 by Eran fixes a race condition that causes counter inconsistency. Series generated against net commit: 200809716aed fou: fix some member types in guehdr v2: Patch 2: Add reviewer credit, rephrase commit message. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13net/mlx4_en: Fill all counters under one call of stats lockEran Ben Elisha
Before this patch, the stats_lock was acquired twice. In between the locks Driver sent command to gather some more statistics (per priority and counter statistics). If the stats lock was acquired by get statistics NDO in between we would have report out of sync counters. Fix this by collecting all stats from Firmware in advance and then fill the Software structs under one lock. Fixes: 0b131561a7d6 ("net/mlx4_en: Add Flow control statistics display via ethtool") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13net/mlx4_core: Fix wrong calculation of free countersEran Ben Elisha
The field res_free indicates the total number of counters which are available for allocation (reserved and unreserved). Fixed a bug where the reserved counters were subtracted from res_free before any allocation was performed. Before this fix, free counters which were not reserved could not be allocated. Fixes: 9de92c60beaa ("net/mlx4_core: Adjust counter grant policy in the resource tracker") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13net/mlx4_en: Fix selftest for small MTUsEugenia Emantayev
Set the minimal MTU threshold for running loopback selftest. MTU should be big enough to include packet payload, NET_IP_ALIGN, Ethernet headers and preamble length. Fixes: e7c1c2c46201 ("mlx4_en: Added self diagnostics test implementation") Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13net: dsa: lan9303: Introduce lan9303_read_waitEgil Hjelmeland
Simplify lan9303_indirect_phy_wait_for_completion() and lan9303_switch_wait_for_completion() by using a new function lan9303_read_wait() Changes v1 -> v2: - param 'mask' type u32 - removed param 'value' (will probably never be used) - add newline before return Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13net: phy: marvell: avoid configuring fiber page for SGMII-to-CopperRussell King
When in SGMII-to-Copper mode, the fiber page is used for the MAC facing link, and does not require configuration of the fiber auto-negotiation settings. Avoid trying. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13dwc-xlgmac: Add co-maintainerJie Deng
Jose Abreu will join to maintain dwc-xlgmac. He will help with new feature development for this driver. Thanks Jose and welcome on board! Signed-off-by: Jie Deng <jiedeng@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13tcp: refresh tcp_mstamp from timers callbacksEric Dumazet
Only the retransmit timer currently refreshes tcp_mstamp We should do the same for delayed acks and keepalives. Even if RFC 7323 does not request it, this is consistent to what linux did in the past, when TS values were based on jiffies. Fixes: 385e20706fac ("tcp: use tp->tcp_mstamp in output path") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Mike Maloney <maloney@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Mike Maloney <maloney@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13tcp: fix potential underestimation on rcv_rttWei Wang
When ms timestamp is used, current logic uses 1us in tcp_rcv_rtt_update() when the real rcv_rtt is within 1 - 999us. This could cause rcv_rtt underestimation. Fix it by always using a min value of 1ms if ms timestamp is used. Fixes: 645f4c6f2ebd ("tcp: switch rcv_rtt_est and rcvq_space to high resolution timestamps") Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13Merge branch 'hv_netvsc-minor-changes'David S. Miller
Stephen Hemminger says: ==================== hv_netvsc: minor changes This includes minor cleanup of code in send and receive path and also a new statistic to check for allocation failures. This also eliminates some of the extra RCU when not needed. There is a theoritical bug where buffered data could be blocked for longer than necessary if the ring buffer got full. This has not been seen in the wild, found by inspection. The reference count between net device and internal RNDIS is not needed. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13hv_netvsc: empty current transmit aggregation if flow blockedStephen Hemminger
If the transmit queue is known full, then don't keep aggregating data. And the cp_partial flag which indicates that the current aggregation buffer is full can be folded in to avoid more conditionals. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13hv_netvsc: remove open_cnt reference countStephen Hemminger
There is only ever a single instance of network device object referencing the internal rndis object. Therefore the open_cnt atomic is not necessary. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13hv_netvsc: pass netvsc_device to receive callbackStephen Hemminger
The netvsc_receive_callback function was using RCU to find the appropriate underlying netvsc_device. Since calling function already had that pointer, this was unnecessary. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13hv_netvsc: simplify function args in receive status pathStephen Hemminger
The caller (netvsc_receive) already has the net device pointer, and should just pass that to functions rather than the hyperv device. This eliminates several impossible error paths in the process. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13hv_netvsc: track memory allocation failures in ethtool statsStephen Hemminger
When skb can not be allocated, update ethtool statisitics rather than rx_dropped which is intended for netif_receive. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13hv_netvsc: copy_to_send buf can be voidStephen Hemminger
Since only caller does not care about return value. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13Merge branch 'phylink-dsa-prep'David S. Miller
Florian Fainelli says: ==================== PHYLINK preparatory patches for DSA In preparation for having DSA migrate to PHYLINK, I had to come up with a number of preparatory patches: - we need to be able to pass phy_flags from an external component calling phylink_of_phy_connect() - DSA tries to connect through OF first, then fallsback using its own internal MDIO bus, in that case we would both show an error, but also not know what the correct phy_interface_t would be, instead use the PHY device/driver provided one - Finally bcm_sf2 makes use of all possible PHYs out there: internal, external, fixed, and MoCA, the latter requires a bit of help to signal link notifications through a MMIO interrupt, as well a report a correct PORT type Changes in v2: - rebased against latest net-next/master - added kernel doc documentation - dropped error message in phylink_of_phy_connect() as suggested by Russell ==================== Signed-off-by: David S. Miller <davem@davemloft.net>