summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-08-13mm/usercopy: use memory range to be accessed for wraparound checkIsaac J. Manjarres
Currently, when checking to see if accessing n bytes starting at address "ptr" will cause a wraparound in the memory addresses, the check in check_bogus_address() adds an extra byte, which is incorrect, as the range of addresses that will be accessed is [ptr, ptr + (n - 1)]. This can lead to incorrectly detecting a wraparound in the memory address, when trying to read 4 KB from memory that is mapped to the the last possible page in the virtual address space, when in fact, accessing that range of memory would not cause a wraparound to occur. Use the memory range that will actually be accessed when considering if accessing a certain amount of bytes will cause the memory address to wrap around. Link: http://lkml.kernel.org/r/1564509253-23287-1-git-send-email-isaacm@codeaurora.org Fixes: f5509cc18daa ("mm: Hardened usercopy") Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org> Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org> Co-developed-by: Prasad Sodagudi <psodagud@codeaurora.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Trilok Soni <tsoni@codeaurora.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-13mm: kmemleak: disable early logging in case of errorCatalin Marinas
If an error occurs during kmemleak_init() (e.g. kmem cache cannot be created), kmemleak is disabled but kmemleak_early_log remains enabled. Subsequently, when the .init.text section is freed, the log_early() function no longer exists. To avoid a page fault in such scenario, ensure that kmemleak_disable() also disables early logging. Link: http://lkml.kernel.org/r/20190731152302.42073-1-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-13mm/vmalloc.c: fix percpu free VM area search criteriaKuppuswamy Sathyanarayanan
Recent changes to the vmalloc code by commit 68ad4a330433 ("mm/vmalloc.c: keep track of free blocks for vmap allocation") can cause spurious percpu allocation failures. These, in turn, can result in panic()s in the slub code. One such possible panic was reported by Dave Hansen in following link https://lkml.org/lkml/2019/6/19/939. Another related panic observed is, RIP: 0033:0x7f46f7441b9b Call Trace: dump_stack+0x61/0x80 pcpu_alloc.cold.30+0x22/0x4f mem_cgroup_css_alloc+0x110/0x650 cgroup_apply_control_enable+0x133/0x330 cgroup_mkdir+0x41b/0x500 kernfs_iop_mkdir+0x5a/0x90 vfs_mkdir+0x102/0x1b0 do_mkdirat+0x7d/0xf0 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 VMALLOC memory manager divides the entire VMALLOC space (VMALLOC_START to VMALLOC_END) into multiple VM areas (struct vm_areas), and it mainly uses two lists (vmap_area_list & free_vmap_area_list) to track the used and free VM areas in VMALLOC space. And pcpu_get_vm_areas(offsets[], sizes[], nr_vms, align) function is used for allocating congruent VM areas for percpu memory allocator. In order to not conflict with VMALLOC users, pcpu_get_vm_areas allocates VM areas near the end of the VMALLOC space. So the search for free vm_area for the given requirement starts near VMALLOC_END and moves upwards towards VMALLOC_START. Prior to commit 68ad4a330433, the search for free vm_area in pcpu_get_vm_areas() involves following two main steps. Step 1: Find a aligned "base" adress near VMALLOC_END. va = free vm area near VMALLOC_END Step 2: Loop through number of requested vm_areas and check, Step 2.1: if (base < VMALLOC_START) 1. fail with error Step 2.2: // end is offsets[area] + sizes[area] if (base + end > va->vm_end) 1. Move the base downwards and repeat Step 2 Step 2.3: if (base + start < va->vm_start) 1. Move to previous free vm_area node, find aligned base address and repeat Step 2 But Commit 68ad4a330433 removed Step 2.2 and modified Step 2.3 as below: Step 2.3: if (base + start < va->vm_start || base + end > va->vm_end) 1. Move to previous free vm_area node, find aligned base address and repeat Step 2 Above change is the root cause of spurious percpu memory allocation failures. For example, consider a case where a relatively large vm_area (~ 30 TB) was ignored in free vm_area search because it did not pass the base + end < vm->vm_end boundary check. Ignoring such large free vm_area's would lead to not finding free vm_area within boundary of VMALLOC_start to VMALLOC_END which in turn leads to allocation failures. So modify the search algorithm to include Step 2.2. Link: http://lkml.kernel.org/r/20190729232139.91131-1-sathyanarayanan.kuppuswamy@linux.intel.com Fixes: 68ad4a330433 ("mm/vmalloc.c: keep track of free blocks for vmap allocation") Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reported-by: Dave Hansen <dave.hansen@intel.com> Acked-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Roman Gushchin <guro@fb.com> Cc: sathyanarayanan kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-13mm/memcontrol.c: fix use after free in mem_cgroup_iter()Miles Chen
This patch is sent to report an use after free in mem_cgroup_iter() after merging commit be2657752e9e ("mm: memcg: fix use after free in mem_cgroup_iter()"). I work with android kernel tree (4.9 & 4.14), and commit be2657752e9e ("mm: memcg: fix use after free in mem_cgroup_iter()") has been merged to the trees. However, I can still observe use after free issues addressed in the commit be2657752e9e. (on low-end devices, a few times this month) backtrace: css_tryget <- crash here mem_cgroup_iter shrink_node shrink_zones do_try_to_free_pages try_to_free_pages __perform_reclaim __alloc_pages_direct_reclaim __alloc_pages_slowpath __alloc_pages_nodemask To debug, I poisoned mem_cgroup before freeing it: static void __mem_cgroup_free(struct mem_cgroup *memcg) for_each_node(node) free_mem_cgroup_per_node_info(memcg, node); free_percpu(memcg->stat); + /* poison memcg before freeing it */ + memset(memcg, 0x78, sizeof(struct mem_cgroup)); kfree(memcg); } The coredump shows the position=0xdbbc2a00 is freed. (gdb) p/x ((struct mem_cgroup_per_node *)0xe5009e00)->iter[8] $13 = {position = 0xdbbc2a00, generation = 0x2efd} 0xdbbc2a00: 0xdbbc2e00 0x00000000 0xdbbc2800 0x00000100 0xdbbc2a10: 0x00000200 0x78787878 0x00026218 0x00000000 0xdbbc2a20: 0xdcad6000 0x00000001 0x78787800 0x00000000 0xdbbc2a30: 0x78780000 0x00000000 0x0068fb84 0x78787878 0xdbbc2a40: 0x78787878 0x78787878 0x78787878 0xe3fa5cc0 0xdbbc2a50: 0x78787878 0x78787878 0x00000000 0x00000000 0xdbbc2a60: 0x00000000 0x00000000 0x00000000 0x00000000 0xdbbc2a70: 0x00000000 0x00000000 0x00000000 0x00000000 0xdbbc2a80: 0x00000000 0x00000000 0x00000000 0x00000000 0xdbbc2a90: 0x00000001 0x00000000 0x00000000 0x00100000 0xdbbc2aa0: 0x00000001 0xdbbc2ac8 0x00000000 0x00000000 0xdbbc2ab0: 0x00000000 0x00000000 0x00000000 0x00000000 0xdbbc2ac0: 0x00000000 0x00000000 0xe5b02618 0x00001000 0xdbbc2ad0: 0x00000000 0x78787878 0x78787878 0x78787878 0xdbbc2ae0: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2af0: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2b00: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2b10: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2b20: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2b30: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2b40: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2b50: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2b60: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2b70: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2b80: 0x78787878 0x78787878 0x00000000 0x78787878 0xdbbc2b90: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2ba0: 0x78787878 0x78787878 0x78787878 0x78787878 In the reclaim path, try_to_free_pages() does not setup sc.target_mem_cgroup and sc is passed to do_try_to_free_pages(), ..., shrink_node(). In mem_cgroup_iter(), root is set to root_mem_cgroup because sc->target_mem_cgroup is NULL. It is possible to assign a memcg to root_mem_cgroup.nodeinfo.iter in mem_cgroup_iter(). try_to_free_pages struct scan_control sc = {...}, target_mem_cgroup is 0x0; do_try_to_free_pages shrink_zones shrink_node mem_cgroup *root = sc->target_mem_cgroup; memcg = mem_cgroup_iter(root, NULL, &reclaim); mem_cgroup_iter() if (!root) root = root_mem_cgroup; ... css = css_next_descendant_pre(css, &root->css); memcg = mem_cgroup_from_css(css); cmpxchg(&iter->position, pos, memcg); My device uses memcg non-hierarchical mode. When we release a memcg: invalidate_reclaim_iterators() reaches only dead_memcg and its parents. If non-hierarchical mode is used, invalidate_reclaim_iterators() never reaches root_mem_cgroup. static void invalidate_reclaim_iterators(struct mem_cgroup *dead_memcg) { struct mem_cgroup *memcg = dead_memcg; for (; memcg; memcg = parent_mem_cgroup(memcg) ... } So the use after free scenario looks like: CPU1 CPU2 try_to_free_pages do_try_to_free_pages shrink_zones shrink_node mem_cgroup_iter() if (!root) root = root_mem_cgroup; ... css = css_next_descendant_pre(css, &root->css); memcg = mem_cgroup_from_css(css); cmpxchg(&iter->position, pos, memcg); invalidate_reclaim_iterators(memcg); ... __mem_cgroup_free() kfree(memcg); try_to_free_pages do_try_to_free_pages shrink_zones shrink_node mem_cgroup_iter() if (!root) root = root_mem_cgroup; ... mz = mem_cgroup_nodeinfo(root, reclaim->pgdat->node_id); iter = &mz->iter[reclaim->priority]; pos = READ_ONCE(iter->position); css_tryget(&pos->css) <- use after free To avoid this, we should also invalidate root_mem_cgroup.nodeinfo.iter in invalidate_reclaim_iterators(). [cai@lca.pw: fix -Wparentheses compilation warning] Link: http://lkml.kernel.org/r/1564580753-17531-1-git-send-email-cai@lca.pw Link: http://lkml.kernel.org/r/20190730015729.4406-1-miles.chen@mediatek.com Fixes: 5ac8fb31ad2e ("mm: memcontrol: convert reclaim iterator to simple css refcounting") Signed-off-by: Miles Chen <miles.chen@mediatek.com> Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-13mm/z3fold.c: fix z3fold_destroy_pool() race conditionHenry Burns
The constraint from the zpool use of z3fold_destroy_pool() is there are no outstanding handles to memory (so no active allocations), but it is possible for there to be outstanding work on either of the two wqs in the pool. Calling z3fold_deregister_migration() before the workqueues are drained means that there can be allocated pages referencing a freed inode, causing any thread in compaction to be able to trip over the bad pointer in PageMovable(). Link: http://lkml.kernel.org/r/20190726224810.79660-2-henryburns@google.com Fixes: 1f862989b04a ("mm/z3fold.c: support page migration") Signed-off-by: Henry Burns <henryburns@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Jonathan Adams <jwadams@google.com> Cc: Vitaly Vul <vitaly.vul@sony.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Henry Burns <henrywolfeburns@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-13mm/z3fold.c: fix z3fold_destroy_pool() orderingHenry Burns
The constraint from the zpool use of z3fold_destroy_pool() is there are no outstanding handles to memory (so no active allocations), but it is possible for there to be outstanding work on either of the two wqs in the pool. If there is work queued on pool->compact_workqueue when it is called, z3fold_destroy_pool() will do: z3fold_destroy_pool() destroy_workqueue(pool->release_wq) destroy_workqueue(pool->compact_wq) drain_workqueue(pool->compact_wq) do_compact_page(zhdr) kref_put(&zhdr->refcount) __release_z3fold_page(zhdr, ...) queue_work_on(pool->release_wq, &pool->work) *BOOM* So compact_wq needs to be destroyed before release_wq. Link: http://lkml.kernel.org/r/20190726224810.79660-1-henryburns@google.com Fixes: 5d03a6613957 ("mm/z3fold.c: use kref to prevent page free/compact race") Signed-off-by: Henry Burns <henryburns@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Jonathan Adams <jwadams@google.com> Cc: Vitaly Vul <vitaly.vul@sony.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@zeniv.linux.org.uk Cc: Henry Burns <henrywolfeburns@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-13mm: mempolicy: handle vma with unmovable pages mapped correctly in mbindYang Shi
When running syzkaller internally, we ran into the below bug on 4.9.x kernel: kernel BUG at mm/huge_memory.c:2124! invalid opcode: 0000 [#1] SMP KASAN CPU: 0 PID: 1518 Comm: syz-executor107 Not tainted 4.9.168+ #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011 task: ffff880067b34900 task.stack: ffff880068998000 RIP: split_huge_page_to_list+0x8fb/0x1030 mm/huge_memory.c:2124 Call Trace: split_huge_page include/linux/huge_mm.h:100 [inline] queue_pages_pte_range+0x7e1/0x1480 mm/mempolicy.c:538 walk_pmd_range mm/pagewalk.c:50 [inline] walk_pud_range mm/pagewalk.c:90 [inline] walk_pgd_range mm/pagewalk.c:116 [inline] __walk_page_range+0x44a/0xdb0 mm/pagewalk.c:208 walk_page_range+0x154/0x370 mm/pagewalk.c:285 queue_pages_range+0x115/0x150 mm/mempolicy.c:694 do_mbind mm/mempolicy.c:1241 [inline] SYSC_mbind+0x3c3/0x1030 mm/mempolicy.c:1370 SyS_mbind+0x46/0x60 mm/mempolicy.c:1352 do_syscall_64+0x1d2/0x600 arch/x86/entry/common.c:282 entry_SYSCALL_64_after_swapgs+0x5d/0xdb Code: c7 80 1c 02 00 e8 26 0a 76 01 <0f> 0b 48 c7 c7 40 46 45 84 e8 4c RIP [<ffffffff81895d6b>] split_huge_page_to_list+0x8fb/0x1030 mm/huge_memory.c:2124 RSP <ffff88006899f980> with the below test: uint64_t r[1] = {0xffffffffffffffff}; int main(void) { syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0); intptr_t res = 0; res = syscall(__NR_socket, 0x11, 3, 0x300); if (res != -1) r[0] = res; *(uint32_t*)0x20000040 = 0x10000; *(uint32_t*)0x20000044 = 1; *(uint32_t*)0x20000048 = 0xc520; *(uint32_t*)0x2000004c = 1; syscall(__NR_setsockopt, r[0], 0x107, 0xd, 0x20000040, 0x10); syscall(__NR_mmap, 0x20fed000, 0x10000, 0, 0x8811, r[0], 0); *(uint64_t*)0x20000340 = 2; syscall(__NR_mbind, 0x20ff9000, 0x4000, 0x4002, 0x20000340, 0x45d4, 3); return 0; } Actually the test does: mmap(0x20000000, 16777216, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x20000000 socket(AF_PACKET, SOCK_RAW, 768) = 3 setsockopt(3, SOL_PACKET, PACKET_TX_RING, {block_size=65536, block_nr=1, frame_size=50464, frame_nr=1}, 16) = 0 mmap(0x20fed000, 65536, PROT_NONE, MAP_SHARED|MAP_FIXED|MAP_POPULATE|MAP_DENYWRITE, 3, 0) = 0x20fed000 mbind(..., MPOL_MF_STRICT|MPOL_MF_MOVE) = 0 The setsockopt() would allocate compound pages (16 pages in this test) for packet tx ring, then the mmap() would call packet_mmap() to map the pages into the user address space specified by the mmap() call. When calling mbind(), it would scan the vma to queue the pages for migration to the new node. It would split any huge page since 4.9 doesn't support THP migration, however, the packet tx ring compound pages are not THP and even not movable. So, the above bug is triggered. However, the later kernel is not hit by this issue due to commit d44d363f6578 ("mm: don't assume anonymous pages have SwapBacked flag"), which just removes the PageSwapBacked check for a different reason. But, there is a deeper issue. According to the semantic of mbind(), it should return -EIO if MPOL_MF_MOVE or MPOL_MF_MOVE_ALL was specified and MPOL_MF_STRICT was also specified, but the kernel was unable to move all existing pages in the range. The tx ring of the packet socket is definitely not movable, however, mbind() returns success for this case. Although the most socket file associates with non-movable pages, but XDP may have movable pages from gup. So, it sounds not fine to just check the underlying file type of vma in vma_migratable(). Change migrate_page_add() to check if the page is movable or not, if it is unmovable, just return -EIO. But do not abort pte walk immediately, since there may be pages off LRU temporarily. We should migrate other pages if MPOL_MF_MOVE* is specified. Set has_unmovable flag if some paged could not be not moved, then return -EIO for mbind() eventually. With this change the above test would return -EIO as expected. [yang.shi@linux.alibaba.com: fix review comments from Vlastimil] Link: http://lkml.kernel.org/r/1563556862-54056-3-git-send-email-yang.shi@linux.alibaba.com Link: http://lkml.kernel.org/r/1561162809-59140-3-git-send-email-yang.shi@linux.alibaba.com Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-13mm: mempolicy: make the behavior consistent when MPOL_MF_MOVE* and ↵Yang Shi
MPOL_MF_STRICT were specified When both MPOL_MF_MOVE* and MPOL_MF_STRICT was specified, mbind() should try best to migrate misplaced pages, if some of the pages could not be migrated, then return -EIO. There are three different sub-cases: 1. vma is not migratable 2. vma is migratable, but there are unmovable pages 3. vma is migratable, pages are movable, but migrate_pages() fails If #1 happens, kernel would just abort immediately, then return -EIO, after a7f40cfe3b7a ("mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified"). If #3 happens, kernel would set policy and migrate pages with best-effort, but won't rollback the migrated pages and reset the policy back. Before that commit, they behaves in the same way. It'd better to keep their behavior consistent. But, rolling back the migrated pages and resetting the policy back sounds not feasible, so just make #1 behave as same as #3. Userspace will know that not everything was successfully migrated (via -EIO), and can take whatever steps it deems necessary - attempt rollback, determine which exact page(s) are violating the policy, etc. Make queue_pages_range() return 1 to indicate there are unmovable pages or vma is not migratable. The #2 is not handled correctly in the current kernel, the following patch will fix it. [yang.shi@linux.alibaba.com: fix review comments from Vlastimil] Link: http://lkml.kernel.org/r/1563556862-54056-2-git-send-email-yang.shi@linux.alibaba.com Link: http://lkml.kernel.org/r/1561162809-59140-2-git-send-email-yang.shi@linux.alibaba.com Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-13mm/hmm: fix bad subpage pointer in try_to_unmap_oneRalph Campbell
When migrating an anonymous private page to a ZONE_DEVICE private page, the source page->mapping and page->index fields are copied to the destination ZONE_DEVICE struct page and the page_mapcount() is increased. This is so rmap_walk() can be used to unmap and migrate the page back to system memory. However, try_to_unmap_one() computes the subpage pointer from a swap pte which computes an invalid page pointer and a kernel panic results such as: BUG: unable to handle page fault for address: ffffea1fffffffc8 Currently, only single pages can be migrated to device private memory so no subpage computation is needed and it can be set to "page". [rcampbell@nvidia.com: add comment] Link: http://lkml.kernel.org/r/20190724232700.23327-4-rcampbell@nvidia.com Link: http://lkml.kernel.org/r/20190719192955.30462-4-rcampbell@nvidia.com Fixes: a5430dda8a3a1c ("mm/migrate: support un-addressable ZONE_DEVICE page in migration") Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-13mm/hmm: fix ZONE_DEVICE anon page mapping reuseRalph Campbell
When a ZONE_DEVICE private page is freed, the page->mapping field can be set. If this page is reused as an anonymous page, the previous value can prevent the page from being inserted into the CPU's anon rmap table. For example, when migrating a pte_none() page to device memory: migrate_vma(ops, vma, start, end, src, dst, private) migrate_vma_collect() src[] = MIGRATE_PFN_MIGRATE migrate_vma_prepare() /* no page to lock or isolate so OK */ migrate_vma_unmap() /* no page to unmap so OK */ ops->alloc_and_copy() /* driver allocates ZONE_DEVICE page for dst[] */ migrate_vma_pages() migrate_vma_insert_page() page_add_new_anon_rmap() __page_set_anon_rmap() /* This check sees the page's stale mapping field */ if (PageAnon(page)) return /* page->mapping is not updated */ The result is that the migration appears to succeed but a subsequent CPU fault will be unable to migrate the page back to system memory or worse. Clear the page->mapping field when freeing the ZONE_DEVICE page so stale pointer data doesn't affect future page use. Link: http://lkml.kernel.org/r/20190719192955.30462-3-rcampbell@nvidia.com Fixes: b7a523109fb5c9d2d6dd ("mm: don't clear ->mapping in hmm_devmem_free") Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Jan Kara <jack@suse.cz> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-13mm: document zone device struct page field usageRalph Campbell
Patch series "mm/hmm: fixes for device private page migration", v3. Testing the latest linux git tree turned up a few bugs with page migration to and from ZONE_DEVICE private and anonymous pages. Hopefully it clarifies how ZONE_DEVICE private struct page uses the same mapping and index fields from the source anonymous page mapping. This patch (of 3): Struct page for ZONE_DEVICE private pages uses the page->mapping and and page->index fields while the source anonymous pages are migrated to device private memory. This is so rmap_walk() can find the page when migrating the ZONE_DEVICE private page back to system memory. ZONE_DEVICE pmem backed fsdax pages also use the page->mapping and page->index fields when files are mapped into a process address space. Add comments to struct page and remove the unused "_zd_pad_1" field to make this more clear. Link: http://lkml.kernel.org/r/20190724232700.23327-2-rcampbell@nvidia.com Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-13riscv: fix flush_tlb_range() end address for flush_tlb_page()Paul Walmsley
The RISC-V kernel implementation of flush_tlb_page() when CONFIG_SMP is set is wrong. It passes zero to flush_tlb_range() as the final address to flush, but it should be at least 'addr'. Some other Linux architecture ports use the beginning address to flush, plus PAGE_SIZE, as the final address to flush. This might flush slightly more than what's needed, but it seems unlikely that being more clever would improve anything. So let's just take that implementation for now. While here, convert the macro into a static inline function, primarily to avoid unintentional multiple evaluations of 'addr'. This second version of the patch fixes a coding style issue found by Christoph Hellwig <hch@lst.de>. Reported-by: Andreas Schwab <schwab@suse.de> Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-08-13Merge tag 'tpmdd-next-20190813' of git://git.infradead.org/users/jjs/linux-tpmddLinus Torvalds
Pull tpm fixes from Jarkko Sakkinen: "One more bug fix for the next release" * tag 'tpmdd-next-20190813' of git://git.infradead.org/users/jjs/linux-tpmdd: KEYS: trusted: allow module init if TPM is inactive or deactivated
2019-08-13Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fix from James Bottomley: "Single lpfc fix, for a single-cpu corner case" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: lpfc: Fix crash when cpu count is 1 and null irq affinity mask
2019-08-13KEYS: trusted: allow module init if TPM is inactive or deactivatedRoberto Sassu
Commit c78719203fc6 ("KEYS: trusted: allow trusted.ko to initialize w/o a TPM") allows the trusted module to be loaded even if a TPM is not found, to avoid module dependency problems. However, trusted module initialization can still fail if the TPM is inactive or deactivated. tpm_get_random() returns an error. This patch removes the call to tpm_get_random() and instead extends the PCR specified by the user with zeros. The security of this alternative is equivalent to the previous one, as either option prevents with a PCR update unsealing and misuse of sealed data by a user space process. Even if a PCR is extended with zeros, instead of random data, it is still computationally infeasible to find a value as input for a new PCR extend operation, to obtain again the PCR value that would allow unsealing. Cc: stable@vger.kernel.org Fixes: 240730437deb ("KEYS: trusted: explicitly use tpm_chip structure...") Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Reviewed-by: Tyler Hicks <tyhicks@canonical.com> Suggested-by: Mimi Zohar <zohar@linux.ibm.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
2019-08-13RDMA/siw: Change CQ flags from 64->32 bitsBernard Metzler
This patch changes the driver/user shared (mmapped) CQ notification flags field from unsigned 64-bits size to unsigned 32-bits size. This enables building siw on 32-bit architectures. This patch changes the siw-abi, but as siw was only just merged in this merge window cycle, there are no released kernels with the prior abi. We are making no attempt to be binary compatible with siw user space libraries prior to the merge of siw into the upstream kernel, only moving forward with upstream kernels and upstream rdma-core provided siw libraries are we guaranteeing compatibility. Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20190809151816.13018-1-bmt@zurich.ibm.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-13ALSA: hda/realtek - Add quirk for HP Envy x360Takashi Iwai
HP Envy x360 (AMD Ryzen-based model) with 103c:8497 needs the same quirk like HP Spectre x360 for enabling the mute LED over Mic3 pin. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=204373 Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-08-13MAINTAINERS: iomap: Remove fs/iomap.c recordDenis Efremov
Update MAINTAINERS to reflect that fs/iomap.c file was splitted into separate files in fs/iomap/ Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: linux-fsdevel@vger.kernel.org Fixes: cb7181ff4b1c ("iomap: move the main iteration code into a separate file") Signed-off-by: Denis Efremov <efremov@linux.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-13Merge tag 'gvt-fixes-2019-08-13' of https://github.com/intel/gvt-linux into ↵Jani Nikula
drm-intel-fixes gvt-fixes-2019-08-13 - Fix one use-after-free error (Dan) Signed-off-by: Jani Nikula <jani.nikula@intel.com> From: Zhenyu Wang <zhenyuw@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190813095845.GF19140@zhen-hp.sh.intel.com
2019-08-13mtd: spi-nor: Fix the disabling of write protection at initTudor Ambarus
spi_nor_spansion_clear_sr_bp() depends on spansion_quad_enable(). While spansion_quad_enable() is selected as default when initializing the flash parameters, the nor->quad_enable() method can be overwritten later on when parsing BFPT. Select the write protection disable mechanism at spi_nor_init() time, when the nor->quad_enable() method is already known. Fixes: 191f5c2ed4b6faba ("mtd: spi-nor: use 16-bit WRR command when QE is set on spansion flashes") Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2019-08-13arm64: cpufeature: Don't treat granule sizes as strictWill Deacon
If a CPU doesn't support the page size for which the kernel is configured, then we will complain and refuse to bring it online. For secondary CPUs (and the boot CPU on a system booting with EFI), we will also print an error identifying the mismatch. Consequently, the only time that the cpufeature code can detect a granule size mismatch is for a granule other than the one that is currently being used. Although we would rather such systems didn't exist, we've unfortunately lost that battle and Kevin reports that on his amlogic S922X (odroid-n2 board) we end up warning and taining with defconfig because 16k pages are not supported by all of the CPUs. In such a situation, we don't actually care about the feature mismatch, particularly now that KVM only exposes the sanitised view of the CPU registers (commit 93390c0a1b20 - "arm64: KVM: Hide unsupported AArch64 CPU features from guests"). Treat the granule fields as non-strict and let Kevin run without a tainted kernel. Cc: Marc Zyngier <maz@kernel.org> Reported-by: Kevin Hilman <khilman@baylibre.com> Tested-by: Kevin Hilman <khilman@baylibre.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will@kernel.org> [catalin.marinas@arm.com: changelog updated with KVM sanitised regs commit] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-08-12drm/amd/display: use kvmalloc for dc_state (v2)Alex Deucher
It's large and doesn't need contiguous memory. Fixes allocation failures in some cases. v2: kvfree the memory. Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-12drm/amdgpu: fix gfx9 soft recoveryPierre-Eric Pelloux-Prayer
The SOC15_REG_OFFSET() macro wasn't used, making the soft recovery fail. v2: use WREG32_SOC15 instead of WREG32 + SOC15_REG_OFFSET Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2019-08-12dt-bindings: fec: explicitly mark deprecated propertiesSven Van Asbroeck
fec's gpio phy reset properties have been deprecated. Update the dt-bindings documentation to explicitly mark them as such, and provide a short description of the recommended alternative. Signed-off-by: Sven Van Asbroeck <TheSven73@gmail.com> Signed-off-by: Rob Herring <robh@kernel.org>
2019-08-12of: resolver: Add of_node_put() before return and breakNishka Dasgupta
Each iteration of for_each_child_of_node puts the previous node, but in the case of a return or break from the middle of the loop, there is no put, thus causing a memory leak. Hence add an of_node_put before the return or break in three places. Issue found with Coccinelle. Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com> Signed-off-by: Rob Herring <robh@kernel.org>
2019-08-12xtensa: add missing isync to the cpu_reset TLB codeMax Filippov
ITLB entry modifications must be followed by the isync instruction before the new entries are possibly used. cpu_reset lacks one isync between ITLB way 6 initialization and jump to the identity mapping. Add missing isync to xtensa cpu_reset. Cc: stable@vger.kernel.org Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2019-08-12Merge tag 'iio-for-5.4a' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-next Jonathan writes: First set of new device support, features and cleanups for IIO in the 5.4 cycle Note includes a merge from i3c tree to get support needed for stm_lsm6dsx driver support for l3c devices. Done from immutable branch. A counter subsystem patche in here as well. Alongside the new device support (which is always good), Chuhong's work on using devres managed APIs has cleaned up a number of drivers. New device support * adis16460 - New driver based on ADIS framework which needed addition of support for cs_change_delay. Includes device tree binding. * cros_ec - Support fo the veyron-minnie which uses an older interface. * lsm6dsx - Support for LSM6DSTR-C gyro + magnetometer sensor (new IDs mainly) - Support for ISM330DHCX acc + gyro sensor (extensive rework needed!) * Maxim 5432 - New driver support MAX5432-MAX5435 family of potentiometers. * noa1305 - New driver for this ON Semiconductor Ambient light sensor. Features and cleanups * tree wide - Drop error prints after platform_get_irq as already prints errors internally if any occur. * docs - Document mounting matrix. - Fix a missing newline at end of file. * ad2s1210 - Switch to device managed APIs for all of probe and drop explicit remove. * ad7192 - Add of_device_id array to explicity handling DT bindings. * ad7606 - Lots of rework leading to support for software configure modes in ad7616 parts. - Debugfs register access support. * am2315 - Switch to device managed APIs for all of probe and drop explicit remove. * apds9960 - Typo in module description. * cm36651 - Convert to i2c_new_dummy_device. - Swithc to device managed APIs for all of probe adn drop explicit remove. * cros_ec - Calibscale support for accel, gyro and magnetometer. - Tidy up some error codes to return the error from the stack rather than -EIO. - Determine protocol version. - Add a sign vector to the core to fix sensor rotation if necessary. Cannot just be done with mount matrix as already in use in many devices. - Tidy up INFO_SCALE being in both the separate and shared lists. - Drop a lot of dplicate code from the cros-ec-accel-legacy driver and use the core provided code instead. - Make frequency range available to userspace. * counter / ftm-quaddec - Switch to device managed APIs for all of probe adn drop explicit remove. * hdc100x - Switch to device managed APIs for all of probe and drop explicit remove. * hi8435 - Use gpiod_set_value_cansleep as we don't care here and there is a board out there where it needs to sleep. - Switch to device managed APIs for all of probe and drop explict remove. * hp03 - Convert to i2c_new_dummy_device. * maxim thermocouple - Switch to device managed APIs for all of probe and drop explicit remove. * mmc35240 - Fix typo in constant naming. * mpu6050 - Use devm_add_action_or_reset in place of explicit error handling. - Make text in Kconfig more explicit about which parts are supported. * mxc4005 - Switch to device managed APIs for all of probe and drop explicit remove. * pms7003 - Convert device tree bindings to yaml. - Add a MAINTAINERS entry * sc27xx - Introduce a local struct device *dev pointer to avoid lots of deref. - Use devm_add_action_or_reset in place of explicit error handling. * sca3000 - Typo fix in naming. * si1145 - Switch to device managed APIs for all of probe and drop explicit remove. * st_sensors - Lots of rework to enable switch to regmap. - Regmap conversion at the end. - Tidy up some inconsistencies in buffer setup ops. - Tidy up an oddity by dropping get_irq_data_ready function in favour of direct access. - Stop allocating buffer in buffer enable in favour of just embedding a large enough constant size buffer in the iio_priv accessed structure. * st_lsm6dsx - l3c device support (LSM6DSO and LSM6DSR) - tidy up irq return logic which was strangely written. - fix up an ABI quirk where this driver used separate scale attributes, even though they were always shared by type. * stk33xx - Device tree bindings include manufacturer ID. * stm32-adc - Add control for supply to analog switches including DT bindings. * stm32 timer - Drop the quadrature mode support. Believed there were no users so take this opportunity to drop this unwanted ABI. * tsl2772 - Switch to device mangage APIs for all of probe and drop explicit remove. - Use regulator_bulk_* APIs to reduce repitition. * veml6070 - Convert to i2c_new_dummy_device. * tag 'iio-for-5.4a' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (84 commits) iio: hi8435: Drop hi8435_remove() by using devres for remaining elements iio: hi8435: Use gpiod_set_value_cansleep() iio:st_sensors: remove buffer allocation at each buffer enable iio: imu: inv_mpu6050: be more explicit on supported chips iio: light: noa1305: Add support for NOA1305 dt-bindings: Add binding document for NOA1305 iio: remove get_irq_data_ready() function pointer and use IRQ number directly iio: imu: st_lsm6dsx: make IIO_CHAN_INFO_SCALE shared by type iio: tsl2772: Use regulator_bulk_() APIs iio: tsl2772: Use devm_iio_device_register iio: tsl2772: Use devm_add_action_or_reset for tsl2772_chip_off iio: tsl2772: Use devm_add_action_or_reset iio: Remove dev_err() usage after platform_get_irq() iio: light: si1145: Use device-managed APIs iio:pressure: preenable/postenable/predisable fixup for ST press buffer iio:magn: preenable/postenable/predisable fixup for ST magn buffer iio:gyro: preenable/postenable/predisable fixup for ST gyro buffer iio:accel: preenable/postenable/predisable fixup for ST accel buffer dt-bindings: iio: imu: st_lsm6dsx: add ism330dhcx device bindings iio: imu: st_lsm6dsx: add support to ISM330DHCX ...
2019-08-12Merge tag 'iio-fixes-for-5.3b' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus Jonathan writes: Second set of IIO fix for the 5.3 cycle. * adf4371 - Calculation of the value to program to control the output frequency was incorrect. * max9611 - Fix temperature reading in probe. A recent fix for a wrong mask meant this code was looked at afresh. A second bug became obvious in which the return value was used inplace of the desired register value. This had no visible effect other than a communication test not actually testing the communications. * tag 'iio-fixes-for-5.3b' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio: iio: adc: max9611: Fix temperature reading in probe iio: frequency: adf4371: Fix output frequency setting
2019-08-12USB: core: Fix races in character device registration and deregistraionAlan Stern
The syzbot fuzzer has found two (!) races in the USB character device registration and deregistration routines. This patch fixes the races. The first race results from the fact that usb_deregister_dev() sets usb_minors[intf->minor] to NULL before calling device_destroy() on the class device. This leaves a window during which another thread can allocate the same minor number but will encounter a duplicate name error when it tries to register its own class device. A typical error message in the system log would look like: sysfs: cannot create duplicate filename '/class/usbmisc/ldusb0' The patch fixes this race by destroying the class device first. The second race is in usb_register_dev(). When that routine runs, it first allocates a minor number, then drops minor_rwsem, and then creates the class device. If the device creation fails, the minor number is deallocated and the whole routine returns an error. But during the time while minor_rwsem was dropped, there is a window in which the minor number is allocated and so another thread can successfully open the device file. Typically this results in use-after-free errors or invalid accesses when the other thread closes its open file reference, because the kernel then tries to release resources that were already deallocated when usb_register_dev() failed. The patch fixes this race by keeping minor_rwsem locked throughout the entire routine. Reported-and-tested-by: syzbot+30cf45ebfe0b0c4847a1@syzkaller.appspotmail.com Signed-off-by: Alan Stern <stern@rowland.harvard.edu> CC: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/Pine.LNX.4.44L0.1908121607590.1659-100000@iolanthe.rowland.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-12x86/fpu/math-emu: Address fallthrough warningsThomas Gleixner
/home/tglx/work/kernel/linus/linux/arch/x86/math-emu/errors.c: In function ‘FPU_printall’: /home/tglx/work/kernel/linus/linux/arch/x86/math-emu/errors.c:187:9: warning: this statement may fall through [-Wimplicit-fallthrough=] tagi = FPU_Special(r); ~~~~~^~~~~~~~~~~~~~~~ /home/tglx/work/kernel/linus/linux/arch/x86/math-emu/errors.c:188:3: note: here case TAG_Valid: ^~~~ /home/tglx/work/kernel/linus/linux/arch/x86/math-emu/fpu_trig.c: In function ‘fyl2xp1’: /home/tglx/work/kernel/linus/linux/arch/x86/math-emu/fpu_trig.c:1353:7: warning: this statement may fall through [-Wimplicit-fallthrough=] if (denormal_operand() < 0) ^ /home/tglx/work/kernel/linus/linux/arch/x86/math-emu/fpu_trig.c:1356:3: note: here case TAG_Zero: Remove the pointless 'break;' after 'continue;' while at it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-08-12x86/apic/32: Fix yet another implicit fallthrough warningBorislav Petkov
Fix arch/x86/kernel/apic/probe_32.c: In function ‘default_setup_apic_routing’: arch/x86/kernel/apic/probe_32.c:146:7: warning: this statement may fall through [-Wimplicit-fallthrough=] if (!APIC_XAPIC(version)) { ^ arch/x86/kernel/apic/probe_32.c:151:3: note: here case X86_VENDOR_HYGON: ^~~~ for 32-bit builds. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190811154036.29805-1-bp@alien8.de
2019-08-12xfs: don't crash on null attr fork xfs_bmapi_readDarrick J. Wong
Zorro Lang reported a crash in generic/475 if we try to inactivate a corrupt inode with a NULL attr fork (stack trace shortened somewhat): RIP: 0010:xfs_bmapi_read+0x311/0xb00 [xfs] RSP: 0018:ffff888047f9ed68 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: ffff888047f9f038 RCX: 1ffffffff5f99f51 RDX: 0000000000000002 RSI: 0000000000000008 RDI: 0000000000000012 RBP: ffff888002a41f00 R08: ffffed10005483f0 R09: ffffed10005483ef R10: ffffed10005483ef R11: ffff888002a41f7f R12: 0000000000000004 R13: ffffe8fff53b5768 R14: 0000000000000005 R15: 0000000000000001 FS: 00007f11d44b5b80(0000) GS:ffff888114200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000ef6000 CR3: 000000002e176003 CR4: 00000000001606e0 Call Trace: xfs_dabuf_map.constprop.18+0x696/0xe50 [xfs] xfs_da_read_buf+0xf5/0x2c0 [xfs] xfs_da3_node_read+0x1d/0x230 [xfs] xfs_attr_inactive+0x3cc/0x5e0 [xfs] xfs_inactive+0x4c8/0x5b0 [xfs] xfs_fs_destroy_inode+0x31b/0x8e0 [xfs] destroy_inode+0xbc/0x190 xfs_bulkstat_one_int+0xa8c/0x1200 [xfs] xfs_bulkstat_one+0x16/0x20 [xfs] xfs_bulkstat+0x6fa/0xf20 [xfs] xfs_ioc_bulkstat+0x182/0x2b0 [xfs] xfs_file_ioctl+0xee0/0x12a0 [xfs] do_vfs_ioctl+0x193/0x1000 ksys_ioctl+0x60/0x90 __x64_sys_ioctl+0x6f/0xb0 do_syscall_64+0x9f/0x4d0 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f11d39a3e5b The "obvious" cause is that the attr ifork is null despite the inode claiming an attr fork having at least one extent, but it's not so obvious why we ended up with an inode in that state. Reported-by: Zorro Lang <zlang@redhat.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204031 Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com>
2019-08-12xfs: remove more ondisk directory corruption assertsDarrick J. Wong
Continue our game of replacing ASSERTs for corrupt ondisk metadata with EFSCORRUPTED returns. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com>
2019-08-12RDMA/core: Fix error code in stat_get_doit_qp()Dan Carpenter
We need to set the error codes on these paths. Currently the only possible error code is -EMSGSIZE so that's what the patch uses. Fixes: 83c2c1fcbd08 ("RDMA/nldev: Allow get counter mode through RDMA netlink") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190809101311.GA17867@mwanda Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-12RDMA/siw: Fix a memory leak in siw_init_cpulist()Dan Carpenter
The error handling code doesn't free siw_cpu_info.tx_valid_cpus[0]. The first iteration through the loop is a no-op so this is sort of an off by one bug. Also Bernard pointed out that we can remove the NULL assignment and simplify the code a bit. Fixes: bdcf26bf9b3a ("rdma/siw: network and RDMA core interface") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com> Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20190809140904.GB3552@mwanda Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-12Merge tag 'misc-habanalabs-fixes-2019-08-12' of ↵Greg Kroah-Hartman
git://people.freedesktop.org/~gabbayo/linux into char-misc-next Oded writes: This tag contains a couple of important fixes: - Four fixes when running on s390 architecture (BE). With these fixes, the driver is fully functional on Big-endian architectures. The fixes include: - Validation/Patching of user packets - Completion queue handling - Internal H/W queues submission - Device IRQ unmasking operation - Fix to double free in an error path to avoid kernel corruption - Fix to DRAM usage accounting when a user process is terminated forcefully. * tag 'misc-habanalabs-fixes-2019-08-12' of git://people.freedesktop.org/~gabbayo/linux: habanalabs: fix device IRQ unmasking for BE host habanalabs: fix endianness handling for internal QMAN submission habanalabs: fix completion queue handling when host is BE habanalabs: fix endianness handling for packets from user habanalabs: fix DRAM usage accounting on context tear down habanalabs: Avoid double free in error flow
2019-08-12IB/mlx5: Fix use-after-free error while accessing ev_file pointerYishai Hadas
Call to uverbs_close_fd() releases file pointer to 'ev_file' and mlx5_ib_dev is going to be inaccessible. Cache pointer prior cleaning resources to solve the KASAN warning below. BUG: KASAN: use-after-free in devx_async_event_close+0x391/0x480 [mlx5_ib] Read of size 8 at addr ffff888301e3cec0 by task devx_direct_tes/4631 CPU: 1 PID: 4631 Comm: devx_direct_tes Tainted: G OE 5.3.0-rc1-for-upstream-dbg-2019-07-26_01-19-56-93 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014 Call Trace: dump_stack+0x9a/0xeb print_address_description+0x1e2/0x400 ? devx_async_event_close+0x391/0x480 [mlx5_ib] __kasan_report+0x15c/0x1df ? devx_async_event_close+0x391/0x480 [mlx5_ib] kasan_report+0xe/0x20 devx_async_event_close+0x391/0x480 [mlx5_ib] __fput+0x26a/0x7b0 task_work_run+0x10d/0x180 exit_to_usermode_loop+0x137/0x160 do_syscall_64+0x3c7/0x490 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f5df907d664 Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 80 00 00 00 00 8b 05 6a cd 20 00 48 63 ff 85 c0 75 13 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 44 f3 c3 66 90 48 83 ec 18 48 89 7c 24 08 e8 RSP: 002b:00007ffd353cb958 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 000056017a88c348 RCX: 00007f5df907d664 RDX: 00007f5df969d400 RSI: 00007f5de8f1ec90 RDI: 0000000000000006 RBP: 00007f5df9681dc0 R08: 00007f5de8736410 R09: 000056017a9d2dd0 R10: 000000000000000b R11: 0000000000000246 R12: 00007f5de899d7d0 R13: 00007f5df96c4248 R14: 00007f5de8f1ecb0 R15: 000056017ae41308 Allocated by task 4631: save_stack+0x19/0x80 kasan_kmalloc.constprop.3+0xa0/0xd0 alloc_uobj+0x71/0x230 [ib_uverbs] alloc_begin_fd_uobject+0x2e/0xc0 [ib_uverbs] rdma_alloc_begin_uobject+0x96/0x140 [ib_uverbs] ib_uverbs_run_method+0xdf0/0x1940 [ib_uverbs] ib_uverbs_cmd_verbs+0x57e/0xdb0 [ib_uverbs] ib_uverbs_ioctl+0x177/0x260 [ib_uverbs] do_vfs_ioctl+0x18f/0x1010 ksys_ioctl+0x70/0x80 __x64_sys_ioctl+0x6f/0xb0 do_syscall_64+0x95/0x490 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 4631: save_stack+0x19/0x80 __kasan_slab_free+0x11d/0x160 slab_free_freelist_hook+0x67/0x1a0 kfree+0xb9/0x2a0 uverbs_close_fd+0x118/0x1c0 [ib_uverbs] devx_async_event_close+0x28a/0x480 [mlx5_ib] __fput+0x26a/0x7b0 task_work_run+0x10d/0x180 exit_to_usermode_loop+0x137/0x160 do_syscall_64+0x3c7/0x490 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff888301e3cda8 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 280 bytes inside of 512-byte region [ffff888301e3cda8, ffff888301e3cfa8) The buggy address belongs to the page: page:ffffea000c078e00 refcount:1 mapcount:0 mapping:ffff888352811300 index:0x0 compound_mapcount: 0 flags: 0x2fffff80010200(slab|head) raw: 002fffff80010200 ffffea000d152608 ffffea000c077808 ffff888352811300 raw: 0000000000000000 0000000000250025 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888301e3cd80: fc fc fc fc fc fb fb fb fb fb fb fb fb fb fb fb ffff888301e3ce00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888301e3ce80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888301e3cf00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888301e3cf80: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc Disabling lock debugging due to kernel taint Cc: <stable@vger.kernel.org> # 5.2 Fixes: 759738537142 ("IB/mlx5: Enable subscription for device events over DEVX") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Link: https://lore.kernel.org/r/20190808081538.28772-1-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-12staging: comedi: dt3000: Fix rounding up of timer divisorIan Abbott
`dt3k_ns_to_timer()` determines the prescaler and divisor to use to produce a desired timing period. It is influenced by a rounding mode and can round the divisor up, down, or to the nearest value. However, the code for rounding up currently does the same as rounding down! Fix ir by using the `DIV_ROUND_UP()` macro to calculate the divisor when rounding up. Also, change the types of the `divider`, `base` and `prescale` variables from `int` to `unsigned int` to avoid mixing signed and unsigned types in the calculations. Also fix a typo in a nearby comment: "improvment" => "improvement". Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20190812120814.21188-1-abbotti@mev.co.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-12staging: comedi: dt3000: Fix signed integer overflow 'divider * base'Ian Abbott
In `dt3k_ns_to_timer()` the following lines near the end of the function result in a signed integer overflow: prescale = 15; base = timer_base * (1 << prescale); divider = 65535; *nanosec = divider * base; (`divider`, `base` and `prescale` are type `int`, `timer_base` and `*nanosec` are type `unsigned int`. The value of `timer_base` will be either 50 or 100.) The main reason for the overflow is that the calculation for `base` is completely wrong. It should be: base = timer_base * (prescale + 1); which matches an earlier instance of this calculation in the same function. Reported-by: David Binderman <dcb314@hotmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Link: https://lore.kernel.org/r/20190812111517.26803-1-abbotti@mev.co.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-12staging: rtl8192u: fix spacing errorsStephen Brennan
Used checkpatch's --fix-inplace option for types SPACING, OPEN_BRACE, ELSE_AFTER_BRACE. Manually edited the resulting changes to correct for mistaken fixes and complete fixes that were only partially applied. Signed-off-by: Stephen Brennan <stephen@brennan.io> Link: https://lore.kernel.org/r/20190811225120.7308-1-stephen@brennan.io Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-12staging: rtl8712: r8712_construct_txaggr_cmd_hdr(): Change return typeNishka Dasgupta
Change return type of r8712_construct_txaggr_cmd_hdr from u8 to void as it always returns _SUCCESS and its return value is never used. Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com> Link: https://lore.kernel.org/r/20190809052353.5308-8-nishkadg.linux@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-12staging: rtl8712: r8712_construct_txaggr_cmd_desc(): Change return typeNishka Dasgupta
Change return type of r8712_construct_txaggr_cmd_desc from u8 to void (and remove its return statement) as it always returns _SUCCESS and its return value is never stored, checked or otherwise used. Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com> Link: https://lore.kernel.org/r/20190809052353.5308-7-nishkadg.linux@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-12staging: rtl8712: r8712_xmit_direct(): Change return typeNishka Dasgupta
Change return type of r8712_xmit_direct from int to void as its return value is never used. Remove return statement accordingly. Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com> Link: https://lore.kernel.org/r/20190809052353.5308-6-nishkadg.linux@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-12staging: rtl8712: r8712_free_xmitbuf(): Change return typeNishka Dasgupta
Change return type of r8712_free_xmitbuf from int to void (and remove its return values) as its return value is never used. Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com> Link: https://lore.kernel.org/r/20190809052353.5308-5-nishkadg.linux@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-12staging: rtl8712: make_wlanhdr(): Change return values and typeNishka Dasgupta
Change return values of make_wlanhdr from _SUCCESS/_FAIL to 0/-EINVAL. Modify call site to check for non-zero return values instead of _FAIL. Change return type from sint to int. Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com> Link: https://lore.kernel.org/r/20190809052353.5308-4-nishkadg.linux@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-12staging: rtl8712: r8712_update_attrib(): Change return values and typeNishka Dasgupta
Change return values of r8712_update_attrib from _SUCCESS and _FAIL to 0 and -ENOMEM or -EINVAL respectively. Modify call site to check for the new failure conditions. Also modify the return type from sint to int. Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com> Link: https://lore.kernel.org/r/20190809052353.5308-2-nishkadg.linux@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-12staging: rtl8712: _r8712_init_xmit_priv(): Change return values and typeNishka Dasgupta
Change the return values in _r8712_init_xmit_priv from _SUCCESS/_FAIL to 0/-ENOMEM respectively. Change return type from sint to int. Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com> Link: https://lore.kernel.org/r/20190809052353.5308-1-nishkadg.linux@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-12staging: wilc1000: return kernel error codes from wilc_wlan_stopAdham Abozaeid
return -EIO for bus failures, 0 otherwise. Signed-off-by: Adham Abozaeid <adham.abozaeid@microchip.com> Link: https://lore.kernel.org/r/20190809182510.22443-3-adham.abozaeid@microchip.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-12staging: wilc1000: Don't reset WILC CPU disgracefullyAdham Abozaeid
Send abort request to WILC from wilc_wlan_stop instead of resetting the CPU. The abort request was being sent from wilc_wlan_cleanup after the CPU was reset which wasn't the correct order. The abort request handler in the chip will take care of resetting the CPU. Signed-off-by: Adham Abozaeid <adham.abozaeid@microchip.com> Link: https://lore.kernel.org/r/20190809182510.22443-2-adham.abozaeid@microchip.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-12xen/blkback: fix memory leaksWenwen Wang
In read_per_ring_refs(), after 'req' and related memory regions are allocated, xen_blkif_map() is invoked to map the shared frame, irq, and etc. However, if this mapping process fails, no cleanup is performed, leading to memory leaks. To fix this issue, invoke the cleanup before returning the error. Acked-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Signed-off-by: Jens Axboe <axboe@kernel.dk>