summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-01-10mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}Dan Williams
Both arch_add_memory() and arch_remove_memory() expect a single threaded context. For example, arch/x86/mm/init_64.c::kernel_physical_mapping_init() does not hold any locks over this check and branch: if (pgd_val(*pgd)) { pud = (pud_t *)pgd_page_vaddr(*pgd); paddr_last = phys_pud_init(pud, __pa(vaddr), __pa(vaddr_end), page_size_mask); continue; } pud = alloc_low_page(); paddr_last = phys_pud_init(pud, __pa(vaddr), __pa(vaddr_end), page_size_mask); The result is that two threads calling devm_memremap_pages() simultaneously can end up colliding on pgd initialization. This leads to crash signatures like the following where the loser of the race initializes the wrong pgd entry: BUG: unable to handle kernel paging request at ffff888ebfff0000 IP: memcpy_erms+0x6/0x10 PGD 2f8e8fc067 PUD 0 /* <---- Invalid PUD */ Oops: 0000 [#1] SMP DEBUG_PAGEALLOC CPU: 54 PID: 3818 Comm: systemd-udevd Not tainted 4.6.7+ #13 task: ffff882fac290040 ti: ffff882f887a4000 task.ti: ffff882f887a4000 RIP: memcpy_erms+0x6/0x10 [..] Call Trace: ? pmem_do_bvec+0x205/0x370 [nd_pmem] ? blk_queue_enter+0x3a/0x280 pmem_rw_page+0x38/0x80 [nd_pmem] bdev_read_page+0x84/0xb0 Hold the standard memory hotplug mutex over calls to arch_{add,remove}_memory(). Fixes: 41e94a851304 ("add devm_memremap_pages") Link: http://lkml.kernel.org/r/148357647831.9498.12606007370121652979.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10ocfs2: fix crash caused by stale lvb with fsdlm pluginEric Ren
The crash happens rather often when we reset some cluster nodes while nodes contend fiercely to do truncate and append. The crash backtrace is below: dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover_grant 1 locks on 971 resources dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover 9 generation 5 done: 4 ms ocfs2: Begin replay journal (node 318952601, slot 2) on device (253,18) ocfs2: End replay journal (node 318952601, slot 2) on device (253,18) ocfs2: Beginning quota recovery on device (253,18) for slot 2 ocfs2: Finishing quota recovery on device (253,18) for slot 2 (truncate,30154,1):ocfs2_truncate_file:470 ERROR: bug expression: le64_to_cpu(fe->i_size) != i_size_read(inode) (truncate,30154,1):ocfs2_truncate_file:470 ERROR: Inode 290321, inode i_size = 732 != di i_size = 937, i_flags = 0x1 ------------[ cut here ]------------ kernel BUG at /usr/src/linux/fs/ocfs2/file.c:470! invalid opcode: 0000 [#1] SMP Modules linked in: ocfs2_stack_user(OEN) ocfs2(OEN) ocfs2_nodemanager ocfs2_stackglue(OEN) quota_tree dlm(OEN) configfs fuse sd_mod iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi af_packet iscsi_ibft iscsi_boot_sysfs softdog xfs libcrc32c ppdev parport_pc pcspkr parport joydev virtio_balloon virtio_net i2c_piix4 acpi_cpufreq button processor ext4 crc16 jbd2 mbcache ata_generic cirrus virtio_blk ata_piix drm_kms_helper ahci syscopyarea libahci sysfillrect sysimgblt fb_sys_fops ttm floppy libata drm virtio_pci virtio_ring uhci_hcd virtio ehci_hcd usbcore serio_raw usb_common sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4 Supported: No, Unsupported modules are loaded CPU: 1 PID: 30154 Comm: truncate Tainted: G OE N 4.4.21-69-default #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20151112_172657-sheep25 04/01/2014 task: ffff88004ff6d240 ti: ffff880074e68000 task.ti: ffff880074e68000 RIP: 0010:[<ffffffffa05c8c30>] [<ffffffffa05c8c30>] ocfs2_truncate_file+0x640/0x6c0 [ocfs2] RSP: 0018:ffff880074e6bd50 EFLAGS: 00010282 RAX: 0000000000000074 RBX: 000000000000029e RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000246 RBP: ffff880074e6bda8 R08: 000000003675dc7a R09: ffffffff82013414 R10: 0000000000034c50 R11: 0000000000000000 R12: ffff88003aab3448 R13: 00000000000002dc R14: 0000000000046e11 R15: 0000000000000020 FS: 00007f839f965700(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f839f97e000 CR3: 0000000036723000 CR4: 00000000000006e0 Call Trace: ocfs2_setattr+0x698/0xa90 [ocfs2] notify_change+0x1ae/0x380 do_truncate+0x5e/0x90 do_sys_ftruncate.constprop.11+0x108/0x160 entry_SYSCALL_64_fastpath+0x12/0x6d Code: 24 28 ba d6 01 00 00 48 c7 c6 30 43 62 a0 8b 41 2c 89 44 24 08 48 8b 41 20 48 c7 c1 78 a3 62 a0 48 89 04 24 31 c0 e8 a0 97 f9 ff <0f> 0b 3d 00 fe ff ff 0f 84 ab fd ff ff 83 f8 fc 0f 84 a2 fd ff RIP [<ffffffffa05c8c30>] ocfs2_truncate_file+0x640/0x6c0 [ocfs2] It's because ocfs2_inode_lock() get us stale LVB in which the i_size is not equal to the disk i_size. We mistakenly trust the LVB because the underlaying fsdlm dlm_lock() doesn't set lkb_sbflags with DLM_SBF_VALNOTVALID properly for us. But, why? The current code tries to downconvert lock without DLM_LKF_VALBLK flag to tell o2cb don't update RSB's LVB if it's a PR->NULL conversion, even if the lock resource type needs LVB. This is not the right way for fsdlm. The fsdlm plugin behaves different on DLM_LKF_VALBLK, it depends on DLM_LKF_VALBLK to decide if we care about the LVB in the LKB. If DLM_LKF_VALBLK is not set, fsdlm will skip recovering RSB's LVB from this lkb and set the right DLM_SBF_VALNOTVALID appropriately when node failure happens. The following diagram briefly illustrates how this crash happens: RSB1 is inode metadata lock resource with LOCK_TYPE_USES_LVB; The 1st round: Node1 Node2 RSB1: PR RSB1(master): NULL->EX ocfs2_downconvert_lock(PR->NULL, set_lvb==0) ocfs2_dlm_lock(no DLM_LKF_VALBLK) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - dlm_lock(no DLM_LKF_VALBLK) convert_lock(overwrite lkb->lkb_exflags with no DLM_LKF_VALBLK) RSB1: NULL RSB1: EX reset Node2 dlm_recover_rsbs() recover_lvb() /* The LVB is not trustable if the node with EX fails and * no lock >= PR is left. We should set RSB_VALNOTVALID for RSB1. */ if(!(kb_exflags & DLM_LKF_VALBLK)) /* This means we miss the chance to return; * to invalid the LVB here. */ The 2nd round: Node 1 Node2 RSB1(become master from recovery) ocfs2_setattr() ocfs2_inode_lock(NULL->EX) /* dlm_lock() return the stale lvb without setting DLM_SBF_VALNOTVALID */ ocfs2_meta_lvb_is_trustable() return 1 /* so we don't refresh inode from disk */ ocfs2_truncate_file() mlog_bug_on_msg(disk isize != i_size_read(inode)) /* crash! */ The fix is quite straightforward. We keep to set DLM_LKF_VALBLK flag for dlm_lock() if the lock resource type needs LVB and the fsdlm plugin is uesed. Link: http://lkml.kernel.org/r/1481275846-6604-1-git-send-email-zren@suse.com Signed-off-by: Eric Ren <zren@suse.com> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10bpf: do not use KMALLOC_SHIFT_MAXMichal Hocko
Commit 01b3f52157ff ("bpf: fix allocation warnings in bpf maps and integer overflow") has added checks for the maximum allocateable size. It (ab)used KMALLOC_SHIFT_MAX for that purpose. While this is not incorrect it is not very clean because we already have KMALLOC_MAX_SIZE for this very reason so let's change both checks to use KMALLOC_MAX_SIZE instead. The original motivation for using KMALLOC_SHIFT_MAX was to work around an incorrect KMALLOC_MAX_SIZE which could lead to allocation warnings but it is no longer needed since "slab: make sure that KMALLOC_MAX_SIZE will fit into MAX_ORDER". Link: http://lkml.kernel.org/r/20161220130659.16461-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10mm, slab: make sure that KMALLOC_MAX_SIZE will fit into MAX_ORDERMichal Hocko
Andrey Konovalov has reported the following warning triggered by the syzkaller fuzzer. WARNING: CPU: 1 PID: 9935 at mm/page_alloc.c:3511 __alloc_pages_nodemask+0x159c/0x1e20 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 9935 Comm: syz-executor0 Not tainted 4.9.0-rc7+ #34 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __alloc_pages_slowpath mm/page_alloc.c:3511 __alloc_pages_nodemask+0x159c/0x1e20 mm/page_alloc.c:3781 alloc_pages_current+0x1c7/0x6b0 mm/mempolicy.c:2072 alloc_pages include/linux/gfp.h:469 kmalloc_order+0x1f/0x70 mm/slab_common.c:1015 kmalloc_order_trace+0x1f/0x160 mm/slab_common.c:1026 kmalloc_large include/linux/slab.h:422 __kmalloc+0x210/0x2d0 mm/slub.c:3723 kmalloc include/linux/slab.h:495 ep_write_iter+0x167/0xb50 drivers/usb/gadget/legacy/inode.c:664 new_sync_write fs/read_write.c:499 __vfs_write+0x483/0x760 fs/read_write.c:512 vfs_write+0x170/0x4e0 fs/read_write.c:560 SYSC_write fs/read_write.c:607 SyS_write+0xfb/0x230 fs/read_write.c:599 entry_SYSCALL_64_fastpath+0x1f/0xc2 The issue is caused by a lack of size check for the request size in ep_write_iter which should be fixed. It, however, points to another problem, that SLUB defines KMALLOC_MAX_SIZE too large because the its KMALLOC_SHIFT_MAX is (MAX_ORDER + PAGE_SHIFT) which means that the resulting page allocator request might be MAX_ORDER which is too large (see __alloc_pages_slowpath). The same applies to the SLOB allocator which allows even larger sizes. Make sure that they are capped properly and never request more than MAX_ORDER order. Link: http://lkml.kernel.org/r/20161220130659.16461-2-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10dax: wrprotect pmd_t in dax_mapping_entry_mkcleanRoss Zwisler
Currently dax_mapping_entry_mkclean() fails to clean and write protect the pmd_t of a DAX PMD entry during an *sync operation. This can result in data loss in the following sequence: 1) mmap write to DAX PMD, dirtying PMD radix tree entry and making the pmd_t dirty and writeable 2) fsync, flushing out PMD data and cleaning the radix tree entry. We currently fail to mark the pmd_t as clean and write protected. 3) more mmap writes to the PMD. These don't cause any page faults since the pmd_t is dirty and writeable. The radix tree entry remains clean. 4) fsync, which fails to flush the dirty PMD data because the radix tree entry was clean. 5) crash - dirty data that should have been fsync'd as part of 4) could still have been in the processor cache, and is lost. Fix this by marking the pmd_t clean and write protected in dax_mapping_entry_mkclean(), which is called as part of the fsync operation 2). This will cause the writes in step 3) above to generate page faults where we'll re-dirty the PMD radix tree entry, resulting in flushes in the fsync that happens in step 4). Fixes: 4b4bb46d00b3 ("dax: clear dirty entry tags on cache flush") Link: http://lkml.kernel.org/r/1482272586-21177-3-git-send-email-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10mm: add follow_pte_pmd()Ross Zwisler
Patch series "Write protect DAX PMDs in *sync path". Currently dax_mapping_entry_mkclean() fails to clean and write protect the pmd_t of a DAX PMD entry during an *sync operation. This can result in data loss, as detailed in patch 2. This series is based on Dan's "libnvdimm-pending" branch, which is the current home for Jan's "dax: Page invalidation fixes" series. You can find a working tree here: https://git.kernel.org/cgit/linux/kernel/git/zwisler/linux.git/log/?h=dax_pmd_clean This patch (of 2): Similar to follow_pte(), follow_pte_pmd() allows either a PTE leaf or a huge page PMD leaf to be found and returned. Link: http://lkml.kernel.org/r/1482272586-21177-2-git-send-email-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Suggested-by: Dave Hansen <dave.hansen@intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10mm/thp/pagecache/collapse: free the pte page table on collapse for thp page ↵Aneesh Kumar K.V
cache. With THP page cache, when trying to build a huge page from regular pte pages, we just clear the pmd entry. We will take another fault and at that point we will find the huge page in the radix tree, thereby using the huge page to complete the page fault The second fault path will allocate the needed pgtable_t page for archs like ppc64. So no need to deposit the same in collapse path. Depositing them in the collapse path resulting in a pgtable_t memory leak also giving errors like BUG: non-zero nr_ptes on freeing mm: 3 Fixes: 953c66c2b22a ("mm: THP page cache support for ppc64") Link: http://lkml.kernel.org/r/20161212163428.6780-2-aneesh.kumar@linux.vnet.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10dax: fix deadlock with DAX 4k holesRoss Zwisler
Currently in DAX if we have three read faults on the same hole address we can end up with the following: Thread 0 Thread 1 Thread 2 -------- -------- -------- dax_iomap_fault grab_mapping_entry lock_slot <locks empty DAX entry> dax_iomap_fault grab_mapping_entry get_unlocked_mapping_entry <sleeps on empty DAX entry> dax_iomap_fault grab_mapping_entry get_unlocked_mapping_entry <sleeps on empty DAX entry> dax_load_hole find_or_create_page ... page_cache_tree_insert dax_wake_mapping_entry_waiter <wakes one sleeper> __radix_tree_replace <swaps empty DAX entry with 4k zero page> <wakes> get_page lock_page ... put_locked_mapping_entry unlock_page put_page <sleeps forever on the DAX wait queue> The crux of the problem is that once we insert a 4k zero page, all locking from then on is done in terms of that 4k zero page and any additional threads sleeping on the empty DAX entry will never be woken. Fix this by waking all sleepers when we replace the DAX radix tree entry with a 4k zero page. This will allow all sleeping threads to successfully transition from locking based on the DAX empty entry to locking on the 4k zero page. With the test case reported by Xiong this happens very regularly in my test setup, with some runs resulting in 9+ threads in this deadlocked state. With this fix I've been able to run that same test dozens of times in a loop without issue. Fixes: ac401cc78242 ("dax: New fault locking") Link: http://lkml.kernel.org/r/1483479365-13607-1-git-send-email-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reported-by: Xiong Zhou <xzhou@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: <stable@vger.kernel.org> [4.7+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10MAINTAINERS: remove duplicate bug filling descriptionVlastimil Babka
I have noticed that two different descriptions for B: entries in MAINTAINERS were merged: commit 686564434e88 ("MAINTAINERS: Add bug tracking system location entry type") and 2de2bd95f456 ("MAINTAINERS: add "B:" for URI where to file bugs"). This patch keeps the description from 2de2bd95f456. There has been a discussion [1] about whether this more detailed description is useful and what it exactly implies. I find it more useful and general, and the author of 686564434e88 agreed in the end that either is fine. [1] https://lkml.org/lkml/2016/12/8/71 Link: http://lkml.kernel.org/r/20161219085158.12114-1-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10gro: Disable frag0 optimization on IPv6 ext headersHerbert Xu
The GRO fast path caches the frag0 address. This address becomes invalid if frag0 is modified by pskb_may_pull or its variants. So whenever that happens we must disable the frag0 optimization. This is usually done through the combination of gro_header_hard and gro_header_slow, however, the IPv6 extension header path did the pulling directly and would continue to use the GRO fast path incorrectly. This patch fixes it by disabling the fast path when we enter the IPv6 extension header path. Fixes: 78a478d0efd9 ("gro: Inline skb_gro_header and cache frag0 virtual address") Reported-by: Slava Shwartsman <slavash@mellanox.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10gro: Enter slow-path if there is no tailroomHerbert Xu
The GRO path has a fast-path where we avoid calling pskb_may_pull and pskb_expand by directly accessing frag0. However, this should only be done if we have enough tailroom in the skb as otherwise we'll have to expand it later anyway. This patch adds the check by capping frag0_len with the skb tailroom. Fixes: cb18978cbf45 ("gro: Open-code final pskb_may_pull") Reported-by: Slava Shwartsman <slavash@mellanox.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10mlx4: Return EOPNOTSUPP instead of ENOTSUPPMartin KaFai Lau
In commit b45f0674b997 ("mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrs"), it changed EOPNOTSUPP to ENOTSUPP by mistake. This patch fixes it. Fixes: b45f0674b997 ("mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrs") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10net/af_iucv: don't use paged skbs for TX on HiperSocketsJulian Wiedmann
With commit e53743994e21 ("af_iucv: use paged SKBs for big outbound messages"), we transmit paged skbs for both of AF_IUCV's transport modes (IUCV or HiperSockets). The qeth driver for Layer 3 HiperSockets currently doesn't support NETIF_F_SG, so these skbs would just be linearized again by the stack. Avoid that overhead by using paged skbs only for IUCV transport. cc stable, since this also circumvents a significant skb leak when sending large messages (where the skb then needs to be linearized). Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> # v4.8+ Fixes: e53743994e21 ("af_iucv: use paged SKBs for big outbound messages") Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10net: add the AF_QIPCRTR entries to family name tablesAnna, Suman
Commit bdabad3e363d ("net: Add Qualcomm IPC router") introduced a new address family. Update the family name tables accordingly so that the lockdep initialization can use the proper names for this family. Cc: Courtney Cavin <courtney.cavin@sonymobile.com> Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Suman Anna <s-anna@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10net: qrtr: Mark 'buf' as little endianStephen Boyd
Failure to mark this pointer as __le32 causes checkers like sparse to complain: net/qrtr/qrtr.c:274:16: warning: incorrect type in assignment (different base types) net/qrtr/qrtr.c:274:16: expected unsigned int [unsigned] [usertype] <noident> net/qrtr/qrtr.c:274:16: got restricted __le32 [usertype] <noident> net/qrtr/qrtr.c:275:16: warning: incorrect type in assignment (different base types) net/qrtr/qrtr.c:275:16: expected unsigned int [unsigned] [usertype] <noident> net/qrtr/qrtr.c:275:16: got restricted __le32 [usertype] <noident> net/qrtr/qrtr.c:276:16: warning: incorrect type in assignment (different base types) net/qrtr/qrtr.c:276:16: expected unsigned int [unsigned] [usertype] <noident> net/qrtr/qrtr.c:276:16: got restricted __le32 [usertype] <noident> Silence it. Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10net: dsa: Ensure validity of dst->ds[0]Florian Fainelli
It is perfectly possible to have non zero indexed switches being present in a DSA switch tree, in such a case, we will be deferencing a NULL pointer while dsa_cpu_port_ethtool_{setup,restore}. Be more defensive and ensure that dst->ds[0] is valid before doing anything with it. Fixes: 0c73c523cf73 ("net: dsa: Initialize CPU port ethtool ops per tree") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10Merge tag 'linux-kselftest-4.10-rc4-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fixes from Shuah Khan: "This update consists of fixes to use shell instead of bash to run tests in embedded devices where the only shell available is the busybox ash. Also included is a typo fix to a test result message" * tag 'linux-kselftest-4.10-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests: x86/pkeys: fix spelling mistake: "itertation" -> "iteration" selftests: do not require bash to run netsocktests testcase selftests: do not require bash to run bpf tests selftests: do not require bash for the generated test
2017-01-10apm-emulation: move APM_MINOR_DEV to include/linux/miscdevice.hCorentin Labbe
This patch move the define for APM_MINOR_DEV to include/linux/miscdevice.h It is better that all minor number definitions are in the same place. Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-10ds1302: remove unneeded linux/miscdevice.h includeCorentin Labbe
drivers/char/ds1302.c does not use any miscdevice so the inclusion of linux/miscdevice.h is unnecessary. This patch remove this inclusion. Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-10mmtimer: add member name to the miscdevice declarationCorentin Labbe
Since the struct miscdevice have many members, it is dangerous to init it without members name relying only on member order. This patch add member name to the init declaration. Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-10misc: mic: double free on ioctl error pathDan Carpenter
This function only has one caller. Freeing "vdev" here leads to a use after free bug. There are several other error paths in this function but this is the only one which frees "vdev". It looks like the kfree() can be safely removed. Fixes: 61e9c905df78 ("misc: mic: Enable VOP host side functionality") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-10misc: set error code when devm_kstrdup failsPan Bian
In function sram_reserve_regions(), the value of return variable ret should be negative on failures. However, the value of ret may be 0 even if the call to devm_kstrdup() returns a NULL pointer. This patch explicitly assigns "-ENOMEM" to ret on the path that devm_kstrdup() fails. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188651 Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-10Drivers: hv: util: Backup: Fix a rescind processing issueK. Y. Srinivasan
VSS may use a char device to support the communication between the user level daemon and the driver. When the VSS channel is rescinded we need to make sure that the char device is fully cleaned up before we can process a new VSS offer from the host. Implement this logic. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-10Drivers: hv: util: Fcopy: Fix a rescind processing issueK. Y. Srinivasan
Fcopy may use a char device to support the communication between the user level daemon and the driver. When the Fcopy channel is rescinded we need to make sure that the char device is fully cleaned up before we can process a new Fcopy offer from the host. Implement this logic. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-10Drivers: hv: util: kvp: Fix a rescind processing issueK. Y. Srinivasan
KVP may use a char device to support the communication between the user level daemon and the driver. When the KVP channel is rescinded we need to make sure that the char device is fully cleaned up before we can process a new KVP offer from the host. Implement this logic. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-10Drivers: hv: vmbus: Fix a rescind handling bugK. Y. Srinivasan
The host can rescind a channel that has been offered to the guest and once the channel is rescinded, the host does not respond to any requests on that channel. Deal with the case where the guest may be blocked waiting for a response from the host. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-10hv: make CPU offlining prevention fine-grainedVitaly Kuznetsov
Since commit e513229b4c38 ("Drivers: hv: vmbus: prevent cpu offlining on newer hypervisors") cpu offlining was disabled. It is still true that we can't offline CPUs which have VMBus channels bound to them but we may have 'free' CPUs (e.v. we booted with maxcpus= parameter and onlined CPUs after VMBus was initialized), these CPUs may be disabled without issues. In future, we may even allow closing CPUs which have only sub-channels assinged to them by closing these sub-channels. All devices will continue to work. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-10hv: switch to cpuhp state machine for synic init/cleanupVitaly Kuznetsov
To make it possible to online/offline CPUs switch to cpuhp infrastructure for doing hv_synic_init()/hv_synic_cleanup(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-10Drivers: hv: vmbus: Prevent sending data on a rescinded channelK. Y. Srinivasan
After the channel is rescinded, the host does not read from the rescinded channel. Fail writes to a channel that has already been rescinded. If we permit writes on a rescinded channel, since the host will not respond we will have situations where we will be unable to unload vmbus drivers that cannot have any outstanding requests to the host at the point they are unoaded. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Cc: <Stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-10hv: don't reset hv_context.tsc_page on crashVitaly Kuznetsov
It may happen that secondary CPUs are still alive and resetting hv_context.tsc_page will cause a consequent crash in read_hv_clock_tsc() as we don't check for it being not NULL there. It is safe as we're not freeing this page anyways. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-10hv: init percpu_list in hv_synic_alloc()Vitaly Kuznetsov
Initializing hv_context.percpu_list in hv_synic_alloc() helps to prevent a crash in percpu_channel_enq() when not all CPUs were online during initialization and it naturally belongs there. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-10hv: allocate synic pages for all present CPUsVitaly Kuznetsov
It may happen that not all CPUs are online when we do hv_synic_alloc() and in case more CPUs come online later we may try accessing these allocated structures. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-10Drivers: hv: vmbus: Raise retry/wait limits in vmbus_post_msg()Vitaly Kuznetsov
DoS protection conditions were altered in WS2016 and now it's easy to get -EAGAIN returned from vmbus_post_msg() (e.g. when we try changing MTU on a netvsc device in a loop). All vmbus_post_msg() callers don't retry the operation and we usually end up with a non-functional device or crash. While host's DoS protection conditions are unknown to me my tests show that it can take up to 10 seconds before the message is sent so doing udelay() is not an option, we really need to sleep. Almost all vmbus_post_msg() callers are ready to sleep but there is one special case: vmbus_initiate_unload() which can be called from interrupt/NMI context and we can't sleep there. I'm also not sure about the lonely vmbus_send_tl_connect_request() which has no in-tree users but its external users are most likely waiting for the host to reply so sleeping there is also appropriate. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-10virtio_blk: fix panic in initialization error pathOmar Sandoval
If blk_mq_init_queue() returns an error, it gets assigned to vblk->disk->queue. Then, when we call put_disk(), we end up calling blk_put_queue() with the ERR_PTR, causing a bad dereference. Fix it by only assigning to vblk->disk->queue on success. Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-10nbd: blk_mq_init_queue returns an error code on failure, not NULLJeff Moyer
Additionally, don't assign directly to disk->queue, otherwise blk_put_queue (called via put_disk) will choke (panic) on the errno stored there. Bug found by code inspection after Omar found a similar issue in virtio_blk. Compile-tested only. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-10virtio_blk: avoid DMA to stack for the sense bufferChristoph Hellwig
Most users of BLOCK_PC requests allocate the sense buffer on the stack, so to avoid DMA to the stack copy them to a field in the heap allocated virtblk_req structure. Without that any attempt at SCSI passthrough I/O, including the SG_IO ioctl from userspace will crash the kernel. Note that this includes running tools like hdparm even when the host does not have SCSI passthrough enabled. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-10do_direct_IO: Use inode->i_blkbits to compute block count to be cleanedChandan Rajendra
The code currently uses sdio->blkbits to compute the number of blocks to be cleaned. However sdio->blkbits is derived from the logical block size of the underlying block device (Refer to the definition of do_blockdev_direct_IO()). Due to this, generic/299 test would rarely fail when executed on an ext4 filesystem with 64k as the block size and when using a virtio based disk (having 512 byte as the logical block size) inside a kvm guest. This commit fixes the bug by using inode->i_blkbits to compute the number of blocks to be cleaned. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Fixed up by Jeff Moyer to only use/evaluate inode->i_blkbits once, to avoid issues with block size changes with IO in flight. Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-10net: skb_flow_get_be16() can be staticEric Dumazet
Removes following sparse complain : net/core/flow_dissector.c:70:8: warning: symbol 'skb_flow_get_be16' was not declared. Should it be static? Fixes: 972d3876faa8 ("flow dissector: ICMP support") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10ibmvscsis: Fix srp_transfer_data fail return codeBryant G. Ly
If srp_transfer_data fails within ibmvscsis_write_pending, then the most likely scenario is that the client timed out the op and removed the TCE mapping. Thus it will loop forever retrying the op that is pretty much guaranteed to fail forever. A better return code would be EIO instead of EAGAIN. Cc: stable@vger.kernel.org Reported-by: Steven Royer <seroyer@linux.vnet.ibm.com> Tested-by: Steven Royer <seroyer@linux.vnet.ibm.com> Signed-off-by: Bryant G. Ly <bgly@us.ibm.com> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
2017-01-10usb: musb: fix runtime PM in debugfsBin Liu
MUSB driver now has runtime PM support, but the debugfs driver misses the PM _get/_put() calls, which could cause MUSB register access failure. Cc: stable@vger.kernel.org # 4.9+ Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Bin Liu <b-liu@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-10Merge branch 'r8152-fix-autosuspend'David S. Miller
Hayes Wang says: ==================== r8152: fix autosuspend issue Avoid rx is split into two parts when runtime suspend occurs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10r8152: fix rx issue for runtime suspendhayeswang
Pause the rx and make sure the rx fifo is empty when the autosuspend occurs. If the rx data comes when the driver is canceling the rx urb, the host controller would stop getting the data from the device and continue it after next rx urb is submitted. That is, one continuing data is split into two different urb buffers. That let the driver take the data as a rx descriptor, and unexpected behavior happens. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10r8152: split rtl8152_suspend functionhayeswang
Split rtl8152_suspend() into rtl8152_system_suspend() and rtl8152_rumtime_suspend(). Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10target: support XCOPY requests without parametersDavid Disseldorp
SPC4r37 6.4.1 EXTENDED COPY(LID4) states (also applying to LID1 reqs): A parameter list length of zero specifies that the copy manager shall not transfer any data or alter any internal state, and this shall not be considered an error. This behaviour can be tested using the libiscsi ExtendedCopy.ParamHdr test. Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
2017-01-10target: check for XCOPY parameter truncationDavid Disseldorp
Check for XCOPY header, CSCD descriptor and segment descriptor list truncation, and respond accordingly. SPC4r37 6.4.1 EXTENDED COPY(LID4) states (also applying to LID1 reqs): If the parameter list length causes truncation of the parameter list, then the copy manager shall transfer no data and shall terminate the EXTENDED COPY command with CHECK CONDITION status, with the sense key set to ILLEGAL REQUEST, and the additional sense code set to PARAMETER LIST LENGTH ERROR. This behaviour can be tested using the libiscsi ExtendedCopy.ParamHdr test. Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
2017-01-10target: use XCOPY segment descriptor CSCD IDsDavid Disseldorp
The XCOPY specification in SPC4r37 states that the XCOPY source and destination device(s) should be derived from the copy source and copy destination (CSCD) descriptor IDs in the XCOPY segment descriptor. The CSCD IDs are generally (for block -> block copies), indexes into the corresponding CSCD descriptor list, e.g. ================================= EXTENDED COPY Header ================================= CSCD Descriptor List - entry 0 + LU ID <--------------<------------------\ - entry 1 | + LU ID <______________<_____________ | ================================= | | Segment Descriptor List | | - segment 0 | | + src CSCD ID = 0 --------->---------+----/ + dest CSCD ID = 1 ___________>______| + len + src lba + dest lba ================================= Currently LIO completely ignores the src and dest CSCD IDs in the Segment Descriptor List, and instead assumes that the first entry in the CSCD list corresponds to the source, and the second to the destination. This commit removes this assumption, by ensuring that the Segment Descriptor List is parsed prior to processing the CSCD Descriptor List. CSCD Descriptor List processing is modified to compare the current list index with the previously obtained src and dest CSCD IDs. Additionally, XCOPY requests where the src and dest CSCD IDs refer to the CSCD Descriptor List entry can now be successfully processed. Fixes: cbf031f ("target: Add support for EXTENDED_COPY copy offload") Link: https://bugzilla.kernel.org/show_bug.cgi?id=191381 Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
2017-01-10target: check XCOPY segment descriptor CSCD IDsDavid Disseldorp
Ensure that the segment descriptor CSCD descriptor ID values correspond to CSCD descriptor entries located in the XCOPY command parameter list. SPC4r37 6.4.6.1 Table 150 specifies this range as 0000h to 07FFh, where the CSCD descriptor location in the parameter list can be located via: 16 + (id * 32) Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> [ bvanassche: inserted "; " in the format string of an error message and also moved a "||" operator from the start of a line to the end of the previous line ] Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
2017-01-10target: simplify XCOPY wwn->se_dev lookup helperDavid Disseldorp
target_xcopy_locate_se_dev_e4() is used to locate an se_dev, based on the WWN provided with the XCOPY request. Remove a couple of unneeded arguments, and rely on the caller for the src/dst test. Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
2017-01-10target: return UNSUPPORTED TARGET/SEGMENT DESC TYPE CODE senseDavid Disseldorp
Use UNSUPPORTED TARGET DESCRIPTOR TYPE CODE and UNSUPPORTED SEGMENT DESCRIPTOR TYPE CODE additional sense codes if a descriptor type in an XCOPY request is not supported, as specified in spc4r37 6.4.5 and 6.4.6. Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
2017-01-10target: bounds check XCOPY total descriptor list lengthDavid Disseldorp
spc4r37 6.4.3.5 states: If the combined length of the CSCD descriptors and segment descriptors exceeds the allowed value, then the copy manager shall terminate the command with CHECK CONDITION status, with the sense key set to ILLEGAL REQUEST, and the additional sense code set to PARAMETER LIST LENGTH ERROR. This functionality can be tested using the libiscsi ExtendedCopy.DescrLimits test. Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>