summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-09-21Merge branch 'bnx2x-enhancements'David S. Miller
Shahed Shaikh says: ==================== bnx2x: enhancements This series adds below changes - - support for VF spoof-check configuration through .ndo_set_vf_spoofchk. - workaround for MFW bug regarding unexpected bandwidth notifcation in single function mode. - supply VF link status as part of get VF config handling. ==================== Signed-off-by: Shahed Shaikh <shahed.shaikh@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
2018-09-21bnx2x: Provide VF link status in ndo_get_vf_configShahed Shaikh
Provide current link status of VF in ndo_get_vf_config handler. Signed-off-by: Shahed Shaikh <Shahed.Shaikh@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21bnx2x: Ignore bandwidth attention in single function modeShahed Shaikh
This is a workaround for FW bug - MFW generates bandwidth attention in single function mode, which is only expected to be generated in multi function mode. This undesired attention in SF mode results in incorrect HW configuration and resulting into Tx timeout. Signed-off-by: Shahed Shaikh <Shahed.Shaikh@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21bnx2x: Add VF spoof-checking configurationShahed Shaikh
Add support for `ndo_set_vf_spoofchk' to allow PF control over its VF spoof-checking configuration. Signed-off-by: Shahed Shaikh <shahed.shaikh@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21mISDN: remove redundant null pointer check before kfree_skbzhong jiang
kfree_skb has taken the null pointer into account. hence it is safe to remove the redundant null pointer check before kfree_skb. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21net: mscc: fix the frame extraction into the skbAntoine Tenart
When extracting frames from the Ocelot switch, the frame check sequence (FCS) is present at the end of the data extracted. The FCS was put into the sk buffer which introduced some issues (as length related ones), as the FCS shouldn't be part of an Rx sk buffer. This patch fixes the Ocelot switch extraction behaviour by discarding the FCS. Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support") Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21vhost_net: add a missing error returnDan Carpenter
We accidentally left out this error return so it leads to some use after free bugs later on. Fixes: 0a0be13b8fe2 ("vhost_net: batch submitting XDP buffers to underlayer sockets") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21Merge branch 'kfree_skb-NULL'David S. Miller
zhong jiang says: ==================== net: remove redundant null pointer check before kfree_skb The issue is detected with the help of Coccinelle. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21ipv6: remove redundant null pointer check before kfree_skbzhong jiang
kfree_skb has taken the null pointer into account. hence it is safe to remove the redundant null pointer check before kfree_skb. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21net: cxgb3_main: remove redundant null pointer check before kfree_skbzhong jiang
kfree_skb has taken the null pointer into account. hence it is safe to remove the redundant null pointer check before kfree_skb. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21net: nci: remove redundant null pointer check before kfree_skbzhong jiang
kfree_skb has taken the null pointer into account. hence it is safe to remove the redundant null pointer check before kfree_skb. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21ipv4: remove redundant null pointer check before kfree_skbzhong jiang
kfree_skb has taken the null pointer into account. hence it is safe to remove the redundant null pointer check before kfree_skb. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21net: cxgb3: remove redundant null pointer check before kfree_skbzhong jiang
kfree_skb has taken the null pointer into account. hence it is safe to remove the redundant null pointer check before kfree_skb. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21net: tap: remove redundant null pointer check before kfree_skbzhong jiang
kfree_skb has taken the null pointer into account. hence it is safe to remove the redundant null pointer check before kfree_skb. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21net: neterion: remove redundant continuezhong jiang
The continue will not truely skip any code. hence it is safe to remove it. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21net: amd: remove redundant continuezhong jiang
The continue will not truely skip any code. hence it is safe to remove it. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21net_sched: change tcf_del_walker() to take idrinfo->lockVlad Buslov
Action API was changed to work with actions and action_idr in concurrency safe manner, however tcf_del_walker() still uses actions without taking a reference or idrinfo->lock first, and deletes them directly, disregarding possible concurrent delete. Change tcf_del_walker() to take idrinfo->lock while iterating over actions and use new tcf_idr_release_unsafe() to release them while holding the lock. And the blocking function fl_hw_destroy_tmplt() could be called when we put a filter chain, so defer it to a work queue. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> [xiyou.wangcong@gmail.com: heavily modify the code and changelog] Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmGreg Kroah-Hartman
Paolo writes: "It's mostly small bugfixes and cleanups, mostly around x86 nested virtualization. One important change, not related to nested virtualization, is that the ability for the guest kernel to trap CPUID instructions (in Linux that's the ARCH_SET_CPUID arch_prctl) is now masked by default. This is because the feature is detected through an MSR; a very bad idea that Intel seems to like more and more. Some applications choke if the other fields of that MSR are not initialized as on real hardware, hence we have to disable the whole MSR by default, as was the case before Linux 4.12." * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (23 commits) KVM: nVMX: Fix bad cleanup on error of get/set nested state IOCTLs kvm: selftests: Add platform_info_test KVM: x86: Control guest reads of MSR_PLATFORM_INFO KVM: x86: Turbo bits in MSR_PLATFORM_INFO nVMX x86: Check VPID value on vmentry of L2 guests nVMX x86: check posted-interrupt descriptor addresss on vmentry of L2 KVM: nVMX: Wake blocked vCPU in guest-mode if pending interrupt in virtual APICv KVM: VMX: check nested state and CR4.VMXE against SMM kvm: x86: make kvm_{load|put}_guest_fpu() static x86/hyper-v: rename ipi_arg_{ex,non_ex} structures KVM: VMX: use preemption timer to force immediate VMExit KVM: VMX: modify preemption timer bit only when arming timer KVM: VMX: immediately mark preemption timer expired only for zero value KVM: SVM: Switch to bitmap_zalloc() KVM/MMU: Fix comment in walk_shadow_page_lockless_end() kvm: selftests: use -pthread instead of -lpthread KVM: x86: don't reset root in kvm_mmu_setup() kvm: mmu: Don't read PDPTEs when paging is not enabled x86/kvm/lapic: always disable MMIO interface in x2APIC mode KVM: s390: Make huge pages unavailable in ucontrol VMs ...
2018-09-21Merge tag 'upstream-4.19-rc4' of git://git.infradead.org/linux-ubifsGreg Kroah-Hartman
Richard writes: "This pull request contains fixes for UBIFS: - A wrong UBIFS assertion in mount code - Fix for a NULL pointer deref in mount code - Revert of a bad fix for xattrs" * tag 'upstream-4.19-rc4' of git://git.infradead.org/linux-ubifs: Revert "ubifs: xattr: Don't operate on deleted inodes" ubifs: drop false positive assertion ubifs: Check for name being NULL while mounting
2018-09-21Merge tag 'for-linus-20180920' of git://git.kernel.dk/linux-blockGreg Kroah-Hartman
Jens writes: "Storage fixes for 4.19-rc5 - Fix for leaking kernel pointer in floppy ioctl (Andy Whitcroft) - NVMe pull request from Christoph, and a single ANA log page fix (Hannes) - Regression fix for libata qd32 support, where we trigger an illegal active command transition. This fixes a CD-ROM detection issue that was reported, but could also trigger premature completion of the internal tag (me)" * tag 'for-linus-20180920' of git://git.kernel.dk/linux-block: floppy: Do not copy a kernel pointer to user memory in FDGETPRM ioctl libata: mask swap internal and hardware tag nvme: count all ANA groups for ANA Log page
2018-09-21Merge tag 'drm-fixes-2018-09-21' of git://anongit.freedesktop.org/drm/drmGreg Kroah-Hartman
David writes: "drm fixes for 4.19-rc5: - core: fix debugfs for atomic, fix the check for atomic for non-modesetting drivers - amdgpu: adds a new PCI id, some kfd fixes and a sdma fix - i915: a bunch of GVT fixes. - vc4: scaling fix - vmwgfx: modesetting fixes and a old buffer eviction fix - udl: framebuffer destruction fix - sun4i: disable on R40 fix until next kernel - pl111: NULL termination on table fix" * tag 'drm-fixes-2018-09-21' of git://anongit.freedesktop.org/drm/drm: (21 commits) drm/amdkfd: Fix ATS capablity was not reported correctly on some APUs drm/amdkfd: Change the control stack MTYPE from UC to NC on GFX9 drm/amdgpu: Fix SDMA HQD destroy error on gfx_v7 drm/vmwgfx: Fix buffer object eviction drm/vmwgfx: Don't impose STDU limits on framebuffer size drm/vmwgfx: limit mode size for all display unit to texture_max drm/vmwgfx: limit screen size to stdu_max during check_modeset drm/vmwgfx: don't check for old_crtc_state enable status drm/amdgpu: add new polaris pci id drm: sun4i: drop second PLL from A64 HDMI PHY drm: fix drm_drv_uses_atomic_modeset on non modesetting drivers. drm/i915/gvt: clear ggtt entries when destroy vgpu drm/i915/gvt: request srcu_read_lock before checking if one gfn is valid drm/i915/gvt: Add GEN9_CLKGATE_DIS_4 to default BXT mmio handler drm/i915/gvt: Init PHY related registers for BXT drm/atomic: Use drm_drv_uses_atomic_modeset() for debugfs creation drm/fb-helper: Remove set but not used variable 'connector_funcs' drm: udl: Destroy framebuffer only if it was initialized drm/sun4i: Remove R40 display pipeline compatibles drm/pl111: Make sure of_device_id tables are NULL terminated ...
2018-09-20Merge branch 'net-wean-netfilter-from-fib_nh'David S. Miller
David Ahern says: ==================== net: wean netfilter from fib_nh Two netfilter modules reference fib_nh. In both cases the code is only checking if a nexthop in a fib_info uses a specific device. Both instances essentially duplicate code from __fib_validate_source, so move that code into a helper and flip the netfilter modules to use it. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-20netfilter: nft_fib: Convert nft_fib4_eval to new dev helperDavid Ahern
Convert nft_fib4_eval to the new device checking helper and remove the duplicate code. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-20netfilter: rpfilter: Convert rpfilter_lookup_reverse to new dev helperDavid Ahern
Convert rpfilter_lookup_reverse to the new device checking helper and remove the duplicate code. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-20net/ipv4: Move device validation to helperDavid Ahern
Move the device matching check in __fib_validate_source to a helper and export it for use by netfilter modules. Code move only; no functional change intended. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-20r8169: fix autoneg issue on resume with RTL8168EHeiner Kallweit
It was reported that chip version 33 (RTL8168E) ends up with 10MBit/Half on a 1GBit link after resuming from S3 (with different link partners). For whatever reason the PHY on this chip doesn't properly start a renegotiation when soft-reset. Explicitly requesting a renegotiation fixes this. Fixes: a2965f12fde6 ("r8169: remove rtl8169_set_speed_xmii") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reported-by: Neil MacLeod <neil@nmacleod.com> Tested-by: Neil MacLeod <neil@nmacleod.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-20scsi: lpfc: Synchronize access to remoteport via rportJames Smart
The driver currently uses the ndlp to get the local rport which is then used to get the nvme transport remoteport pointer. There can be cases where a stale remoteport pointer is obtained as synchronization isn't done through the different dereferences. Correct by using locks to synchronize the dereferences. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-09-20scsi: ufs: Disable blk-mq for nowAdrian Hunter
blk-mq does not support runtime pm, so disable blk-mq support for now. Fixes: d5038a13eca7 ("scsi: core: switch to scsi-mq by default") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-09-21Merge branch 'drm-fixes-4.19' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie
into drm-fixes A few fixes for 4.19: - Add a new polaris pci id - KFD fixes for raven and gfx7 Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180920155850.5455-1-alexander.deucher@amd.com
2018-09-21Merge branch 'vmwgfx-fixes-4.19' of ↵Dave Airlie
git://people.freedesktop.org/~thomash/linux into drm-fixes A couple of modesetting fixes and a fix for a long-standing buffer-eviction problem cc'd stable. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Hellstrom <thellstrom@vmware.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180920063935.35492-1-thellstrom@vmware.com
2018-09-20x86/mm: Expand static page table for fixmap spaceFeng Tang
We met a kernel panic when enabling earlycon, which is due to the fixmap address of earlycon is not statically setup. Currently the static fixmap setup in head_64.S only covers 2M virtual address space, while it actually could be in 4M space with different kernel configurations, e.g. when VSYSCALL emulation is disabled. So increase the static space to 4M for now by defining FIXMAP_PMD_NUM to 2, and add a build time check to ensure that the fixmap is covered by the initial static page tables. Fixes: 1ad83c858c7d ("x86_64,vsyscall: Make vsyscall emulation configurable") Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: kernel test robot <rong.a.chen@intel.com> Reviewed-by: Juergen Gross <jgross@suse.com> (Xen parts) Cc: H Peter Anvin <hpa@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andy Lutomirsky <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180920025828.23699-1-feng.tang@intel.com
2018-09-20ocfs2: fix ocfs2 read block panicJunxiao Bi
While reading block, it is possible that io error return due to underlying storage issue, in this case, BH_NeedsValidate was left in the buffer head. Then when reading the very block next time, if it was already linked into journal, that will trigger the following panic. [203748.702517] kernel BUG at fs/ocfs2/buffer_head_io.c:342! [203748.702533] invalid opcode: 0000 [#1] SMP [203748.702561] Modules linked in: ocfs2 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sunrpc dm_switch dm_queue_length dm_multipath bonding be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i iw_cxgb4 cxgb4 cxgb3i libcxgbi iw_cxgb3 cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_devintf iTCO_wdt iTCO_vendor_support dcdbas ipmi_ssif i2c_core ipmi_si ipmi_msghandler acpi_pad pcspkr sb_edac edac_core lpc_ich mfd_core shpchp sg tg3 ptp pps_core ext4 jbd2 mbcache2 sr_mod cdrom sd_mod ahci libahci megaraid_sas wmi dm_mirror dm_region_hash dm_log dm_mod [203748.703024] CPU: 7 PID: 38369 Comm: touch Not tainted 4.1.12-124.18.6.el6uek.x86_64 #2 [203748.703045] Hardware name: Dell Inc. PowerEdge R620/0PXXHP, BIOS 2.5.2 01/28/2015 [203748.703067] task: ffff880768139c00 ti: ffff88006ff48000 task.ti: ffff88006ff48000 [203748.703088] RIP: 0010:[<ffffffffa05e9f09>] [<ffffffffa05e9f09>] ocfs2_read_blocks+0x669/0x7f0 [ocfs2] [203748.703130] RSP: 0018:ffff88006ff4b818 EFLAGS: 00010206 [203748.703389] RAX: 0000000008620029 RBX: ffff88006ff4b910 RCX: 0000000000000000 [203748.703885] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000023079fe [203748.704382] RBP: ffff88006ff4b8d8 R08: 0000000000000000 R09: ffff8807578c25b0 [203748.704877] R10: 000000000f637376 R11: 000000003030322e R12: 0000000000000000 [203748.705373] R13: ffff88006ff4b910 R14: ffff880732fe38f0 R15: 0000000000000000 [203748.705871] FS: 00007f401992c700(0000) GS:ffff880bfebc0000(0000) knlGS:0000000000000000 [203748.706370] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [203748.706627] CR2: 00007f4019252440 CR3: 00000000a621e000 CR4: 0000000000060670 [203748.707124] Stack: [203748.707371] ffff88006ff4b828 ffffffffa0609f52 ffff88006ff4b838 0000000000000001 [203748.707885] 0000000000000000 0000000000000000 ffff880bf67c3800 ffffffffa05eca00 [203748.708399] 00000000023079ff ffffffff81c58b80 0000000000000000 0000000000000000 [203748.708915] Call Trace: [203748.709175] [<ffffffffa0609f52>] ? ocfs2_inode_cache_io_unlock+0x12/0x20 [ocfs2] [203748.709680] [<ffffffffa05eca00>] ? ocfs2_empty_dir_filldir+0x80/0x80 [ocfs2] [203748.710185] [<ffffffffa05ec0cb>] ocfs2_read_dir_block_direct+0x3b/0x200 [ocfs2] [203748.710691] [<ffffffffa05f0fbf>] ocfs2_prepare_dx_dir_for_insert.isra.57+0x19f/0xf60 [ocfs2] [203748.711204] [<ffffffffa065660f>] ? ocfs2_metadata_cache_io_unlock+0x1f/0x30 [ocfs2] [203748.711716] [<ffffffffa05f4f3a>] ocfs2_prepare_dir_for_insert+0x13a/0x890 [ocfs2] [203748.712227] [<ffffffffa05f442e>] ? ocfs2_check_dir_for_entry+0x8e/0x140 [ocfs2] [203748.712737] [<ffffffffa061b2f2>] ocfs2_mknod+0x4b2/0x1370 [ocfs2] [203748.713003] [<ffffffffa061c385>] ocfs2_create+0x65/0x170 [ocfs2] [203748.713263] [<ffffffff8121714b>] vfs_create+0xdb/0x150 [203748.713518] [<ffffffff8121b225>] do_last+0x815/0x1210 [203748.713772] [<ffffffff812192e9>] ? path_init+0xb9/0x450 [203748.714123] [<ffffffff8121bca0>] path_openat+0x80/0x600 [203748.714378] [<ffffffff811bcd45>] ? handle_pte_fault+0xd15/0x1620 [203748.714634] [<ffffffff8121d7ba>] do_filp_open+0x3a/0xb0 [203748.714888] [<ffffffff8122a767>] ? __alloc_fd+0xa7/0x130 [203748.715143] [<ffffffff81209ffc>] do_sys_open+0x12c/0x220 [203748.715403] [<ffffffff81026ddb>] ? syscall_trace_enter_phase1+0x11b/0x180 [203748.715668] [<ffffffff816f0c9f>] ? system_call_after_swapgs+0xe9/0x190 [203748.715928] [<ffffffff8120a10e>] SyS_open+0x1e/0x20 [203748.716184] [<ffffffff816f0d5e>] system_call_fastpath+0x18/0xd7 [203748.716440] Code: 00 00 48 8b 7b 08 48 83 c3 10 45 89 f8 44 89 e1 44 89 f2 4c 89 ee e8 07 06 11 e1 48 8b 03 48 85 c0 75 df 8b 5d c8 e9 4d fa ff ff <0f> 0b 48 8b 7d a0 e8 dc c6 06 00 48 b8 00 00 00 00 00 00 00 10 [203748.717505] RIP [<ffffffffa05e9f09>] ocfs2_read_blocks+0x669/0x7f0 [ocfs2] [203748.717775] RSP <ffff88006ff4b818> Joesph ever reported a similar panic. Link: https://oss.oracle.com/pipermail/ocfs2-devel/2013-May/008931.html Link: http://lkml.kernel.org/r/20180912063207.29484-1-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Changwei Ge <ge.changwei@h3c.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-20mm: slowly shrink slabs with a relatively small number of objectsRoman Gushchin
9092c71bb724 ("mm: use sc->priority for slab shrink targets") changed the way that the target slab pressure is calculated and made it priority-based: delta = freeable >> priority; delta *= 4; do_div(delta, shrinker->seeks); The problem is that on a default priority (which is 12) no pressure is applied at all, if the number of potentially reclaimable objects is less than 4096 (1<<12). This causes the last objects on slab caches of no longer used cgroups to (almost) never get reclaimed. It's obviously a waste of memory. It can be especially painful, if these stale objects are holding a reference to a dying cgroup. Slab LRU lists are reparented on memcg offlining, but corresponding objects are still holding a reference to the dying cgroup. If we don't scan these objects, the dying cgroup can't go away. Most likely, the parent cgroup hasn't any directly charged objects, only remaining objects from dying children cgroups. So it can easily hold a reference to hundreds of dying cgroups. If there are no big spikes in memory pressure, and new memory cgroups are created and destroyed periodically, this causes the number of dying cgroups grow steadily, causing a slow-ish and hard-to-detect memory "leak". It's not a real leak, as the memory can be eventually reclaimed, but it could not happen in a real life at all. I've seen hosts with a steadily climbing number of dying cgroups, which doesn't show any signs of a decline in months, despite the host is loaded with a production workload. It is an obvious waste of memory, and to prevent it, let's apply a minimal pressure even on small shrinker lists. E.g. if there are freeable objects, let's scan at least min(freeable, scan_batch) objects. This fix significantly improves a chance of a dying cgroup to be reclaimed, and together with some previous patches stops the steady growth of the dying cgroups number on some of our hosts. Link: http://lkml.kernel.org/r/20180905230759.12236-1-guro@fb.com Fixes: 9092c71bb724 ("mm: use sc->priority for slab shrink targets") Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Rik van Riel <riel@surriel.com> Cc: Josef Bacik <jbacik@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-20kernel/sys.c: remove duplicated includeYueHaibing
Link: http://lkml.kernel.org/r/20180821133424.18716-1-yuehaibing@huawei.com Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-20mm: shmem.c: Correctly annotate new inodes for lockdepJoel Fernandes (Google)
Directories and inodes don't necessarily need to be in the same lockdep class. For ex, hugetlbfs splits them out too to prevent false positives in lockdep. Annotate correctly after new inode creation. If its a directory inode, it will be put into a different class. This should fix a lockdep splat reported by syzbot: > ====================================================== > WARNING: possible circular locking dependency detected > 4.18.0-rc8-next-20180810+ #36 Not tainted > ------------------------------------------------------ > syz-executor900/4483 is trying to acquire lock: > 00000000d2bfc8fe (&sb->s_type->i_mutex_key#9){++++}, at: inode_lock > include/linux/fs.h:765 [inline] > 00000000d2bfc8fe (&sb->s_type->i_mutex_key#9){++++}, at: > shmem_fallocate+0x18b/0x12e0 mm/shmem.c:2602 > > but task is already holding lock: > 0000000025208078 (ashmem_mutex){+.+.}, at: ashmem_shrink_scan+0xb4/0x630 > drivers/staging/android/ashmem.c:448 > > which lock already depends on the new lock. > > -> #2 (ashmem_mutex){+.+.}: > __mutex_lock_common kernel/locking/mutex.c:925 [inline] > __mutex_lock+0x171/0x1700 kernel/locking/mutex.c:1073 > mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088 > ashmem_mmap+0x55/0x520 drivers/staging/android/ashmem.c:361 > call_mmap include/linux/fs.h:1844 [inline] > mmap_region+0xf27/0x1c50 mm/mmap.c:1762 > do_mmap+0xa10/0x1220 mm/mmap.c:1535 > do_mmap_pgoff include/linux/mm.h:2298 [inline] > vm_mmap_pgoff+0x213/0x2c0 mm/util.c:357 > ksys_mmap_pgoff+0x4da/0x660 mm/mmap.c:1585 > __do_sys_mmap arch/x86/kernel/sys_x86_64.c:100 [inline] > __se_sys_mmap arch/x86/kernel/sys_x86_64.c:91 [inline] > __x64_sys_mmap+0xe9/0x1b0 arch/x86/kernel/sys_x86_64.c:91 > do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > -> #1 (&mm->mmap_sem){++++}: > __might_fault+0x155/0x1e0 mm/memory.c:4568 > _copy_to_user+0x30/0x110 lib/usercopy.c:25 > copy_to_user include/linux/uaccess.h:155 [inline] > filldir+0x1ea/0x3a0 fs/readdir.c:196 > dir_emit_dot include/linux/fs.h:3464 [inline] > dir_emit_dots include/linux/fs.h:3475 [inline] > dcache_readdir+0x13a/0x620 fs/libfs.c:193 > iterate_dir+0x48b/0x5d0 fs/readdir.c:51 > __do_sys_getdents fs/readdir.c:231 [inline] > __se_sys_getdents fs/readdir.c:212 [inline] > __x64_sys_getdents+0x29f/0x510 fs/readdir.c:212 > do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > -> #0 (&sb->s_type->i_mutex_key#9){++++}: > lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924 > down_write+0x8f/0x130 kernel/locking/rwsem.c:70 > inode_lock include/linux/fs.h:765 [inline] > shmem_fallocate+0x18b/0x12e0 mm/shmem.c:2602 > ashmem_shrink_scan+0x236/0x630 drivers/staging/android/ashmem.c:455 > ashmem_ioctl+0x3ae/0x13a0 drivers/staging/android/ashmem.c:797 > vfs_ioctl fs/ioctl.c:46 [inline] > file_ioctl fs/ioctl.c:501 [inline] > do_vfs_ioctl+0x1de/0x1720 fs/ioctl.c:685 > ksys_ioctl+0xa9/0xd0 fs/ioctl.c:702 > __do_sys_ioctl fs/ioctl.c:709 [inline] > __se_sys_ioctl fs/ioctl.c:707 [inline] > __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:707 > do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > other info that might help us debug this: > > Chain exists of: > &sb->s_type->i_mutex_key#9 --> &mm->mmap_sem --> ashmem_mutex > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(ashmem_mutex); > lock(&mm->mmap_sem); > lock(ashmem_mutex); > lock(&sb->s_type->i_mutex_key#9); > > *** DEADLOCK *** > > 1 lock held by syz-executor900/4483: > #0: 0000000025208078 (ashmem_mutex){+.+.}, at: > ashmem_shrink_scan+0xb4/0x630 drivers/staging/android/ashmem.c:448 Link: http://lkml.kernel.org/r/20180821231835.166639-1-joel@joelfernandes.org Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: NeilBrown <neilb@suse.com> Suggested-by: NeilBrown <neilb@suse.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-20fs/proc/kcore.c: fix invalid memory access in multi-page read optimizationDominique Martinet
The 'm' kcore_list item could point to kclist_head, and it is incorrect to look at m->addr / m->size in this case. There is no choice but to run through the list of entries for every address if we did not find any entry in the previous iteration Reset 'm' to NULL in that case at Omar Sandoval's suggestion. [akpm@linux-foundation.org: add comment] Link: http://lkml.kernel.org/r/1536100702-28706-1-git-send-email-asmadeus@codewreck.org Fixes: bf991c2231117 ("proc/kcore: optimize multiple page reads") Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Omar Sandoval <osandov@osandov.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: James Morse <james.morse@arm.com> Cc: Bhupesh Sharma <bhsharma@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-20mm: disable deferred struct page for 32-bit archesPasha Tatashin
Deferred struct page init is needed only on systems with large amount of physical memory to improve boot performance. 32-bit systems do not benefit from this feature. Jiri reported a problem where deferred struct pages do not work well with x86-32: [ 0.035162] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) [ 0.035725] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) [ 0.036269] Initializing CPU#0 [ 0.036513] Initializing HighMem for node 0 (00036ffe:0007ffe0) [ 0.038459] page:f6780000 is uninitialized and poisoned [ 0.038460] raw: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff [ 0.039509] page dumped because: VM_BUG_ON_PAGE(1 && PageCompound(page)) [ 0.040038] ------------[ cut here ]------------ [ 0.040399] kernel BUG at include/linux/page-flags.h:293! [ 0.040823] invalid opcode: 0000 [#1] SMP PTI [ 0.041166] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc1_pt_jiri #9 [ 0.041694] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014 [ 0.042496] EIP: free_highmem_page+0x64/0x80 [ 0.042839] Code: 13 46 d8 c1 e8 18 5d 83 e0 03 8d 04 c0 c1 e0 06 ff 80 ec 5f 44 d8 c3 8d b4 26 00 00 00 00 ba 08 65 28 d8 89 d8 e8 fc 71 02 00 <0f> 0b 8d 76 00 8d bc 27 00 00 00 00 ba d0 b1 26 d8 89 d8 e8 e4 71 [ 0.044338] EAX: 0000003c EBX: f6780000 ECX: 00000000 EDX: d856cbe8 [ 0.044868] ESI: 0007ffe0 EDI: d838df20 EBP: d838df00 ESP: d838defc [ 0.045372] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210086 [ 0.045913] CR0: 80050033 CR2: 00000000 CR3: 18556000 CR4: 00040690 [ 0.046413] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ 0.046913] DR6: fffe0ff0 DR7: 00000400 [ 0.047220] Call Trace: [ 0.047419] add_highpages_with_active_regions+0xbd/0x10d [ 0.047854] set_highmem_pages_init+0x5b/0x71 [ 0.048202] mem_init+0x2b/0x1e8 [ 0.048460] start_kernel+0x1d2/0x425 [ 0.048757] i386_start_kernel+0x93/0x97 [ 0.049073] startup_32_smp+0x164/0x168 [ 0.049379] Modules linked in: [ 0.049626] ---[ end trace 337949378db0abbb ]--- We free highmem pages before their struct pages are initialized: mem_init() set_highmem_pages_init() add_highpages_with_active_regions() free_highmem_page() .. Access uninitialized struct page here.. Because there is no reason to have this feature on 32-bit systems, just disable it. Link: http://lkml.kernel.org/r/20180831150506.31246-1-pavel.tatashin@microsoft.com Fixes: 2e3ca40f03bb ("mm: relax deferred struct page requirements") Signed-off-by: Pavel Tatashin <pavel.tatashin@microsoft.com> Reported-by: Jiri Slaby <jslaby@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-20fork: report pid exhaustion correctlyKJ Tsanaktsidis
Make the clone and fork syscalls return EAGAIN when the limit on the number of pids /proc/sys/kernel/pid_max is exceeded. Currently, when the pid_max limit is exceeded, the kernel will return ENOSPC from the fork and clone syscalls. This is contrary to the documented behaviour, which explicitly calls out the pid_max case as one where EAGAIN should be returned. It also leads to really confusing error messages in userspace programs which will complain about a lack of disk space when they fail to create processes/threads for this reason. This error is being returned because alloc_pid() uses the idr api to find a new pid; when there are none available, idr_alloc_cyclic() returns -ENOSPC, and this is being propagated back to userspace. This behaviour has been broken before, and was explicitly fixed in commit 35f71bc0a09a ("fork: report pid reservation failure properly"), so I think -EAGAIN is definitely the right thing to return in this case. The current behaviour change dates from commit 95846ecf9dac ("pid: replace pid bitmap implementation with IDR AIP") and was I believe unintentional. This patch has no impact on the case where allocating a pid fails because the child reaper for the namespace is dead; that case will still return -ENOMEM. Link: http://lkml.kernel.org/r/20180903111016.46461-1-ktsanaktsidis@zendesk.com Fixes: 95846ecf9dac ("pid: replace pid bitmap implementation with IDR AIP") Signed-off-by: KJ Tsanaktsidis <ktsanaktsidis@zendesk.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Gargi Sharma <gs051095@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-20MAINTAINERS: Add X86 MM entryThomas Gleixner
Dave, Andy and Peter are de facto overseing the mm parts of X86. Add an explicit maintainers entry. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Andy Lutomirski <luto@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Ingo Molnar <mingo@kernel.org>
2018-09-20x86/intel_rdt: Add Reinette as co-maintainer for RDTFenghua Yu
Reinette Chatre is doing great job on enabling pseudo-locking and other features in RDT. Add her as co-maintainer for RDT. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Reinette Chatre <reinette.chatre@intel.com> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Tony Luck" <tony.luck@intel.com> Link: https://lkml.kernel.org/r/1537472228-221799-1-git-send-email-fenghua.yu@intel.com
2018-09-20Revert "ubifs: xattr: Don't operate on deleted inodes"Richard Weinberger
This reverts commit 11a6fc3dc743e22fb50f2196ec55bee5140d3c52. UBIFS wants to assert that xattr operations are only issued on files with positive link count. The said patch made this operations return -ENOENT for unlinked files such that the asserts will no longer trigger. This was wrong since xattr operations are perfectly fine on unlinked files. Instead the assertions need to be fixed/removed. Cc: <stable@vger.kernel.org> Fixes: 11a6fc3dc743 ("ubifs: xattr: Don't operate on deleted inodes") Reported-by: Koen Vandeputte <koen.vandeputte@ncentric.com> Tested-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Richard Weinberger <richard@nod.at>
2018-09-20ubifs: drop false positive assertionSascha Hauer
The following sequence triggers ubifs_assert(c, c->lst.taken_empty_lebs > 0); at the end of ubifs_remount_fs(): mount -t ubifs /dev/ubi0_0 /mnt echo 1 > /sys/kernel/debug/ubifs/ubi0_0/ro_error umount /mnt mount -t ubifs -o ro /dev/ubix_y /mnt mount -o remount,ro /mnt The resulting UBIFS assert failed in ubifs_remount_fs at 1878 (pid 161) is a false positive. In the case above c->lst.taken_empty_lebs has never been changed from its initial zero value. This will only happen when the deferred recovery is done. Fix this by doing the assertion only when recovery has been done already. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
2018-09-20ubifs: Check for name being NULL while mountingRichard Weinberger
The requested device name can be NULL or an empty string. Check for that and refuse to continue. UBIFS has to do this manually since we cannot use mount_bdev(), which checks for this condition. Fixes: 1e51764a3c2ac ("UBIFS: add new flash file system") Reported-by: syzbot+38bd0f7865e5c6379280@syzkaller.appspotmail.com Signed-off-by: Richard Weinberger <richard@nod.at>
2018-09-20sctp: update dst pmtu with the correct daddrXin Long
When processing pmtu update from an icmp packet, it calls .update_pmtu with sk instead of skb in sctp_transport_update_pmtu. However for sctp, the daddr in the transport might be different from inet_sock->inet_daddr or sk->sk_v6_daddr, which is used to update or create the route cache. The incorrect daddr will cause a different route cache created for the path. So before calling .update_pmtu, inet_sock->inet_daddr/sk->sk_v6_daddr should be updated with the daddr in the transport, and update it back after it's done. The issue has existed since route exceptions introduction. Fixes: 4895c771c7f0 ("ipv4: Add FIB nexthop exceptions.") Reported-by: ian.periam@dialogic.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-20bnxt_en: don't try to offload VLAN 'modify' actionDavide Caratti
bnxt offload code currently supports only 'push' and 'pop' operation: let .ndo_setup_tc() return -EOPNOTSUPP if VLAN 'modify' action is configured. Fixes: 2ae7408fedfe ("bnxt_en: bnxt: add TC flower filter offload support") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Sathya Perla <sathya.perla@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-20net-next: mscc: remove unused ocelot_dev_gmii.hCorentin Labbe
The header ocelot_dev_gmii.h is unused since the inclusion of the driver. It is unused, lets just remove it. Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-20KVM: nVMX: Fix bad cleanup on error of get/set nested state IOCTLsLiran Alon
The handlers of IOCTLs in kvm_arch_vcpu_ioctl() are expected to set their return value in "r" local var and break out of switch block when they encounter some error. This is because vcpu_load() is called before the switch block which have a proper cleanup of vcpu_put() afterwards. However, KVM_{GET,SET}_NESTED_STATE IOCTLs handlers just return immediately on error without performing above mentioned cleanup. Thus, change these handlers to behave as expected. Fixes: 8fcc4b5923af ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE") Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Reviewed-by: Patrick Colp <patrick.colp@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20drm/amdkfd: Fix ATS capablity was not reported correctly on some APUsYong Zhao
Because CRAT_CU_FLAGS_IOMMU_PRESENT was not set in some BIOS crat, we need to workaround this. For future compatibility, we also overwrite the bit in capability according to the value of needs_iommu_device. Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-20drm/amdkfd: Change the control stack MTYPE from UC to NC on GFX9Yong Zhao
CWSR fails on Raven if the control stack is MTYPE_UC, which is used for regular GART mappings. As a workaround we map it using MTYPE_NC. The MEC firmware expects the control stack at one page offset from the start of the MQD so it is part of the MQD allocation on GFXv9. AMDGPU added a memory allocation flag just for this purpose. Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Yong Zhao <yong.zhao@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-20drm/amdgpu: Fix SDMA HQD destroy error on gfx_v7Amber Lin
A wrong register bit was examinated for checking SDMA status so it reports false failures. This typo only appears on gfx_v7. gfx_v8 checks the correct bit. Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>