summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-04-08Merge branch 'l2tp-sockopt-errors'David S. Miller
Guillaume Nault says: ==================== l2tp: fix error handling of PPPoL2TP socket options Fix pppol2tp_[gs]etsockopt() so that they don't ignore errors returned by their helper functions. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-08l2tp: don't mask errors in pppol2tp_getsockopt()Guillaume Nault
pppol2tp_getsockopt() doesn't take into account the error code returned by pppol2tp_tunnel_getsockopt() or pppol2tp_session_getsockopt(). If error occurs there, pppol2tp_getsockopt() continues unconditionally and reports erroneous values. Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-08l2tp: don't mask errors in pppol2tp_setsockopt()Guillaume Nault
pppol2tp_setsockopt() unconditionally overwrites the error value returned by pppol2tp_tunnel_setsockopt() or pppol2tp_session_setsockopt(), thus hiding errors from userspace. Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-08netlink: uapi: use hex numbers for NLM_F_* flagsJohannes Berg
It's rather confusing that the netlink message flags are numbered 1, 2, 4, 8, 16, 32, <unused>, 0x100. Make that more understandable by numbering the lower ones with hex constants as well. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-08nfp: don't dereference a null nn->eth_port to print a warningColin Ian King
On the case where nn->eth_port is null the warning message is printing the port by dereferencing this null pointer. Remove the deference to avoid a crash when printing the warning message. Detected by CoverityScan, CID#1426198 ("Dereference after null check") Fixes: ce22f5a2cbe3c627 ("nfp: separate high level and low level NSP headers") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-08Merge branch 'net-SO_COOKIE'David S. Miller
Chenbo Feng says: ==================== New getsockopt option to retrieve socket cookie In the current kernel socket cookie implementation, there is no simple and direct way to retrieve the socket cookie based on file descriptor. A process mat need to get it from sock fd if it want to correlate with sock_diag output or use a bpf map with new socket cookie function. If userspace wants to receive the socket cookie for a given socket fd, it must send a SOCK_DIAG_BY_FAMILY dump request and look for the 5-tuple. This is slow and can be ambiguous in the case of sockets that have the same 5-tuple (e.g., tproxy / transparent sockets, SO_REUSEPORT sockets, etc.). As shown in the example program. The xt_eBPF program is using socket cookie to record the network traffics statistics and with the socket cookie retrieved by getsockopt. The program can directly access to a specific socket data without scanning the whole bpf map. ==================== Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-08Sample program using SO_COOKIEChenbo Feng
Added a per socket traffic monitoring option to illustrate the usage of new getsockopt SO_COOKIE. The program is based on the socket traffic monitoring program using xt_eBPF and in the new option the data entry can be directly accessed using socket cookie. The cookie retrieved allow us to lookup an element in the eBPF for a specific socket. Signed-off-by: Chenbo Feng <fengc@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-08New getsockopt option to get socket cookieChenbo Feng
Introduce a new getsockopt operation to retrieve the socket cookie for a specific socket based on the socket fd. It returns a unique non-decreasing cookie for each socket. Tested: https://android-review.googlesource.com/#/c/358163/ Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Chenbo Feng <fengc@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-08Merge tag 'mlx5-updates-2017-04-16' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2017-04-16 This patchset provides some updates for the mlx5 drivers. From Majd, 1st patch, Adds ConnectX-6 and ConnectX-6 VF PCI IDs support. From Guy, 2nd patch, Adds RXFCS scatter support. 3rd patch, Small cleanup to make a function static. From Eran, 4th patch, Adds 4 zeros padding to ethtool FW version. 6th patch, Trevial code reuse cleanup From Inbar, 5th patch, Show board id in ethtool driver information From Saeed, 7th patch, Set default RX moderation parameters on driver load as a small fix for the latest fail-safe config feature. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-08staging: android: ashmem: lseek failed due to no FMODE_LSEEK.Shuxiao Zhang
vfs_llseek will check whether the file mode has FMODE_LSEEK, no return failure. But ashmem can be lseek, so add FMODE_LSEEK to ashmem file. Comment From Greg Hackmann: ashmem_llseek() passes the llseek() call through to the backing shmem file. 91360b02ab48 ("ashmem: use vfs_llseek()") changed this from directly calling the file's llseek() op into a VFS layer call. This also adds a check for the FMODE_LSEEK bit, so without that bit ashmem_llseek() now always fails with -ESPIPE. Fixes: 91360b02ab48 ("ashmem: use vfs_llseek()") Signed-off-by: Shuxiao Zhang <zhangshuxiao@xiaomi.com> Tested-by: Greg Hackmann <ghackmann@google.com> Cc: stable <stable@vger.kernel.org> # 3.18+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-08i40e/i40evf: Use build_skb to build framesAlexander Duyck
This patch is meant to improve the performance of the Rx path. Specifically by using build_skb we have several distinct advantages. In the case of small frames we were previously using a copy-break approach. This means that we were allocating a page fragment to use for skb->head, and were having to copy the packet into that region. Both of those calls are now avoided since we just build the skb around the data. In the case of large frames the gains are much more significant. Specifically we were having to allocate skb->head, and copy the headers as before. However in addition we were having to parse the header using eth_get_headlen which could be quite expensive. All of this is avoided by building the frame around the data. I have seen gains as high as 30% when using VXLAN for instance due to just header pulling overhead. Finally with all this in place it also sets us up to start looking at enabling XDP. Specifically we now have a path in which the data is in the page and the frame is built around it. So if we parse it with XDP before we call build_skb we can take care of any necessary processing there. Change-ID: Id4bdd618e94473d41f892417e5d8019639e421e3 Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-08i40e/i40evf: Add support for padding start of framesAlexander Duyck
This patch adds padding to the start of frames to make room for headroom for us to eventually start using build_skb. Right now we guarantee at least NET_SKB_PAD + NET_IP_ALIGN, however we allocate more space if more is available. For example on x86 the headroom should be 192 bytes. On systems that have too large of a cache line size to support storing 1.5K padding and shared info we default to using 3K buffers and reserve everything that isn't used for skb_shared_info or the data buffer for headroom. Change-ID: I33c641c9a1ea10cf7cc484c2d20985368d2d709a Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-08i40e/i40evf: Add support for using order 1 pages with a 3K bufferAlexander Duyck
There are situations where adding padding to the front and back of an Rx buffer will require that we add additional padding. Specifically if NET_IP_ALIGN is non-zero, or the MTU size is larger than 7.5K we would need to use 2K buffers which leaves us with no room for the padding. To preemptively address these cases I am adding support for 3K buffers to the Rx path so that we can provide the additional padding needed in the event of NET_IP_ALIGN being non-zero or a cache line being greater than 64. Change-ID: I938bc1ba611285428df39a613cd66f98e60b55c7 Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-08i40e: clean up historic deprecated flag definitionsJacob Keller
Since an early commit a few flags have no longer been used. Remove these definitions to reduce code clutter. Change-ID: I3589be4622574e747013cd4dc403e18b039f4965 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-08i40e: remove I40E_FLAG_NEED_LINK_UPDATEAlice Michael
The I40E_FLAG_NEED_LINK_UPDATE was never used. Remove the flag definitions. Change-ID: If59d0c6b4af85ca27281f3183c54b055adb439a4 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-08i40e: remove extraneous loop in i40e_vsi_wait_queues_disabledJacob Keller
We can simply check both Tx and Rx queues in a single loop, rather than repeating the loop twice. Change-ID: Ic06f26b0e3c2620e0e33c1a2999edda488e647ad Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-08i40e: allow look-up of MAC address from Open Firmware or IDPROMJacob Keller
Look up the MAC address from the eth_get_platform_mac_address() function first before checking what the firmware provides. We already handle the case of re-writing the MAC-VLAN filter, so there is no need to add extra code for this. However, update the comment where we do this to indicate that it does impact the Open Firmware MAC address case. Change-ID: I73e59fbe0b0e7e6f3ee9f5170d0bd3a4d5faf4db Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-08i40e: Simplify i40e_detect_recover_hung_queue logicAlan Brady
This patch greatly reduces the unneeded complexity in the i40e_detect_recover_hung_queue code path. The previous implementation set a 'hung bit' which would then get cleared while polling. If the detection routine was called a second time with the bit already set, we would issue a software interrupt. This patch makes it such that if interrupts are disabled and we have pending TX descriptors, we trigger a software interrupt since in, the worst case, queues are already clean and we have an extra interrupt. Additionally this patch removes the workaround for lost interrupts as calling napi_reschedule in this context can cause software interrupts to fire on the wrong CPU. Change-ID: Iae108582a3ceb6229ed1d22e4ed6e69cf97aad8d Signed-off-by: Alan Brady <alan.brady@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-08i40e: Decrease the scope of rtnl lockMaciej Sosin
Previously rtnl lock was held during whole reset procedure that was stopping other PFs running their reset procedures. In the result reset was not handled properly and host reset was the only way to recover. Change-ID: I23c0771c0303caaa7bd64badbf0c667e25142954 Signed-off-by: Maciej Sosin <maciej.sosin@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-08i40e: Swap use of pf->flags and pf->hw_disabled_flags for ATR EvictionAlexander Duyck
This is a minor cleanup so that we are always updating pf->flags when we make a change to the private flags instead of updating a mix of either pf->flags and/or pf->hw_disabled_flags. In addition I went through and cleaned out all the spots where we were using the X722 define in regards to this flag. Lastly since we changed the logic I went through and flushed out any redundancy and cleaned up the handling of the flags in the Tx path. Change-ID: I79ff95a7272bb2533251ff11ef91e89ccb80b610 Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-08i40e: update error message when trying to add invalid filtersJacob Keller
Re-word the error message displayed when adding a filter with an invalid flow type. Additionally, report a distinct error message when the IPv4 protocol is at fault. Change-ID: Iba3d85b87f8d383c97c8bdd180df34a6adf3ee67 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-08i40e: only register client on iWarp-capable devicesMitch Williams
The client interface is only intended for use on devices that support iWarp. Only register with the client if this is the case. This fixes a panic when loading i40iw on X710 devices. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Reported-by: Stefan Assmann <sassmann@kpanic.de> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds
Pull sparc fixes from David Miller: "Several fixes here, mostly having to due with either build errors or memory corruptions depending upon whether you have THP enabled or not" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc: remove unused wp_works_ok macro sparc32: Export vac_cache_size to fix build error sparc64: Fix memory corruption when THP is enabled sparc64: Fix kernel panic due to erroneous #ifdef surrounding pmd_write() arch/sparc: Avoid DCTI Couples sparc64: kern_addr_valid regression sparc64: Add support for 2G hugepages sparc64: Fix size check in huge_pte_alloc
2017-04-08Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Radim Krčmář: "ARM: - Fix a problem with GICv3 userspace save/restore - Clarify GICv2 userspace save/restore ABI - Be more careful in clearing GIC LRs - Add missing synchronization primitive to our MMU handling code PPC: - Check for a NULL return from kzalloc s390: - Prevent translation exception errors on valid page tables for the instruction-exection-protection support x86: - Fix Page-Modification Logging when running a nested guest" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: PPC: Book3S HV: Check for kmalloc errors in ioctl KVM: nVMX: initialize PML fields in vmcs02 KVM: nVMX: do not leak PML full vmexit to L1 KVM: arm/arm64: vgic: Fix GICC_PMR uaccess on GICv3 and clarify ABI KVM: arm64: Ensure LRs are clear when they should be kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd KVM: s390: remove change-recording override support arm/arm64: KVM: Take mmap_sem in kvm_arch_prepare_memory_region arm/arm64: KVM: Take mmap_sem in stage2_unmap_vm
2017-04-08Merge branch 'stable-4.11' of git://git.infradead.org/users/pcmoore/auditLinus Torvalds
Pull audit cleanup from Paul Moore: "A week later than I had hoped, but as promised, here is the audit uninline-fix we talked about during the last audit pull request. The patch is slightly different than what we originally discussed as it made more sense to keep the audit_signal_info() function in auditsc.c rather than move it and bunch of other related variables/definitions into audit.c/audit.h. At some point in the future I need to look at how the audit code is organized across kernel/audit*, I suspect we could do things a bit better, but it doesn't seem like a -rc release is a good place for that ;) Regardless, this patch passes our tests without problem and looks good for v4.11" * 'stable-4.11' of git://git.infradead.org/users/pcmoore/audit: audit: move audit_signal_info() into kernel/auditsc.c
2017-04-08Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "10 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: move pcp and lru-pcp draining into single wq mailmap: update Yakir Yang email address mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff() dax: fix radix tree insertion race mm, thp: fix setting of defer+madvise thp defrag mode ptrace: fix PTRACE_LISTEN race corrupting task->state vmlinux.lds: add missing VMLINUX_SYMBOL macros mm/page_alloc.c: fix print order in show_free_areas() userfaultfd: report actual registered features in fdinfo mm: fix page_vma_mapped_walk() for ksm pages
2017-04-08mm: move pcp and lru-pcp draining into single wqMichal Hocko
We currently have 2 specific WQ_RECLAIM workqueues in the mm code. vmstat_wq for updating pcp stats and lru_add_drain_wq dedicated to drain per cpu lru caches. This seems more than necessary because both can run on a single WQ. Both do not block on locks requiring a memory allocation nor perform any allocations themselves. We will save one rescuer thread this way. On the other hand drain_all_pages() queues work on the system wq which doesn't have rescuer and so this depend on memory allocation (when all workers are stuck allocating and new ones cannot be created). Initially we thought this would be more of a theoretical problem but Hugh Dickins has reported: : 4.11-rc has been giving me hangs after hours of swapping load. At : first they looked like memory leaks ("fork: Cannot allocate memory"); : but for no good reason I happened to do "cat /proc/sys/vm/stat_refresh" : before looking at /proc/meminfo one time, and the stat_refresh stuck : in D state, waiting for completion of flush_work like many kworkers. : kthreadd waiting for completion of flush_work in drain_all_pages(). This worker should be using WQ_RECLAIM as well in order to guarantee a forward progress. We can reuse the same one as for lru draining and vmstat. Link: http://lkml.kernel.org/r/20170307131751.24936-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Suggested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@suse.de> Tested-by: Yang Li <pku.leo@gmail.com> Tested-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-08mailmap: update Yakir Yang email addressJeffy Chen
Set current email address to replace previous employers email addresses. Link: http://lkml.kernel.org/r/1491450722-6633-1-git-send-email-jeffy.chen@rock-chips.com Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-08mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff()David Rientjes
We got need_resched() warnings in swap_cgroup_swapoff() because swap_cgroup_ctrl[type].length is particularly large. Reschedule when needed. Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1704061315270.80559@chino.kir.corp.google.com Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-08dax: fix radix tree insertion raceRoss Zwisler
While running generic/340 in my test setup I hit the following race. It can happen with kernels that support FS DAX PMDs, so v4.10 thru v4.11-rc5. Thread 1 Thread 2 -------- -------- dax_iomap_pmd_fault() grab_mapping_entry() spin_lock_irq() get_unlocked_mapping_entry() 'entry' is NULL, can't call lock_slot() spin_unlock_irq() radix_tree_preload() dax_iomap_pmd_fault() grab_mapping_entry() spin_lock_irq() get_unlocked_mapping_entry() ... lock_slot() spin_unlock_irq() dax_pmd_insert_mapping() <inserts a PMD mapping> spin_lock_irq() __radix_tree_insert() fails with -EEXIST <fall back to 4k fault, and die horribly when inserting a 4k entry where a PMD exists> The issue is that we have to drop mapping->tree_lock while calling radix_tree_preload(), but since we didn't have a radix tree entry to lock (unlike in the pmd_downgrade case) we have no protection against Thread 2 coming along and inserting a PMD at the same index. For 4k entries we handled this with a special-case response to -EEXIST coming from the __radix_tree_insert(), but this doesn't save us for PMDs because the -EEXIST case can also mean that we collided with a 4k entry in the radix tree at a different index, but one that is covered by our PMD range. So, correctly handle both the 4k and 2M collision cases by explicitly re-checking the radix tree for an entry at our index once we reacquire mapping->tree_lock. This patch has made it through a clean xfstests run with the current v4.11-rc5 based linux/master, and it also ran generic/340 500 times in a loop. It used to fail within the first 10 iterations. Link: http://lkml.kernel.org/r/20170406212944.2866-1-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: <stable@vger.kernel.org> [4.10+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-08mm, thp: fix setting of defer+madvise thp defrag modeDavid Rientjes
Setting thp defrag mode of "defer+madvise" actually sets "defer" in the kernel due to the name similarity and the out-of-order way the string is checked in defrag_store(). Check the string in the correct order so that TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG is set appropriately for "defer+madvise". Fixes: 21440d7eb904 ("mm, thp: add new defer+madvise defrag option") Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1704051814420.137626@chino.kir.corp.google.com Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-08ptrace: fix PTRACE_LISTEN race corrupting task->statebsegall@google.com
In PT_SEIZED + LISTEN mode STOP/CONT signals cause a wakeup against __TASK_TRACED. If this races with the ptrace_unfreeze_traced at the end of a PTRACE_LISTEN, this can wake the task /after/ the check against __TASK_TRACED, but before the reset of state to TASK_TRACED. This causes it to instead clobber TASK_WAKING, allowing a subsequent wakeup against TRACED while the task is still on the rq wake_list, corrupting it. Oleg said: "The kernel can crash or this can lead to other hard-to-debug problems. In short, "task->state = TASK_TRACED" in ptrace_unfreeze_traced() assumes that nobody else can wake it up, but PTRACE_LISTEN breaks the contract. Obviusly it is very wrong to manipulate task->state if this task is already running, or WAKING, or it sleeps again" [akpm@linux-foundation.org: coding-style fixes] Fixes: 9899d11f ("ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL") Link: http://lkml.kernel.org/r/xm26y3vfhmkp.fsf_-_@bsegall-linux.mtv.corp.google.com Signed-off-by: Ben Segall <bsegall@google.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-08vmlinux.lds: add missing VMLINUX_SYMBOL macrosJessica Yu
When __{start,end}_ro_after_init is referenced from C code, we run into the following build errors on blackfin: kernel/extable.c:169: undefined reference to `__start_ro_after_init' kernel/extable.c:169: undefined reference to `__end_ro_after_init' The build error is due to the fact that blackfin is one of the few arches that prepends an underscore '_' to all symbols defined in C. Fix this by wrapping __{start,end}_ro_after_init in vmlinux.lds.h with VMLINUX_SYMBOL(), which adds the necessary prefix for arches that have HAVE_UNDERSCORE_SYMBOL_PREFIX. Link: http://lkml.kernel.org/r/1491259387-15869-1-git-send-email-jeyu@redhat.com Signed-off-by: Jessica Yu <jeyu@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Eddie Kovsky <ewk@edkovsky.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-08mm/page_alloc.c: fix print order in show_free_areas()Alexander Polakov
Fixes: 11fb998986a72a ("mm: move most file-based accounting to the node") Link: http://lkml.kernel.org/r/1490377730.30219.2.camel@beget.ru Signed-off-by: Alexander Polyakov <apolyakov@beget.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> [4.8+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-08userfaultfd: report actual registered features in fdinfoMike Rapoport
fdinfo for userfault file descriptor reports UFFD_API_FEATURES. Up until recently, the UFFD_API_FEATURES was defined as 0, therefore corresponding field in fdinfo always contained zero. Now, with introduction of several additional features, UFFD_API_FEATURES is not longer 0 and it seems better to report actual features requested for the userfaultfd object described by the fdinfo. First, the applications that were using userfault will still see zero at the features field in fdinfo. Next, reporting actual features rather than available features, gives clear indication of what userfault features are used by an application. Link: http://lkml.kernel.org/r/1491140181-22121-1-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-08mm: fix page_vma_mapped_walk() for ksm pagesHugh Dickins
Doug Smythies reports oops with KSM in this backtrace, I've been seeing the same: page_vma_mapped_walk+0xe6/0x5b0 page_referenced_one+0x91/0x1a0 rmap_walk_ksm+0x100/0x190 rmap_walk+0x4f/0x60 page_referenced+0x149/0x170 shrink_active_list+0x1c2/0x430 shrink_node_memcg+0x67a/0x7a0 shrink_node+0xe1/0x320 kswapd+0x34b/0x720 Just as observed in commit 4b0ece6fa016 ("mm: migrate: fix remove_migration_pte() for ksm pages"), you cannot use page->index calculations on ksm pages. page_vma_mapped_walk() is relying on __vma_address(), where a ksm page can lead it off the end of the page table, and into whatever nonsense is in the next page, ending as an oops inside check_pte()'s pte_page(). KSM tells page_vma_mapped_walk() exactly where to look for the page, it does not need any page->index calculation: and that's so also for all the normal and file and anon pages - just not for THPs and their subpages. Get out early in most cases: instead of a PageKsm test, move down the earlier not-THP-page test, as suggested by Kirill. I'm also slightly worried that this loop can stray into other vmas, so added a vm_end test to prevent surprises; though I have not imagined anything worse than a very contrived case, in which a page mlocked in the next vma might be reclaimed because it is not mlocked in this vma. Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()") Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1704031104400.1118@eggly.anvils Signed-off-by: Hugh Dickins <hughd@google.com> Reported-by: Doug Smythies <dsmythies@telus.net> Tested-by: Doug Smythies <dsmythies@telus.net> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-07net: thunderx: Enable TSO and checksum offloads for ipv6Thanneeru Srinivasulu
Adding support for TSO and checksum hardware offloads for ipv6. Signed-off-by: Thanneeru Srinivasulu <tsrinivasulu@cavium.com> Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-07Merge branch 'dsa-mediatek-MT7530'David S. Miller
Sean Wang says: ==================== net-next: dsa: add Mediatek MT7530 support MT7530 is a 7-ports Gigabit Ethernet Switch that could be found on Mediatek router platforms such as MT7623A or MT7623N which includes 7-port Gigabit Ethernet MAC and 5-port Gigabit Ethernet PHY. Among these ports, The port from 0 to 4 are the user ports connecting with the remote devices while the port 5 and 6 are the CPU ports connecting into Mediatek Ethernet GMAC. The patch series integrated Mediatek MT7530 into DSA support which includes the most of the essential callbacks such as tag insertion for port distinguishing, port control, bridge offloading, STP setup and ethtool operations to allow DSA to model each user port into independently standalone netdevice as the other DSA driver had done. Changes since v1: - rebased into 4.11-rc1 - refined binding document including below five items - changed the type of mediatek,mcm into bool - used reset controller binding for MCM reset and removed "mediatek,ethsys" property from binding - reused CPU port's ethernet Phandle instead of creating new one and removed "mediatek,ethernet" property from binding - aligned naming for GPIO reset with dsa/marvell.txt - added phy-mode as required property child nodes within ports container - handled gpio reset with devm_gpiod_* API - refined comment words - removed condition for CDM setting since the setup looks both fine for all cases - allowed of_find_net_device_by_node() working with pointing the device node into real netdev instance - fixed Kbuild warnings Changes since v2: - reuse readx_poll_timeout() to poll - add proper macro instead of hard coding - treat inconsistent cpu port as warning - remove the usage for regmap-debugfs - show error message when invalid id is found - put the logic for the setup of trgmii into adjut_link() - refine and reuse logic between port_[disable,enable], and default port setup - correct typo Changes since v3: - used struct as the parameter for readx_poll_timeout() and kill extra lpriv defined - moved around function to get out of an additional declaration - fixed kbuild errors caused by missing proper include in the latest tree ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-07net-next: dsa: add dsa support for Mediatek MT7530 switchSean Wang
MT7530 is a 7-ports Gigabit Ethernet Switch that could be found on Mediatek router platforms such as MT7623A or MT7623N platform which includes 7-port Gigabit Ethernet MAC and 5-port Gigabit Ethernet PHY. Among these ports, The port from 0 to 4 are the user ports connecting with the remote devices while the port 5 and 6 are the CPU ports connecting into Mediatek Ethernet GMAC. For port 6, it can communicate with the CPU via Mediatek Ethernet GMAC through either the TRGMII or RGMII which could be controlled by phy-mode in the dt-bindings to specify which mode is preferred to use. And for port 5, only RGMII can be specified. However, currently, only port 6 is being supported in this DSA driver. The driver is made with the reference to qca8k and other existing DSA driver. The most of the essential callbacks of the DSA are already support in the driver, including tag insert for user port distinguishing, port control, bridge offloading, STP setup and ethtool operation to allow DSA to model each user port into a standalone netdevice as the other DSA driver had done. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: Landen Chao <Landen.Chao@mediatek.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-07net-next: ethernet: mediatek: add device_node of GMAC pointing into the ↵Sean Wang
netdev instance the patch adds the setup of the corresponding device node of GMAC into the netdev instance which could allow other modules such as DSA to find the instance through the node in dt-bindings using of_find_net_device_by_node() call. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-07net-next: ethernet: mediatek: add CDM able to recognize the tag for DSASean Wang
The patch adds the setup for allowing CDM can recognize these packets with carrying port-distinguishing tag. Otherwise, these tagging packets will be handled incorrectly by CDM. The setup is also working out for general untag packets as well. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: Landen Chao <Landen.Chao@mediatek.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-07net-next: dsa: add Mediatek tag RX/TX handlerSean Wang
Add the support for the 4-bytes tag for DSA port distinguishing inserted allowing receiving and transmitting the packet via the particular port. The tag is being added after the source MAC address in the ethernet header. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: Landen Chao <Landen.Chao@mediatek.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-07dt-bindings: net: dsa: add Mediatek MT7530 bindingSean Wang
Add device-tree binding for Mediatek MT7530 switch. Cc: devicetree@vger.kernel.org Signed-off-by: Sean Wang <sean.wang@mediatek.com> Acked-by: Rob Herring <robh@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-07orangefs: move features validation to fix filesystem hangMartin Brandenburg
Without this fix (and another to the userspace component itself described later), the kernel will be unable to process any OrangeFS requests after the userspace component is restarted (due to a crash or at the administrator's behest). The bug here is that inside orangefs_remount, the orangefs_request_mutex is locked. When the userspace component restarts while the filesystem is mounted, it sends a ORANGEFS_DEV_REMOUNT_ALL ioctl to the device, which causes the kernel to send it a few requests aimed at synchronizing the state between the two. While this is happening the orangefs_request_mutex is locked to prevent any other requests going through. This is only half of the bugfix. The other half is in the userspace component which outright ignores(!) requests made before it considers the filesystem remounted, which is after the ioctl returns. Of course the ioctl doesn't return until after the userspace component responds to the request it ignores. The userspace component has been changed to allow ORANGEFS_VFS_OP_FEATURES regardless of the mount status. Mike Marshall says: "I've tested this patch against the fixed userspace part. This patch is real important, I hope it can make it into 4.11... Here's what happens when the userspace daemon is restarted, without the patch: ============================================= [ INFO: possible recursive locking detected ] [ 4.10.0-00007-ge98bdb3 #1 Not tainted ] --------------------------------------------- pvfs2-client-co/29032 is trying to acquire lock: (orangefs_request_mutex){+.+.+.}, at: service_operation+0x3c7/0x7b0 [orangefs] but task is already holding lock: (orangefs_request_mutex){+.+.+.}, at: dispatch_ioctl_command+0x1bf/0x330 [orangefs] CPU: 0 PID: 29032 Comm: pvfs2-client-co Not tainted 4.10.0-00007-ge98bdb3 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014 Call Trace: __lock_acquire+0x7eb/0x1290 lock_acquire+0xe8/0x1d0 mutex_lock_killable_nested+0x6f/0x6e0 service_operation+0x3c7/0x7b0 [orangefs] orangefs_remount+0xea/0x150 [orangefs] dispatch_ioctl_command+0x227/0x330 [orangefs] orangefs_devreq_ioctl+0x29/0x70 [orangefs] do_vfs_ioctl+0xa3/0x6e0 SyS_ioctl+0x79/0x90" Signed-off-by: Martin Brandenburg <martin@omnibond.com> Acked-by: Mike Marshall <hubcap@omnibond.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-07Merge tag 'pci-v4.11-fixes-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: - fix ThunderX legacy firmware resources - fix ARTPEC-6 and DesignWare platform driver NULL pointer dereferences - fix HiSilicon link error * tag 'pci-v4.11-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: dwc: Fix dw_pcie_ops NULL pointer dereference PCI: dwc: Select PCI_HOST_COMMON for hisi PCI: thunder-pem: Fix legacy firmware PEM-specific resources
2017-04-07tcp: restrict F-RTO to work-around broken middle-boxesYuchung Cheng
The recent extension of F-RTO 89fe18e44 ("tcp: extend F-RTO to catch more spurious timeouts") interacts badly with certain broken middle-boxes. These broken boxes modify and falsely raise the receive window on the ACKs. During a timeout induced recovery, F-RTO would send new data packets to probe if the timeout is false or not. Since the receive window is falsely raised, the receiver would silently drop these F-RTO packets. The recovery would take N (exponentially backoff) timeouts to repair N packet losses. A TCP performance killer. Due to this unfortunate situation, this patch removes this extension to revert F-RTO back to the RFC specification. Fixes: 89fe18e44f7e ("tcp: extend F-RTO to catch more spurious timeouts") Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-07blk-mq: Restart a single queue if tag sets are sharedBart Van Assche
To improve scalability, if hardware queues are shared, restart a single hardware queue in round-robin fashion. Rename blk_mq_sched_restart_queues() to reflect the new semantics. Remove blk_mq_sched_mark_restart_queue() because this function has no callers. Remove flag QUEUE_FLAG_RESTART because this patch removes the code that uses this flag. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-07dm rq: Avoid that request processing stalls sporadicallyBart Van Assche
While running the srp-test software I noticed that request processing stalls sporadically at the beginning of a test, namely when mkfs is run against a dm-mpath device. Every time when that happened the following command was sufficient to resume request processing: echo run >/sys/kernel/debug/block/dm-0/state This patch avoids that such request processing stalls occur. The test I ran is as follows: while srp-test/run_tests -d -r 30 -t 02-mq; do :; done Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-07scsi: Avoid that SCSI queues get stuckBart Van Assche
If a .queue_rq() function returns BLK_MQ_RQ_QUEUE_BUSY then the block driver that implements that function is responsible for rerunning the hardware queue once requests can be queued again successfully. commit 52d7f1b5c2f3 ("blk-mq: Avoid that requeueing starts stopped queues") removed the blk_mq_stop_hw_queue() call from scsi_queue_rq() for the BLK_MQ_RQ_QUEUE_BUSY case. Hence change all calls to functions that are intended to rerun a busy queue such that these examine all hardware queues instead of only stopped queues. Since no other functions than scsi_internal_device_block() and scsi_internal_device_unblock() should ever stop or restart a SCSI queue, change the blk_mq_delay_queue() call into a blk_mq_delay_run_hw_queue() call. Fixes: commit 52d7f1b5c2f3 ("blk-mq: Avoid that requeueing starts stopped queues") Fixes: commit 7e79dadce222 ("blk-mq: stop hardware queue in blk_mq_delay_queue()") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Long Li <longli@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-07blk-mq: Introduce blk_mq_delay_run_hw_queue()Bart Van Assche
Introduce a function that runs a hardware queue unconditionally after a delay. Note: there is already a function that stops and restarts a hardware queue after a delay, namely blk_mq_delay_queue(). This function will be used in the next patch in this series. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Long Li <longli@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Jens Axboe <axboe@fb.com>